Last year we had a problem that showed up only after we started making the product in 1,000-piece runs. The problem was that some builds of the system took a very long time to power up. We had built about 10 prototypes, tested the design over thousands of power ups, and it tested just fine (thanks to POC-IT). Then the 1,000-piece run uncovered about a half-dozen units that had variable power-up times—ranging from a few seconds to more than an hour! Replacing the watchdog chip that controlled the RESET line to an ARM9 processor fixed the problem.
But why did these half dozen fail?
Many hours into the analysis we discovered that the RESET line out of the watchdog chip on the failed units would pulse but stay low for long periods of time. A shot of cold air instantly caused the chip to release the RESET. Was it a faulty chip lot? Nope. Upon a closer read of the documentation, we found that you cannot have a pull-up resister on the RESET line. For years we always had pull-ups on RESET lines. We’d missed that in the documentation.
Like it or not, we have to pour over the documentation of the chips and software library calls we use. We have to digest the content carefully. We cannot rely on what is intuitive.
Finally, and this is much more necessary than in years past, we have to pour over the errata sheets. And we need to do it before we commit the design. A number of years ago, a customer designed a major new product line around an Atmel ARM9. This ARM9 had the capability of directly addressing NOR memory up to 128 MB. Except for the fact that the errata said that due to a bug it could only address 16 MB. Ouch! Later we had problems with the I2C bus in the same chip. At times, the bus would lock up and nothing except a power cycle would unlock it. Enter the errata. Under some unmentioned conditions the I2C state machine can lock up. Ouch! In this case, we were able to use a bit-bang algorithm rather than the built-in I2C—but obviously at the cost of money, scheduling, and real time.—Bob Japenga, CC25, 2013
RF designers, as well as more and more digital-oriented designers, are used to thinking about impedance matching. But it is very easy to forget it when you are designing a non-RF project. A non-matched circuit will generate power losses as well as nasty reflection phenomena. (Refer to my article, “TDR Experiments,” Circuit Cellar 225, 2009.)
Impedance matching must be managed at the schematic stage, for example, by adding provisional matching pads for all integrated antennas, which will enable you to correct a slightly mis-adapted antenna (see Figure 1).
Figure 1: Impedance matching requirements must be anticipated. In particular, any embedded antenna will surely need manual matching for optimal performance. If you forget to include some area for a matching network like this one on your PCB, you won’t achieve the best performance.
Impedance matching is also a PCB design issue. As rule of thumb, you can’t avoid impedance-matched tracks when you are working with frequencies higher than the speed of light divided by 10 times the board size. A typical 10-cm board would translate to a cutoff frequency of 300 MHz. A digital designer would then say: “Cool, my clock is only 100 MHz. No problem!” But a 100-MHz square-ware clock or digital signal means harmonic frequencies in the gigahertz range, so it would be wise to show some concern.
The problem could also happen with very slow clocks when you’re using fast devices. Do you want an example? Last year, one of my colleagues developed a complex system with plenty of large and fast FPGAs. These chips were programmed through a common JTAG link and we ended up with nasty problems on the JTAG bus. We still had issues even when we slowed down the JTAG speed to 100 kHz. So, it couldn’t have been an impedance matching problem, right? Wrong. It was. Simply because the JTAG is managed by the FPGA with the same ultra-fast logic cells that manage your fast logic so with stratospheric skew rates which translated into very fast transitions on the JTAG lines. This generated ringing due to improper impedance matching, so there were false transitions on the bus. Such a problem was easy to solve once we pinpointed it, but we lost some days in between.—Robert Lacoste, CC25, 2013
When I first started designing, I did not understand the need for the scope posts for hardware test points. I could always tack on a wire or, with many through-hole parts, connect my scope right to the chip. But now test points are essential. My eyesight and steady hands are long gone. But it goes way beyond that. Many of the scope points are buried under the chips. And those that are exposed are smaller than grains of sand. Provide yourself access to the critical points.
Thinking about where you’ll want to probe the software can also be useful. Linux has done a great job by providing hundreds “test points” for the OS. We should learn to do that with our applications. Planning in advance the places you want to test is also a useful exercise in the whole development cycle because early on it forces you to think about testing.—Bob Japenga, CC25, 2013
Watchdog timers are essential to many complete electronic system designs. As Bob Japenga explains, following a few guidelines will help make your designs more effective.
No longer used in just the realm of fault-tolerant systems, independent watchdog timers are put on systems because we know something can go wrong and prevent it from being fully functional. Sometimes the dogs reset the processor and sometimes they just safe the device and notify a user. However they work, they are an essential part of any system design. Here are the main guidelines we use:
- Make it independent of the processor. The last thing you want is for the processor to lock up and the watchdog to lock up too.
- Do not tickle the watchdog during an interrupt. (The interrupt can keep going while a critical thread is locked up.)
- Do not tickle the watchdog until you are sure that all of your threads are running and stable.
- Provide a way for your debugger to have break points without tripping the watchdog.
- If the watchdog can be disabled, provide a mechanism for this to be detected.
I provide many more guidelines for watchdog design in a white paper that’s posted on our website.—Bob Japenga, CC25, 2013
Electrical engineers often develop “headless” electronic systems—that is, systems without user interfaces. And many of those systems are embedded within product and are generally out of reach when problems occur. Bob Japenga is an engineer with some advice about logging and how it can help you troubleshoot problems as they occur.
Many of our designs are buried in some product or located in remote areas. No one is there when the device hiccoughs. Well defined logging can help make your system more robust because there are going to be problems and you might as well find out as much as you can about the problems when they happen. Here are some general guidelines that we apply here:
• Use an existing logging facility if you can. It should have all of the features discussed here.
• Unless you truly have unlimited disk space, provide a self-pruning cap on all logs. Linux syslog feature has this built in.
• Attempt to provide the most amount of information in the least amount of space. One way we do this is by limiting the number of times the same error can be logged. I cannot tell you how many times I look at a log file and find the same error logged over and over again. Knowing your memory limitation and the frequency of the error, after a set number of identical logs, start logging only one in every 100, or only allow so many per hour, and include the total count. Some failures are best kept in error counters. For example, communications errors in a noisy environment should be periodically logged with a counter; you don’t usually need to know every occurrence.
• Create multiple logs concerning multiple areas. For example, network errors and communications errors are better kept in their own log apart from processing errors. This will help a single error from flooding all logs with its own problem.
• Timestamp all logs—ideally with date and time—but I understand that all of our systems don’t have date and time. As a minimum, it could be in milliseconds since power-up.
• Establish levels of logging. Some logging is only applicable during debugging. Build that into your logging.
• Avoid side effects. I hate it when the designer tells me that if he turns logging on, the system will come to its knees. That’s just a bad logging design.
• Make the logs easy to automatically parse.—Bob Japenga, CC25, 2013