When it comes to software development, quality is king. George Novacek provides some tips for developing high-quality software for embedded controllers.
Quality software should be the goal of every engineer. As the old engineering saying goes: “Quality must be built-in; it can’t be tested in.” In this article I’ll suggest a few ways to develop quality software for embedded controllers.
A few generally accepted truths permeate software development. For example, defective or misunderstood requirements are considered to be the source of at least 56% of all faults. Here, concurrent engineering, often forcing software developers to start design before the requirements have been finalized, achieves the exact opposite of what it is intended to do. Consequently, the budget and the schedule get destroyed. Productivities of individual programmers have been experienced to differ as much as 10:1, making scheduling and cost estimating—not to mention “software reliability prediction”—tantamount to staring at a crystal ball. An effort to fix an escaped bug can cost as much as 1,000 times more than it would have cost catching or avoiding it during development.
I intentionally put “software reliability” in the quotation marks because the term is misleading. Calling it software quality reflected by a potential quantity of latent faults (bugs) is closer to reality. Just as we talk about defects in an assembled product. Its reliability is a totally different issue.
Figure 1 is the ubiquitous hardware reliability bath tub curve I described in several of my previous articles. Briefly, there is the period of initial failures caused by components’ infant mortality. Quality manufacturers strive to exceed this period by a screening process before releasing the product to the customer. A statistically determined steady failure rate period caused by stress follows. At the end of the life cycle, failure rate increases due to fatigue and wear out.
Software is different. There are no physical factors at play; software does not wear out. Faults are related to human factors only. Defects caused by workmanship are embedded in it. That is the fundamental difference between hardware and software. The graph for hardware reliability (see Figure 1) based on failure statistics can be used to determine the length of the hardware’s exposure to factory-induced stress to overcome infant mortality. In addition, it can be used to determine the field failure rate and shelf life. This can’t be done with any reasonable accuracy for software. You may take a stab at it based on historical data, but the results are debatable.
Whenever a software defect becomes patent, it needs to be fixed. Fixing it often introduces new defects or unmasks already existing ones. Therefore, the often inevitable drop in quality, reflected by the increased number of latent faults follows and requires further testing. This is illustrated by Figure 2. The graph cannot be scaled with any reasonable accuracy. Some latent faults may not become patent throughout the life of the product.
Figure 2 represents software issues, not failures caused by hardware. During the development testing of the software the initial number of bugs can be quite high, but it diminishes as the integration and verification and validation (V&V) progress. At some point, the V&V results should provide sufficient confidence for the product to be released. What that point is depends on the application. It differs substantially between commercial and safety-critical applications.
Sometime after the software release a failure may occur. If you’re unlucky, like I once was, it might happen the very first day on the first product delivered. This happened with Level A dual-redundant software after the customer signed off on the V&V. Embarrassing as it was, the cause of the failure was traced to a bug in the assembler. The unique conditions under which it became patent never occurred during testing. The assembler vendor fixed the bug, but the entire costly V&V had to be repeated due to the new assembler version.
Introducing new faults while fixing the discovered ones is not unusual. Usually, a thorough costly V&V must follow, unless the modification can be classified as minor. An updated version of an assembler or a compiler or a major software change require full V&V. With time, the number of latent faults should decrease until, eventually, their number should approach zero.
Some authors end the curve in Figure 2 with an increase in failure rate similar to the wear-out effects in Figure 1 and call it obsolescence. I can’t see obsolescence having any relation to the software quality. It introduces no faults in the software. It is related to either hardware wear-out requiring repair, hardware obsolescence requiring redesign, or the decrease in market demand. In this sense it only can be considered a life cycle factor for embedded controllers.
SOFTWARE DESIGN STANDARDS
As I noted previously, preventing defects from happening in the first place is a lot less costly than fixing them later. To that end, adhering to good engineering practices will take you a long way towards generating quality, robust software. The first step is simple. Write Software Design Standards and then stick to them. Even if you are just a one-man show or a hobbyist, develop your own writing style and be consistent. Everything depends on which is the best method for you.
You can develop your own style for headers, placing of curly braces, indentation and so forth. You want to make sure your code is easy to read and well commented. There is nothing more frustrating than trying to modify your own code a year later and struggling to understand what you did, how you did it, and why.
Most of my programs, even very small or simple ones, contain after initialization the main() loop doing practically nothing but calling functions. Everything else, including declaration of variables, defines, function prototypes, functions, and so forth I locate in individual header files. The majority of header files contain just a single function. The resulting code can be easily traced. Every function is clear and often separately testable. Should a change be required, the affected functions can be quickly identified and isolated.
At the risk of repeating myself, it is absolutely essential to have the high-level requirements complete, well defined and understood before the hardware/software partitioning can begin in earnest and the system architecture developed. This is a continuing nightmare due to misapplication of the presumably progressive methods of concurrent engineering. It is believed to accelerate schedule and reduce development cost while achieving the exact opposite. It also violates the rules for hardware and software development under standards (e.g., DO-178, DO-254, ISO 26262, or IEC 61508), potentially leading to errors and loss of configuration. Yet it continues to be a common practice.
When beginning the development of functions, the engineer should always think of how those functions could be tested, preferably independently of the rest of the software. It is advantageous to test each function separately such as by simulation on a PC before its integration. Strong data typing should be the rule. In some cases, when the hardware cost is not a major issue, some data typing can be performed externally by hardware to improve robustness. This is often the preferred method to provide the level of safety required by critical systems. Also, it often pays to include indicators, such as LEDs, to display successful execution of their corresponding functions. It may sound primitive, but it greatly speeds up troubleshooting, especially when a field problem cannot be repeated in the lab and one has to rely on the user’s report.
I cannot stress enough that quality must be built in. Don’t rely on test engineers to fix your work. It costs money to produce quality, but remember that fixing an escaped defect can cost as much as 1,000 times more. There are methods to accelerate testing, just running ten prototypes concurrently for forty eight hours equals to 480 hours of test time. The designer should never perform the acceptance testing or V&V of his own software. A dedicated test engineer’s job is to break the product. Designers have too much of parental feelings for their creations and unwittingly try to avoid their failures.
It is never a bad idea to have, before its release, the product handled by somebody who knows little about it or the future customer. I find that people can misuse or abuse products beyond their designers’ wildest imagination. In fact, too many products on the market indicate that not even their designers have tried to use them. Don’t join their company.
Software quality, thanks to myriads of modern tools now available, has greatly improved. It will hardly ever be 100%, but it is continuously improving. I see the next major software challenge in making software security bulletproof. The Internet of Things (IoT) will need to be secure against hackers to achieve major commercial success. And that is not an easy task.
B. Beizer, Software Testing Techniques, International Thomson Computer, 1990.
C. Hobbs, “Stabilize Software Upgrades in Critical Systems,” Electronic Design, 2014, http://electronicdesign.com/dev-tools/stabilize-software-upgrades-critical-systems.
R. Seacord, “Top 10 Secure Coding Practices,” 2011, www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices.
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • APRIL 2016 #309 – Get a PDF of the issueSponsor this Article
George Novacek was a retired president of an aerospace company. He was a professional engineer with degrees in Automation and Cybernetics. George’s dissertation project was a design of a portable ECG (electrocardiograph) with wireless interface. George has contributed articles to Circuit Cellar since 1999, penning over 120 articles over the years. George passed away in January 2019. But we are grateful to be able to share with you several articles he left with us to be published.