Bugs that Bug Us
This month, Bob continues his series on debugging embedded real-time systems. In this article, we will look at bugs that are not bugs of scale nor are they bugs of time, but they are bugs that bother us.
In a previous article on IoT security [1], I related a security bug that was found in one of our designs. Just recently I found a website called “The IoT Hall of Shame.” It was with some trepidation that I searched their list for our system. Phew! It wasn’t there—but perhaps could have been. Bugs are our enemy and can destroy a ton of good work.
We started this series by looking at the Chinese military general Sun-tzu’s tactics for war. In particular, Sun-tzu’s first principle of successful warfare was to know your enemy. As embedded systems designers, we too need to know the kind of bugs that plague us—in order to avoid them as well as to find them. In the previous two articles, we looked at two types: bugs of scale and bugs of time. Bugs of scale are where our code fails because of too much of something: too many inputs; too much data; too much noise; so forth. In other words, bugs happen when the system experiences a load for which it wasn’t designed. Bugs of time are where our code fails after being run for a long time. This month we will look at a catch-all category that I’ll call: bugs that bugs us.
Pointers
Pointer bugs are so pernicious that some languages eliminate them or at least de-fang them (like Java). Pointer bugs—your name is legion. It is hard to know even where to begin. And it is also hard to describe the kinds of problems that wrong handling of pointers can cause. Generally, I start by looking at the pointers in the code when I have segment faults or memory corruption. But pointer bugs are difficult to find (Figure 1).

English Pointer chasing a bug
Uninitialized pointers: The following code fragment shows a simple uninitialized pointer:
int *aPtr;
*aPtr = 12;
This is simple to illustrate but where we get into trouble is with the complex use of pointers (for example, an array of pointers to pointers), which are loaded dynamically. The index might be off by one and we fail to initialize the last pointer.
Thinking that C arrays are pointers: I hate to admit how often I boggled this. I still find many inexperienced C programmers who do not know the difference between:
— ADVERTISMENT—
—Advertise Here—
char str[] = “foo2_literal”;
and
const char *str = “foo2_literal”;
You cannot always use these two ‘str’s interchangeably. Sometimes you cast it to quiet a compiler complaint. And the next thing you know you have a bug. Simply stated, the first example has no pointers, just an address that starts the string.
Star (*) crossed pointers: C does limit what you can do with a pointer but you can increment the address instead of the variable and that is easy to do. aPtr++ increments the address while *aPtr++ increments the variable. Precedence matters here. You get the data pointed to before the pointer is incremented. Code that is missing the * is easy to do.
Pointers to data that is deleted: This is easy to do if you are manipulating a lot of heap data. A data block is allocated on the heap and a pointer is returned to the first location of that data block. Imagine that you create another pointer to point to that data Perhaps you are both filling and emptying the buffer at the “same” time. You keep track of the fill point and the emptying point. Later that data is deleted and the extra pointer is now invalid.
4+1 = 8 errors: If we increment by a 64-bit address, the difference is 8. This and other math errors can create bugs when manipulating pointers. For example, you might increment the address in code for a 32-bit processor and then port it to a 64-bit processor. The difference is 4 for the 32-bit processor and 8 for the other. Use sizeof to normalize addresses. Use it often. The Table 1 highlights arithmetic that is not allowed with pointers.

Pointer arithmetic errors
Returning a pointer to a local variable:
None of you would ever do this:
char *bad1(void)
{
char buffer[] = “test_123”;
return buffer;
}
int *bad2(void)
{
int t[3] = {1,2,3};
return t;
}
Thankfully good compilers return warnings on this but sometimes we have so many warnings we miss this one.
Too many indirections – It was not unusual for us to create table-driven software (a topic for another day!). When we did this, we often had variables that looked like:
int ***a[10];
Once you start manipulating a pointer to a pointer to a pointer, it is easy to make a mistake. I still think pointer tables can be used to create really maintainable table-driven code—but if you are having problems, I would closely review every variable with more than one * and how it is used.
“Off-by-One”
In C, the first element of an array is index 0; The buffer size of a zero-terminated string is one more than the maximum length of the string. These two C language facts set us up for problems. When we find all of our data isn’t being loaded or our buffers are over-running, it is always wise to check for “off-by-one” bugs (Figure 2).
— ADVERTISMENT—
—Advertise Here—
>= or > AND <= or <

Off-by-one error
If loops are terminating early or data is not complete, it is always good to double-check all of your comparisons. Is it “greater than” or “greater than or equal to?” I hate to admit that this never became second nature to me. I always had to think twice about whether to use one or the other.
Bugs due to Precedence Issues
Precedence determines which operator is executed first in an expression with more than one operator. Although well defined in the C language, relying on the defined language precedence makes the code hard to read and hard to maintain. Not everyone has the precedence table memorized. And if the code is hard to read for us, then the probability of having a bug is increased. Most of us don’t follow the good coding guidelines to use parentheses to make it clear how non-trivial expressions are evaluated. For example:
int a = 0, b = 1, c = 0;
int x = (a & b) == c; // x equals 1
int y = a & (b == c); // y equals 0
int z = a & b == c; //
What does z equal?
When we encounter a bug, it is easy to forget the exact order of precedence and gloss over the wrong logic. The less we have to remember the better—especially when we are debugging. USE PARENTHESES, not language-defined precedence! In the above example:
int z = a & (b == c); //
The bitwise & is after equivalence checks.
Associative Errors
Did you know that some C operators have right-to-left associativity and the rest have left-to-right? For example:
int this = 2*5/2;
int that = 2*(5/2);
int other = (2*5)/2;
What is the value of this and that? Because multiplication and division are equal in precedence, we need to look at their associativity. Since they are left to right this is five and that is 4. Same advice as above—USE PARENTHESES!
Short Circuit Operations
Because the C language is designed for efficiency, certain operations are not executed when included in conditional statements if the overall condition makes the operation unnecessary. This can be particularly confusing when functions are included in the operation. For example:
if (function_a(variable) && function_b(variable++))
If function_a returns a 0, function_b never runs and the variable is not incremented. This type of bug (Figure 3) is easily missed both in code review and even in testing. And generally, we make them more complicated than my example.
Not Properly Handling “Break” statements
In 1990, AT&T (Figure 4) suffered a nine-hour outage of long-distance service because of a patch intended to speed up the handling of long-distance calls. The following shows a representative pseudo code (Listing 1) fragment of the problem developed by Dennis Burke of California Polytechnic State University:
LISTING 1
From Burke’s write up of the problem
In pseudocode, the program read as follows:
1 while (ring receive buffer not empty
and side buffer not empty) DO
2 Initialize pointer to first message in side buffer
or ring receive buffer
3 get copy of buffer
4 switch (message) {
5 case (incoming_message):
6 if (sending switch is out of service) {
7 if (ring write buffer is empty) {
8 send “in service” to status map
9 else
10 break
}
}
11 process incoming message, set up pointers to optional parameters
12 break
}
13 do optional parameter work
When the destination switch received the second of the two closely timed messages while it was still busy with the first (buffer not empty, line 7), the program should have dropped out of the if clause (line 7), processed the incoming message, and set up the pointers to the database (line 11). Instead, because of the break statement in the else clause (line 10), the program dropped out of the case statement entirely and began doing optional parameter work which overwrote the data (line 13).
Error correction software detected the overwrite and shut the switch down while it [with a] reset. Because every switch contained the same software, the resets cascaded down the network, incapacitating the system. If a certain code is not being executed, check the location of the break statements in the switches. In the same way, all cases of a case statement should be checked or covered by default.
How do we tell if we have this kind of bug?
Well, if 74 million of your long-distance calls don’t get through, you have this bug! Or if you lose 200,000 airplane reservations you might check out this bug!
Failure to check all return values
I have mentioned this before, but it bears repeating. We must check all return values or they will come back to bug us.
— ADVERTISMENT—
—Advertise Here—
In 1990, we designed, developed, and sold the fastest (and safest) hard disk defragmenter in the world. It caught the eye of an Israeli company that wanted to buy the source code. When we delivered the source code, they found that there were a ton of functions where we did not check the return values. Since we still retained the right to market it under our own name (Speedbak), it was worth our while to correct these deficiencies and we did.
It is difficult to tell the type of problems that this could create. If you find your system not producing correct results or having random crashes, do a rigorous code review of every function to verify that you are handling all possible return values.
More Help
This list is just those bugs that have bugged me. But there are many more. In a previous article [2] I introduced you to the Common Weakness Enumeration project. This is a community-driven effort that attempts to define a list of common software weaknesses and vulnerabilities. You would be well advised to become well aware of their list. In this article, of course, we have covered this only in thin slices.
Next time we will look at tools we can use for debugging. This will not be a review of existing tools available on the market, but what are the tools that you need to debug an embedded system’s software.
REFERENCES
[1] Circuit Cellar article Embedded in Thin Slices: Internet of Things Security (Part 6) Identifying Threats December 2018 Issue 341.
[2] Circuit Cellar Embedded in Thin Slices: The Internet of Things: Internet of Things Security June 2016 Issue 311
RESOURCES
Code Curmudgeon | https://codecurmudgeon.com
AT&T | www.att.com
Cal Poly | www.calpoly.edu
Common Weakness Enumeration | https://cwe.mitre.org
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • AUGUST 2022 #385 – Get a PDF of the issue
Sponsor this ArticleBob Japenga has been designing embedded systems since 1973. From 1988 - 2020, Bob led a small engineering firm specializing in creating a variety of real-time embedded systems. Bob has been awarded 11 patents in many areas of embedded systems and motion control. Now retired, he enjoys building electronic projects with his grandchildren. You can reach him at
Bob@ListeningToGod.org