Design Solutions Research & Design Hub

Debugging Embedded Real-Time Systems

Figure 1 Left Ventricular Assist Device (LVAD)
Written by Bob Japenga

Strategies for Finding Bugs

This month, I wrap up this article series with a list of proven strategies that I’ve used in finding bugs in embedded real-time systems.

  • What are some good strategies to find bugs in embedded real-time systems?
  • How can I use a binary search to find coding bugs?
  • What’s a good use of conditional breakpoints?
  • Embedded Real-Time Systems

A number of years ago, a product we developed for one of our customers had a bug that prevented us from shipping. (Oh, would that this was true for all companies!) I was called in—not because I was an expert, but because I was a good wall to bounce ideas off of—and spent several days with our top designer futilely trying to find that bugger. After several days, I realized that we both had different strategies that we were applying to find the bug. It was the first time that I realized how many different strategies there were for finding bugs. And I learned from him. I honestly don’t remember whose strategy finally proved successful, but in all probability, it was his. But it showed me that we need to apply multiple strategies in finding those bugs that bug us. This month, we’ll look at some of my favorite strategies.

STRATEGY #1: “COULD IT BE?”

One strategy that I found useful in finding bugs revolves around building a series of hypotheses as to what could cause the problem. First, gather as much data as you reasonably can about the bug. Asking questions like: What is going on when it happens? How does it manifest itself? How often does it happen? And so on. Then brainstorm with some others to create as many hypotheses as you can regarding what could be causing the bug. Think about what else might be going on if this bug is happening based on this hypothesis. In other words, ask yourself: “If this is the problem, what else also must be happening?” Next, prioritize these hypotheses from most likely to least likely. This prioritization doesn’t need to be perfect. Then attempt to prove the hypotheses wrong by building test cases one at a time around each in order of priority.

A trivial example might be the hypothesis that there is a concurrency issue causing some data corruption. With most good real-time operating systems, you can change the scheduling algorithms with a compiler switch to a non-preemptive scheduler. We implemented this strategy once and found something we weren’t looking for. We were using real-time extensions in Linux (before they were built in) to achieve deterministic operation on an output controlling a balloon pump triggered by an electrode measuring heartbeats (Figure 1) [1]. Much to our surprise, we discovered that non-pre-emptive scheduling of our tasks did not appear to change the repeatability of our balloon pump. But repeatability after two days in the lab with one system is not the same as 10 years with 10,000 units in operation. Needless to say, we stuck with the real-time extensions.

Figure 1 
Left Ventricular Assist Device (LVAD)
Figure 1
Left Ventricular Assist Device (LVAD)
STRATEGY #2: HÄNSEL AND GRETEL

Most of us are familiar with the story of Hänsel and Gretel (Figure 2). Having overheard a plot by their stepmother to abandon them deep in the forest, they drop pebbles along the path and easily find their way home following their abandonment. This strategy uses some kind of marker that can lead us to the source of the bug. The marker can be printf’s, LEDs, log files, GPIO outputs, or any appropriate pebble that can indicate the steps leading up to the bug. Of course, on the step-mother’s second attempt, the bread crumbs are eaten before Hänsel and Gretel try to come back. Make sure your markers don’t get “eaten.”

Figure 2
Hänsel and Gretel
Figure 2
Hänsel and Gretel

This strategy’s incredibly important, given the nature of many hard-to-find bugs. A study published in Communications of the ACM found that a large percentage of hard-to-find bugs happen when large temporal or spatial chasms exist between the root cause and the manifestation of the bug [2]. A pebble trail can help bridge that chasm.

For example, imagine state-driven software that “randomly” fails in some way. Recording all the states with a timestamp could reveal that the failure occurs after the software was in a particular state after having been in another particular state. And the failure could be temporally distant from the software being in either state.

STRATEGY #3: EAT YOUR EGO

We have already mentioned bringing in extra help. But I’m embarrassed at how often, when I’ve done so, my helper has found the bug immediately. This works counterproductively with those of us who have not learned humility. The more often it happens, the more reluctant I am to call in help the next time. “I’m not going to be embarrassed again.” Or, “I have to be able to find this on my own.” So, eat your ego and get help—soon! See Figure 3 for a visual reminder.

Figure 3
Eat your ego!
Figure 3
Eat your ego!
STRATEGY #4: DIVIDE (BY TWO) AND CONQUER

I distinctly remember the first time I used a binary search on some large structures. (I think it was in FORTRAN IV, to date myself). I was pleasantly surprised at how much faster it worked than a simple linear search. The “divide and conquer” strategy starts by simply taking out code to isolate the bug. With interrupts turned off, does the bug still happen? When we don’t run the built-in test thread, does the problem go away? I’ve found that using the binary search technique with this strategy helps immensely to narrow down the source of the bug quickly.

For example, imagine that you’re experiencing some buffer overwrite condition. But who’s causing it? Start paring down your system in a binary fashion. First, cut out half of the possibilities. And then half again based on your results. If you have 10 tasks, cut out five, and then three, and so on—not one at a time. Or use the technique to pare down a function—not in little chunks, but by half and then half again.

Time is money in our game, and the divide-and-conquer strategy can help you find bugs more quickly.

STRATEGY #5: PRESS THE PAUSE BUTTON

I am amazed at how our brains work. They often solve problems in our sleep. Sometimes the best debug strategy is to pause and give it a rest. Take a walk. Go home. Take a nap. Like strategy #3, our egos will resist this strategy. “I’m almost there.” “If I can find this, I will be able to go home in peace.” Don’t give in to these siren calls. Give your brain a rest. Press the pause button (Figure 4).

Figure 4
Pause
Figure 4
Pause
STRATEGY #6: WRITE DOWN WHAT YOU’VE TRIED

Often, I have tried so many things to find some stupid bug, that I start repeating things I had tried before. I know it takes more time, but write down the methods you’ve tried, the steps that you’ve taken, the paths that you’ve already traveled. And start doing this right from the beginning—before you know how long it’s going to take to find the bug.

STRATEGY #7: BE AS DUMB AS THE COMPUTER

A former co-worker, Mike Scudder, wrote me this as I was writing this article: “Another technique I often use, once I have identified a relatively manageable series of statements that should contain the bug, is to process them mentally (or perhaps on paper) while striving to be as dumb as a computer. Often the bug stems from treating the computer as if it had common sense, when it is actually and always instead doing exactly what you told it to do.”

In Zen terminology, this is called harnessing the beginner’s mind. The Zen monk Shunryū Suzuki once said: “In the beginner’s mind there are many possibilities; in the expert’s mind there are few.” We all have probably been there. We are in a marathon session with a deadline and a boss breathing down our neck. Our intense focus can cause us to miss the obvious. We miss that we have confused our assumptions with facts. We accept things that are just plain wrong. We have rejected some possibilities for no good reason. This is where adopting a beginner’s mind can help.

STRATEGY #8: LOOK AT THE CODE, NOT THE COMMENTS OR LABELS

The names and comments in the program code can throw us off since they indicate what the computer is supposed to be doing, not necessarily what we actually told it to do. A classic bug that I often introduced was reversing the Boolean logic on the return of a function or the labeling of a variable. For example:

if (OutputIsOn(someOutput))

{

// Operate on the On condition

}

else

{

// Operate on the Off condition

}

while the function itself is returning false if the output is on. Or when we define a variable where the name is opposite to its value like:

bool TankIsEmpty;

which we define as true when the tank is not empty.

STRATEGY #9: DUMP AND DIFF

Digitally comparing core dumps or extensive log files is often a helpful strategy for finding differences in data structures that are voluminous. Use automated tools whenever they are easy to use. They can prevent eye strain, after all!

STRATEGY #10: CONDITIONAL BREAKPOINT

The first time my debugger had the ability to perform a breakpoint on a condition was with the HP64000. I fell in love with that feature and strongly recommend that you invest in a debugger that gives you the ability to break on complex expressions. The more complex the better: not just, “Break when the variable changes,” but, “Break when the variable is cleared for the eighth time.”

Although most good debuggers have this feature now, learn how to use it to its full capacity. You will thank me later. I’ll send you my Venmo handle.

STRATEGY #11: TREAD BACKWARDS

I’m always shocked at how hard it is to run my elliptical backward. I think (without any real evidence) this difficulty has more to do with my brain than with my body. Our bodies and our brains don’t like to go backward. In a similar way, the concept of back-planning is not a natural process for our brains. When I first learned about back-planning [3], it was a major innovation for me. Start with the end in mind and work backward. For years, all my planning was done going forward. “Okay, we are here. What is the next step to get to our end goal.” Back-planning taught me to start with the goal and work backward. In debugging, we start with the erroneous output and trace it backward to the inputs. Remember, that is not the way our brains naturally think.

STRATEGY #12: HISTORY

All of us tend to have repeat failures that plague us. I never struggled with accidentally placing an = sign instead of a == in a C program. (Most modern compilers help us identify those now). But I do quite often mistake == with === in React Native [4]. Also, pointer arithmetic has always been a problem for me. As we said in an earlier article, know your enemy. Sometimes the enemy is you. Or as Pogo said, “We have met the enemy and he is us.”

I wish I had recorded my “bug traits” early on in my career. I would have been able to pick up patterns and learn from my mistakes instead of repeating them. Just don’t let your boss see the list of bugs that you have fixed!

SUMMARY

This article series has come to an end. Over its course, we have looked at:

  • the bugs that bug us (April, June, and August 2022)
  • the tools needed for debugging (October and December 2022)
  • strategies for determining if the bug’s in software or hardware (February 2023)
  • strategies for duplicating bugs (April 2023)
  • flawed assumptions in debugging (June 2023)
  • and now strategies for finding bugs.

See Circuit Cellar’s Article Materials and Resources web page for more information on those articles. We have certainly covered a lot—but, as always, only in thin slices. 

REFERENCES
[1] Balloon Pump description: https://myhealth.alberta.ca/health/pages/conditions.aspx?hwid=tx4071abc
[2] Communications of the ACM April 1997 / Vol. 40, No. 4 Marc Eisenstadt “My Hairiest Bug War Stories”: https://dl.acm.org/doi/pdf/10.1145/248448.248456
[3] Back-planning example: https://www.eventdrive.com/en/ressources/blog/backward-planning
[4] React Native – a language for developing cross platform mobile apps developed by Meta: https://reactnative.dev/

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • AUGUST 2023 #397 – Get a PDF of the issue

Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.


Note: We’ve made the Dec 2022 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article

Bob Japenga has been designing embedded systems since 1973. From 1988 - 2020, Bob led a small engineering firm specializing in creating a variety of real-time embedded systems. Bob has been awarded 11 patents in many areas of embedded systems and motion control. Now retired, he enjoys building electronic projects with his grandchildren. You can reach him at
Bob@ListeningToGod.org

Supporting Companies

Upcoming Events


Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2024 KCK Media Corp.

Debugging Embedded Real-Time Systems

by Bob Japenga time to read: 8 min