Arduino Program Timing

Ed Nisely explains how output pulses can reveal the progress of invisible firmware routines. After presenting a straightforward way to instrument your code, he covers the detailed timing produced by Arduino statements and functions.

While writing the Arduino program for the RGB LED lights mentioned in my May column (Circuit Cellar 310), I wondered how much time the floating-point trigonometric calculations required. Rather than wrap a software measurement around the code, I added a pair of instructions to flip an output bit, then watched the results on an oscilloscope. It’s a classic technique that helps you debug software when there’s no other way to see what’s going on inside your code or when elaborate tracing routines will disturb the firmware’s normal operation.

In this column, I’ll show a simple way to instrument your code, then take a look at the detailed timing produced by Arduino statements and functions. You’ll need an oscilloscope or logic analyzer for this technique, but even the least-expensive instruments will suffice.

FUNDAMENTAL PULSE WIDTHS

Listing 1 shows the simplest Arduino program that produces a visible output. Those two lines toggle Pin 13, the LED found on nearly all Arduino boards, to form the 5.1 µs pulses in Photo 1, with each pulse marking one iteration of the loop() that executes continuously after finishing the configuration in the setup() section. Although the oscilloscope cursor readout provides three fractional digits, the true resolution is considerably worse, because each horizontal pixel covers 0.040 µs.

LISTING 1: The simplest possible Arduino program produces a single pulse on an output bit during each loop() iteration. Although the ATmega328 microcontroller can flip a bit with a single assembly language instruction, each of these high-level functions executes dozens of assembly instructions before and after the bit changes.

LISTING 1: The simplest possible Arduino program produces a single pulse on an output bit during each loop() iteration. Although the ATmega328 microcontroller can flip a bit with a single assembly language instruction, each of these high-level functions executes dozens of assembly instructions before and after the bit changes.

The Atmel ATmega328 microcontroller can toggle an output bit with a single instruction, which makes the digitalWrite() function in the Arduino runtime infrastructure seem overly elaborate: fifteen lines of code with a path length of 12 lines. Among other tasks, the code must select and verify the hardware port, produce a bitmask selecting the pin, handle the special case of a PWM pin, and disable interrupts while manipulating the port.

PHOTO 1 Each of the digitalWrite() functions producing this pulse requires 5.1 µs, roughly 50 instructions, on the 16 MHz ATmega328 microcontroller. The Arduino loop() adds 0.40 µs to the low portion of the waveform.

PHOTO 1: Each of the digitalWrite() functions producing this pulse requires 5.1 µs, roughly 50 instructions, on the 16 MHz ATmega328 microcontroller. The Arduino loop() adds 0.40 µs to the low portion of the waveform.

Homework: Find and examine the digitalWrite() function in the Arduino source code, then explain why all that additional logic isn’t optional.

The single AVR machine-level instruction that produces the waveform in Photo 1 is located near the end of the digitalWrite() function, with most of the execution time occurring before the output changes. The high part of the pulse therefore includes the short tail end of the first digitalWrite() and the longer beginning of the second, but the output pulse width shows the execution time through the complete function.

The 16 MHz crystal oscillator on the Arduino UNO clone board I used for these measurements produces a 62.5 ns instruction cycle, so each 5 µs pulse contains about 80 cycles. Most AVR arithmetic instructions require one or two cycles, branches require two or three, and subroutine CALL/RETURN pairs soak up eight, so assuming 1.6 cycles/instruction for short subroutines seems reasonable and allows you to figure the results in your head: 10 instructions per microsecond. That means the compiler translates the dozen high-level lines of code in the digitalWrite() function into 50 machine-level instructions.

Homework: Examine the compiler’s machine-level output to determine the actual number of instructions.

The low part of the pulse occurs as execution continues through the Arduino infrastructure as the loop() returns control to the first digitalWrite() line, adding 400 ns behind the scenes. That’s only six machine cycles, perhaps three or four machine instructions, and not much overhead at all.

TRACING A MISSING PULSE

Triggering your oscilloscope on the signal produced by Listing 1 should produce a steady display, but you’ll see “ghost” traces that seem to show missing pulses. Setting the trigger to produce single sweeps will eventually produce a display similar to Photo 2, with an obviously missing pulse.

PHOTO 2: A “missing pulse” occurs when a timer interrupt executes additional instructions that don’t appear in the user program. The interrupt can also occur when the pulse is high, causing an unexpectedly long pulse.

PHOTO 2: A “missing pulse” occurs when a timer interrupt executes additional instructions that don’t appear in the user program. The interrupt can also occur when the pulse is high, causing an unexpectedly long pulse.

Because the program’s logic (if you can call it that) lacks branches that could omit a pulse, the problem must lie elsewhere. Carefully examining the pulse timings shows that there’s not quite enough time for another pulse in that gap, which means the extended time came from a delay, rather than an omission.

The Arduino infrastructure includes a millis() function that reports the elapsed time in milliseconds since the microcontroller emerged from its most recent reset. A timer interrupt occurs every millisecond, with the interrupt handler updating a counter that the millis() function returns to the caller in a four byte unsigned long integer.

Because the loop() in Listing 1 iterates every 10.4 µs, the timer interrupt will occur once in every 96 pulses, stretching either the high or low part of the pulse. The advantage of an oscilloscope should be obvious: a problem that occurs 1% of the time might not appear amid all the “correct” values shown by a simple software measurement.

The if() statement in Listing 2 determines whether the return value from millis() has changed and, if so, produces a pulse on Pin 11. The lower trace in Photo 3 shows that pulse, with the oscilloscope triggering a single sweep on that pin and the trigger point at the middle of the screen. The upper trace comes from Pin 13, as before, and shows the same “missing” pulse due to the timer interrupt handler.

LISTING 2: The millis() function returns the number of milliseconds since the microcontroller emerged from the most recent reset. The comparison will be false 98.7% of the time, but that test adds 2 µs to the average loop() time.

LISTING 2: The millis() function returns the number of milliseconds since the microcontroller emerged from the most recent reset. The comparison will be false 98.7% of the time, but that test adds 2 µs to the average loop() time.

The cursors bracketing the 7.4 µs pulse in the upper trace show that the if() statement adds 2.3 µs to the execution time of the code in Listing 1, even when the value of millis() doesn’t change. The CPU uses those 35 clock cycles and two dozen instructions to call the millis() function, test its return value, then branch around the code inside the conditional block. You can see the effect of every instruction at this time scale!

PHOTO 3: The pulse in the lower trace shows that the “missing pulse” occurs when the return value of the millis() function changes. Testing the millis() value increases the loop() duration, as shown comparing the pulses in the top trace to those in Photo 2.

PHOTO 3: The pulse in the lower trace shows that the “missing pulse” occurs when the return value of the millis() function changes. Testing the millis() value increases the loop() duration, as shown comparing the pulses in the top trace to those in Photo 2.

The long pulse in the middle of the upper trace shows what happens just after the timer interrupt occurs: detecting the change, generating the sync pulse in the lower trace, and updating the Millis_Previous variable requires 22 µs, roughly 15 µs and 150 instructions more than the previous code.

A logic analyzer can trigger a capture based on the Boolean combination of many inputs, a situation that depends on having access to all those signals. A microcontroller buries those signals inside the CPU, where combining them into a trigger requires software, rather than hardware. A similar situation occurs with gate array logic, where a dab of additional logic inside an FPGA can generate suitable triggers, if you have a spare output pin, and may be the only way to isolate elusive glitches due to combinations that occur so rarely as to be invisible.

Homework: Move the statement that updates Millis_Previous between the two digitalWrite() functions, then measure the change in pulse width to determine the time required to update that variable.

As the number of statements inside the loop() increases, the likelihood that the timer interrupt will occur during the program also increases. This won’t make any difference to most programs, but when you’re watching a pulse on the oscilloscope, you don’t want a timer interrupt disturbing your measurements. The code in the next few examples will disable all interrupts just before raising the timing pulse and enable them after lowering it, forcing the timer interrupt to occur outside the critical section.

Download the entire article. This article appeared in Circuit Cellar 316 November 2016.

Ed Nisley is an electrical engineer and author in Poughkeepsie, NY. He has been writing for Circuit Cellar since Circuit Cellar 1, 1988. His regular column, “Above the Ground Plane,” appears every other month in Circuit Cellar magazine.