Basics of Design CC Blog Research & Design Hub

Breaking the Loop with Fault Injection

Figure 1 The complete demo shows a microcontroller under attack at (1) and the EMFI tool at (2).
Written by Colin O'Flynn

A Simple Experiment on an Embedded System’s Code

A simple demo of fault injection attacks often seems “too good to be true.” In this article, Colin has demonstrated why simple fault injection attacks that break out of infinite loops are still a threat for bootloaders, and how you can experiment with this on your own code.

— ADVERTISMENT—

Advertise Here

  • How does fault injection work?

  • How can you recreate the fault injection experiment yourself?

  • What is a EMFI fault?

  • How EMFI faults?

  • Why do you use a compiler explorer?

  • How to fix the fault?

  • PicoEMP

  • ChipSHOUTER

  • Flip-flops

  • Compiler explorer

— ADVERTISMENT—

Advertise Here

At Embedded World this year, I had a demo that showed how fault injection (electromagnetic fault injection in particular) can be used in a surprisingly simple way to bypass an example secure bootloader. In this article, I’m going to show you how this works, and how you can recreate this type of experiment yourself.

THE SETUP

To start with, let’s look at the demo first to give you an idea of what we can show. The demo is shown in Figure 1, with the microcontroller being the center of a development board (marked with a “1”). To cause a fault, we use a PicoEMP (which I discussed in the March 2022 #380 issue of Circuit Cellar) marked with a “2,” which generates a very strong electromagnetic “pulse” overtop of the chip (more on this shortly). You’ll also see videos of this demo done with the original (more powerful) ChipSHOUTER instead of the PicoEMP; the principle is the same.

Figure 1
The complete demo shows a microcontroller under attack at (1) and the EMFI tool at (2).
Figure 1
The complete demo shows a microcontroller under attack at (1) and the EMFI tool at (2).

If the fault is successful, the display updates as in Figure 2. In this figure, I’ve marked the original display strings in the green outline, and the display updates after the fault occurs in the red outline. My claim in this demo is it’s showing how electromagnetic fault injection can be used to bypass something like a “real” bootloader, but you might wonder how much of this is just simple demo code and how much is like a real-life implementation.

Figure 2
The initial output is shown in the green box, and after a fault attack the output in red comes up.
Figure 2
The initial output is shown in the green box, and after a fault attack the output in red comes up.

— ADVERTISMENT—

Advertise Here

To start with, let’s investigate the demo version of the code. This is shown in Listing 1. This sort of code-flow is relatively common for bootloaders, where a signature of a new image sent is checked to see if it appears to be valid. If the signature is invalid, the device should not boot the image as it might be a malicious image an attacker is trying to load that will bypass any protection built into the device.

This is enforced with a simple comparison on Line 1 of Listing 1 and if the signature is invalid, the panic() function is called on Line 3. The panic() function ultimately ends up in a simple while(1) loop that causes the bootloader to stop processing and wait for a system reset. The attack comes because if someone jumps over the panic() function, the code instead continues executing at line 5. This means the code boots the image that it previously decided was untrustworthy.

First, let’s look at how EMFI faults, and then we’ll come back to how you can view the assembly of the code from Listing 1 to better understand how realistic the code from Listing 1 is in practice.

HOW EMFI FAULTS

The electromagnetic fault injection (EMFI) technique uses an inductor (coil) over-top of the chip, and a high voltage pulse is driven through it. This effectively forms a transformer with the coil over top of the chip and structures on the chip die, which means voltages get coupled into the chip itself. I’ve previously discussed this in the September 2019 #350 issue of Circuit Cellar, and also go into more detail about this in The Hardware Hacking Handbook.

But if you accept some voltages that get coupled into the chip, why does it cause faults? And in particular, why does it cause the instruction skip on Line 3 of Listing 1? To understand this, consider a typical Cortex-M instruction pipeline, as in Figure 3.

Figure 3
Typical pipeline stages of the Cortex-M microcontroller being targeted
Figure 3
Typical pipeline stages of the Cortex-M microcontroller being targeted

The instruction pipeline is how things get executed on the chip. Between each pipeline stage, we typically have registers that are holding data, including the instruction along with the process of decoding and executing it.

A register itself is built up of flip-flops, which have a data and clock input as shown in Figure 4. The timing between the data and the clock is critical: If the clock comes too late, the wrong data will be latched in. And if the clock comes too close to the data edge, we can get a condition called metastability where the flip-flop takes even longer to reach a final value (I demonstrated this in the December 2014 issue of Circuit Cellar, Issue #293).

Figure 4
Data getting loaded into registers relies on the clock and data paths having known timing.
Figure 4
Data getting loaded into registers relies on the clock and data paths having known timing.

The link between EMFI and the registers is that EMFI introduces local voltage variations on the chip. These variations cause the supply voltages for the logic (including the combinational logic and flip-flops) inside the chip to be out-of-spec, meaning the delay time of different signals will be slower than planned. The result of this is we introduce timing errors, meaning the incorrect data is processed by the pipeline.

What exactly “incorrect” means will vary, but for example, it could mean a new instruction isn’t loaded. It could also mean some (or all) of the bits are incorrect—for example instead of loading the “branch” op-code, what if a simple ADD or even NOP (code) is loaded? This would mean the branch is never taken since that op-code isn’t the one loaded.

If the branch that forms part of the panic() statement on line 3 isn’t taken, the code will continue executing the remaining section of the code.

This was the claim for the demo, but you may be wondering if a real code would be that vulnerable. For the final part of this article, I’ll demonstrate how you can see this yourself without leaving your keyboard.

USING COMPILER EXPLORER

To easily explore how C code is converted into the vulnerable assembly, we’ll use compiler explorer (godbolt.org). A similar version of the code from Listing 1 is shown in Listing 2. You’ll notice in Listing 2 the logic that appears in C is not susceptible to the simple glitch.

Head over to godbolt.org, and copy the code from Listing 2 into the “C++ source” tab. On the right-hand side, you’ll see some assembly code. To get the same results as me, specify the compiler as “ARM GCC 11.2.1 (none)”. Finally, in the compiler options field specify the “-O3” option to enable those specific compiler optimizations.

Hopefully, you end up with the resulting assembly code found in Listing 3. The important part is to notice that our infinite while(1) loop from Listing 2 becomes the infinite loop with the .L2 label. Looking at the assembly code, you’ll notice the do_boot() call is still after the infinite loop in the assembly.

This simple experiment demonstrates why my “extremely vulnerable” code from Listing 1 is a lot more realistic than you might think. The compiler has taken our code from Listing 2, which at first glance doesn’t look as vulnerable, and moved the infinite loop in front of the sensitive code. You can play around with the compiler optimizations to see how changing them changes the compiled code.

Listing 1
Pseudo-code from the Demo shown in Figure 1 and Figure 2
1: if (signature_check(&image) == SIGNATURE_FAIL) {
2:    display_message(“Signature   [FAIL]”);
3:    panic();
4: } else {
5:    display_message(“Signature   [OK]”);
6:    do_boot(&image);
7: }


Listing 2
A simplified version of the code from Listing 1
1: int signature_check(void * image);
2: void do_boot(void * image);
3: 
4: void bootload(void * image) {
5:   if (signature_check(image)) {
6:     do_boot(image);
7:   } else {
8:     while(1);
9:   }
10: }


Listing 3
The assembly output of the code from Listing 2, note the location of the infinite loop with the .L2 label.
bootload(void*):
        push    {r4, lr}
        mov     r4, r0
        bl      signature_check(void*)
        cmp     r0, #0
        bne     .L8
.L2:
        b       .L2
.L8:
        mov     r0, r4
        bl      do_boot(void*)
        pop     {r4, lr}
        bx      lr


Listing 4

Mangling the image pointer means the signature_check() function result needs to be correct for the image pointer to be valid.
int signature_check(void * image);
void do_boot(void * image);

void bootload(void * image_xored) {
  int image_valid = 0;
  void * image = NULL;
  image = image_xored ^ signature_check(image_xored, *image_valid);
  if (image_valid) {
    do_boot(image);
  } else {
    while(1);
  }
}

FIXING THE FAULTS

I discussed some countermeasures in my article in the January 2020 #354 issue of Circuit Cellar. But knowing that the compiler will change your code flow is an important part of understanding how to apply countermeasures. If you decide to add countermeasures that rely on specific code-flow changes, be sure you either automatically validate the output assembly or fix the code-flow using assembly-based functions.

But often simple countermeasures can be added that don’t rely on double-checks or other hardening, but instead “entangle” the data with the code flow itself. For example, in Listing 4 I’ve modified the code such that the output of signature_check() is XORed with the pointer to the image.

This implies that we pass a pointer that has been XORed with some constant, such that if you attempt to use the pointer directly it will result in an error (such as hardfault). This means that signature_check() does not just return a flag, but unmangles the pointer. If an attacker skips the check later in the code, it doesn’t matter since the image pointer will be invalid.

You can extend this further by making the value that you used to mangle the pointer part of the valid signature itself. Taking this final step eliminates the point where an attacker could glitch the signature_check() function itself.

CONCLUSION

You can probably think of all sorts of other useful methods, such as adding multiple checks and adding random delays to complicate timing. But the point of this article is to show you how easily you can perform these experiments—and you can get started without even needing to use a fault injection tool.

If you want to give them a try on a development board, you can also use a debugger (or even code emulator) to see the effect. Simply set a breakpoint before your decision point, and manually change the program counter to give you the desired glitch effect.

Fault injection doesn’t need to be black magic, and it’s something of which all embedded developers working with security-conscious decisions should be aware. Hopefully, this article helped you understand a simple path you can take to start experimenting with fault injection attacks on your systems. 

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • SEPTEMBER 2022 #386 – Get a PDF of the issue

Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.


Note: We’ve made the Dec 2022 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article
Website | + posts

Colin O’Flynn has been building and breaking electronic devices for many years. He is an assistant professor at Dalhousie University, and also CTO of NewAE Technology both based in Halifax, NS, Canada. Some of his work is posted on his website (see link above).

Supporting Companies

Upcoming Events


Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2023 KCK Media Corp.

Breaking the Loop with Fault Injection

by Colin O'Flynn time to read: 7 min