Projects Research & Design Hub

Fault Injection and Power Analysis

FIGURE 1 The bottom side of my controller board showing the TMP91FW60DG MCU.
Written by Colin O'Flynn

Part of Your Appliance Repair Toolkit?

This article walks through an attack on an obsolete microcontroller (MCU) used as part of an oven controller. The goal was to recover the firmware to understand if the oven had a hardware failure or a software bug, and doing this needed the application of both power analysis and fault injection on a real-life board. It turns out the device has a perfect set of interfaces to make the attack not only possible, but fun!

  • How do oven controllers work?
  • How can I use fault injection and power analysis to repair appliances?
  • What is a fun “hacking” project?
  • TMP91FW27UG
  • ChipWhisperer Husky

Working in cybersecurity, a common “game” is Capture the Flag (CTF). With a CTF a participant is trying to solve a series of increasingly difficult challenges. These are a lot of fun, but sometimes feel a little unrealistic, as how often do you need to solve such a series of challenges in real life?

In this article, I’m going to describe an attack on a device that felt just like this sort of unrealistic challenge. Starting with some basic reverse engineering, it eventually encompasses power analysis and fault injection to ultimately bypass the security of an old microcontroller (MCU). It’s a particularly enjoyable problem to describe to you since the device in question is in End-of-Life (EOL) status, so we can speak more candidly without worrying that we are disrupting new products.

Now We’re Cooking

The background to this article involves an unexpected source: my oven. The oven in question had the annoying issue that despite being equipped with a digital display, it would show the actual (measured) temperature only during the “preheating” mode, and once it thought the oven was ready it switched to a static display of the setpoint. But it seemed to do a poor job of keeping it at temperature, which you didn’t know as the display showed only the set temperature.

Having worked with PID controllers before, my oven felt like it had a poorly tuned PID controller. At first, you might assume it was just a failure of a heating element or sensor. I started to wonder if it was more than just something going wrong with my specific oven when I noticed a claimed class action lawsuit about it, which concentrated on the temperature sensor itself (part number DG32-00002B). But if I could dump the measured temperature from the device as it’s running, I could see if the oven was incorrectly measuring the temperature (sensor failure), or if this was some form of a software bug.

How to do this? The answer of course would be to reverse engineer the firmware, or at least see if I could debug the firmware running in my oven to dump the values storing the measured temperature. If I could get access to these internal variables, I could finally see what temperature my oven thinks it’s running at, and have a better idea of how the internal control loop works.

The control board in question (Figure 1) has a TMP91FW60 MCU on the bottom side, which is a 16-bit MCU from Toshiba. It’s part of the TLCS-900/L1 series, which had the most active releases in the early 2000s (the TMP91FW60 datasheet is from around 2006). The naming here is a bit confusing, as for example the closely-named TCLS-900/H1 is a 32-bit MCU, and the older TLCS-900 series was more active in the 1990s. Unfortunately, many of the resources for this device are no longer online, so if you’ve got old development kits for this series, I’d love to hear from you.

FIGURE 1 The bottom side of my controller board showing the TMP91FW60DG MCU.
FIGURE 1
The bottom side of my controller board showing the TMP91FW60DG MCU.

For some of my work I did find an old development kit that included the precious CD-ROM with a compiler/simulator and programming software (which seemed to work only on Windows XP). I’ve mirrored some of the relevant information in my Toshiba-TLCS-900-L-Resources repository [1].

Building a Target

While my oven board used a TMP91FW60, I knew I’d want to build a custom target to better understand this device. In addition, I wanted this custom target to be more accessible to readers than a specific oven control board. In the same series as my TMP91FW60, I found a large stock of the TMP91FW27UG devices available on eBay and other sites. These devices use the same security method, with some minor differences in flash size, so they would make a good reference.

Figure 2 shows the target board plugged into my ChipWhisperer-Husky. As of the writing of this article, I’m not sure about the commercial availability of the ChipWhisperer-Husky, but if you want to build your own you can download the design and Gerber files from the open-source chipwhisperer-target-cw308t repository [2]. You can also check if it’s an available target through NewAE Technology Inc. I haven’t duplicated the full schematic in this article, but you can find that online as well. The only interesting feature of the target I’d point out in the schematic is the shunt resistor inserted into the VCC rail to allow us to perform power analysis.

FIGURE 2
This CW312 target uses a similar TMP91FW27UG MCU as in Figure 1.
FIGURE 2
This CW312 target uses a similar TMP91FW27UG MCU as in Figure 1.

We can use this TMP91FW27UG to understand how the firmware protection works. First, let’s look at the datasheet to see what features we get. The MCU has a built-in bootloader, which supports a number of commands. The bootloader has both a password and a protection flag feature. Unlike some devices, the password is not enough to overcome the read protection. This means that even if we knew the password, we still wouldn’t be able to read out the flash memory.

Table 1 shows the various commands supported by the bootloader. You’ll notice some require just the password, and some require both the password and the protect flag to be cleared. And there is no way to clear the protect flag except to perform a flash erase, so once the protect flag is set you can’t go back. As in previous columns, we’re going to investigate the actual implementation to understand how this might be done.

RAM Loader and Readout

To understand the bootloader implementation, we’ll simply read it out of the memory of the chip. You might notice from Table 1 there is no “read flash” or similar command. Instead, we use the RAM program to load a second-stage bootloader. Luckily the resource CD included the Segger ToshLoad program (something I couldn’t find online, so I was happy to have the “old school” CD-ROM), which included the required second-stage bootloader implementations. Happily, this also included the source code for each of the bootloaders. That will be relevant when I go to use this attack on my oven controller.

Since the ToshLoad program is rather old and runs only on Windows XP, I decided to re-implement the important parts in Python. You can find this implementation in my PyToshLoad repository [3]. This lets us simply read out the bootloader memory (or any other memory).

Unfortunately, the specific version of the TLCS-900 /L1 processor didn’t work well for me in a recent decompiler like Ghidra. But the TLCS-900 IDE on the CD-ROM I had did in fact let me load the code in a simulator, which included a robust decompiler. From here, I could “simply” analyze the code to understand some of the important parts.

A part of the code which processes the “run RAM program” command is shown in Listing 1. The result of this analysis is that the code first checks the protection flag before it checks the password. This is important as it means we need to bypass the protection flag logic before the password logic.

LISTING 1
The “run RAM program” function first checks a protection flag, and if OK goes to the password check and the rest of the logic.

FUNCTION START: RAM WRITE FUNCTION
00fff2f5 CALR    0x0FFF75F <-- Load protection flag
00fff2f8 CPB     A,0x0FF   <-- Check protection flag
00fff2fb JR      NZ,0x0FFF290 <-- Send error if protection enabled
00fff2fd CALR    0x0FFF2A2 <-- Password check
...rest of function...

The password logic is something we expect that we can use power analysis to recover, which should be easier than using fault injection attacks. But the protection flag means we’ll need to use fault injection, since we need to alter the actual code flow. So, we now know that attacking this device will require both a power analysis attack and a fault injection attack. Let’s look at the power analysis attack first.

Password Power Analysis

Attacking the password will require us to use a feature of the device which uses the password. Inspecting the functions from Table 1, you can see that “Enable Protect Flag” uses the password but not the protect flag. This is important since on our practical device, we want to attack the password separately from the protect flag. To understand the importance of this, let’s look at the password attack.

TABLE 1
The ROM-based bootloader includes several commands, which can be protected by a “password” and/or a “protect flag”.
TABLE 1
The ROM-based bootloader includes several commands, which can be protected by a “password” and/or a “protect flag”.

The code for processing the password could either read the entire 12-byte password and compare it in a single test, or it could compare each character as it is received. Either way, we will use power analysis to understand the code flow of the target with various data.

The attack works by attacking each byte of the password one at a time. We’ll start by attacking the first byte, by recording the power usage of the device as it processes this byte. We’ll compare each of these “power traces” for the value of byte set to 0, 1, 2, 3,…, 255. An example of this is shown in Figure 3, which shows (for clarity) only four guesses for a single byte. You’ll notice that all four power traces start out following the same path. Around clock cycle 42 they diverge, and one of them looks different. This difference is because it took a different code flow path (such as a comparison passing instead of failing). If you plotted all guesses, you’d see 255 traces following the same path, and a single outlier. This outlier is the correct value of the password byte.

FIGURE 3
Power traces show the difference in internal program flow when the password byte comparison passes compared to when it fails. Here the correct byte is ‘s’ (hex 0x73).
FIGURE 3
Power traces show the difference in internal program flow when the password byte comparison passes compared to when it fails. Here the correct byte is ‘s’ (hex 0x73).

Knowing the first byte, we simply repeat the guessing processes for the second and future bytes. For example, to find the second byte, we start by sending the first byte which we now know, followed by a guess of the second byte. Using power analysis we can determine which was the correct (or incorrect) guess for the second byte. Between each guess, we reset the device, since we need each guessing operation to start from the exact same point. Once we know the second byte, we can send the correct first and second byte, and perform the guess on the third byte.

While the power traces in Figure 3 make sense visually, they can be difficult to understand algorithmically (and thus to automate our attack). Instead, to simplify the work, I can plot the “difference” between the average trace and each measured trace. This makes it easier to see how obvious the processing difference is, as shown in Figure 4. Here each of the possible 256 values is shown for the byte, and you’ll notice one of them sticks way out. This difference is because when the code processes the correct value of the byte, it takes a different code path which results in a different power trace.

FIGURE 4
The single outlier shows that one of the power traces looks very different, showing the correct value of the byte.
FIGURE 4
The single outlier shows that one of the power traces looks very different, showing the correct value of the byte.

One final note on the topic is that the need to record so many power traces explains why I wanted to use the “Enable Protect Flag” function, which does not check the protect flag (and thus I don’t need to bypass it). If I needed to perform a fault injection attack on the protect flag first, it would require me to perform the attack for each guess operation for each of the twelve password bytes. I need at least 256 guess/checks per byte, and in practice of course the experiment setup requires many more tries to understand the data. Not having to deal with the protect flag is helpful.

If you’re curious about all the details, I’ve posted the sources for this as part of my “Embedded System Essentials” GitHub repository [4]. This includes all the Python code used in generating figures in the article too, as I know I had to skip a few details (such as generating the differences to make Figure 4).

And if you got a little lost about what exactly power analysis is, you can see some similar demos as part of the open-source ChipWhisperer-Jupyter repository [5], many of which I walked through in previous articles (see, for example, “Breaking a Password with Power Analysis Attacks,” Circuit Cellar 323, June 2017).

Fault Injection

Now, with the password recovery working, we can turn our attention to fault injection work to bypass the protection flag. To start, I need to understand if this target is vulnerable to fault injection, and what sort of settings I need to use with it. In previous examples, I’ve compiled some sort of “calibration” firmware, but this is a real target and I need to use existing code.

Looking at the code functions from Listing 1, we need to select something that will give us feedback about our ability to impact the code flow on the device. We find in that list the “Get CRC” function—this is an alluring function, since, based on the CRC function, we can expect that if we could cause part of the loop to skip (or perform the wrong operation), we’ll see the wrong CRC returned. If our fault is so aggressive that it crashes the target, we can also see that it stops responding. Once we know how to cause wrong operations (faults), we’ll go on to target the operations at the protect flag check and hopefully bypass that protection.

We could immediately see that we could use two types of fault injection: clock fault injection, or voltage fault injection. We can use clock fault injection because the target device directly uses the external clock input, so I’ll use that here (unlike in my article “Recreating Code Protection Bypass: An LPC MCU Attack” in Circuit Cellar 338, September 2018, in which I used voltage fault injection on the LPC1114 target).

With clock fault injection, we’ll add short clock edges that cause the internal logic to violate sample and hold timing specifications of the internal flip flops. This causes incorrect data to be latched, including in systems such as the instruction decode logic. In practice, this leads to the processor often processing the wrong instruction or simply skipping instructions. In this case, we can experiment with the width of the short clock pulses along with the location (offset) relative to the normal clock edge.

By trying various offsets and widths, we can generate a graph such as Figure 5. This shows various locations where the returned CRC is incorrect, which means the settings of the clock is incorrect. One question you might have is how I triggered the glitch to affect only the CRC itself. This is simple: as the CRC process is relatively slow, I just trigger it sometime after sending the CRC request and before getting the response back. You’ll see how to trigger this more precisely when we look at attacking the protection flag itself.

FIGURE 5
This map shows that there are various settings for offsets and widths that cause invalid CRC results.
FIGURE 5
This map shows that there are various settings for offsets and widths that cause invalid CRC results.
Protection Flag Glitch Attack

Knowing the parameters for the clock glitch that are likely to be successful, I now must figure out when in time to apply them. From the code in Listing 1, I know I need to apply the attack sometime after sending the “Run RAM Program” command. While the ChipWhisperer-Husky supports a trigger based on the UART data, I decided to simply trigger on the rising edge of the UART data.

The UART data being sent is the actual command, hex 0x10, or binary 0001000. The rising edge will be that single “1” bit. Knowing the baud rate (9600), and the frequency we are using for the glitch clock (16MHz), we can calculate how many cycles from the rising edge of the first 1-bit until the end of the message. Our glitch is going to need to be applied sometime later than that, since we need to affect the processing of the running function.

Because the data is sent least significant bit first, we expect four data bits and one stop bit, or five bit-times total. This means we can expect the instructions to start around (16000000/9600) x 5 = 8330 cycles. Of course, we don’t know what other processing is happening, so we’ll need to figure out the exact timing from this approximate one.

This is done with a simple guess and check—I sweep the location from around clock cycle 8000 to cycle 20000 (the upper end being a bit of a guess). The protocol used by the device responds with either an error that the protect flag is enabled, or an acknowledge that it is now waiting for the password. This feedback makes it simple for us to detect when the glitch is successful. My glitch was successful at cycle 8100, so either I was off on my timing of the trigger or the bit timing was a little faster than I expected.

Once the glitch is successful, I feed the device the password (which I found with power analysis), and then finally the real bootloader which allows me to read back the flash memory. This was all done on the development board I built that used the TMP91FW27UG microcontroller (Figure 2), so the final task is to perform the attack on the real oven controller.

From Theory to Practice

As is often the case, things aren’t quite as simple in real life as they are in theory. To perform the attack in practice, I modified the board from Figure 1 to allow me to do both power analysis and fault injection. On the board, there was a rather convenient jumper marked “5V” which supplied the voltage for just the MCU. By desoldering this jumper I could add a shunt resistor, which gave me reasonably clean power traces. The MCU seemed to run at 3.3V which eliminates the need for a voltage translator.

I also removed the 16MHz crystal and mounted a header. This would let me perform the clock glitching attack, very similar to the development board. I had to re-run the calibration cycle, since the physical setup of this board (along with the part number of the device) was different. You can see the setup in Figure 6. Note that it looks like Figure 2, but with the real target in place of my test board.

FIGURE 6
The topside of the board from Figure 1, now modified by removing the crystal and adding a shunt to allow power analysis. The board already had a header for the required UART and reset signals.
FIGURE 6
The topside of the board from Figure 1, now modified by removing the crystal and adding a shunt to allow power analysis. The board already had a header for the required UART and reset signals.

Once the glitch was successful, all that should have been left was to download the second-stage bootloader and read out the memory. Unfortunately, this stage always failed. After running the test many times, my glitch eventually triggered the “mass erase” function by accident. Oops! This meant I needed a new oven board (this type of damage is a risk of glitch attacks). But it also meant I had a blank chip I could perform more complete experiments on.

The problem turned out to be that the second-stage bootloader images I had weren’t correct for the TMP91FW60 chip on my oven controller. The location of several registers was different on this chip than on my development board, which meant the images wouldn’t work. I mentioned earlier that I had the source code for the second-stage bootloader, so I could see the exact registers I needed to modify.

In this case, it took changing the address of the INTES1 register and the value written to the INTCLR to clear the interrupt. I didn’t even need to recompile the source code. Instead, I just modified the binary to replace the register addresses. The specifics of the bootloader changes are documented in my PyToshLoad repository in case you need them [3].

With this fix (and after getting a new oven controller board), I was successfully able to download the firmware from my oven. Running it through the disassembler quickly showed me that I could easily identify the ADC operations (and thus temperature reads), as shown in Listing 2. The annotations and names in this listing are added by me, so it’s a “best guess” based on the program flow and hardware setup.

LISTING 2
This snippet of code reads the temperature sensor, and if I can patch the code to send this temperature out a serial port, I can finally understand what my oven is seeing.

;Start of function ADC_ReadTemps
0xff3952 PUSHW   0x0B             ;Channel AN11 (Upper Temp)
0xff3955 CALL    ADC_StartConv    ;Start ADC
0xff3959 INCL    0x2,XSP
0xff395b CALL    ADC_ReadResults  ;Load result into HL
0xff395f ADDW    (0x12AE),HL      ;Accumulate temp
0xff3964 NOP
0xff3965 NOP
0xff3966 PUSHW   0x0C             ;Channel AN12 (Lower Temp)
0xff3969 CALL    ADC_StartConv    ;Start ADC
0xff396d INCL    0x2,XSP
0xff396f CALL    ADC_ReadResults  ;Load result into HL
0xff3973 ADDW    (0x12B0),HL      ;Accumulate temp
0xff3978 INCB    0x1,(0x12B6)     ;Increment conversion count
0xff397d CPB     (0x12B6),0x32    ;Check conversion count
0xff3983 RET     C                ;Return if not yet at 0x32
0xff3985 LDB     (0x12B6),0x0     ;Clear conversion count
0xff398b LDW     WA,(0x12AE)      ;Load upper temp accumulator
0xff3990 EXTZL   XWA
0xff3992 DIVW    XWA,0x32         ;Divide by 0x32
0xff3996 LDW     (0x12B2),WA      ;Store avg upper to 0x12B2
0xff399b LDW     WA,(0x12B0)      ;Load lower temp accumulator
0xff39a0 EXTZL   XWA
0xff39a2 DIVW    XWA,0x32         ;Divide by 0x32
0xff39a6 LDW     (0x12B4),WA      ;Store avg lower to 0x12B4
0xff39ab LDW     (0x12AE),0x0     ;Clear accumulator
0xff39b2 LDW     (0x12B0),0x0     ;Clear accumulator
0xff39b9 LDB     (0x12B8),0x1
0xff39bf LDB     (0x12BA),0x1
0xff39c5 RET

This leaves the door open for a future article to discuss what I do with the firmware. But for now, I think you can see that the process of getting a working attack on this device was a great challenge that encompassed the perfect amount of reverse engineering, power analysis, and fault attacks. Sometimes it really is true that the journey is the reward. 

REFERENCES
[1] Toshiba-TLCS-900-L-Resources Repository: https://github.com/colinoflynn/Toshiba-TLCS-900-L-Resources[2] chipwhisperer-target-cw308t Repository: https://github.com/newaetech/chipwhisperer-target-cw308t[3] PyToshLoad Repository: https://github.com/colinoflynn/pytoshload[4] Embedded Systems Essentials Repository: https://github.com/colinoflynn/circuitcellar-EmbeddedSystemEssentials[5] Open-source ChipWhisperer-Jupyter repository:  https://github.com/newaetech/chipwhisperer-jupyter

RESOURCES
NewAE Technology Inc. | newae.com

Code and Supporting Files

— ADVERTISMENT—

Advertise Here

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • MAY 2023 #394 – Get a PDF of the issue

Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.


Note: We’ve made the Dec 2022 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article
Website | + posts

Colin O’Flynn has been building and breaking electronic devices for many years. He is an assistant professor at Dalhousie University, and also CTO of NewAE Technology both based in Halifax, NS, Canada. Some of his work is posted on his website (see link above).

Supporting Companies

Upcoming Events


Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2024 KCK Media Corp.

Fault Injection and Power Analysis

by Colin O'Flynn time to read: 16 min