Breaking a Password with Power Analysis Attacks

Breaking a Password with Power Analysis Attacks

In his previous column, Colin showed how timing attacks could be used to break a password check. This article brings out a more advanced type of attack called a power analysis attack, which exploits small leaks about internal states of a microcontroller to recover the password.

By Colin O’Flynn

Article originally published in Circuit Cellar June 2017, Issue #323

Last month, I introduced a type of attacks on embedded systems called power analysis attacks. I used these to attack a simple PIN code check, where the power analysis attack told us what steps the code was performing. This was possible because different instructions had unique signatures we could see in a detailed measurement of the power of the device as it was performing operations. I won’t replicate the hardware setup I discussed in the previous column, but again the example figures here will be measured on my open-source ChipWhisperer-Lite platform.

I’ll be returning to the PIN code check I have in Listing 1. This code uses an XOR of the input PIN code (could be a password or anything else) with the correct code. If the input and correct code are the same value, the result of all the XORs will be zero. If a single bit differs, the XOR will output a 1 for that bit. The accumulating OR circuit will then keep that bit set to “1” for the remainder of the comparisons.

int check_pin( uint8_t entered_pin[]){
 uint8_t correct_pin[4] = {1,2,3,4};

 uint8_t pin_fail = 0;

 for (int i = 0; i < 4; i++){
 pin_fail |= correct_pin[i] ^ entered_pin[i];
 }
 
 if(pin_fail){
 return 0;
 } else {
 return 1;
 }
}
Listing 1
This password check code came from my previous column, as it was written to avoid timing attacks. We’re going to use a more advanced type of attack in this column to break the code.

BACKGROUND
Let’s begin with a little background. Consider a digital device like our microcontroller. Internally, it has a data bus, which moves data from one section (e.g., a register) to another section (e.g., the arithmetic logic unit, or ALU).

Is there some way an external observer could detect details of that data? It turns out there might be, and it might be a lot easier than you expect. That data bus contains a number of lines, which we can model as capacitors. Changing the logic state of those lines is the same as changing the voltage on those lines, as in Figure 1.

OFlynn #323 - Figure 1

Figure 1
Changing the voltage on an internal data bus is equivalent to charging or discharging a capacitor, something that takes a tiny amount of energy.

While changing the voltage on a capacitor takes energy—a tiny amount of energy—but it still physically requires a little bit of power. When four data lines change from a 0 to a 1 state, it actually takes more power than when only one of the data lines change state. And when it comes to a microcontroller, as we make a more complete picture, things get even easier for us. Most buses on microcontrollers use a precharge state, which you can consider a state partway between a 0 and a 1.

To transfer data on the bus, the bus goes from this intermediate state, to the final state, and then back to the intermediate state. What this means for us is the amount of power consumed may depend not on the difference between number of bits set in the data, but in fact just on the number of bits in the data. For example, if you transfer 0xFF on the data bus, you’ll see a slightly higher spike at that instant in time than if you transferred 0x00 on the data bus. This probably still seems a little abstract, so let’s keep working and see two different ways this can be used to break the XOR code of Listing 1.

DPA ATTACK
The first attack I’ll discuss will be the “classic” differential power analysis (DPA) attack, which was published by Paul Kocher, Joshua Jaffee, and Benjamin Jun in the paper entitled “Differential Power Analysis” around 1999. For this attack to work, assume we have a method of sending in a four-digit guess for the pin-code of Listing 1, and we can trigger such that we can record the power consumption around when the XOR is happening. We don’t need to guarantee we get the exact moment; just that we know roughly when the XOR test is happening. Practically, this can be pretty easy. You know at some point after sending the input data the XOR will happen, so you just need to record a section of power after sending the input data.

Next, assume we could send a bunch of wrong guesses. For each wrong guess, we record the guess and the power trace of the system processing this guess. Figure 2 shows a number of such power traces overlaid on each other. Notice that the traces are mostly uniform, but certain small areas seem to have minor differences.

OFlynn-323-F2

Figure 2
An example power trace as the code in Listing 1 is executed an a XMEGA device.

Next, we’ll do the most important part, which is to take the power traces and move them into two groups. Our attack will work by looking at a single bit of the secret pin at a time. Let’s say we want to get the value of byte 0, bit 0. Taking our set of known inputs and associated power traces, we can split them into two groups—one where byte 0, bit 0 is “0” and one where that same bit is “1.” We’ll take the average of these two groups to end up with two traces. Finally, taking the difference between these “average” traces (a difference of means) tells us specifically where the amount of power varied for each operation.

What has all this fuss accomplished? First off, we’d expect to see a very small spike in power consumption at the point that byte 0 is manipulated. If bit 0 of byte 0 is “1,” it will take a tiny bit more power than when that bit is “0.” “But what about the other bits?” you might ask, as they are also being flipped. The rest of the bits are set to random values, so the average of them should be the same between the two groups. The only difference between those groups was the value of byte 0, bit 0. And it’s that bit we are concentrating on.

Then there will be a second spike, as the “correct” PIN code is a constant that will basically either flip (if the bit of the pin-code is “1”) or not flip (if the bit of the PIN code is “0”) that spike. This is shown in Figure 3, where the bit of the secret key is “1,” so we see two opposite polarity spikes. These are from real measurements performed on Listing 1 running on an Atmel XMEGA microcontroller measured with my ChipWhisperer-Lite. These tiny differences are clear as day—it might seem impossible from the text, but it works in real life!

OFlynn-323-F3

Figure 3
This shows the power difference when attacking a single bit of a password byte. I’ve averaged two groups of traces and subtracted them to see the difference between the groups. See Listing 2 for the code that generated this plot.

And as in my other article, I encourage you to try this yourself. This is something you can measure with a regular oscilloscope and using a shunt resistor in the voltage line of a microcontroller, as discussed in my April 2017 column.

If you need a hint, the code in Listing 2 shows a simple Python listing that performs this splitting of an array of data into two groups, averages them, does the difference, and plots this for you. This will give the value of a single bit of the secret key.

from chipwhisperer.common.api.CWCoreAPI import CWCoreAPI
from matplotlib.pylab import *

cwapi = CWCoreAPI()
cwapi.openProject(‘xortest_1000.cwp’)

tm = cwapi.project().traceManager()
number_traces = tm.numTraces()

zerolist = []
onelist = []

for tnum in range(0, number_traces):
 entered_pin = tm.getTextin(tnum)
 trace_data = tm.getTrace(tnum)

 #Get value of bit 1 in data we sent
 bit_value = entered_pin[0] & 0x02
 
 #Seperate into group based on bit value
 if bit_value:
 onelist.append(trace_data)
 else:
 zerolist.append(trace_data)
 
#Take mean of both groups of traces
one_mean = np.mean(onelist, axis=0)
zero_mean = np.mean(zerolist, axis=0) 

#Get difference
diff = one_mean - zero_mean

plot(diff)
Listing 2
This Python code performs a single-bit DPA attack, by attempting to determine the value of bit 0 of the key. The resulting plot is given in Figure 3.

BREAKING A REAL SYSTEM
Moving from that single-bit break to a real system requires little more than taking the same power traces, and iterating through each bit and byte to recover the complete value. You’ll be able to get the entire PIN code (or password) out of the system, even though there appears to be no timing or similar errors.

As a test, we can do this for the case where we know the “secret key.” I’ve done this for Byte 0 in Figure 4, where you can see all the bits with a certain state have a positive power difference, and all the bits with the opposite state have a negative power difference. The red and blue coloring is only possible as I know the secret key, if I hadn’t known it we would recover it based on the difference direction.

OFlynn-323-F4

Figure 4
This shows differences for all 8 bits of a guessed password byte, where red power traces are bits where the value of the key-bit ‘0’, and blue power traces are values of the key are ‘1’. You can see all the bits of each value go in opposite directions.

A complete attack is shown in Listing 3. Note that I just consider a single point to determine if the bit is a “0” or a “1.” This point moves for each byte. Because this is an 8-bit microcontroller, the byte moves further in time every 8 bits that are processed. If I had a 32-bit microcontroller then it could have processed 4 bytes at once, for example. But looking at the difference traces (such as in Figure 3) helps you determine where exactly to look for a large difference, even if you don’t know much about the device you are attacking or how the code works. The only tricky part is getting a nice trigger. In many systems, this can be done by triggering on the communication line. For example, if you have a UART protocol to send the password, you can trigger when you see the last byte go over the UART.

from chipwhisperer.common.api.CWCoreAPI import CWCoreAPI
from matplotlib.pylab import *

cwapi = CWCoreAPI()
cwapi.openProject(‘xortest_1000.cwp’)

tm = cwapi.project().traceManager()
number_traces = tm.numTraces()

for byte in range(0, 4):
    recovered_byte = 0
    for bit in range(0, 8):
        zerolist = []
        onelist = []
        for tnum in range(0, number_traces):
            entered_pin = tm.getTextin(tnum)
            trace_data = tm.getTrace(tnum)
            
            #Get value of bit in input guess for this trace            
            bit_value = entered_pin[byte] & (1<<bit)
            
            #Seperate into group based on bit value
            if bit_value:
                onelist.append(trace_data)
            else:
                zerolist.append(trace_data)
        #Take mean of all traces where one, all traces where zero
        one_mean = np.mean(onelist, axis=0)
        zero_mean = np.mean(zerolist, axis=0)        
        #Get difference
        diff = one_mean - zero_mean
        
        #Based on our graphical plotting, we identified point 129 in byte 0
        #and that point occurs 92 samples later in each successive byte
        print “byte %d, bit %d = “%(byte, bit),
        if diff[129 + 92*byte] < 0:
            print “0”            
        else:
            print “1”
            recovered_byte |= (1<<bit)
    print “Guess for byte %d: 0x%02x”%(byte, recovered_byte)
}
Listing 3
This is Python code for breaking complete system iterates through the test done in Listing 2. (See text for details.)

You can even get fancy by triggering on patterns in the analog waveform. Certain oscilloscopes provide this capability, and it’s possible with custom hardware such as I built for the ChipWhisperer-Pro (a higher-end version of the same capture hardware). But in most practical cases it’s enough to trigger on communication lines that are already present. The open-source ChipWhisperer software I’m using here also has capabilities to resynchronize traces with some “jitter” in them by looking for patterns that appear in both traces and lining them up.

Hopefully, this article has opened your eyes to how it’s possible to attack real systems using side-channel power analysis. This is just the tip of the iceberg for advanced hardware attacks that are possible, and I’ll be sharing more of these with you in the coming columns.

If you want more detailed examples, I’ll link them from a blog post for this article on oflynn.com, but they are all part of the open-source ChipWhisperer project. I’m creating some unique examples for my columns here, but the overall goals will be the same.

Read this article in the June #323 issue of Circuit Cellar

Stay informed, subscribe today:

 

Single issues can be purchased in the  CC-Webshop

Reliability and Failure Prediction: A New Take

HALT methodology has been a popular way to test harsh environment reliability. A new approach involves PCB design simulation for vibration and acceleration for deeper yet faster analyses.

By Craig Armenti & Dave Wiens—Mentor Board Systems Division

Many electronic products today are required to operate under significant environmental stress for countless hours. The need to design a reliable product is not a new concept, however, the days of depending on a product’s “made in” label as an indicator of reliability are long gone. PCB designers now realize the importance of capturing the physical constraints and fatigue issues for a design prior to manufacturing to reduce board failure and improve product quality.

Simulation results should be available in a two-phase post-processor for each simulation, providing broad input on the PCB’s behavior under the defined conditions.

Simulation results should be available in a two-phase post-processor for each simulation, providing broad input on the PCB’s behavior under the defined conditions.

Although every product is expected to fail at some point. That’s inevitable. But premature failures can be mitigated through proper design when proper attention is paid to potential issues due to vibration and acceleration. ….

Read this article in the August 325 issue of Circuit Cellar

Not a Subscriber yet? Become one today:

 

Or purchase the August 2017 issue at the  CC-Webshop

 

Power Analysis of a Software DES Encryption Routine

This article continues the foray into breaking software security routines, now targeting a software implementation of DES. This builds on a previous example of breaking a hardware AES example.

By Colin O’Flynn

In the previous column, I broke a simple XOR password check using side-channel power analysis. How can we apply this to more complex algorithms though? In my Circuit   Cellar   313   (August   2016) story, I demonstrated how to break the AES encryption standard running on a FPGA.

The EFF’s “Deep Crack” board could brute force a DES key in a matter of days. (Photo courtesy of Electronic Frontier Foundation)

The EFF’s “Deep Crack” board could brute force a DES key in a matter of days. (Photo courtesy of Electronic Frontier Foundation)

While I originally considered breaking a software implementation of AES in this column, there was just too much overlap between those columns. So instead I decided to pick on something new. This time, I’ll cover how we can break a software implementation of DES. The actual process ends up being very similar. But by using a different algorithm, it might help give you a bit of perspective on how the underlying  attack  works.  ….

Read this article in the August 325 issue of Circuit Cellar

Not a Subscriber yet? Become one today:

 

Or purchase the August 2017 issue at the  CC-Webshop