On an FPGA
Soft-core RISC-V designs allow you to run a custom microcontroller (MCU) on a field-programmable gate array (FPGA). This article introduces the higher-performance Ibex RISC-V core, and then demonstrates how it generates similar power signatures to a “real,” hard-core MCU. This sets the stage for future experimentation, since you can evaluate the security of an “off-the-shelf” design, and then add in features you might need such as specific cryptographic accelerators.
This article is a follow-up to my piece in the May 2022 issue of Circuit Cellar (“Taking a Look at RISC-V Power Analysis,” Circuit Cellar 382, May 2022), where I showed how you can run the NEORV32 core on a small iCE40 field-programmable gate array (FPGA) target . This iCE40 FPGA target was part of my ChipWhisperer-Husky project. At the time I also planned on building a larger FPGA target using the same FPGA as that on the popular Digilent Arty board, the Artix-7 35T FPGA.
The hardware setup for this is shown in Figure 1, where the Artix-7 35T target is on the right side of the photo. This hardware target is specially adapted to allow me to perform power analysis and fault injection on it, so we can do more than just run the soft core as normal.
In the previous article, I explained how RISC-V has an open-source Instruction Set Architecture (ISA). This has opened the door to a wide variety of RISC-V implementations: everything from proprietary commercial implementations to small hobbyist implementations. This article will use the Ibex RISC-V core, instead of the NEORV32.
NEORV32 AND IBEX
At a glance, the cores may seem similar. Both have various options that can be enabled to tune the size of the core, such as turning on or off compressed instructions (“C” extension) and hardware multipliers (“M” extension). Differences become more obvious once you start exploring the performance, in terms of instructions per cycle. The Ibex core targets higher performance than the NEORV32 core. For example, with the Ibex core you have the option of single-cycle multipliers. The Ibex core also has additional performance features you can enable, such as going from a two-stage pipeline to a three-stage pipeline (the third stage that is added is a “writeback” stage), and enabling a dedicated branch-address calculator to reduce latency with branch instructions.
The other major difference with Ibex is that it supports a high level of design verification for use when mistakes are expensive, such as with application-specific integrated circuits (ASICs). This isn’t to say NEORV32 isn’t suitable for such use—simply that Ibex has emphasized this verification to a higher degree. In practice, you’ll notice subtle differences between projects. For example, I find that the NEORV32 configuration examples are more diverse, whereas with Ibex the configuration examples emphasize the verified designs. The Ibex repository also shows the status of its (as of my writing, 1,380) tests that automatically run every night .
To get an idea of what the performance difference is, see Table 1, which is based on performance numbers using CoreMark/MHz published by both projects. This shows that there is a range of options for both cores; on both, the performance option provides about three times higher CoreMark/MHz than the smallest option. The Ibex skews higher, but the area (not shown in the table) of the Ibex core is also larger than the NEORV32. As I mentioned, previously I used the small iCE40 FPGA to run the NEORV32 core , which is not currently possible with the Ibex core.
The general design of the Ibex core is shown in Figure 2. This figure shows the pipeline at the center of the Ibex design. The first two stages, “Instruction Fetch” and “Decode and Execute,” are always present. The third stage, “Writeback,” is optional. If the writeback stage isn’t present, the output is directly written to the target (register file or ALU).
Of course, this is just a core, without any of the surrounding blocks you need to actually use the thing. For this, we can turn to the Ibex demo system repository, the link for which is available on Circuit Cellar’s Article Materials and Resources web page . This wraps the core with the required blocks such as RAM, a serial port, GPIO, and a debug module.
FROM IBEX DEMO TO CHIPWHISPERER
The Ibex demo system repository is designed to work with the Arty A7-35T board. We designed a similar board using the Artix-7 35T. My colleague, Jean-Pierre Thibault, ported the demo platform to our board and got our various security examples running on it. We designed our own board so that it works with the ChipWhisperer system, as in Figure 1.
A more detailed block diagram is shown in Figure 3, which also shows the additional blocks around the Ibex core to provide serial and GPIO interface. One advantage of this setup is that we can use the JTAG support built into the ChipWhisperer to debug our Ibex soft core, just like it was a normal microcontroller (MCU) core.
An interesting feature here is that the Ibex demo core supports using the “existing” FPGA JTAG port. The normal FPGA JTAG port would be used for configuration of the FPGA fabric, along with any debug running in the FPGA fabric itself. But you can also expose this JTAG core to the internal logic running in the FPGA, which means this JTAG port can also serve to debug the soft-core RISC-V processor core. This can be helpful when using an FPGA board where you have an existing debug interface hooked to the FPGA JTAG port, as you can (likely) repurpose it to also serve as the debugger for the RISC-V core.
The board from Figure 1 has some other features not found on standard FPGA targets. In particular, a shunt is used to measure the power consumption of the soft core running inside the FPGA. This lets us perform the sort of power analysis tests that I’ve done in previous columns, and I’ll show you a demonstration of two different power analysis attacks on firmware running in my Ibex soft core.
SIMPLE POWER ANALYSIS
The first thing we’ll look at is how the instructions themselves have a noticeable power signature. For this, I’m going to simply run AES. A measurement of the power used by my Ibex core running on the FPGA is shown in Figure 4.
The AES-128 encryption operation runs in 10 rounds. You can see the repeating nature of the first two rounds, which I’ve annotated in the figure. You can also see the structure within each round. This type of information would be useful if we had an algorithm with conditions based on secret data. Such algorithms include password checks and many implementations of asymmetric cryptography (such as RSA). But for attacking AES, we need to use a more powerful form of power analysis.
DIFFERENTIAL POWER ANALYSIS
A more powerful attack called differential power analysis (DPA) takes advantage of differences in power consumption based only on data stored on the internal databus. This relies on the fact that the power consumption differs based on the Hamming weight (that is, the number of 1’s) of data on the internal databus. This difference is a small but statistically significant amount of information that the processor will leak.
I’ve detailed the attack in previous articles (such as “Programmable Logic in Practice: Breaking Unbreakable Cryptography,” Circuit Cellar 313, August 2016) , so I won’t repeat how it works here. But the quick summary is we use a guess and check operation, where we “guess” the value of a single byte of the secret key, and we “check” by comparing the power consumption of our physical device with a simple model of how we expect the power consumption to behave. Only when the model and physical device match is our guess of the secret key byte likely to be correct.
If you haven’t run across this before and it sounds like magic, see my previous article , or check out the open-source labs that are part of the ChipWhisperer project and which explain this in more detail.
An important part of this is the idea that we can see the difference between how many 1’s are on an internal databus (the Hamming weight). The Hamming weight of the internal databus and the power consumption have a strong linear correlation. Using the soft core running on the FPGA, I measured the average power consumption at a given point in time when the processor is handling a specific Hamming weight of data. I plotted this power consumption against the Hamming weight in Figure 5.
You’ll probably be amazed at how linear the relationship looks. Especially as this is a soft-core processor running inside an FPGA, so there’s a lot of “logic” going on to run this experiment. But fundamentally we see minor variations in power consumption, and we can correlate this power consumption with data being processed. The existence of this correlation allows the power analysis attack to succeed. In fact, we can use simplified metrics to detect if there is a correlated signal to better measure this leakage (a topic for the next article).
CUSTOMIZING RISC-V FOR SECURITY
This article was running standard software AES on the RISC-V core, which leaked similarly to any other MCU. But unlike a standard MCU, we have control over the low-level implementation details.
One of the emerging topics in RISC-V is the inclusion of AES acceleration operations. Currently, the instructions are like those in Arm and Intel processors, which accelerate each round of the AES operation. Unfortunately, such instructions mean that the same sort of data leakage you see in Figure 5 will be present. This leakage occurs since the “intermediate” values are written back to registers, and this writeback will likely see some Hamming weight (or related) leakage.
But RISC-V is also looking at the inclusion of an all-rounds instruction, which would have a single instruction that could execute a full AES operation. This type of operation could include protection against the side-channel leakage I described here. This may or may not be part of the final specification at this time, but it’s interesting as it shows that we could get such powerful instructions as part of the core instruction set.
In the meantime, most high-security AES cores will likely use peripheral-specific register sets (just like existing hardware AES accelerators do). But the flexibility of RISC-V means you can do much more than just use peripheral sets—you can experiment with architecture-level changes, and the flexibility of a soft-core FPGA board makes it possible to do this without fabricating a new chip. You can test these changes to understand the security impact immediately as well.
Circuit Cellar’s Article Materials and Resources page has a Github link where you can see the code for the attack demonstrations from this article . In my next piece, I’ll introduce you to some methods of evaluating the leakage of various cryptographic cores and implementations using “Test Vector Leakage Assessment” (TVLA), sometimes called the T-Test. But I wanted to give you an updated platform first where you could see why we might want to compare variations of a processor design, and I hope this article gave you some inspiration to get your hands dirty with one of the many RISC-V soft-core processors yourself.
 O’Flynn, Colin: Taking a Look at RISC-V Power Analysis. Circuit Cellar, Issue 382, May 2022, p. 64-67. lowRISC, Ibex Repository: https://github.com/lowRISC/ibex lowRISC, Ibex demo system repository: https://github.com/lowrisc/ibex-demo-system O’Flynn, Colin: Programmable Logic in Practice: Breaking Unbreakable Cryptography. Circuit Cellar, Issue 313, August 2016, p. 42-47. Code for attack demonstrations from this article: https://github.com/colinoflynn/circuitcellar-EmbeddedSystemEssentials/tree/main/396_2023_07_JULY
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JULY 2023 #396 – Get a PDF of the issueSponsor this Article
Colin O’Flynn has been building and breaking electronic devices for many years. He is an assistant professor at Dalhousie University, and also CTO of NewAE Technology both based in Halifax, NS, Canada. Some of his work is posted on his website (see link above).