The ChipWhisperer-Lite is an open-source tool for power analysis and fault injection. And Colin described its design in an article six years ago. Since then, the world has moved onward. Here, he talks about some adjustments required for building a new version of the ChipWhisperer, using a more recent FPGA while addressing supply chain headaches that are affecting hardware builds in 2021 and beyond.
Back in 2015 I wrote an article “USB-to-FPGA Communications” (Circuit Cellar 299, June 2015)  describing the open-source ChipWhisperer-Lite, a device I designed for working with side-channel power analysis and fault injection. This used a Xilinx Spartan 6 FPGA alongside a microcontroller (MCU) resulting in a highly flexible architecture, enabling you to perform power measurements of devices along with clock and voltage fault injection.
I’ve used this device (or the higher-end version, called the ChipWhisperer-Pro) in many of my articles since then. But the Spartan 6 used in this device has been outdated for some time, and does not work with modern design tools. In this article, I’ll be describing the next generation of tools developed to help you work with power analysis and fault injection. This will start with the ChipWhisperer-Husky, which has been in development—and even has a few units in the field—since early 2021.
One major change with the ChipWhisperer-Husky is that I’m now describing the work of our small team. The majority of the real development—such as FPGA design, MCU firmware and Python interface—has been done primarily by my colleagues Jean-Pierre Thibault and Alex Dewar. In addition to this change of design, launching a hardware product in 2021 is much more difficult than it was in 2015 with supply chain challenges, so the iterations on the design have come more sporadically than in the past. But enough of where we’ve come from, let’s see where we ended up.
To skip right to the end, the ChipWhisperer-Husky main unit is shown in Figure 1. This is our current “prototype,” but is effectively the same as the production units. It’s heavily modeled on our PhyWhisperer-USB, which was a completely open-source USB sniffer and triggering tool I’ve used in a few previous articles.
Compared to the ChipWhisperer-Lite the major physical difference is the addition of an enclosure. This can be helpful when working with the device on your desk. It also adds additional I/O pins—there is a second 20-pin header on the top-side of the unit, which allows an 8-bit data bus (8 data lines and one clock line) to be mapped out. This 20-pin topside header matches the standard 20-pin Arm JTAG header too, which will allow some very unique features such as triggering on Arm trace packets.
The Arm trace decoder and trigger is currently an alternate build for PhyWhisperer-USB, and depending on the FPGA space, may fit also in the ChipWhisperer-Husky. But we are leaving lots of room for future growth in the hardware here.
The actual PCB is shown in Figure 2. The odd angled parts might cause some people a bit of mental anguish I admit! But we were trying to fit them around the mounting holes of the enclosure, and we made a late addition of a PLL (phase locked loop) chip for the ADC, which further cramped the space. Let’s look at what all those parts do next.
The block diagram is shown in Figure 3. Starting at the left we’ve got our USB interface. It uses the same Microchip Technology (formerly Atmel) SAM3U MCU used in the ChipWhisperer-Lite (and PhyWhisperer-USB), which talks to the FPGA over an address/data bus. Besides the FPGA and MCU, the ADC is the other main star—here upgraded to a 12-bit 200MS/s ADC (up from the 10-bit 105MS/s ADC on the ChipWhisperer-Lite). A big part of ChipWhisperer is being able to synchronize the clock of the ADC to the clock of the target device, something you can’t do in a regular oscilloscope. This causes a lot of effort in the clock routing. Why go to such effort?
While the way a regular oscilloscope works is shown in Figure 4, you’ll notice there is a small delay from the rising edge of the device clock to the rising edge of the sample clock. This delay will change on every rising edge of the device clock, since the two clocks have no phase relationship (they are asynchronous).
For the purpose of power analysis, we are trying to determine what a device is doing on certain clock edges. So, this jitter adds noise in our measurement. The ChipWhisperer uses a synchronous sampling that drives the ADC from the target device clock, which requires extra effort since we can’t just consider the ADC clock by itself. But the advantage is there is almost no jitter between the ADC measurements and the “source” of our signal, giving us a much higher effective signal-to-noise ratio (SNR) compared to driving the ADC asynchronously.
With the ChipWhisperer-Lite, this is done with blocks inside the Spartan 6 FPGA. One downside of the change from the Spartan 6 to the Xilinx Artix 7 was some of the “minimum clock frequencies” increase, meaning for typical embedded use (targeting a device with an 8MHz clock) we were on the edge of allowable frequency ranges. Using an external PLL means the ChipWhisperer-Husky has a very wide range of frequency inputs that will work with the device. Of course, the PLL blocks in the FPGA also have much higher jitter than the external PLL—the ADC datasheet even goes out of the way to warn you not to drive the ADC clock from an FPGA directly.
Despite such warnings I admit first trying to drive the ADC directly, and the final results were better than I expected. I came close to avoiding the PLL chip, as the difficulty of sourcing components in 2021 meant reducing the total number of “active” parts in the design. But one downside of the change from the Spartan 6 to the Artix 7 was some of the “minimum clock frequencies” increased, meaning for typical embedded use where a device might run on an 8MHz clock we were on the edge of allowable frequency ranges. Using an external PLL means the ChipWhisperer-Husky has a very wide range of frequency inputs that will work with the device, so in the end we added the PLL chip to the design.
GLITCHING AND MONITORING
The other major focus of the ChipWhisperer-Husky is generating signals you can use to drive glitch generators. For this we still use the blocks inside the FPGA, which I’ve described in previous articles when discussing glitching with the ChipWhisperer-Lite.
Getting feedback on the type of glitching you are inserting has always been tricky, since you are measuring very fast signals. The glitches may be only nanoseconds wide, meaning the required analog bandwidth is very high—and requires a very fast oscilloscope or logic analyzer. So, we’re solving this by building a logic analyzer inside the ChipWhisperer-Husky, meaning it can measure the “source” signals without all the bandwidth issues that come into play once you try adding probes onto the lines. Of course, this still requires a fast sample rate, but this fast-sampling logic analyzer is something we can do inside the FPGA.
An example output of this is shown in Figure 5. This design is entirely the brainchild of J-P Thibault, which shows what you can accomplish with a small team compared to when you tried to do it yourself! In this example it’s being used to debug the glitch generation logic, by capturing the original source clock (the top square wave), along with the two phase-shifted versions that are then turned into the glitchy clock (the second and third square waves from top are the phase shifted versions). By adjusting the phase shift we can adjust where in the output clock the glitch happens, and this feedback lets us see exactly what we are injecting.
Of course, this logic analyzer block has usage beyond just the simple glitch monitoring. With such a block we can monitor the state of other pins such as the serial communication pins and the bonus data connector pins. This data can be analyzed with open-source tooling such as Sigrok to decode serial data busses captured from the ChipWhisperer-Husky.
SUPPLY CHAIN FUN
In this final part of the article, you’d probably like to learn how you could buy or build your own unit. Unfortunately, this part of the article is the least exciting—as anyone who has been involved in the hardware production space will know! The main FPGA used is an Artix 7A35, which we had preordered in March 2021 for a small run (500 units). Unfortunately, our scheduled delivery date of September 2021 is already bumped to March 2022, so it means we’ll have limited ability to do a larger run. Other parts had been pre-ordered but remain on long lead times, so we’ll be able to build hardware “sometime.”
But if you’re interested in seeing the next generation of our capture hardware, keep an eye out for the ChipWhisperer-Husky making an appearance. We were planning on using Crowd Supply to launch our first run, which by the time you read this might be in progress. Of course, it’s been a little disappointing to not be able to have all of our suppliers perfectly lined up for a quick turnaround, but we’re still hopeful to get all of the parts we need to make it possible to physically build this thing. CC
 “USB-to-FPGA Communications”, Circuit Cellar 299, June 2015
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • NOVEMBER 2021 #376 – Get a PDF of the issueSponsor this Article