Design Solutions Research & Design Hub

Finding a $Billion Dollar Fault Mode

Written by Colin O'Flynn

Using EMFI Analysis

Claims of improper software design related to the electronic throttle system have cost Toyota several billion dollars in settlements, government fines and other business losses. Yet the exact fault hasn’t been recreated, even though at least one candidate exists in the code. In this article, Colin explores using electromagnetic fault injection (EMFI), often used in security analysis, to try and trigger a fault mode in a similar vehicle computer to the one implicated in the Toyota lawsuit.

  • How to use EMFI to try and trigger a fault mode in a similar vehicle computer to the one implicated in the Toyota lawsuit.

  • What was the Toyota ETCS-i issue?

  • How previous simulated environment tests worked?

  • How to use EMFI for safety testing

  • How to build the benchtop test system

  • What are the relevant auto safety standards?

  • How to introduce and EMFI soft attack?

  • Renesas RH850 MCU

  • NXP MPC565

  • Throttle body

  • Accelerator pedal sensor,

  • Ignition switch and start button

  • Simulator to generate the CAM and crank signals

  • OBD-II reader

  • Oscilloscope

Depending on where you are in the world, you may or may not have heard of the extensive “unintended acceleration” issues faced by Toyota starting in 2009. In most cases, drivers claimed that their cars suddenly “took off” on them—that is, the car started accelerating. And because such a wide-open throttle may change the brake pedal feel due to loss of power braking, drivers had trouble stopping the cars. A series of dueling narratives have been put forward, the first being from Toyota. The company maintained that it was either driver error (in a near-miss, with your heart pounding, maybe you catch the accelerator pedal while trying to brake) or a physical issue, such as the floor mat catching the pedal. The second narrative was related to how the throttle control worked, which was a (relatively) new electronic system at the time.

An increase in unintended acceleration complaints could be related to the introduction of what Toyota calls the “Electronic Throttle Control System-intelligence” (ETCS-i). The idea of this control loop is shown in Figure 1, where the driver’s pedal is little more than another input to the electronic control unit (ECU). Thus, there was considerable suspicion that the issue could instead be caused by electrical (or software) issues. To try and resolve this mystery, extensive analysis of the controller has been performed by both government and private engineers.

FIGURE 1 – The control Loop of ETSC-i involves a throttle body motor, which is commanded to a certain position based on the accelerator pedal.

The government side was performed by the NASA Engineering and Safety Center (NESC), which has experience in safety-critical coding. They had access to the Toyota source code along with engineers to assist with testing, but the scope of the investigation was limited due to time constraints. The result of this review process was NESC claiming that there was no failure path that resulted in the unintended acceleration. Electromagnetic interference testing was done as part of their research. This involved subjecting the car to strong electromagnetic fields in order to explore what happens if interference causes memory corruption or similar issues. These EMI tests caused mechanical issues, such as the engine stalling, but no example of the unintended acceleration was found through this experiment.

A more in-depth analysis was performed by Barr Group, an investigation triggered by a lawsuit from the two victims of a specific unintended acceleration incident. For this analysis, Barr Group also had access to the source code (though their access was tightly controlled). The final full report is highly confidential so we are limited to two public sources of information: Slides from Michael Barr’s 2014 keynote at EELive!, and expert witness testimony slides. See the RESOURCES at the end of the article for links to slides from both Michael Barr [1][2] and Philip Koopman [3].

SOFTWARE ERRORS 101

Barr Group found many serious software issues. For example, the watchdog timer is serviced not in individual tasks—which would be used to detect task death—but instead is serviced in one of the timer interrupts, which will continue to run in almost all conditions! More ominously, if the task that monitored the accelerator pedal position dies, this means that the throttle would stick in the last requested position. Because the state of the task was stored in a single memory bit, if that memory bit randomly flipped, the task would die. Such a memory bit flip would be consistent with what is known about soft errors in memory, which can have such rare random bit flips due to electromagnetic interference, cosmic rays or voltage spikes in the power supply rails.

This was tested in a “simulated environment”, which was a physical car on a dynamometer, where by flipping the bit and killing the task, the car could be made to continue to accelerate without further user input. The fail-safes that should shut down the engine in those scenarios also had issues. One particular fail-safe would only kick in after removing your foot from the brake if the throttle task died for example (a non-intuitive action in this scenario). Questions remain about whether these conditions matched what actual users experienced. The specific case at the center of this trial involved a driver on an exit ramp and the defense raised the question about how the throttle could “stick” at a value if the driver was not accelerating.

— ADVERTISMENT—

Advertise Here

This specific simulated test demonstrated one example of memory corruption causing the loss of throttle control, an example which could have been easily corrected with proper software design. There could be many other issues, and perhaps some of those other situations more closely match with what was reported—where the throttle itself ended up set at a higher value. Regardless, it was decided by a jury in the trial that the evidence was sufficient to find Toyota at fault. Toyota settled a number of lawsuits after losing this first test case, with settlements reaching $1.2 billion. Additional government fines ($1 billion), costs of recall and the loss of brand value made this entire episode even more expensive for Toyota—likely more than $3 billion). In addition to the financial cost, these acceleration incidents caused a number of fatalities and injuries.

A SIMPLE QUESTION

With this background, I’m going to answer a simple question: Without access to the ECU source code, without access to the confidential report by Barr Group and without man-years (or even man-months) of effort, could I introduce memory or control flow errors into an ECU to cause unintended operation modes? In previous articles, I have discussed how electromagnetic fault injection (EMFI) can corrupt memory and alter control flow. Can we use it for safety testing?

To clarify, I’m not claiming that the results here match any on-the-road results. Instead, what I’m exploring here is how we can improve our safety testing by using techniques from the embedded security analysis work to trigger fault conditions that might explain unintended behaviors. In my article “Low-Level Automotive ECU Security” (Circuit Cellar 364, November 2020) [4], I discussed how EMFI can be used to bypass security of a modern (2019) ECU, and I’m going to use similar techniques to perform safety testing of a slightly older (2006) ECU.

When I started this work, I planned on using a similar ECU to the one analyzed by Barr and company. To that end, I found a locally wrecked 2005 Toyota Corolla, which allowed me to remove the ECU and related devices (throttle body, pedal sensor and so forth). It turns out that this ECU uses a different architecture. The ECU from the car in the lawsuit (2005 Toyota Camry) has a Renesas RH850 based main microcontroller (MCU), whereas the ECU from my device is a NXP MPC565 (PowerPC architecture). While I anticipate that the code itself is different, the overall control-flow design should be almost identical. There was a separate monitor MCU, and presumably various other hardware fail-safes on the main board. A photo of the main ECU board is shown in Figure 2 for reference.

FIGURE 2 – A photo of the main ECU, the large BGA on the upper right is a Freescale (NXP) MPC565. A secondary Microchip device in TQFP package to the upper left of that BGA is assumed to be a PIC device, but a custom part number does not lend itself to a specific cross reference.
BENCH-TOP TESTING

I built a test bench, which you can see in Figure 3. This contains several important sections: (1) the main ECU, (2) throttle body, (3) accelerator pedal sensor, (4) ignition switch and start button, (5) simulator to generate the CAM and crank signals, which the ECU expects during the engine operation, (6) OBD-II reader which can be used to read data from the ECU and confirm overall operation and (7) oscilloscope to monitor drive signals.

FIGURE 3 – This test bench enables me to use the ECU with a physical throttle body and acceleration pedal sensor, with several other sensors “faked out” to try and keep the ECU operating somewhat normally. A standard automotive diagnostic tool is used to confirm operation. Shown here: (1) the main ECU, (2) throttle body, (3) accelerator pedal sensor, (4) ignition switch and start button, (5) simulator to generate the CAM and crank signals, which the ECU expects during the engine operation, (6) OBD-II reader which can be used to read data from the ECU and confirm overall operation and (7) oscilloscope to monitor drive signals.

This setup allows me to exercise much of the ECU functionality, while trying to observe negative effects in the control loop of the throttle body itself. Note there are still some missing areas here, and many of the sensor readings do not act in response to changes in the commanded throttle itself. This is one restriction in my current experiment—since this is a used ECU, the immobilizer is preventing the fuel injectors from firing, so I cannot confirm all fail-safes are tripped. However, being able to push the throttle to invalid values, while showing that the system appears to otherwise run, is an interesting result for further experimentation.

Unfortunately, the debug access port of the device is disabled so I could not immediately watch code execution to try and track the corruption. This is part of ongoing work (watch for a future article or a blog post), so I’ll hopefully be able to better map the specific corruption.

In a normal situation, the throttle position reports itself as going from about 20% to 81% (as measured using the ODB-II code reading tool). The throttle position should be linked to the accelerator pedal sensor position. The accelerator pedal sensor also has dual sensors to reduce the chance of an error from a single wire getting cut or similar. The accelerator sensor also has an over-extended range that is above the “pedal to the metal” range, and when that happens, it triggers a “limp home” mode that limits the throttle opening. If the over-extended range is seen, it is most likely caused by a physical error—such as a mounting coming loose or the pedal stop breaking.

EMFI BACKGROUND

If you haven’t read my previous articles, here is a short introduction to EMFI. EMFI discharges a capacitor bank into an inductor, where the inductor is placed near our target device. This forms a transformer with conductive structures inside the target device, so this will induce voltages in the target device, which has effects such as flipping bits in memory, or changing values in transit on buses.

— ADVERTISMENT—

Advertise Here

The specific effect we have is very hard to control. It could be corrupting large sections of memory, or it could be affecting just a few registers. It is less likely that the corruption is only a single bit at a time. Such single-bit corruption is normally called a “Single Event Upset” (SEU), as might be seen due to charged particles striking the device. SEUs are of great concern when a device is in space. That’s because there are more charged particles around due to the lack of atmosphere and the Earth’s magnetic field deflecting them. However, SEUs can still be an issue on Earth, and thus safety-critical systems often try to detect such failures.

SAFETY STANDARDS

In order to help design safety-critical items in automotive environments, we have an entire set of ISO standards to work through. In particular, I’m going to look at the ISO 26262 family, and go through how to ensure that we have fault-tolerant devices. Of interest to us, ISO 26262 details how we can expect faults (referred to as soft errors) to occur in digital devices. The ISO 26262 family also deals with failure of a harder type (in other words, permanent device damage). In particular, ISO 26262-11 has detailed failure calculation models for the hard failures of semiconductor devices.

Details of the soft error calculations in ISO 26262-11 are slightly less specific. Instead, they point to the JEDEC JESD89A standard, which details various types of soft errors (faults) that may occur. Note that a major assumption here is about the faults occurring due to effects such as cosmic radiation and similar. In reality, we can expect memory corruption to occur through many other mechanisms. I’ve written about voltage fault injection several times in this magazine, most recently in my article “Building the ChipJabber-Unplugged” (Circuit Cellar 360, July 2020) [5]. Voltage fault injection can easily corrupt memory and control flow mechanisms. Such a voltage fault injection is similar to noise or spikes on the power supply rails of your circuit, which may occur in many real-life situations.

If we consider memory corruption being a soft error, this also means that we have software causes such as stack overflow and bad C pointer usage. There is no shortage of ways to kill your memory—and making assumptions that only single bits may be corrupt at a time has the dangerous potential of not detecting more complex memory corruption that may occur. Testing only specific patterns, as is the practice in some “industry standards,” limits your ability to catch more complex mistakes, which in turn could cost a lot of money and people’s lives.

A SOFT ATTACK

In order to introduce soft errors, an EMFI “attack” is performed while the device is in various operating modes. As you can see in Figure 3, an XYZ table allows sweeping of the injection location across the surface of the chip in order to corrupt different locations. For example, we might have RAM under the injection tip (inductor) at one point, and part of the register at another location.

After a single injection at one location, I noticed that the throttle motor suddenly became noisier. You can see the normal PWM drive signal (pre-fault injection) in Figure 4, and compare this to the PWM drive signal (post-fault injection) in Figure 5, and you can see that there is a drastic change in the drive waveforms. The incorrect drive waveforms caused an increase in current draw. The draw becomes erratic but spikes beyond 5A and averages around 3.5A, whereas when operating normally, the system averaged at 1.5A. The control loop is still controlling the throttle, such that it follows the expected value. It simply appears to be “struggling” to control it now.

FIGURE 4 – The two throttle body motor wires show a PWM signal operating normally, where one signal is pulsing at a constant duty cycle, and the other signal is a constant value.
FIGURE 5 – The same throttle body motor wires are used as in Figure 4, but under erratic operation after fault, where the waveform does not match an expected square wave used in PWM.

Interestingly, this erratic mode remains a problem if turning the ignition off and on again. As the car maintains some power to the ECU, even when the ignition is switched off normally, it is assumed some calibration or similar variable has been corrupted. It also could be a register which is not initialized within the MCU, except on a full reset or power cycle (such as a PWM control register). I found that only a total power cycle (removing all power from the ECU) caused the system to return to normal operation.

The final objective is to find a situation where the throttle appears to stick open. This erratic mode appeared to be the gateway. After some time in this mode (normally about 30 seconds to 2 minutes), the throttle body did jam either fully open or fully closed. The problem seems to be accelerated by reducing the current limit of the power supply, causing the voltage to dip during the current spikes. This instance of the voltage dip having an effect on the throttle matches the work published in 2016 on the topic by Park et al [6]. In this previous research, the throttle changes were not permanent (only changing during the voltage dips), but in my work here it appears to stick fully open—more closely matching some of the claims from consumers.

Once the throttle is stuck (open in this case), the ECU will continue to communicate with my ODB-II tool, and appears to be controlling the spark igniters in response to my simulated crank and cam signals. Adjusting the signal input creates a corresponding change in the igniter signals for example. Using the ODB-II tool, I can confirm that the throttle position is commanded to be fully open, as in Figure 6. Note that the set position of 88% is above the “allowed” regular operating range that is reached with the accelerator pedal. The throttle seems to remain in this state until the ignition is switched off. I only waited a few minutes before shutting off system power, but the fault does not clear within a few seconds as it should if the watchdog or similar kicked in.

FIGURE 6 – Wide Open Throttle and ODB-II reader showing that the throttle is commanded to 88%. The motor is being actively driven with a constant “on” voltage (no PWM operation is observed).
SAFETY TESTING

While this article hasn’t demonstrated that the failure mode would happen in an actual vehicle, the real goal here is to provide a specific example of how relatively simple fault injection testing could have flagged potential issues. Performing this work did not require access to secret or confidential information (such as source code). In the world of security evaluations, performing a “no knowledge” attack is relatively common, and would not stop such evaluations.

By helping to bring security evaluation techniques into safety testing, we can perform more advanced evaluation of safety-critical devices. This is just the beginning of such work. Many of the standards that are relevant to the safety domain (such as ISO 26262 and JEDS 89) do not mention the common fault injection techniques (such as voltage, clock or EM fault injection) covered here as applicable to security research. I believe there is an important crossover here, and that those involved in embedded security and embedded safety have an opportunity to learn many tools and techniques from each other. Look out for future exploration of the specific fault mode that I’ve found here. I plan on testing it on a physical car, which should help demonstrate if this was a purely laboratory condition problem or a real-life problem. 

RESOURCES

References:
[1] Michael Barr. EELive! 2014 Keynote Slides.
https://barrgroup.com/sites/default/files/KillerApps_Barr_OFFICIAL.pdf
[2] Michael Barr. Expert Witness slides in Bookout v. Toyota trial.
https://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUBBED.pdf
[3] Phil Koopman. “A Case Study of Toyota Unintended Acceleration and Software Safety”
https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_slides.pdf
[4] “Low-Level Automotive ECU Security” (Circuit Cellar 364, November 2020)
[5] “Building the ChipJabber-Unplugged” (Circuit Cellar 360, July 2020)
https://circuitcellar.com/research-design-hub/building-the-chipjabber-unplugged
[6] Park S, Choi Y, Choi W. Experimental study for the reproduction of sudden unintended acceleration incidents. Forensic Sci Int. 2016 Oct;267 35-41.

NASA Engineering and Safety Center reports on unintended acceleration.
https://one.nhtsa.gov/About-NHTSA/Press-Releases/ci.NHTSA%E2%80%93NASA-Study-of-Unintended-Acceleration-in-Toyota-Vehicles.print

Microchip Technology | www.microchip.com
NXP Semiconductors | www.nxp.com
Renesas Electronics | www.renesas.com

— ADVERTISMENT—

Advertise Here

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JANUARY 2021 #366 – Get a PDF of the issue

Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.


Note: We’ve made the May 2020 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Website | + posts

Colin O’Flynn has been building and breaking electronic devices for many years. He is an assistant professor at Dalhousie University, and also CTO of NewAE Technology both based in Halifax, NS, Canada. Some of his work is posted on his website (see link above).

Sponsor this Article

Supporting Companies

Upcoming Events


Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2021 KCK Media Corp.

Finding a $Billion Dollar Fault Mode

by Colin O'Flynn time to read: 13 min