Talking Hands: American Sign Language Gesture Recognition Glove

Roberto developed a glove that enables communication between the user and those
around him. While the design is intended for use by people communicating in American Sign Language, you can apply what you learn in this article to a variety of communications applications.Capture
PHOTO 1-Here you see the finished product with all of the sensors sewn in. The use of string as opposed to adhesive for the sensors allowed the components to smoothly slide back and forth as the hand was articulated.

By Roberto Villalba

While studying at Cornell University in 2014, my lab partner Monica Lin and I designed and built a glove to be worn on the right hand that uses a machine learning (ML) algorithm to translate sign language into spoken English (see Photo 1). Our goal was to create a way for the speech impaired to be able to communicate with the general public more easily. Since every person’s hand is a unique size and shape, we aimed to create a device that could provide reliable translations regardless of those differences. Our device relies on a variety of sensors, such as flex sensors, a gyroscope, an accelerometer, and touch sensors to quantify the state of the user’s hand. These sensors allow us to capture the flex on each of the fingers, the hand’s orientation, rotation, and points of contact. By collecting a moderate amount of this data for each sign and feeding it into a ML algorithm, we are able to learn the association between sensor readings and their corresponding signs. We make use of a microcontroller to read, filter and send the data from the glove to a PC. Initially, some data is gathered from the users and the information is used to train a classifier that learns to differentiate between signs. Once the training is done, the user is able to put on the glove and make gestures which the computer then turns into audible output.

After performing some calculation and characterizing our flex sensors, we decided to use a 10-kΩ resistor. Note that the rightmost point goes into one of the microcontroller’s ADC.

FIGURE 1-After performing some calculation and characterizing our flex sensors, we decided to use a 10-kΩ resistor. Note that the rightmost point goes into one of the microcontroller’s ADC.

HIGH-LEVEL DESIGN
We use the microcontroller’s analog-to digital converter (ADC) to read the voltage drop across each of the flex sensors. We then move on to reading the linear acceleration and rotation values from the accelerometer and gyro sensor using I 2C. And finally, we get binary readings from each of the touch sensors regarding if there exists contact or not. We perform as many readings as possible within a given window of time and use all of this data to do some smoothing. This information is then sent through serial to the PC where it is gathered and processed. Python must listen to information coming in from the microprocessor and either store data or predict based on already learned information. Our code includes scripts for gathering data, loading stored data, classifying the data that is being streamed live, and some additional scripts to help with visualization of sensor readings and so on.

MCU & SENSORS
The design comprises an Atmel ATmega1284P microcontroller and a glove onto which the various sensors and necessary wires were sewn. Each finger has one Spectra Symbol flex sensor stitched on the backside of the glove. The accelerometer and gyro sensors are attached to the center of the back of the glove. The two contact sensors were made out of copper tape and wire that was affixed to four key locations.

Since each flex sensor has a resistance that varies depending on how much the finger is bent, we attached each flex sensor as part of a voltage divider circuit in order to obtain a corresponding voltage that can then be input into the microcontroller.

Capture3

We determined a good value for R1 by analyzing expected values from the flex sensor. Each one has a flat resistance of 10 k and a maximum expected resistance (obtained by measuring its resistance on a clenched fist) of about 27 k. In order to obtain the maximum range of possible output voltages from the divider circuit given an input voltage of 5 V, we plotted the expected ranges using the above equation and values of R1 in the range of 10 to 22 k. We found that the differences between the ranges were negligible and opted to use 10 k for R1 (see Figure 1).

Our resulting voltage divider has an output range of about 1 V. We were initially concerned that the resulting values from the microcontroller’s ADC converter would be too close together for the learning algorithm to discern between different values sufficiently. We planned to address this by increasing the input voltage to the voltage divider if necessary, but we found that the range of voltages described earlier was sufficient and performed extremely well.

The InvenSense MPU-6050 accelerometer and gyro sensor packet operates on a lower VCC (3.3 V) compared to the microcontroller’s 5 V. So as not to burn out the chip, we created a voltage regulator using an NPN transistor and a trimpot, connected as shown. The trimpot was adjusted so that the output of the regulator reads 3.3 V. This voltage also serves as the source for the pull-up resistors on the SDA and SCL wires to the microcontroller. Since the I 2C devices are capable only of driving the input voltages low, we connect them to VCC via two 4.7-k pull-up resistors (see Figure 2).

As described later, we found that we needed to add contact sensors to several key spots on the glove (see Figure 3). These would essentially function as switches that would pull the microcontroller input pins to ground to signal contact (be sure to set up the microcontroller pins to use the internal pull up resistors).

Figure 2: Here we see the schematic of the voltage regulator circuit that we created in order to obtain 3.3 V. The bottom of the schematic shows how this same regulator was used to pull up the signals at SCL and SDA. Figure 3: The contact sensor circuitry was quite simple. The input pins of the microcontroller are set to the internal pull-up resistors and whenever the two corresponding copper ends on the fingers touch the input is pulled low.

Figure 2: Here we see the schematic of the voltage regulator circuit that we created in order to obtain 3.3 V. The bottom of the schematic shows how this same regulator was used to pull up the signals at SCL and SDA.

Figure 3: The contact sensor circuitry was quite simple. The input pins of the microcontroller are set to the internal pull-up resistors and whenever the two corresponding copper ends on the fingers touch the input is pulled low.

I2C COMMUNICATIONS
Interfacing with the MPU-6050 required I 2C communication, for which we chose to use Peter Fleury’s public I 2C library for AVR microcontrollers. I 2C is designed to support multiple devices using a single dedicated data (SDA) bus and a single clock (SCL) bus. Even though we were only using the interface for the microcontroller to regularly poll the MPU6050, we had to adhere to the I 2C protocol. Fleury’s library provided us with macros for issuing start and stop conditions from the microcontroller (which represent different signals that the microcontroller is requesting data from the MPU-6050 or is releasing control of the bus). These provided macros allowed for us to easily initialize the I 2C interface, set up the MPU-6050, and request and receive the accelerometer and gyroscope data (described later).

Figure 4: The image is the visual output received from plotting sequences of sensor readings. The clear divisions across the horizontal signal the different signs A, B, C, and D, respectively.

Figure 4: The image is the visual output received from plotting sequences of sensor readings. The clear divisions across the horizontal signal the different signs A, B, C, and D, respectively.

While testing our I2C communication with the MPU-6050, we found that the microcontroller would on rare occasions hang while waiting for data from the I2C bus. To prevent this from stalling our program, we enabled a watchdog timer that would reset the system every 0.5 seconds, unless our program continued to progress to regular checkpoint intervals, at which time we would reset the watchdog timer to prevent it from unnecessarily resetting the system. We were able to leverage the fact that our microcontroller’s work consists primarily of continuously collecting sensor data and sending packets to a separate PC.

Photo 2: In this image we see the hand gestures for R, U, and V. As you can tell, there is not much difference in the hand’s orientation or the amount of flex on the fingers. However, note that the copper pieces make different kinds of contact for each of the signs.

Photo 2: In this image we see the hand gestures for R, U, and V. As you can tell, there is not much difference in the hand’s orientation or the amount of flex on the fingers. However, note that the copper pieces make different kinds of contact for each of the signs.

TINYREALTIME
For the majority of the code, we used Dan Henriksson and Anton Cervin’s TinyRealTime kernel. The primary reason for using this kernel is that we wanted to take advantage of the already implemented non-blocking UART library in order to communicate with the PC. While we only had a single thread running, we tried to squeeze in as much computation as possible while the data was being transmitted.

The program first initializes the I 2C, the MPU, and the ADC. After it enters an infinite loop it resets the watchdog timer and gets 16 readings from all of the sensors: accelerometers, gyroscopes, flex-sensors, and touch sensors. We then take all of the sensor values and compute filtered values by summing all of the 16 readings from each sensor. Since summation of the IMU sensors can produce overflow, we make sure to shift all of their values by 8 before summing them up. The data is then wrapped up into byte array packet that is organized in the form of a header (0xA1B2C3D4), the data, and a checksum of the data. Each of the sensors is stored into 2 bytes and the checksum is calculated by summing up the unsigned representation of each of the bytes in the data portion of the packet into a 2-byte integer. Once the packet has been created it is sent through the USB cable into the computer and the process repeats.

PYTHON COMMUNICATION
Communication with the microcontroller was established through the use of Python’s socket and struct libraries. We created a class called SerialWrapper whose main goal is to receive data from the microcontroller. It does so by opening a port and running a separate thread that waits on new data to be available. The data is then scanned for the header and a packet of the right length is removed when available. The checksum is then calculated and verified, and, if valid, the data is unpacked into the appropriate values and fed into a queue for other processes to extract. Since we know the format of the packet, we can use the struct library to extract all of the data from the packet, which is in a byte array format. We then provide the user with two modes of use. One that continuously captures and labels data in order to make a dataset, and another that continuously tries to classify incoming data. Support Vector Machines (SVM) are a widely used set of ML algorithms that learn to classify by using a kernel. While the kernel can take various forms, the most common kind are the linear SVMs. Simply put, the classification, or sign, for a set of readings is decided by taking the dot product of the readings and the classifier. While this may seem like a simple approach, the results are quite impressive. For more information about SVMs, take a look at scikit-learn’s “Support Vector Machines” (http://scikit-learn.org/stable/modules/svm.html).

PYTHON MACHINE LEARNING
For the purposes of this project we chose to focus primarily on the alphabet, a-z, and we added two more labels, “nothing” and “relaxed”, to the set. Our rationale for providing the classifier “nothing” was in order to have a class that was made up of mostly noise. This class would not only provide negative instances to help learn our other classes, but it also gave the classifier a way of outputting that the gestured sign is not recognized as one of the ones that we care about. In addition, we didn’t want the classifier to be trying to predict any of the letters when the user was simply standing by, thus we taught it what a “relaxed” state was. This state was simply the position that the user put his/her hand when they were not signing anything. In total there were 28 signs or labels. For our project we made extensive use of Python’s scikit-learn library. Since we were using various kinds of sensors with drastically different ranges of values, it was important to scale all of our data so that the SVM would have an easier time classifying. To do so we made use of the preprocessing tools available from scikit-learn. We chose to take all of our data and scale it so that the mean for each sensor was centered at zero and the readings had unit variance. This approach brought about drastic improvements in our performance and is strongly recommended. The classifier that we ended up using was a SVM that is provided by scikit-learn under the name of SVC.

Figure 5: The confusion matrix demonstrates how many times each label is predicted and how many times that prediction is accurate. We would like to see a perfect diagonal line, but we see that one square does not adhere to this. This square corresponds to “predicted V when it was really U” and it shows about a 66% accuracy.

Figure 5: The confusion matrix demonstrates how many times each label is predicted and how many times that prediction is accurate. We would like to see a perfect diagonal line, but we see that one square does not adhere to this. This square corresponds to “predicted V when it was really U” and it shows about a 66% accuracy.

Another part that was crucial to us as developers was the use of plotting in order to visualize the data and qualify how well a learning algorithm should be able to predict the various signs. The main tool that was developed for this was the plotting of a sequence of sensor readings as an image (see Figure 4). Since each packet contained a value for each of the sensors (13 in total), we could concatenate multiple packets to create a matrix. Each row is thus one of the sensor and we look at a row from left to right we get progressively later sensor readings. In addition, every packet makes up a column. This matrix could then be plotted with instances of the same sign grouped together and the differences between these and the others could then be observed. If the difference is clear to us, then the learning algorithm should have no issue telling them apart. If this is not the case, then it is possible that the algorithm could struggle more and changes to the approach could have been necessary.

The final step to classification is to pass the output of the classifier through a final level of filtering and debouncing before the output reaches the user. To accomplish this, we fill up a buffer with the last 10 predictions and only consider something a valid prediction if it has been predicted for at least nine out of the last 10 predictions. Furthermore, we debounce this output and only notify the user if this is a novel prediction and not just a continuation of the previous. We print this result on the screen and also make use of Peter Parente’s pyttsx text-to-speech x-platform to output the result as audio in the case that it is neither “nothing” or “relaxed.”

RESULTS
Our original glove did not have contact sensors on the index and middle fingers. As a result, it had a hard time predicting “R,” “U,” and “V” properly. These signs are actually quite similar to each other in terms of hand orientation and flex. To mitigate this, we added two contact sensors: one set on the tips of the index and middle fingers to detect “R,” and another pair in between the index and middle fingers to discern between “U” and “V.”

As you might have guessed, the speed of our approach is limited by the rate of communication between the microcontroller and the computer and by the rate at which we are able to poll the ADC on the microprocessor. We determined how quickly we could send data to the PC by sending data serially and increasing the send rate until we noticed a difference between the rate at which data was being received and the rate at which data was being sent. We then reduced the send frequency back to a reasonable value and converted this into a loop interval (about 3 ms).

We then aimed to gather as much data as possible from the sensors in between packet transmission. To accomplish this, we had the microcontroller gather as much data as possible between packets. And in addition to sending a packet, the microcontroller also sent the number of readings that it had performed. We then used this number to come up with a reasonable number of values to poll before aggregating the data and sending it to the PC. We concluded that the microcontroller was capable of reading and averaging each of the sensors 16 times each, which for our purposes would provide enough room to do some averaging.

The Python algorithm is currently limited by the rate at which the microcontroller sends data to the PC and the time that it takes the speech engine to say the word or letter. The rate of transfer is currently about thirty hertz and we wait to fill a buffer with about ten unanimous predictions. This means that the fastest that we could output a prediction would be about three times per second which for our needs was suitable. Of course, one can mess around with the values in order to get faster but slightly less accurate predictions. However, we felt that the glove was responsive enough at three predictions per second.

While we were able to get very accurate predictions, we did see some slight variations in accuracy depending on the size of the person’s hands. The accuracy of each flexsensor is limited beyond a certain point. Smaller hands will result in a larger degree of bend. As a result, the difference between slightly different signs with a lot of flex tends to be smaller for users with more petite hands. For example, consider the signs for “M” and “S.” The only difference between these signs is that “S” will elicit slightly more flex in the fingers. However, for smaller hands, the change in the resistance from the flex-sensor is small, and the algorithm may be unable to discern the difference between these signs.

Figure 6: We can see that even with very small amounts of data the classifier does quite well. After gathering just over 60 readings per sign it achieves an accuracy of over 98%.

Figure 6: We can see that even with very small amounts of data the classifier does quite well. After gathering just over 60 readings per sign it achieves an accuracy of over 98%.

In the end, our current classifier was able to achieve an accuracy of 98% (the error being composed almost solely of u, v sign confusion) on a task of 28 signs, the full alphabet as well as “relaxed” and “nothing” (see Figure 5). A random classifier would guess correctly 4% of the time, clearly indicating that our device is quite accurate. It is however worth noting that the algorithm could greatly benefit from improved touch sensors (seeing as the most common mistake is confusing U for V), being trained on a larger population of users, and especially on larger datasets. With a broad enough data set we could provide the new users with a small test script that only covers difficult letters to predict and relies on the already available data for the rest. The software has currently been trained on the two team members and it has been tested on some users outside of the team. The results were excellent for the team members that trained the glove and mostly satisfying though not perfect for the other volunteers. Since the volunteers did not have a chance to train the glove and were not very familiar with the signs, it is hard to say if their accuracy was a result of overfitting, individual variations in signing, or inexperience with American Sign Language. Regardless, the accuracy of the software on users who trained was near perfect and mostly accurate for users that did not know American Sign Language prior to and did not train the glove.

Lastly it is worth noting that the amount of data necessary for training the classifier was actually surprisingly small. With about 60 instances per label the classifier was able to reach the 98% mark. Given that we receive 30 samples per second and that there are 28 signs, this would mean that gathering data for training could be done in under a minute (see Figure 6).

FUTURE UPGRADES
The project met our expectations. Our initial goal was to create a system capable of recognizing and classifying gestures. We were able to do so with more than 98% average accuracy across all 28 classes. While we did not have a solid time requirement for the rate of prediction, the resulting speed made using the glove comfortable and it did not feel sluggish. Looking ahead, it would make sense to improve our approach for the touch sensors since the majority of the ambiguity in signs come from the difference between U and V. We want to use materials that lend themselves more seamlessly to clothing and provide a more reliable connection. In addition, it will be beneficial to test and train our project on a large group of people since this would provide us with richer data and more consistency. Lastly, we hope to make the glove wireless, which would allow it to easily communicate with phones and other devices and make the system truly portable.

RESOURCES
Arduino, “MPU-6050 Accelerometer + Gyro,” http://playground.arduino.cc/ Main/MPU-6050.

Atmel Corp., “8-Bit AVR Microcontroller with 128K Bytes In-System Programmable Flash: ATmega1284P,” 8059D­AVR­ 11/09, 2009,
www.atmel. com/images/doc8059.pdf.

Fleury, “AVR-Software,” 2006,
http://homepage. hispeed.ch/peterfleury/avrsoftware.html.

Lund University, “Tiny Real Time,” 2006, www.control.lth. se/~anton/tinyrealtime/.

Parente, “pyttsx – Text-tospeech x-platform,” pyttsx “struct–Interpret Strings as Packed Binary Data,” https://docs.python.org/2/ library/struct.html.

scikit-learn, “Preprocessing Data,”
http:// scikit-learn.org/stable/modules/preprocessing. html.

“Support Vector Machines,” scikit-learn.org/stable/modules/svm.html.

Spectra Symbol, “Flex Sensor FS,” 2015,
www.spectrasymbol.com/wp-content/themes/spectra/images/datasheets/FlexSensor.pdf.

Villalba and M. Lin, “Sign Language Glove,” ECE4760, Cornell University, 2014,
http:// people.ece.cornell.edu/land/courses/ece4760/FinalProjects/f2014/rdv28_mjl256/webpage/.

SOURCES
ATmega1284P Microcontroller Atmel | www.atmel.com

MPU-6050 MEMS MotionTracking Device InvenSense | www.invensense.com

Article originally published in Circuit Cellar June 2016, Issue #311

The Future of IoT Security

By Haydn Povey

Unlimited opportunity. That’s what comes to mind when I think about the future of the Internet of Things (IoT). And, that is both a blessing and a curse.

As the IoT proliferates, billions of cloud-connected devices are expected to be designed, manufactured, and deployed over the next decade. Our increasingly connected world will become hyper-connected, transforming our lives in ways we likely never thought possible. We will see smarter cities where the commuter is automatically guided, smarter farming where livestock health is individually monitored with on-call veterinary services, smarter healthcare to reduce the spiraling costs, integration between smart white goods and utilities to manage grid loading, and the integration of smart retail and personal assistant AI to provide a “curated” shopping experience. That future is limitless and exciting. But it is also frightening. We have already seen the headlines of how attacks have impacted businesses and people with valuable data being stolen or ransomed. It is widely believed the attacks are just starting.

Devices—not often seen as likely hacking targets—now have the potential to be weaponized. No one wants a device or application that is prone to hacking or theft. Hacks, malware, and IP theft have a significant dollar cost and can destroy corporate brands and reputations. And these devices may have extended lifecycles of decades. And a “secure” connected device does not guarantee a secure system. All too often, security has been an after-thought in the development of systems.

Hardware, software, communications, and communications protocol, device commissioning, applications layers, and other systems considerations all could impact security of a device and its data. The future of IoT must see security become an integral part of the design and deployment process, not merely an after-thought or add-on.

Delivering security-orientated embedded systems is a major challenge today. It will take a strong ecosystem and the development of a “supply chain of trust” to deliver truly secure product creation, deployment, and lifecycle management for the rapidly evolving IoT marketplace.

Security needs to be architected into devices from the moment of inception. In addition, it needs to be extended across the supply chain, from security-orientated chips through to manufacturing and management for the lifecycle of the product.

To deliver secure manufacturing and ensure no malware can be injected, cold and hard cryptography principles must be relied upon to ensure solutions are secured. Security principles should be embedded in every aspect of the system from the delivery of secure foundations in the silicon device, through to the secure mastering and encryption of the OEM codebase to ensure it is protected. The programming and manufacturing stages may then freely handle the encrypted code base, but the utilization of secure appliances, which integrate high-integrity and high-availability hardware security modules, enables secure enclaves to be integrated into the process to manage and orchestrate all key material. Furthermore, the ability to encrypt applications within the development process and subsequently decrypt the images in place within the device is a critical to securing the intellectual property.

While simple in theory, there are multiple aspects of a system that must be secured, encompassing the device, the mastering of the application, the handling and sharing of the keys, and the loading of the application on to the device. The only real solution is to develop a “zero trust” approach across the supply chain to minimize vulnerabilities and continually authenticate and individualize deliverables as far as possible.

While this integrated approach cannot resolve all aspects of counterfeiting, it does mark a key rallying point for the industry, and finally enables the industry to start to draw a line under the mass counterfeiting and over-production of devices. And all stakeholders in the process—including device platform providers, OEMs, programming centers, contract manufacturers, end users, security experts, and standards bodies—must do their parts to make cyber-secure programming and manufacturing ubiquitous, easy to use, and easily adoptable.

As I said, the future of IoT holds limitless opportunity, and that will drive new solutions. There will be new business models and new ecosystems. The threats are real, and the cost of failure could be astronomical. So, for the future of IoT to be bright, it must start with security.

This article appears in Circuit Cellar 324.

Haydn Povey [Headshot - Colour]Haydn Povey is the Founder/CEO of Secure Thingz, a company focused on developing and delivering next-generation security technology into the Internet of Things (IoT) and other connected systems. He also currently sits on the Executive Steering Board of the IoT Security Foundation. Haydn has been in senior management at leading global technology companies for more than 20 years, including 10 years in senior marketing and business development roles at ARM.

The Future of Embedded FPGAs

The embedded FPGA is not new, but only recently has it started becoming a mainstream solution for designing chips, SoCs, and MCUs. A key driver is today’s high-mask costs of advanced ICs.  For a chip company designing in high nodes, a change in RTL could cost millions of dollars and set the design schedule back by months. Another driver is constantly changing standards. The embedded FPGA is so compelling because it provides designers with the flexibility to update RTL at any time after fabrication, even in-system. Chip designers, management, and even the CFO like it.Tate Fig1

Given these benefits, the embedded FPGA is here to stay. However, like any technology, it will evolve to become better and more widespread. Looking back to the 1990s when ARM and others offered embedded processor IP, the technology evolved to where embedded processors appear widely on most logic chips today. This same trend will happen with embedded FPGAs. In the last few years, the number of embedded FPGA suppliers has increased dramatically: Achronix, Adicsys, Efinix, Flex Logix, Menta, NanoXplore, and QuickLogic. The first sign of market adoption was DARPA’s agreement with Flex Logix to provide TSMC 16FFC embedded FPGA for a wide range of US government applications. This first customer was critical as it validated the technology and paved the way for others to adopt.

There are a number of things driving the adoption of the embedded FPGA:

  • Mask costs are increasing rapidly: approximately $1 million for 40 nm, $2 million for 28 nm, and $4 million for 16 nm.
  • The size of design teams required to design advanced node is increasing. Fewer chips are being designed, but they want the same functions as in the past.
  • Standards are constantly changing.
  • Data centers require programmable protocols.
  • AI and machine learning algorithms

Surprisingly, embedded FPGAs don’t compete with FPGA chips. FPGA chips are used for rapid prototyping and lower-volume products that can’t justify the increasing cost of ASIC development. When systems with FPGAs hit high volume, FPGAs are generally converted to ASICs for cost reduction.

In contrast, embedded FPGAs don’t use external FPGAs and they can do things external FPGAs can’t, such as:

  • They are lower power because SERDES aren’t needed. Standard CMOS interfaces can run 1 GHz+ in 16 nm for embedded FPGA with hundreds and thousands of interconnects available.
  • Embedded FPGA is lower cost per LUT. There is no expensive packaging and a one-third of the die area of an FPGA chip is SERDES, PLLs, DDR PHYs, etc. that are no longer needed.
  • 1-GHz operations in the control path
  • Embedded FPGAs can be optimized: lots of MACs (Multiplier-Accumulators) for DSP or none; exactly the kind of RAM needed or none.
  • Tiny embedded FPGAs of just 100 LUTs up to very large embedded FPGAs of greater than 100K LUTs
  • Embedded FPGAs can be optimized for very low power operation or very high performance.

The following markets are likely to see widespread utilization of embedded FPGAs: the Internet of Things (IoT); MCUs and customizable programmable blocks on the processor bus; defense electronics; networking chips; reconfigurable wireless base stations; flexible, reconfigurable ASICs and SoCs; and AI and deep Learning accelerators.

To integrate embedded FPGAs, chip designers need them to have the following characteristics: silicon proven IP; density in LUTs/square millimeters similar to FPGA chips; a wide range of array sizes from hundreds of LUTs to hundreds of thousands of LUTs; options for a lot of DSP support and the kind of RAM a customer needs; IP proven in the process node a company wants with support of their chosen VT options and metal stack; an IP implementation optimized for power or performance; and proven software tools.

Over time, embedded FPGA IP will be available on every significant foundry from 180 to 7 nm supporting a wide range of applications. This means embedded FPGA suppliers must be capable of cost-effectively “porting” their architecture to new process nodes in a short time (around six months). This is especially true because process nodes keep getting updated over time and each major step requires an IP redesign.

Early adopters of embedded FPGA will have chips with wider market potential, longer life, and higher ROI, giving designers a competitive edge over late adopters. Similar benefits will accrue to systems designers. Clearly, this technology is changing the way chips are designed, and companies will soon learn that they can’t afford to “not” adopt embedded FPGA.

This article appears in Circuit Cellar 323.

Geoff Tate is CEO/Cofounder of Flex Logix Technologies. He earned a BSc in Computer Science from the University of Alberta and an MBA from Harvard University. Prior to cofounding Rambus in 1990, Geoff served as Senior Vice President of Microprocessors and Logic at AMD.

The Future of Network-on-Chip (NoC) Architectures

Adding multiple processing cores on the same chip has become the de facto design choice as we continue extracting increasing performance per watt from our chips. Chips powering smartphones and laptops comprise four to eight cores. Those powering servers comprise tens of cores. And those in supercomputers have hundreds of cores. As transistor sizes decrease, the number of cores on-chip that can fit in the same area continues to increase, providing more processing capability each generation. But to use this capability, the interconnect fabric connecting the cores is of paramount importance to enable sharing or distributing data. It must provide low latency (for high-quality user experience), high throughput (to maintain a rate of output), and low power (so the chip doesn’t overheat).

Ideally, each core should have a dedicated connection to a core with which it’s intended to communicate. However, having dedicated point-to-point wires between all cores wouldn’t be feasible due to area, power, and wire layout constraints. Instead, for scalability, cores are connected by a shared network-on-chip (NoC). For small core counts (eight to 16), NoCs are simple buses, rings, or crossbars. However, these topologies aren’t too scalable: buses require a centralized arbiter and offer limited bandwidth; rings perform distributed arbitration but the maximum latency increases linearly with the number of cores; and crossbars offer tremendous bandwidth but are area and power limited. For large core counts, meshes are the most scalable. A mesh is formed by laying out a grid of wires and adding routers at the intersections which decide which message gets to use each wire segment each cycle, thus transmitting messages hop by hop. Each router has four ports (one in each direction) and one or more ports connecting to a core. Optimized mesh NoCs today take one to two cycles at every hop.

Today’s commercial many-core chips are fairly homogeneous, and thus the NoCs within them are also homogeneous and regular. But the entire computing industry is going through a massive transformation due to emerging technology, architecture, and application trends. These, in turn, will have implications for the NoC designs of the future. Let’s consider some of these trends.

An exciting and potentially disruptive technology for on-chip networks is photonics. Its advantage is extremely high-bandwidth and no electrical power consumption once the signal becomes optical, enabling a few millimeters to a few meters at the same power. Optical fibers have already replaced electronic cables for inter-chassis interconnections within data centers, and optical backplanes are emerging viable alternatives between racks of a chassis. Research in photonics for shorter interconnects—from off-die I/O to DRAM and for on-chip networks—is currently active. In 2015, researchers at Berkeley demonstrated a microprocessor chip with on-chip photonic devices for the modulation of an external laser light source and on-chip silicon waveguides as the transmission medium. These chips directly communicated via optical signals. In 2016, researchers at the Singapore-MIT Alliance for Research and Technology demonstrated LEDs as on-chip light sources using novel III-V materials. NoC architectures inspired by these advances in silicon photonics (light sources, modulators, detectors, and photonic switches) are actively researched. Once challenges related to reliable and low-power photonic devices and circuits are addressed, silicon photonics might partially or completely replace on-chip electrical wires and provide high-bandwidth data delivery to multiple processing cores.

Read more Tech the Future Essays

The performance and energy scaling that used to accompany transistor technology scaling has diminished. While we have billions of transistors on-chip, switching them simultaneously can exceed a chip’s power budget. Thus, general-purpose processing cores are being augmented with specialized accelerators that would only be turned on for specific applications. This is known as dark silicon. For instance, GPUs accelerate graphics and image processing, DSPs accelerate signal processing, cryptographic accelerators perform fast encryption and decryption, and so on. Such domain-specific accelerators are 100× to 1000× more efficient than general-purpose cores. Future chips will be built using tens to hundreds of cores and accelerators, with only a subset of them being active at any time depending on the application. This places an additional burden on the NoC. First, since the physical area of each accelerator isn’t uniform (unlike cores), future NoCs are expected be irregular and heterogeneous. This creates questions about topologies, algorithms for routing, and managing contention. Second, traffic over the NoC may have dynamically varying latency/bandwidth requirements based on the currently active cores and accelerators. This will require quality-of-service guarantees from the NoC, especially for chips operating in real-time IoT environments or inside data centers with tight end-to-end latency requirements.

The aforementioned domain-specific accelerators are massively parallel engines with NoCs within them, which need to be tuned for the algorithm. For instance, there’s a great deal of interest in architectures/accelerators for deep neural networks (DNN), which have shown unprecedented accuracy in vision and speech recognition tasks. Example ASICs include IBM’s TrueNorth, Google’s Tensor Processing Unit (TPU), and MIT’s Eyeriss. At an abstract level, these ASICs comprise hundreds of interconnected multiply-accumulate units (the basic computation inside a neuron). The traffic follows a map-reduce style. Map (or scatter) inputs (e.g., image pixels or filter weights in case of convolutional neural networks) to the MAC units. Reduce (or gather) partial or final outputs that are then mapped again to neurons of this/subsequent layers. The NoC needs to perform this map-reduce operation for massive datasets in a pipelined manner such that MAC units are not idle. Building a highly scalable, low-latency, high-bandwidth NoC for such DNN accelerators will be an active research area, as novel DNN algorithms continue to emerge.

You now have an idea of what’s coming in terms of the hardware-software co-design of NoCs. Future computer systems will become even more heterogeneous and distributed, and the NoC will continue to remain the communication backbone tying these together and providing high performance at low energy.

This essay appears in Circuit Cellar 322.

Dr. Tushar Krishna is an Assistant Professor of ECE at Georgia Tech. He holds a PhD (MIT), an MSE (Princeton), and a BTech (IIT Delhi). Dr. Krishna spent a year as a post-doctoral researcher at Intel and a semester at the Singapore-MIT Alliance for Research and Technology.

The Future of Embedded Computing

Although my academic background is in cybernetics and artificial intelligence, and my career started out in production software development, I have been lucky enough to spend the last few years diving head first into embedded systems development. There have been some amazing steps forward in embedded computing in recent years, and I’d like to share with you some of my favorite recent advances, and how I think they will progress.

While ever-decreasing costs of embedded computing hardware is expected and not too exciting, I think there have been a few key price points that are an indicator of things to come. In the last few months, we have seen the release of Application Processor development boards that are below $10. Tiny gigahertz-level processors that are Linux-ready for an amazingly low price. The most well-known is the Raspberry Pi Zero, which is created by the Raspberry Pi Foundation, who I believe will continue to push this impressive level of development capability into schools, really giving the next generation of engineers (and non-engineers) some hands-on experience. Perhaps a less well known release is C.H.I.P, the new development platform from Next Thing Co. The hardware is like the Pi Zero, but the drive behind the company is quite different. We’ll discuss this more later.

While the hobbyist side of embedded computing is not new, the communities and resources that are being built are exciting. Most of you will have heard of Arduino and Raspberry Pi. The Pi is a low-cost, easy-to-use Linux computer. Arduino is an open-source platform consisting of a super-simple IDE, tons of libraries, and a huge range of development boards. These have set a standard for member of the maker community who expect affordable hardware, open-source designs, and strong community support, and some companies are stepping up to this.

Next Thing Co. has the goal of creating things to inspire creativity. In addition to developing low-cost hardware, they try to remove the pain from the design process and only open-source, well-documented products will do. This ethos is embodied in their C.H.I.P Pro, which is not just an open-source Linux System-on-Module. It’s built around their own GR8 IC, which contains an Allwinner 1-GHz ARM Cortex-A8, as well as 256 MB of DDR3 built in, accompanied with an open datasheet requiring no NDA, and with a one-unit minimum order quantity. This really eliminates the headaches of high-speed routing between DDR3 and the processor, and it reduces the manufacturing complexities of creating a custom Linux ready PCB. Innovation and progress like this provide a lot more value than the many other companies just producing insufficiently documented breakout boards for existing chips. I think that this will be a company to watch, and I can’t wait to see what their next ambitious project will be.

 
We’ve all been witnessing the ever-increasing performance of embedded systems, as successive generations of smart phones and tablets are released, but when I talk about high performance I don’t refer to a measly 2+GHz Octa-core system with a few Gig of RAM, I’m talking about embedded supercomputing!

As far as I’m concerned, the one to watch here is NVIDIA. Their recent Tegra series sees them bringing massively parallel GPU processing to affordable embedded devices. The Tegra 4 had a quadcore CPU and 72 GPU cores. The TK1 has a quadcore CPU and 192 GPU cores, and the most recent TX1 has an octacore CPU and a 256 GPU cores that provide over 1 Teraflops of processing power. These existing devices are very impressive, but NVIDIA are not slowing down development, with the Xavier expected to appear at the end of 2017. Boasting 512 GPU cores and a custom octacore CPU architecture, the Xavier claims to deliver 20 trillion operations per second for only 20-W power consumption.

NVIDIA is developing these systems with the intent for them to enable embedded artificial intelligence (AI) with a focus on autonomous vehicles and real-time computer vision. This is an amazing goal, as AI has historically lacked the processing power to make it practical in many applications, and I’m hoping that NVIDIA is putting an end to that. In addition to their extremely capable hardware, they are providing great software resources and support for developing deep learning systems.

We are on the horizon of some exciting advancements in the field of embedded computing. In addition to seeing an ever-growing number of IoT and smart devices, I believe that during the next few years we’ll see embedded computing enable great advancements in AI and smart cities. Backyard developers will be enabled to create more impressive and advanced systems, and technical literacy will become more widespread.

This essay appears in Circuit Cellar 321.

 

Steve Samuels (steve@think-engineer.com) is a Cofounder and Prototype Engineer at Think Engineer LLP, a research, development and prototyping company that specializes in creating full system prototypes and proof-of-concepts for next-generation products and services. Steve has spent most of his career in commercial research and development in domains such as transportation, satellite communications, and space robotics. Having worked in a lot of different technical areas, his main technical interests are embedded systems and machine learning.

Lightweight Systems and the Future of Wireless Technology

Last November, we published engineer Alex Bucknall’s essay “Taking the ‘Hard’ Out of Hardware.” We recently followed up with him to get his thoughts on the future of ‘Net-connected wireless devices and the Internet of Things (IoT).

BucknallAs we enter an age of connected devices, sensors, and objects (aka the Internet of Things), we’re beginning to see a drive for lightweight systems that allow for low power, low complexity, and long-distance communication protocols. More of the world is becoming connected and not all of these embedded devices can afford high-capacity batteries or to be connected to mains power. We’ll see a collection of protocols that can provide connectivity with just a few milliwatts of power that can be delivered through means of energy harvesting such as solar power. It’ll become essential for embedded sensors to communicate from remote locations where current standards like Wi-Fi and BLE fall behind due to range constraints. Low-Power Wide Area Networks (LPWANs) will strive to fill this gap with protocols such as Sigfox, LoRa, NB-IoT, and others stepping up to the plate. The next hurdle will be the exciting big data challenge as we start to learn more about our world via the Internet of Things! — Alex Bucknall (Developer Evangelist, Sigfox, France)

The Future of Automation

The robot invasion isn’t coming. It’s already here. One would be hard-pressed to find anything in modern “industrialized” society that doesn’t rely on a substantial level of automation during its life cycle—whether in its production, consumption, use, or (most probably) all of the above. Regardless of the definition du jour, “robots” are indeed well on their way to taking over—and not in the terrifying, apocalyptic, “Skynet” kind of way, but in a way that will universally improve the human condition.

Of course, the success of this r/evolution relies on an almost incomprehensible level of compounding innovations and advancements accompanied by a mountain of associated technical complexities and challenges. The good news is many of these challenges have already been addressed—albeit in a piecemeal manner—by focused professionals in their respective fields. The real obstacle to progress, therefore, ultimately lies in the compilation and integration of a variety of technologies and techniques from heterogeneous industries, with the end goal being a collection of cohesive systems that can be easily and intuitively implemented in industry.

Two of the most promising and critical aspects of robotics and automation today are human-machine collaboration and flexible manufacturing. Interestingly (and, perhaps, fortuitously), their problem sets are virtually identical, as the functionality of both systems inherently revolves around constantly changing and wildly unpredictable environments and tasks. These machines, therefore, have to be heavily adaptable to and continuously “aware” of their surroundings in order to maintain not only a high level of performance, but also to consistently perform safely and reliably.

Not unlike humans, machines rely on their ability to collect, analyze, and act on external data, oftentimes in a deterministic fashion—in other words, certain things must happen in a pre-defined amount of time for the system to perform as intended. These data can range from the very slow and simple (e.g., calculating temperature by reading the voltage of a thermocouple once a second) to the extremely fast and complex (e.g., running control loops for eight brushless electric motors 25,000-plus times a second). Needless to say, giving a machine the ability to perceive—be it through sight, sound, and/or touch—and act on its surroundings in “real time” is no easy task.


Read more Tech the Future essays and get inspired!


Computer vision (sight and perception), speech recognition (sound and language), and precision motion control (touch and motor skills) are things most people take for granted, as they are collectively fundamental to survival. Machines, however, are not “born” with—nor have they evolved—these abilities. Piling on additional layers of complexity like communication and the ability to acquire knowledge/learn new tasks, and it becomes menacingly apparent how substantial the challenge of creating intelligent and connected automated systems really is.

While the laundry list of requirements might seem nearly impossible to address, fortunately the tools used for integrating these exceedingly complex systems have undergone their own period of hyper growth in the last several years. In much the same way developers, researchers, engineers, and entrepreneurs have picked off industry- and application-specific problems related to the aforementioned technical hurdles, as have the people behind the hardware and software that make it possible for these independently developed, otherwise standalone solutions to be combined and interwoven, thus resulting in truly world-changing innovations.

For developers, only in the last few years has it become practical to leverage the combination of embedded technologies like the power-efficient, developer-friendly mobile application processor with the flexibility and raw “horsepower” of programmable logic (i.e., field-programmable gate arrays, which have historically been reserved for the aerospace/defense and telecommunication industries) at scales never previously imagined. And with rapidly growing developer communities, the platforms built around these technologies are directly facilitating the advancement of automation, and doing it all under a single silicon “roof.” There’s little doubt that increasing access to these new tools will usher in a more nimble, intelligent, safe, and affordable wave of robotics.

Looking forward, automation will undoubtedly continue to play an ever-increasingly central role in day-to-day life. As a result, careful consideration must be given to facilitating human-machine (and machine-machine) collaboration in order to accelerate innovation and overcome the technical and societal impacts bred from the disruption of the status quo. The pieces are already there, now it’s time to assemble them.


This article appears in Circuit Cellar 320.


Ryan Cousins is cofounder and CEO of krtkl, Inc.http://krtkl.com/ (“critical”), a San Francisco-based embedded systems company. The krtkl team created snickerdoodle—an affordable and highly reconfigurable platform for developing mechatronic, audio/video, computer vision, networking, and wireless communication systems. Ryan has a BS in mechanical engineering from UCLA.  He has experience in R&D, project management, and business development in the medical and embedded systems industries. Learn more at krtkl.com or snickerdoodle.io.

The Importance of Widely Available Wireless Links for UAV Systems

Readily available, first-rate wireless links are essential for building and running safe UAV systems. David Weight, principal electronics engineer at Waittcircuit, recently shared his thoughts on the importance developing and maintaining high-quality wireless links as the UAV industry expands.

weightOne of the major challenges that is emerging in the UAV industry is maintaining wireless links with high availability. As UAVs start to share airspace with other vehicles, we need to demonstrate that a control link can be maintained in a wide variety of environments, including interference and non-line of sight. We are starting to see software defined radio used to build radios which are frequency agile and capable of using multiple modulation techniques. For example, being able to use direct links in open spaces where these are most effective, but being able to change to 4G type signals when entering more built-up areas as these areas can pose issues for direct links, but have good coverage for existing commercial telecoms. Being able to change the frequency and modulation also means that, where interference or poor signal paths are found, frequencies can be changed to avoid interference, or in extreme cases, be reduced to lower bands which allow control links to be maintained. This may mean that not all the data can be transmitted back, but it will keep the link alive and continue to transmit sufficient information to allow the pilot to control the UAV safely. — David Weight (Principal Electronics Engineer, Wattcircuit, UK)

Brain Controlled-Tech and the Future of Wireless

Wireless IoT devices are becoming increasingly common in both private and public spaces. Phil Vreugdenhil, an instructor at Camosun College in Canada, recently shared his thoughts on the future of ‘Net-connected wireless technology and the ways users will interact with it.

VreugdenhilI see brain-controlled software and hardware seamlessly interacting with wireless IoT devices.  I also foresee people interacting with their enhanced realities through fully integrated NEMS (nano-electromechancical systems) which also communicate directly with the brain, bypassing the usual pathways (eyes, ears, nose, touch, taste) much like cochlear implants and bionic eyes. I see wireless health-monitoring systems and AI doctors drastically improving efficiency in the medical system. But, I also see the safety and security pitfalls within these future systems. The potential for hacking somebody’s personal systems and altering or deleting the data they depend upon for survival makes the future of wireless technology seem scarier than it will probably be. — Phil Vreugdenhil (Instructor, Camosun College, Canada)

The Future of Test-First Embedded Software

The term “test-first” software development comes from the original days of extreme programming (XP). In Kent Beck’s 1999 book, Extreme Programming Explained: Embrace Change (Addison-Wesley), his direction is to create an automated test before making any changes to the code.

Nowadays, test-first development usually means test-driven development (TDD): a well-defined, continuous feedback cycle of code, test, and refactor. You write a test, write some code to make it pass, make improvements, and then repeat. Automation is key though, so you can run the tests easily at any time.

TDD is well regarded as a useful software development technique. The proponents of TDD (including myself) like the way in which the code incrementally evolves from the interface as well as the comprehensive test suite that is created. The test suite is the safety net that allows the code to be refactored freely, without worry of breaking anything. It’s a powerful tool in the battle against code rot.

To date, TDD has had greater adoption in web and application development than with embedded software. Recent advances in unit test tools however are set to make TDD more accessible for embedded development.

In 2011 James Grenning published his book, Test Driven Development for Embedded C (Pragmatic Bookshelf). Six years later, this is still the authoritative reference for embedded test-first development and the entry point to TDD for many embedded software developers. It explains how TDD works in detail for an unfamiliar audience and addresses many of the traditional concerns, like how will this work with custom hardware. Today, the book is still completely relevant, but when it was published, the state-of-the art tools were simple unit test and mocking frameworks. These frameworks require a lot of boilerplate code to run tests, and any mock objects need to be created manually.

In the rest of the software world though, unit test tools are significantly more mature. In most other languages used for web and application development, it’s easy to create and run many unit tests, as well as to create mock objects automatically.
Since 2011, the current state of TDD tools has advanced considerably with the development of the open-source tool Ceedling. It automates running of unit tests and generation of mock objects in C applications, making it a lot easier to do TDD. Today, if you want to test-drive embedded software in C, you don’t need to roll-your-own test build system or mocks.

With better tools making unit testing easier, I suspect that in the future test-first development will be more widely adopted by embedded software developers. While previously relegated to the few early adopters willing to put in the effort, with tools lowering the barrier to entry it will be easier for everyone to do TDD.
Besides the tools to make TDD easier, another driving force behind greater adoption of test-first practices will be the simple need to produce better-quality embedded software. As embedded software continues its infiltration into all kinds of devices that run our lives, we’ll need to be able to deliver software that is more reliable and more secure.

Currently, unit tests for embedded software are most popular in regulated industries—like medical or aviation—where the regulators essentially force you to have unit tests. This is one part of a strategy to prevent you from hurting or killing people with your code. The rest of the “unregulated” embedded software world should take note of this approach.

With the rise of the Internet of things (IoT), our society is increasingly dependent on embedded devices connected to the Internet. In the future, the reliability and security of the software that runs these devices is only going to become more critical. There may not be a compelling business case for it now, but customers—and perhaps new regulators—are going to increasingly demand it. Test-first software can be one strategy to help us deal with this challenge.


This article appears in Circuit Cellar 318.


Matt Chernosky wants to help you build better embedded software—test-first with TDD. With years of experience in the automotive, industrial, and medical device fields, he’s excited about improving embedded software development. Learn more from Matt about getting started with embedded TDD at electronvector.com.

Taking the “Hard” Out of Hardware

There’s this belief among my software developer friends that electronics are complicated, hardware is hard, and that you need a degree before you can design anything to do with electricity. They honestly believe that building electronics is more complicated than writing intricate software—that is, the software which powers thousands of people’s lives all around the world. It’s this mentality that confuses me. How can you write all of this incredible software, but a believe a simple 555 timer circuit is complicated?

I wanted to discover where the idea that “hardware is hard” came from and how I could disprove it. I started with something with which almost everyone is familiar, LEGO. I spent my childhood playing with these tiny plastic bricks, building anything my seven-year-old mind could dream up, creating intricate constructions from seemingly simplistic pieces. Much like the way you build LEGO designs, electronic systems are built upon a foundation of simple components.

When you decide to design/build a system, you want to first start by breaking down the system into components and functional sections that are easy to understand. You can use this approach for both digital and analog systems. The example I like use to explain this is a phase-locked loop frequency modulator demodulator/detector, a seemingly complicated device used to decode frequency modulated radio signals. This system sounds like it would be impossible to build, especially for someone who isn’t familiar with electronics. I can recognize that from experience. I remember the first year of my undergraduate studies where my lecturers would place extremely detailed circuit schematics up on a chalkboard and expect us to be able to understand high-level functionality. I recall the panic this induced in a number of my peers and very likely put them off electronics in later years. One of the biggest problems that an electronics instructor faces is teaching complexity without scaring away students.

This essay appears in Circuit Cellar 317, December 2016. Subscribe to Circuit Cellar to read project articles, essays, interviews, and tutorials every month!

 

What many people either don’t realize or aren’t taught is that most systems can be broken down into composite pieces. The PLL FM demodulator breaks into three main elements: the phase detector, a voltage controlled oscillator (VCO) and a loop filter. These smaller pieces, or “building blocks,” can then be separated even further. For example, the loop filter—an element of the circuit used to remove high-frequency—is constructed from a simple combination of resistors, capacitors, and operational amplifiers (see Figure 1).Figure 1

I’m going to use a bottom-up approach to explain the loop filter segment of this system using simple resistors (R) and capacitors (C). It is this combination of resistors and capacitors allows you to create passive RC filters—circuits which work by allowing only specific frequencies to pass to the output. Figure 2 shows a low-pass filter. This is used to remove high-frequency signals from the output of a circuit. Note: I’m avoiding as much math as possible in this explanation, as you don’t need numerical examples to demonstrate behavior. That can come later! The performance of this RC filter can be improved by adding an amplification stage using an op-amp, as we’ll see next.Figure 2

Op-amps are a nice example of abstraction in electronics. We don’t normally worry about their internals, much like a CPU or other ICs, and rather treat them like functional boxes with inputs and an output. As you can see in Figure 3, the op-amp is working in a “differential” mode to try to equalize the voltages at its negative and positive terminals. It does this by outputting the difference and feeding it back to the negative terminal via a feedback loop created by the potential divider (voltage divider) at R2 and R3. The differential effect between the op-amp’s two input terminals causes a “boosted” output that is determined by the values of R2 and R3. This amplification, in combination with the low-pass passive filter, creates what’s known as a low-pass active filter.Figure 3

The low-pass active filter would be one of a number of filtering elements within the loop filter, and we already built up one of the circuit’s three main elements! This example starts to show how behavior is cumulative. As you gain knowledge about fundamental components, you’ll start to understand how more complex systems work. Almost all of electronic systems have this building block format. So, yes, there might be a number of behaviors to understand. But as soon as you learn the fundamentals, you can start to design and build complicated systems of your own!

Alex Bucknall earned a Bachelor’s in Electronic Engineering at the University of Warwick, UK. He is particularily interested in FPGAs and communications systems. Alex works as a Developer Evangelist for Sigfox, which is offering simple and low-energy communication solutions for the Internet of Things.

The Future of Ultra-Low Power Signal Processing

One of my favorite quotes comes from the IEEE Signal Processing magazine in 2010. They attempted to answer the question: What does ultra-low power consumption mean? And they came to the conclusion that it is where the “power source lasts longer than the useful life of the product.”[1] It’s a great answer because it’s scalable. It applies equally to signal processing circuitry inside an embedded IoT device that can never be accessed or recharged and to signal processing inside a car where the petrol for the engine dominates the operating lifetime, not the signal processing power. It also describes exactly what a lot of science fiction has always envisioned: no changing or recharging of batteries, which people forget to do or never have enough batteries for. Rather, we have devices that simply always work.Figure 1

My research focuses on healthcare applications and creating “wearable algorithms”—that is, signal processing implementations that fit within the very small power budgets available in wearable devices. Historically, this focused on data reduction to save power. It’s well known that wireless data transmission is very power intensive. By using some power to reduce the amount of data that has to be sent, it’s possible to save lots of power in the wireless transmission stage and so to increase the overall battery lifetime.

This argument has been known for a long time. There are papers dating back to at least the 1990s based on it. It’s also readily achievable. Inevitably, it depends on the precise situation, but we showed in 2014 that the power consumption of a wireless sensor node could be brought down to the level of a node without a wireless transmitter (one that uses local flash memory) using easily available, easy-to-use, off-the-shelf-devices.[2]

This essay appears in Circuit Cellar 316, November 2016. Subscribe to Circuit Cellar to read project articles, essays, interviews, and tutorials every month!

Today, there are many additional benefits that are being enabled by the emerging use of ultra-low power signal processing embedded in the wearable itself, and these new applications are driving the research challenges: increased device functionality; minimized system latency; reliable, robust operation over unreliable wireless links; reduction in the amount of data to be analyzed offline; better quality recordings (e.g., with motion artifact removal to prevent signal saturations); new closed-loop recording—stimulation devices; and real-time data redaction for privacy, ensuring personal data never leaves the wearable.

It’s these last two that are the focus for my research now. They’re really important for enabling new “bioelectronic” medical devices which apply electrical stimulation as an alternative to classical pharmacological treatments. These “bioelectronics” will be fully data-driven, analyzing physiological measurements in real-time and using this to decide when to optimally trigger an intervention. Doing such as analysis on a wearable sensor node though requires ultra-low power signal processing that has all of the feature extraction and signal classification operating within a power budget of a few 100 µW or less.

To achieve this, most works do not use any specific software platform. Instead they achieve very low-power consumption by using only dedicated and highly customized hardware circuits. While there are many different approaches to realizing low-power fully custom electronics, for the hardware, the design trends are reasonably established: very low supply voltages, typically in the 0.5 to 1 V range; highly simplified circuit architectures, where a small reduction in processing accuracy leads to substantial power savings; and the use of extensive analogue processing in the very lowest power consumption circuits.[3]

Less well established are the signal processing functions for ultra-low power. Focusing on feature extractions, our 2015 review highlighted that the majority (more than half) of wearable algorithms created to date are based upon frequency information, with wavelet transforms being particularly popular.[4] This indicates a potential over-reliance on time–frequency decompositions as the best algorithmic starting points. It seems unlikely that time–frequency decompositions would provide the best, or even suitable, feature extraction across all signal types and all potential applications. There is a clear opportunity for creating wearable algorithms that are based on other feature extraction methods, such as the fractal dimension or Empirical Mode Decomposition.

Investigating this requires studying the three-way trade-off between algorithm performance (e.g., correct detections), algorithm cost (e.g., false detections), and power consumption. We know how to design signal processing algorithms, and we know how to design ultra-low power circuitry. However, combining the two opens many new degrees of freedom in the design space, and there are many opportunities and work to do in mapping feature extractions and classifiers into sub-1-V power supply dedicated hardware.


[1] G. Frantz, et al, “Ultra-low power signal processing,” IEEE Signal Processing Magazine, vol. 27, no. 2, 2010.
[2] S. A. Imtiaz, A. J. Casson, and E. Rodriguez-Villegas, “Compression in Wearable Sensor Nodes,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 4, 2014.
[3] A. J. Casson, et al, “Wearable Algorithms,” in E. Sazonov and M. R. Neuman (eds.), Wearable Sensors, Elsevier, 2014.
[4] A. J. Casson, “Opportunities and Challenges for Ultra Low Power Signal Processing in Wearable Healthcare,” 23rd European Signal Processing Conference, Nice, 2015.


Alex Casson is a lecturer in the Sensing, Imaging, and Signal Processing Department at the University of Manchester. His research focuses on creating next-generation human body sensors, developing both the required hardware and software. Dr. Casson earned an undergraduate degree at the University of Oxford and a PhD from Imperial College London.

The Future of Biomedical Signal Analysis Technology

Biomedical signals obtained from the human body can be beneficial in a variety of scenarios in a healthcare setting. For example, physicians can use the noninvasive sensing, recording, and processing of a heart’s electrical activity in the form of electrocardiograms (ECGs) to help make informed decisions about a patient’s cardiovascular health. A typical biomedical signal acquisition system will consist of sensors, preamplifiers, filters, analog-to-digital conversion, processing and analysis using computers, and the visual display of the outputs. Given the digital nature of these signals, intelligent methods and computer algorithms can be developed for analysis of the signals. Such processing and analysis of signals might involve the removal of instrumentation noise, power line interference, and any artifacts that act as interference to the signal of interest. The analysis can be further enhanced into a computer-aided decision-making tool by incorporating digital signal processing methods and algorithms for feature extraction and pattern analysis. In many cases, the pattern analysis module is developed to reveal hidden parameters of clinical interest, and thereby improve the diagnostic and monitoring of clinical events.Figure1

The methods used for biomedical signal processing can be categorized into five generations. In the first generation, the techniques developed in the 1970s and 1980s were based on time-domain approaches for event analysis (e.g., using time-domain correlation approaches to detect arrhythmic events from ECGs). In the second generation, with the implementation of the Fast Fourier Transform (FFT) technique, many spectral domain approaches were developed to get a better representation of the biomedical signals for analysis. For example, the coherence analysis of the spectra of brain waves also known as electroencephalogram (EEG) signals have provided an enhanced understanding of certain neurological disorders, such as epilepsy. During the 1980s and 1990s, the third generation of techniques was developed to handle the time-varying dynamical behavior of biomedical signals (e.g., the characteristics of polysomnographic (PSG) signals recorded during sleep possess time-varying properties reflecting the subject’s different sleep stages). In these cases, Fourier-based techniques cannot be optimally used because by definition Fourier provides only the spectral information and doesn’t provide a time-varying representation of signals. Therefore, the third-generation algorithms were developed to process the biomedical signals to provide a time-varying representation, and   clinical events can be temporally localized for many practical applications.

This essay appears in Circuit Cellar 315, October 2016. Subscribe to Circuit Cellar to read project articles, essays, interviews, and tutorials every month!

These algorithms were essentially developed for speech signals for telecommunications applications, and they were adapted and modified for biomedical applications. The nearby figure illustrates an example of knee vibration signal obtained from two different knee joints, their spectra, and joint time-frequency representations. With the advancement in computing technologies, for the past 15 years, many algorithms have been developed for machine learning and building intelligent systems. Therefore, the fourth generation of biomedical signal analysis involved the automatic quantification, classification, and recognition of time-varying biomedical signals by using advanced signal-processing concepts from time-frequency theory, neural networks, and nonlinear theory.

During the last five years, we’ve witnessed advancements in sensor technologies, wireless technologies, and material science. The development of wearable and ingestible electronic sensors mark the fifth generation of biomedical signal analysis. And as the Internet of Things (IoT) framework develops further, new opportunities will open up in the healthcare domain. For instance, the continuous and long-term monitoring of biomedical signals will soon become a reality. In addition, Internet-connected health applications will impact healthcare delivery in many positive ways. For example, it will become increasingly effective and advantageous to monitor elderly and chronically ill patients in their homes rather than hospitals.

These technological innovations will provide great opportunities for engineers to design devices from a systems perspective by taking into account patient safety, low power requirements, interoperability, and performance requirements. It will also provide computer and data scientists with a huge amount of data with variable characteristics.

The future of biomedical signal analysis looks very promising. We can expect  innovative healthcare solutions that will improve everyone’s quality of life.

Sridhar (Sri) Krishnan earned a BE degree in Electronics and Communication Engineering at Anna University in Madras, India. He earned MSc and PhD degrees in Electrical and Computer Engineering at the University of Calgary. Sri is a Professor of Electrical and Computer Engineering at Ryerson University in Toronto, Ontario, Canada, and he holds the Canada Research Chair position in Biomedical Signal Analysis. Since July 2011, Sri has been an Associate Dean (Research and Development) for the Faculty of Engineering and Architectural Science. He is also the Founding Co-Director of the Institute for Biomedical Engineering, Science and Technology (iBEST). He is an Affiliate Scientist at the Keenan Research Centre at St. Michael’s Hospital in Toronto.

The Hunt for Power Remote Sensing

With the advent of the Internet of Things (IoT), the need for ultra-low power passive remote sensing is on the rise for battery-powered technologies. Always-on motion-sensing technologies are a great option to turn to. Digital cameras have come light years from where they were a decade ago, but low power they are not. When low-power technologies need always-on remote sensing, infrared motion sensors are a great option to turn to.

Passive infrared (PIR) sensors and passive infrared detectors (PIDs) are electronic devices that detect infrared light emitted from objects within their field of view. These devices typically don’t measure light per se; rather, they measure the delta of a system’s latent energy. This change generates a very small potential across a crystalline material (gallium nitride, cesium nitrate, among others), which can be amplified to create a usable signal.

Infrared technology was built on a foundation of older motion-sensing technologies that came before. Motion sensing was first utilized in the early 1940s, primarily for military purposes nearing the end of World War II. Radar and ultrasonic detectors were the progenitors of motion-sensing technologies seen today, relying on reflecting sound waves to determine the location of objects in a detection environment. Though effective for its purpose, its use was limited to military applications and was not a reasonable option for commercial users.

This essay appears in Circuit Cellar 314 (September 2016).

 
The viability of motion detection tools began to change as infrared-sensing options entered development. The birth of modern PIR sensors began towards the end of the sixties, when companies began to seek alternatives to the already available motion technologies that were fast becoming outdated.

The modern versions of these infrared motion sensors have taken root in many industries due to the affordability and flexibility of their use. The future of motion sensors is PID, and it has several advantages over its counterparts:

  • Saving Energy—PIDs are energy efficient. The electricity required to operate PIDs is minimal, with most units actually reducing the user’s energy consumption when compared to other commercial motion-sensing devices.
  • Inexpensive—Cost isn’t a barrier to entry for those wanting to deploy IR motion sensing technology. This sensor technology makes each individual unit affordable, allowing users to deploy multiple sensors for maximum coverage without breaking the bank.
  • Durability—It’s hard to match the ruggedness of PIDs. Most units don’t employ delicate circuitry that is easily jarred or disrupted; PIDs are routinely used outdoors and in adverse environments that would potentially damage other styles of detectors.
  • Simple and Small—The small size of PIDs work to their advantage. Innocuous sensors are ideal for security solutions that aren’t obtrusive or easily noticeable. This simplicity makes PIDs desirable for commercial security, when businesses want to avoid installing obvious security infrastructure throughout their buildings.
  • Wide Lens Range—The wide field of vision that PIDs have allow for comprehensive coverage of each location in which they are placed. PIDs easily form a “grid” of infrared detection that is ideal for detecting people, animals, or any other type of disruption that falls within the lens range.
  • Easy to Interface With—PIDs are flexible. The compact and simple nature of PIDs lets the easily integrate with other technologies, including public motion detectors for businesses and appliances like remote controls.

With the wealth of advantages PIDs have over other forms of motion-sensing technology, it stands to reason that PIR sensors and PIDs will have a place in the future of motion sensor development. Though other options are available, PIDs operate with simplicity, energy-efficiency, and a level of durability that other technologies can’t match. Though there are some exciting new developments in the field of motion-sensing technology, including peripherals for virtual reality and 3-D motion control, the reliability of infrared motion technology will have a definite role in the evolution of motion sensing technology in the years to come.

As the Head Hardware Engineer at Cyndr (www.cyndr.co), Kyle Engstrom is the company’s lead electron wrangler and firmware designer. He specializes in analog electronics and power systems. Kyle has bachelor’s degrees in electrical engineering and geology. His life as a rock hound lasted all of six months before he found his true calling in engineering. Kyle has worked three years in the aerospace industry designing cutting-edge avionics.

Software-Programmable FPGAs

Modern workloads demand higher computational capabilities at low power consumption and cost. As traditional multi-core machines do not meet the growing computing requirements, architects are exploring alternative approaches. One solution is hardware specialization in the form of application specific integrated circuits (ASICs) to perform tasks at higher performance and lower power than software implementations. The cost of developing custom ASICs, however, remains high. Reconfigurable computing fabrics, such as field-programmable gate arrays (FPGAs), offer a promising alternative to custom ASICs. FPGAs couple the benefits of hardware acceleration with flexibility and lower cost.

FPGA-based reconfigurable computing has recently taken the spotlight in academia and industry as evidenced by Intel’s high-profile acquisition of Altera and Microsoft’s recent announcement to deploy thousands of FPGAs to speed up Bing search. In the coming years, we should expect to see hardware/software co-designed systems supported by reconfigurable computing to become common. Conventional RTL design methodologies, however, cannot productively manage the growing complexity of algorithms we wish to accelerate using FPGAs. Consequently, FPGA programmability is a major challenge that must be addressed both technologically by leveraging high-level software abstractions (e.g., language and compilers), run-time analysis tools, and readily available libraries and benchmarks, as well as scholastically through the education of rising hardware/software engineers.

Recent efforts related to software-programmable FPGAs have focused on designing high-level synthesis (HLS) compilers. Inspired by classical C-to-gates tools, HLS compilers automatically transform programs written in traditional untimed software languages to timed hardware descriptions. State-of-the-art HLS tools include Xilinx’s Vivado HLS (C/C++) and SDAccel (OpenCL) as well as Altera’s OpenCL SDK. Although HLS is effective at translating C/C++ or OpenCL programs to RTL hardware, compilers are only a part of the story in realizing truly software-programmable FPGAs.

 
Efficient memory management is central to software development. Unfortunately, unlike traditional software programming, current FPGA design flows require application-specific memories to sustain high performance hardware accelerators. Features such as dynamic memory allocation, pointer chasing, complex data structures, and irregular memory access patterns are also ill-supported by FPGAs. In lieu of basic software memory abstractions techniques, experts must design custom hardware memories. Instead, more extensible software memory abstractions would facilitate software-programmability of FPGAs.

In addition to high-level programming and memory abstractions, run-time analysis tools such as debuggers and profilers are essential to software programming. Hardware debuggers and profilers in the form of hardware/co-simulation tools, however, are not ready for tackling exascale systems. In fact, one of the biggest barriers to realizing software-programmable FPGAs are the hours, even days, it takes to generate bitstreams and run hardware/software co-simulators. Lengthy compilation and simulation times cause debugging and profiling to consume the majority of FPGA development cycles and deter agile software development practices. The effect is compounded when FPGAs are integrated into heterogeneous systems with CPUs and GPUs over complex memory hierarchies. New tools, following architectural simulators, may aid in rapidly gathering performance, power, and area utilization statistics for FPGAs in heterogeneous systems. Another solution to long compilation and simulation times is using overlay architectures. Overlay architectures mask the FPGA’s bit-level configurability with a fixed network of simple processing nodes. The fixed hardware in overlay architectures enables faster programmability at the expense of finer grained, bit-level parallelism of FPGAs.

Another key facet of software programming is readily available libraries and benchmarks. Current FPGA development is marred with vendor specific IPs cores that span limited domains. As FPGAs become more software-programmable, we should expect to see more domain experts providing vendor agnostic FPGA-based libraries and benchmarks. Realistic, representative, and reproducible vendor-agnostic libraries and benchmarks will not only make FPGA development more accessible but also serve as reference solutions for developers.

Finally, the future of software-programmable FPGAs lies not only in technological advancements but also in educating the next generation of hardware/software co-designing engineers. Software engineers are rarely concerned with the downstream architecture except when exercising expert optimizations. Higher-level abstractions and run-time analysis tools will improve FPGA programmability but developers will still need a working knowledge of FPGAs to design competitive hardware accelerators. Following reference libraries and benchmarks, software engineers must become fluent with the notion of pipelining, unrolling, partitioning memory into local SRAM blocks and hardened IPs. Terms like throughout, latency, area utilization, power and cycle time will enter software engineering vernacular.

Recent advances in HLS compilers have demonstrated the feasibility of software-programmable FPGAs. Now, a combination of higher-level abstractions, run-time analysis tools, libraries and benchmarks must be pioneered alongside trained hardware/software co-designing engineers to realize a cohesive software engineering infrastructure for FPGAs.
 

Udit Gupta earned a BS in Electrical and Computer Engineering at Cornell University. He is currently studying toward a PhD in Computer Science at Harvard University. Udit’s past research includes exploring software-programmable FPGAs by leveraging intelligent design automation tools and evaluating high-level synthesis compilers with realistic benchmarks. He is especially interested in vertically integrated systems—exploring the computing stack from applications, tools, languages, and compilers to downstream architectures