FPGA Solutions Evolve to Meet AI Needs

Brainy System ICs

Long gone now are the days when FPGAs were thought of as simple programmable circuitry for interfacing and glue logic. Today, FPGAs are powerful system chips with on-chip processors, DSP functionality and high-speed connectivity.

By Jeff Child, Editor-in-Chief

Today’s FPGAs have now evolved to the point that calling them “systems-on-chips” is redundant. It’s now simply a given that the high-end lines of the major FPGA vendors have general-purpose CPU cores on them. Moreover, the flavors of signal processing functionality on today’s FPGA chips are ideally suited to the kind of system-oriented DSP functions used in high-end computing. And even better, they’ve enabled AI (Artificial Intelligence) and Machine Learning kinds of functionalities to be implemented into much smaller, embedded systems.

In fact, over the past 12 months, most of the leading FPGA vendors have been rolling out solutions specifically aimed at using FPGA technology to enable AI and machine learning in embedded systems. The two main FPGA market leaders Xilinx and Intel’s Programmable Solutions Group (formerly Altera) have certainly embraced this trend, as have many of their smaller competitors like Lattice Semiconductor and QuickLogic. Meanwhile, specialists in so-called e-FPGA technology like Archonix and Flex Logix have their own compelling twist on FPGA system computing.

Project Brainwave

Exemplifying the trend toward FPGAs facilitating AI processing, Intel’s high-performance line of FPGAs is its Stratix 10 family. According to Intel, the Stratix 10 FPGAs are capable of 10 TFLOPS, or 10 trillion floating point operations per second (Figure 1). In May Microsoft announced its Microsoft debuted its Azure Machine Learning Hardware Accelerated Models powered by Project Brainwave integrated with the Microsoft Azure Machine Learning SDK. Azure’s architecture is developed with Intel FPGAs and Intel Xeon processors.

Figure 1
Stratix 10 FPGAs are capable of 10 TFLOPS or 10 trillion floating point operations per second.

Intel says its FPGA-powered AI is able to achieve extremely high throughput that can run ResNet-50, an industry-standard deep neural network requiring almost 8 billion calculations without batching. This is possible using FPGAs because the programmable hardware—including logic, DSP and embedded memory—enable any desired logic function to be easily programmed and optimized for area, performance or power. And because this fabric is implemented in hardware, it can be customized and can perform parallel processing. This makes it possible to achieve orders of magnitudes of performance improvements over traditional software or GPU design methodologies.

In one application example, Intel cites an effort where Canada’s National Research Council (NRC) is helping to build the next-generation Square Kilometer Array (SKA) radio telescope to be deployed in remote regions of South Africa and Australia, where viewing conditions are most ideal for astronomical research. The SKA radio telescope will be the world’s largest radio telescope that is 10,000 times faster with image resolution 50 times greater than the best radio telescopes we have today. This increased resolution and speed results in an enormous amount of image data that is generated by these telescopes, processing the equivalent of a year’s data on the Internet every few months.

NRC’s design embeds Intel Stratix 10 SX FPGAs at the Central Processing Facility located at the SKA telescope site in South Africa to perform real-time processing and analysis of collected data at the edge. High-speed analog transceivers allow signal data to be ingested in real time into the core FPGA fabric. After that, the programmable logic can be parallelized to execute any custom algorithm optimized for power efficiency, performance or both, making FPGAs the ideal choice for processing massive amounts of real-time data at the edge.

ACAP for Next Gen

For its part, Xilinx’s high-performance product line is its Virtex UltraScale+ device family (Figure 2). According to the company, these provide the highest performance and integration capabilities in a FinFET node, including the highest signal processing bandwidth at 21.2 TeraMACs of DSP compute performance. They deliver on-chip memory density with up to 500 Mb of total on-chip integrated memory, plus up to 8 GB of HBM Gen2 integrated in-package for 460 GB/s of memory bandwidth. Virtex UltraScale+ devices provide capabilities with integrated IP for PCI Express, Interlaken, 100G Ethernet with FEC and Cache Coherent Interconnect for Accelerators (CCIX).

Figure 2
Virtex UltraScale+ FPGAs provide a signal processing bandwidth at 21.2 TeraMACs. They deliver on-chip memory density with up to 500 Mb of total on-chip integrated memory, plus up to 8 GB of HBM Gen2 integrated in-package for 460 GB/s of memory bandwidth.

Looking to the next phase of system performance, Xilinx in March announced its strategy toward a new FPGA product category it calls its adaptive compute acceleration platform (ACAP). Touted as going beyond the capabilities of an FPGA, an ACAP is a highly integrated multi-core heterogeneous compute platform that can be changed at the hardware level to adapt to the needs of a wide range of applications and workloads. An ACAP’s adaptability, which can be done dynamically during operation, delivers levels of performance and performance per-watt that is unmatched by CPUs or GPUs, says Xilinx… …

Read the full article in the August 337 issue of Circuit Cellar

Don’t miss out on upcoming issues of Circuit Cellar. Subscribe today!

Note: We’ve made the October 2017 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Dev Kit Enables Cars to Express Their Emotions

Renesas Electronics has announced that it has developed a development kit for its R-Car that takes advantage of “emotion engine”, an artificial sensibility and intelligence technology pioneered by cocoro SB Corp. The new development kit enables cars with the sensibility to read the driver’s emotions and optimally respond to the driver’s needs based on their emotional state.

The development kit includes cocoro SB’s emotion engine, which was developed leveraging its sensibility technology to recognize emotional states such as confidence or uncertainty based on the speech of the driver. The car’s response to the driver’s emotional state is displayed by a new driver-attentive user interface (UI) implemented in the Renesas R-Car system-on-chip (SoC). Since it is possible for the car to understand the driver’s words and emotional state, it can provide the appropriate response that ensures optimal driver safety.

20170719-verbal-emotion-recognition-engine-st

As this technology is linked to artificial intelligence (AI) based machine learning, it is possible for the car to learn from conversations with the driver, enabling it to transform into a car that is capable of providing the best response to the driver. Renesas plans to release the development kit later this year.

Renesas  demonstrated its connected car simulator incorporating the new development kit based on cocoro SB’s emotion engine at the SoftBank World 2017 event earlier this month in held by SoftBank at the Prince Park Tower Tokyo.

Renesas considers the driver’s emotional state, facial expression and eyesight direction as key information that combines with the driver’s vital signs to improve the car and driver interface, placing drivers closer to the era of self-driving cars. For example, if the car can recognize the driver is experiencing an uneasy emotional state, even if he or she has verbally accepted the switch to hands free autonomous-driving mode, it is possible for the car to ask the driver “would you prefer to continue driving and not switch to autonomous-driving mode for now?” Furthermore, understanding the driver’s emotions enables the car to control vehicle speed according to how the driver is feeling while driving at night in autonomous-driving mode. By providing carmakers and IT companies with the development kit that takes advantage of this emotion engine, Renesas hopes to expand the possibilities for this service model to the development of new interfaces between cars and drivers and other mobility markets that can take advantage of emotional state information. Based on the newly-launched Renesas autonomy, a new advanced driving assistance systems (ADAS) and automated driving platform, Renesas enables a safe, secure, and convenient driving experience by providing next-generation solutions for connected cars.

Renesas Electronics America | www.renesas.com

Natural Human-Computer Interaction

Recent innovations in both hardware and software have brought on a new wave of interaction techniques that depart from mice and keyboards. The widespread adoption of smartphones and tablets with capacitive touchscreens shows people’s preference to directly manipulate virtual objects with their hands.

Going beyond touch-only interaction, the Microsoft Kinect sensor enables users to play

This shows the hand tracking result from Kinect data. The red regions are our tracking results and the green lines are the skeleton tracking results from the Kinect SDK (based on data from the ChAirGest corpus: https://project.eia-fr.ch/chairgest/Pages/Overview.aspx).

This shows the hand tracking result from Kinect data. The red regions are our tracking results and the green lines are the skeleton tracking results from the Kinect SDK (based on data from the ChAirGest corpus: https://project.eia-fr.ch/chairgest/Pages/Overview.aspx).

games with their entire body. More recently, Leap Motion’s new compact sensor, consisting of two cameras and three infrared LEDs, has opened up the possibility of accurate fingertip tracking. With Project Glass, Google is pioneering new technology in the wearable human-computer interface. Other new additions to wearable technology include Samsung’s Galaxy Gear Smartwatch and Apple’s rumored iWatch.

A natural interface reduces the learning curve, or the amount of time and energy a person requires to complete a particular task. Instead of a user learning to communicate with a machine through a programming language, the machine is now learning to understand the user.

Hardware advancements have led to our clunky computer boxes becoming miniaturized, stylish sci-fi-like phones and watches. Along with these shrinking computers come ever-smaller sensors that enable a once keyboard-constrained computer to listen, see, and feel. These developments pave the way to natural human-computer interfaces.
If sensors are like eyes and ears, software would be analogous to our brains.

Understanding human speech and gestures in real time is a challenging task for natural human-computer interaction. At a higher level, both speech and gesture recognition require similar processing pipelines that include data streaming from sensors, feature extraction, and pattern recognition of a time series of feature vectors. One of the main differences between the two is feature representation because speech involves audio data while gestures involve video data.

For gesture recognition, the first main step is locating the user’s hand. Popular libraries for doing this include Microsoft’s Kinect SDK or PrimeSense’s NITE library. However, these libraries only give the coordinates of the hands as points, so the actual hand shapes cannot be evaluated.

Fingertip tracking using a Kinect sensor. The green dots are the tracked fingertips.

Our team at the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory has developed methods that use a combination of skin-color and motion detection to compute a probability map of gesture salience location. The gesture salience computation takes into consideration the amount of movement and the closeness of movement to the observer (i.e., the sensor).

We can use the probability map to find the most likely area of the gesturing hands. For each time frame, after extracting the depth data for the entire hand, we compute a histogram of oriented gradients to represent the hand shape as a more compact feature descriptor. The final feature vector for a time frame includes 3-D position, velocity, and hand acceleration as well as the hand shape descriptor. We also apply principal component analysis to reduce the feature vector’s final dimension.

A 3-D model of pointing gestures using a Kinect sensor. The top left video shows background subtraction, arm segmentation, and fingertip tracking. The top right video shows the raw depth-mapped data. The bottom left video shows the 3D model with the white plane as the tabletop, the green line as the arm, and the small red dot as the fingertip.

The next step in the gesture-recognition pipeline is to classify the feature vector sequence into different gestures. Many machine-learning methods have been used to solve this problem. A popular one is called the hidden Markov model (HMM), which is commonly used to model sequence data. It was earlier used in speech recognition with great success.

There are two steps in gesture classification. First, we need to obtain training data to learn the models for different gestures. Then, during recognition, we find the most likely model that can produce the given observed feature vectors. New developments in the area involve some variations in the HMM, such as using hierarchical HMM for real-time inference or using discriminative training to increase the recognition accuracy.

Ying Yin

Ying Yin is a PhD candidate and a Research Assistant at the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory. Originally from Suzhou, China, Ying received her BASc in Computer Engineering from the University of British Columbia in Vancouver, Canada, in 2008 and an MS in Computer Science from MIT in 2010. Her research focuses on applying machine learning and computer vision methods to multimodal human-computer interaction. Ying is also interested in web and mobile application development. She has won awards in web and mobile programming competitions at MIT.

Currently, the newest development in speech recognition at the industry scale is a method called deep learning. Earlier machine-learning methods require careful selection of feature vectors. The goal of deep learning is automatic discovery of powerful features from raw input data. So far, it has shown promising results in speech recognition. It can possibly be applied to gesture recognition to see whether it can further improve accuracy.

As component form factors shrink, sensor resolutions grow, and recognition algorithms become more accurate, natural human-computer interaction will become more and more ubiquitous in our everyday life.