Boards, Boxes and SoCs
There’s a big difference between AI in the abstract sense and AI implemented in embedded systems. Smoothing the way, processing technologies including FPGAs, GPUs, NPUs and dedicated AI SoCs are being designed into a variety of solutions in board, box, development kit and chip IP formats.
While Artificial Intelligence (AI) is very much a topic of mainstream discussion these days, there’s a whole set of challenges that emerge when AI is implemented in embedded systems—especially in IoT edge applications. Today, there’s no one way to do AI or machine learning (ML) from a processing standpoint. FPGAs, NPUs, GPUs and even dedicated AI/ML systems-on-chips (SoCs) are all part of the landscape when it comes to AI.
Perhaps more significant for embedded system developers, is that all those processing approaches are becoming available in embedded-friendly formats—including embedded boards, box-level systems, complete development kits and even chip-level IP (intellectual property). Over the past 12 months, a new crop of those products has emerged, designed to serve the needs of both general and application-specific AI requirements.
GPUs DRIVE AI
There’s no question that Nvidia has been a pioneer when it comes to the idea of using GPU processing technology for AI kinds of computing. In terms of compute density, the GPU approach continues to excel in small form factor embedded implementations. The latest solution along those lines from Nvidia is its Jetson Xavier NX, announced in November.
With a form factor smaller than the size of a credit card, the company is touting it as “the world’s smallest, most powerful AI supercomputer for robotic and embedded computing devices at the edge.” Jetson Xavier NX provides up to 14 TOPS (at 10 W) or 21 TOPS (at 15 W), running multiple neural networks in parallel and processing data from multiple high-resolution sensors simultaneously in a Nano form factor (70 mm × 45 mm) (Figure 1). For developers already building embedded machines, Jetson Xavier NX runs on the same CUDA-X AI software architecture as all Jetson offerings.
The Jetson Xavier NX is based on a Nvidia Volta GPU with 384 Nvidia CUDA cores and 48 Tensor Cores, plus 2x NVDLA. The module’s CPU is 6-core Carmel Arm 64-bit processor. Video support includes 2x 4K30 encode and 2x 4K60 encode. It supports up to six CSI cameras (36 via virtual channels)—12 lanes (3×4 or 6×2) MIPI CSI-2. Rounding the module’s features are 8 GB 128-bit LPDDR4x DRAM and Gbit Ethernet.
Jetson Xavier NX is aimed at embedded edge computing devices that demand increased performance but are constrained by size, weight and power budgets or cost. These include small commercial robots, drones, intelligent high-resolution sensors for factory logistics and production lines, optical inspection, network video recorders, portable medical devices and other industrial IoT systems. Jetson Xavier NX is supported by Nvidia’s JetPack software development kit, which is a complete AI software stack that can run modern and complex AI networks, accelerated libraries for deep learning as well as computer vision, computer graphics, multimedia and more.
Nvidia also announced that it topped all five benchmarks measuring the performance of AI inference workloads in data centers and at the edge. The results of MLPerf Inference 0.5—the industry’s first independent AI benchmark for inference—demonstrate the inference capabilities of Nvidia Turing GPUs for data centers and the Nvidia Xavier SoC for edge. The Jetson Xavier NX module is built around a new low-power version of the Xavier SoC used in these benchmarks.
AI FOR VISION SYSTEMS
Vision systems is another application category where AI is providing powerful solutions. As an example, along those lines, in September, Microsoft announced making its “Vision AI Developer Kit” broadly available. The kit comes with an 8MP, 4K camera that runs Linux on Qualcomm’s 10 nm, AI-enabled QCS603 SoC (Figure 2). The kit is aimed at AI edge developers using Azure IoT Edge and Azure Machine Learning.
The Vision AI Developer Kit was originally announced in May 2018 as a collaboration with Qualcomm, which had recently announced its Qualcomm Vision Intelligence Platform. This hardware/software platform for deep learning was deployed with its 10 nm fabricated octa-core QCS605 and quad-core QCS603 SoCs.
There are three development paths available with the kit. The first is “no code”, using Custom Vision, Azure Cognitive Service, custom models with Azure Machine Learning, and the Visual Studio Code IDE. Aimed at novices, Custom Vision walks users through the process of uploading data, training and deploying customer vision models including image tagging. With the Vision AI Developer Kit, users can then use Azure IoT Hub to deploy a custom vision model directly to the kit.
The second path is for advanced users. They can use the Azure Cognitive Service with visual drag and drop tools for AML development. Reference implementations provided in Jupyter notebooks walk you through the steps to upload training data to Azure Blob Storage, run a transfer learning experiment, convert the trained model to be compatible with the developer kit platform and deploy via Azure IoT Edge.
Finally, more advanced developers can use the kit’s extension for Visual Studio Code, which offers sample Python modules, pre- built Azure IoT deployment configurations and Dockerfiles for container creation and deployment. Visual Studio Code can also add business logic to existing Azure solutions based on camera input sent via Azure IoT Hub to transform the image data into normalized datastreams using Azure Stream Analytics.
Open-spec SBCs have become an important resource among embedded systems developers. And AI has found its way into that realm too. As an example, in September, the BeagleBoard.org Foundation made its BeagleBone AI SBC available after announcing it back February. The BeagleBone AI’s 1.5 GHz, dual-core Cortex-A15 Texas Instruments (TI) Sitara AM5729 is a major performance boost over the 1 GHz, Cortex-A8 Sitara AM3358 found on the earlier BeagleBone Black and its many variants.
BeagleBone AI is designed to make it easy to explore how AI can be used in everyday life via the TI C66x DSP cores and embedded-vision-engine (EVE) cores supported through an optimized TIDL machine learning OpenCL API with pre-installed tools (Figure 3). The EVE cores, which use a Vision AccelerationPac architecture, provide a programmable imaging and vision processing engine for AI acceleration.
The AM5729 processor adds 2x TI C66x DSPs and 2x Cortex-M4 MCUs in addition to the EVE cores. Like the earlier AM3358 SoC, the AM5729 is equipped with 4x programmable, real-time PRU-ICS (Programmable Real-Time Unit and Industrial Communication SubSystem) cores. The BeagleBone AI doubles the RAM to 1 GB and quadruples the eMMC storage to a 16 GB eMMC compared to the BeagleBone Black. The board provides a GbE port (up from 10/100), as well as 802.11ac WiFi and Bluetooth 4.2. A USB 3.0 Type-C OTG port supports power input, and there’s also a USB 2.0 host port.
AI FOR STREET LIGHTING
The emergence of Smart Cities is bringing with it a variety of demands to enhance many aspects of connected systems that leverage AI. A great example is the fanless, AI-enhanced “Atlas” embedded computer announced by Aaeon and Intel in November. It is designed to be integrated with a variety of different streetlamp designs. The computer is based on an Intel Apollo Lake SoC and an Intel Myriad X VPU.
The Atlas consolidates data streams from smart lighting, environmental monitoring and video sensors. Atlas works at the edge to allow city administrators to create a connected data hub that enables numerous new applications. For example, when the Atlas edge node is deployed in conjunction with a camera, network video recorder (NVR) and management system, it can power a citywide parking application. Car drivers can park on city streets without having to fish for change or pay at a meter—nearby smart cameras on streetlights can capture and classify images such as license plates using computer vision and charge the owner directly for the exact parking time via a gateway and cloud-based management system connected to a department of motor vehicles database.
For Intel, the Atlas provides a means for the company to showcase the Intel Vision Products line. This includes its dual Movidius Myriad X VPUs, deployed via an Aaeon AI Core XM 2280 M.2 module. The other key element is Intel’s OpenVINO AI toolkit for running AI inferences on frameworks such as TensorFlow or Caffe.
An online Intel product brief describes how Aaeon’s Atlas can be added onto existing infrastructure, such as street and traffic lights. The node streamlines multiple overlapping efforts by city agencies and saves on costs (Figure 4) by helping agencies consolidate their data and application. Atlas includes outdoor smart lighting control for streetlight systems and environmental monitoring using multiple gas sensors. The edge node can be used to monitor traffic flow, parking availability, pedestrian crossings, seismic activity, trash receptacle overflow and even atmospheric changes in addition to numerous other use cases.
FROM SOC TO BOARD-LEVEL AI
For its part, MediaTek offers a trio of SoC solutions aimed at smart home systems. The chips are highly integrated, specially designed, extreme low-power solutions for short and medium range connectivity in the home. These AIoT SoCs include its i300, i500 and i700 products. In August, LinuxGizmos.com, Circuit Cellar’s sister website, reported on Innocomm’s preliminary information on two system-on-modules (SoMs) based on MediaTek SoCs. The SB30 SoM is based on MediaTek’s 1.5 GHz, quad-core, Cortex-A35 based MediaTek i300 (MT8362) SoC. And the SB50 SoM is based on the MediaTek i500 (MT8385).
The SB50 SoM (also called SB50 MTK i500 SoM) is designed for AI/AR/VR applications (Figure 5). Its MediaTek i500 SoC was announced with the MediaTek i300 in April. In July, MediaTek followed up with its high-end MediaTek i700 (AI IoT platform i700). More on the i700 on the next page. MediaTek designed all three of these “AIoT” platforms for media-enhanced edge computing, with the i500 and i700 also targeting AI on the edge.
The MediaTek i500 SoC is the mid-range model of the three. The SoC combines 4x Cortex-A73 and 4x Cortex A53 cores, all clocked at 2.0 GHz. There’s also an 800 MHz Arm Mali-G72 MP3 and a 500 MHz AI processor (APU) for deep learning, neural network acceleration and computer vision applications. The “cost-effective MediaTek i300 (MT8362) inside the SB30 SoM is a more stripped down offering. The SoC is built around 4x power-efficient Cortex-A35 cores clocked up to 1.5 GHz.
Innocomm’s SB30 SoM (SB30 MTK i300 SoM) is designed for audio/video, kiosk, digital signage and fitness console applications. It combines the MediaTek i300 with 1 GB or 2 GB LPDDR3, 16 GB eMMC and either dual-band 802.11ac and Bluetooth 5.0 or 2.4 GHz 802.11n with Bluetooth 4.0. The SB30’s media interfaces include MIPI-DSI, LVDS and HDMI 1.4a, as well as I2S for audio. It also has USB 2.0 host and OTG connections plus I2C, SPI and UART.
SOC FOR AI IMAGE RECOGNITION
MediaTek’s most recent AI SoC is its i700 device, which was launched in July. The SoC features high speed edge AI computation for rapid image recognition. According to the company, the i700 platform can be applied in a wide range of scenarios, including smart city, smart building and smart manufacturing. As an integrated platform encompassing CPU, GPU, ISP and a dedicated AI processing unit, it enables embedded system developers to accelerate the development of AI-enabled consumer IoT products.
The i700 platform is powered by an octa-core CPU, which includes two 2.2 GHz Arm Cortex-A75 cores and six 2.0 GHz Arm Cortex-A55 cores, operating alongside an IMG 9XM-HP8 ISP that is clocked at 970 MHz. The AI IoT platform supports CorePilot, which ensures processing resources are allocated efficiently across all cores for maximum performance and battery efficiency.
An improved AI engine comprising a dedicated dual-core AI processor, AI Accelerator and AI Face Detection Engine is embedded in the i700 platform. Combined, the new components enable the platform to perform AI computations five times faster the predecessor i500. In addition, i700 supports MediaTek’s NeuroPilot SDK.
Compatible with Google’s Android Neural Networks API, it provides an open platform for customers and device manufacturers to take full advantage of common industry frameworks including TensorFlow, TF Lite, Caffe and Caffe2 when developing new applications.
The i700 supports camera configurations of up to a single 32MP camera or a 24MP+16MP dual-camera setup. It also supports accurate, zero-latency image recognition on 30 fps videos shot on a 32MP camera, as well as high speed recognition on high resolution 120 fps slow motion videos. Furthermore, the upgraded tri-core ISP supports 14-bit RAW and 10-bit YUV processing, while the AI Face Detection engine enables fast facial recognition. The MediaTek i700 is expected to be globally available starting in 2020.
MACHINE LEARNING IP FOR ARM
Arm-based processors are in a wide variety of embedded devices. Indeed, a good percentage of today’s microcontrollers and embedded processors these days are based on Arm CPU cores. It’s a natural progression perhaps that Arm has gotten into the AI game with technology for mainstream consumer devices. Arm says that, while CPUs and GPUs are ML powerhouses in their own right, where the most intensive and efficient performance is required, they can struggle to meet requirements. For those tasks, a dedicated NPU (neural processing units) has advantages in efficiency. Along just those lines, in October Arm announced new IP in form of two NPUs (neural processing unit) aimed at AI and ML applications.
The two new processors are the Ethos-N57 and Ethos-N37 NPUs and they follow the earlier introduction of the Arm ML processor (now referred to as the Ethos-N77). Arm Ethos is a suite of products designed to solve complex AI and ML compute challenges allowing the creation of more personalized, immersive experiences in everyday devices (Figure 6). As consumer devices become smarter there is a need for additional AI performance and efficiency via dedicated ML processors, says Arm. They are optimized for the most cost and battery life-sensitive designs.
The NPUs support Int8 and Int16 datatypes and offer performance enhancement techniques such as Winograd. They also provide advanced data management techniques minimizing data movement and associated power. The Ethos-N57 features 8x compute engines and supports up to 2-TOPS AI performance using 1024 8-bit MACs. It’s designed for smart home hubs, mainstream smartphones and digital TVs.
The Ethos-N37 measures only one square millimeter. It offers 4x compute engines for up to 1-TOPS AI performance using 512 8-bit MACs. The Ethos-N37 is intended for entry-level phones and smart devices such as smart cameras. On the high end, the up to 4-TOPS Ethos-N77 (Arm ML) has a much larger 1 MB to 4 MB memory footprint than the 512 KB-ready Ethos-N57 and -N37. It targets premium smartphones, AR/VR and computational photography applications.
AI FOR RAILWAY DESIGNS
Even railway systems are hungry to employ the benefits of AI. Feeding that need, in May ADLINK Technology released its latest rugged, fanless NVIDIA Quadro embedded AIoT (AI and IoT) platform, the AVA-5500, designed for real-time video/graphics analytics applications in the rail industry (Figure 7). According to ADLINK, AI is making railway operations safer, smarter and more reliable, significantly enhancing passenger travel experience and freight logistics services.
AI-driven applications only function with proper data input that is collected by massive numbers of IoT devices installed in stations, on trains and along tracks. A successful implementation of such rail applications requires a seamless integration of AI and IoT technologies, says ADLINK. Powered by an Intel Core i7 processor and integrated NVIDIA Quadro GPGPU module, ADLINK’s EN50155 certified AVA-5500 AIoT Platform is ruggedized for both wayside and onboard deployment with its wide range DC input and isolated I/O design.
The unit also provides an edge solution for real-time video/graphic analysis applications necessary for today’s increasingly complex railroad operations. To meet varying application requirements, the AVA-5500 is also available in variants featuring an additional two USB 2.0 via M12 connectors and two 2.5″ SATA 6Gb/s drive bays, as well as a version supporting +12 VDC power input only.
The AVA-5500 is being tested and deployed commercially by leading rail system integrators worldwide, according to the company. In one application, the intelligent platform is installed on special rail inspection trains to process captured images of key wayside equipment in real-time. With a sophisticated algorithm driven by parallel computing and deep learning, the application can effectively identify potential equipment faults at a train speed of 120 km/h, and raise the alarm to notify maintenance crews. In another application, the AVA-5500 is used in a train station control office to analyze the real-time video stream received from the platform. The application is able to not only detect suspicious behaviors and trigger alerts, but also conduct post-event analysis.
FPGAS AND EMBEDDED AI
FPGAs offer another attractive approach to AI, especially where flexibility is key. Xilinx offers an AI Platform, which it calls “Xilinx/Deephi Core.” The platform was developed partly as a result of Xilinx’s acquisition of DeePhi Technology in July 2018. DeePhi was a Beijing, China-based start-up with expertise in in machine learning, deep compression, pruning and system-level optimization for neural networks.
In October, iWave Systems launched its “iW-Rainbow G30D Zynq Ultrascale+ MPSoC Development Kit” for its iW-Rainbow G30M compute module, based on the Arm Cortex-A53/FPGA Xilinx Zynq UltraScale+ MPSoC (Figure 8). In the announcement, iWave focused mostly on the platform’s ability to test the new Xilinx AI Platform, which it calls Xilinx/Deephi core.
The kit’s Zynq UltraScale+ MPSoC SOM features an intelligent blend of MPSoC and FPGA functionality in an Arm plus Xilinx FPGA architecture. The heterogeneous Arm multicore processors complement the edge applications with high-performance non-real-time processing—such as system boot, peripherals management, server communication and so on—while offloading the FPGA to execute critical real-time tasks using Deephi algorithms.
Deephi core platforms integrate both hardware and software components, presenting a comprehensive framework for AI/ML acceleration in applications such as face recognition, real-time surveillance, image/pose detection and so forth. With its AI/ML capabilities, the Xilinx/ Deephi core platform allows high-level adaptiveness to various workload characteristics and complement edge applications with ultra-low latency real-time inference. iWave supports an extensive portfolio of Deephi cores based on various application needs.
In contrast with AI used in a generic office environment, embedded AI must be implemented with the right size, volume and power capabilities to be able to operate in the intended application. Clearly, a lot of choices are available in the form of embedded boards, box-level systems, development kits and chip-level IP to meet the wide range of needs for embedded AI. Those choices will only expand as these technology vendors and others roll out new solutions.
For detailed article references and additional resources go to:
Aaeon | www.aaeon.com
ADLINK Technology | www.adlinktech.com
Arm | www.arm.com
BeagleBone.org Foundation | www.beaglebone.org
Innocomm | www.innocomm.com
Intel | www.intel.com
iWave Systems | www.iwavesystems.com
MediaTek | www.mediatek.com
Microsoft | www.microsoft.com
Nvidia | www.nvidia.com
Qualcomm | www.qualcomm.com
Xilinx | www.xilinx.com
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • DECEMBER 2019 #353 – Get a PDF of the issueBecome a Sponsor