Modular AI Unleashed
Artificial Intelligence is conquering more and more applications and areas of life: image detection and classification, translation and recommendation systems, to name just a few. The volume of applications being built on Machine Learning technology is large and growing. By utilizing a standard System on Module (SOM), (shown above) that combines an FPGA with an Arm Processor, it has never been so easy to utilize the power of AI offline and on the edge.
Artificial Intelligence (AI) has a long history and was a recognized academic discipline by 1955. AI is the ability of a computer to mimic human intelligence, learn from experiences, adapt to new information and perform human-like activities. Applications of AI include expert systems, natural language processing (NLP), speech recognition and machine vision.
The revival of AI
After several waves of optimism, followed by disappointments, AI is experiencing new and growing interest. During the last 15 years or so, thousands of AI startups have been founded, at an ever-increasing rate. There are several drivers behind this: probably the most important one is the huge computing power now available at an affordable price. Not only is the hardware faster, but now everyone has access to supercomputers in the cloud. This has democratized access to the necessary hardware platforms to run AI, enabling a proliferation of start-ups.
Artificial Neural Networks (Figure 1) now scale to several tens to hundreds of hidden layer nodes (Figure 2). Even networks with 10,000 hidden layers have already been implemented. This evolution is increasing the abstraction power of neural networks and is enabling new applications. Neural networks nowadays can be trained on tens of thousands of CPU or GPU cores, massively speeding up the process of developing generalized learning models.
Another reason for the increased interest in AI is the groundbreaking advancement in Machine Learning in recent years. This has helped to attract interest for technology investment and funding startups, further accelerating the development and improvement of AI.
How machines are learning
An Artificial Neural Network is a computational model that is inspired by the human brain. It consists of an interconnected network of simple processing units that can learn from experience by modifying their connections (Figure 1). So-called deep neural networks (DNN – neural networks with many hidden layers) currently provide the best solution to many large computing problems.
The most widely used deep learning systems are Convolutional Neural Networks (CNNs). These systems use a feedforward artificial network of neurons to map input features to an output. They use a reverse feed system for learning (i.e. training) and produce a set of weights to calibrate the CNN (back propagation, (Figure 3).
The most computationally intensive process in Machine Learning is to train the neural network. For a state-of-the-art network, it can take days to weeks, needing billions of floating-point calculations and a huge training set of data (gigabytes to hundreds of gigabytes) until the network reaches the needed accuracy. Fortunately, this step is mostly not time-critical and can be offloaded to the cloud.
When the network is trained, it can be fed with a new, unlabeled dataset and classify the data based on what it learned previously. This step is called inference and is the actual goal of the application.
Tell me what you see
The classification of an input can either take place in the cloud or at the edge (mostly offline). While the processing of data through the neural network often requires a dedicated accelerator (FPGA, GPU, DSP, or ASIC), additional tasks are best handled by a CPU, which can be programmed with a traditional programming language. That is where an FPGA with an integrated CPU – a so-called System-on-Chip (SoC) – often excels, especially at the edge. SoCs combine the accelerator for the inference (FPGA array) and the CPU in one chip. The CPU runs the control algorithms and the dataflow management. At the same time, FPGAs offer many advantages compared to a GPU- or ASIC-based solution – amongst others, the easy integration of multiple interfaces and sensors, and the flexibility to adjust to new neural network architectures (Figure 4).
The FPGA’s inherent reconfigurability also makes it possible to take advantage of evolving neural network topologies, updated sensor types and configurations, as well as updating software algorithms. Using an SoC guarantees a low and deterministic latency as needed, for example, for real-time object detection. At the same time, SoCs are also very power efficient. The main challenge to get the best performance out of an FPGA is the efficient mapping of the floating-point model to the fixed-point FPGA implementation without losing accuracy (Figure 5). This is where the vendor tools step in.
Choose the right tools
A lot of tools are available today which lower the hurdle to realize a first AI project. The Vitis AI development tools, for example, provide users with the tools to develop and deploy Machine Learning applications for Real-time Inference on FPGAs. They provide support for many common Machine Learning frameworks such as Caffe and TensorFlow, with PyTorch support coming soon. They enable efficient adaptation of state-of-the-art neural networks to FPGAs, for embedded artificial intelligence applications (Figure 5).
In combination with a standard System-on-Module (SOM), like the Mars XU3 (see lead article photo) from Enclustra (which is based on the Xilinx Zynq UltraScale+ MPSoC), inserted into the Mars ST3 base board, industrial AI applications can be implemented faster than ever before (Figure 6).
To showcase the power of this combination and the fast time-to-market, Enclustra has developed an AI-based image recognition system in just days. The images are captured with a standard USB camera that is connected to the Mars ST3 base board. For higher performance, a MIPI interface is also available on the base board.
The neural network – which classifies the images with low latency – runs on the Mars XU3 SOM. The system supports popular neural networks such as ResNet-50 and DenseNet for image classification and real-time face detection, respectively.
The single FPGA module not only runs the neural network inference, but can also handle numerous other tasks in parallel, like communication with a host PC and other peripherals. Moreover, controlling all kinds of high dynamic actuators at the same time is where the FPGA technology is playing to its strengths. For example, adding the Enclustra Universal Drive Controller IP Core to control BLDC or stepper motors would be a snap. It has never been so easy to utilize the power of AI on the edge – so start your project today!
Enclustra is an innovative and successful Swiss FPGA design company. With the FPGA Design Center, Enclustra provides services covering the whole range of FPGA-based system development: From high-speed hardware or HDL firmware through to embedded software, from specification and implementation all the way to prototype and series production.
By specializing in forward-looking FPGA technology, and with broad application knowledge, Enclustra can offer ideal solutions at minimal expense in many areas. More information can be found at: www.enclustra.com
This article is based on presentations from the seminar “Jump-start your AI based FPGA application”: Download presentations.