Pros at Signal Processing
Because they marry the combined benefits of powerful signal processing and system-level integration, FPGAs now rank as a key technology for embedded system developers. FPGA vendors are keeping pace with both chip- and IP-level solutions that meet today’s system design demands.
Today’s FPGAs provide the kind of system-oriented digital signal processing (DSP) requirements in demand across a variety of applications—including broadcast video, financial processing systems, machine leaning, software-defined radio and many others. Meanwhile, it is already a given these days that FPGAs have become complete systems-on-chips (SoCs).
While the trend toward FPGAs with general-purpose CPU cores embedded on them is nothing new, the latest crop FPGA architectures have moved toward supporting artificial intelligence (AI) and machine learning types of processing. Even within the past six months, FPGA vendors have announced new solutions that improve upon these processing levels. Aside from CPU and DSP processing, another big advantage of FPGAs lies in their ample, programmable, high-speed I/O, which is why they are often found close to the analog-to-digital converters (ADC) in radio frequency (RF) and radar applications.
6 GHZ SPECTRUM SUPPORT
Exemplifying those trends, Xilinx in February announced an upgrade to its Zynq UltraScale+ RF SoC) portfolio adding greater RF performance and scalability. The new generation of these devices can cover the entire sub 6 GHz spectrum, which is a critical need for next-generation 5G deployment, says Xilinx. They support direct RF sampling of up to 5 GSPS, 14-bit ADCs and 10 GS/s 14-bit digital-to-analog converters (DACs), both up to 6 GHz of analog bandwidth.
The RFSoC portfolio now includes the Xilinx Zynq UltraScale+ RFSoC Gen 2 and Zynq UltraScale+ RFSoC Gen 3. Gen 2 is now in production and meets regional deployment timelines in Asia and supports 5G New Radio. The Gen 3 device provides full sub-6 GHz direct-RF support, extended millimeter wave interface and up to 20% power reduction in the RF data converter subsystem compared to the base portfolio (Figure 1). The product will be available in 2H 2019. Thanks to pin-compatibility across the portfolio, system developers can design and deploy their systems now using first-generation devices with a roadmap to Gen 2 and Gen 3 for greater performance.
The new products monolithically integrate higher-performance RF data converters that deliver the broad-spectrum coverage required for the deployment of 5G wireless communications systems, cable access, advanced phased-array radar solutions and additional applications including test and measurement and satellite communications. By eliminating discrete components, the devices enable up to a 50% power and footprint reduction, making them well suited for the needs of telecommunications operators seeking to enable massive multiple-input, multiple-output base stations for their 5G systems, according to Xilinx.
With an eye toward solving challenges in data-centric systems, Intel’s Programmable Solutions Group (PSG) in April announced Agilex, a new family of FPGAs designed to enable customized solutions that address the unique data-centric challenges across embedded, network and data center markets.
The Intel Agilex family combines an FPGA fabric built on Intel’s 10 nm process with heterogeneous 3D SiP technology. This provides the capability to integrate analog, memory, custom computing, custom I/O and Intel eASIC device tiles into a single package with the FPGA fabric. Intel provides a custom logic continuum with reusable IP (Intellectual Property) through a migration path from FPGA to structured ASIC. One API provides a software-friendly heterogeneous programming environment, enabling software developers to easily access the benefits of FPGA for acceleration.
According to Intel PSG, system developers need solutions that can aggregate and process increasing amounts of data traffic to enable transformative applications in emerging, data-driven industries like edge computing, networking and cloud computing. This includes edge analytics for low-latency processing, virtualized network functions to improve performance and data center acceleration for greater efficiency.
The Intel Agilex FPGA is the first FPGA to support Compute Express Link, a cache and memory coherent interconnect to future Intel Xeon Scalable processors. Agilex’s 2nd-gen HyperFlex architecture provides up to 40% higher performance or up to 40% lower total power compared with Intel Stratix 10 FPGAs. The Agilex supports PCI Express Gen 5, offering higher bandwidth compared with PCIe Gen 4. It also supports transceiver data rates of 112 Gbps. Advanced memory is provided via DDR5, HBM, Intel Optane DC persistent memory.
Intel also claims that the device is the only FPGA supporting hardened BFLOAT16 and up to 40 Teraflops of DSP. Each Intel Agilex DSP block can perform two FP16 floating-point operations (FLOPs) per clock cycle (Figure 2). Total FLOPs for FP16 configuration is derived by multiplying 2x the maximum number of DSP blocks to be offered in a single Intel Agilex FPGA by the maximum clock frequency that will be specified for that block.
NEURAL NETS AND IOT
Last year Lattice Semiconductor unveiled its Lattice sensAI solution, a technology stack that combines modular hardware kits, neural network IP cores, software tools, reference designs and custom design services. In May of this year, the company followed that up with major performance and design flow enhancements for the Lattice sensAI solutions stack.
The new enhancements to the Lattice sensAI solution stack include a 10x performance boost over previous version. This performance boost is driven by an updated Convolutional Neural Network (CNN) IP and neural network compiler with features like 8-bit activation quantization, smart layer merging and a dual-DSP engine. The new version expends neural network and machine learning frameworks support including Keras. It also provides support for quantization. Fraction setting schemes for neural network training eliminate iterative post-processing. Simple neural network debugging can be done via USB. New customizable reference designs in Lattice sensAI accelerate time to market for popular use cases like object counting and presence detection.
Among the customers using the new version of Lattice sensAI is Pixcellence, a developer of image processing and computer vision solutions with advanced features like color night vision. According to Pixcellence, interest in the IoT is fueling demand for smart cameras that support AI applications like presence detection or facial recognition. The problem is that smart cameras have strict power consumption and cost requirements that make it a challenge to use off-the-shelf ASSPs. By using the Lattice sensAI solutions stack, Pixcellence said it was able to easily add low power, flexible AI inference support to its existing and new camera designs.
INFERENCE ENGINE IC
Advances in processing performance isn’t only happening among the leading FPGA vendors. Vendors of Embedded FPGAs (eFPGAs) are also adding new innovations. An eFPGA is an IP block that allows an FPGA to be incorporated in an SoC, MCU or any kind of IC. Among these eFPGA companies are Flex Logix and Achronix Semiconductor. For its part, Flex Logix has always been an IP company—offering both embedded FPGAs and IP such as an inferencing IP solution it announced last Fall. According to Flex Logix, the reception of that inferencing IP was so good that it decided to develop and manufacture their own edge inferencing chip.
Announced back in April, this InferX X1 Edge Inference Co-Processor is optimized for what the edge needs—in particular, support for large models, says Flex Logix. The chip offers throughput close to data center boards that sell for thousands of dollars but does so at single digit Watts of power and at a fraction of the price. InferX X1 is programmed using TensorFlow Lite and ONNX. The device is based on Flex Logix’s nnMAX architecture integrating 4 tiles for 4K MACs and 8 MB L2 SRAM. The chip connects to a single x32 LPDDR4 DRAM. Four lanes PCIe Gen3 connect to the host processor. A GPIO link is available for hosts without PCIe. Two X1s can work together to double throughput.
MACHINE LEARNING eFPGA
The latest eFPGA from Achronix Semiconductor follows the trend toward machine learning (ML) and AI kinds of processing. In May, the company introduced its new Speedster7t family. Based on a new, highly optimized architecture, Achronix says it goes beyond traditional FPGA solutions featuring ASIC-like performance, FPGA adaptability and enhanced functionality to streamline design. Specifically designed for AI/ML and high-bandwidth workloads, the Speedster7t FPGA family features a new 2D network-on-chip (NoC), and a high-density array of new machine learning processors (MLP). Blending FPGA programmability with ASIC routing structures and compute engines, the Speedster7t family creates what Achronix dubs a new “FPGA+” class of technology.
In developing the Speedster7t family of FPGAs, Achronix’s engineering team redesigned the entire FPGA architecture to balance on-chip processing, interconnect and external I/O, to maximize the throughput of data-intensive workloads such as those found in edge- and server-based AI/ML applications, networking and storage.
Speedster7t devices are designed to accept massive amounts of data from multiple high-speed sources, distribute that data to programmable on-chip algorithmic and processing units and then deliver those results with the lowest possible latency. Speedster7t devices include high-bandwidth GDDR6 interfaces, 400G Ethernet ports, and PCI Express Gen5. These are all interconnected to deliver ASIC-level bandwidth while retaining the full programmability of FPGAs.
At the heart of Speedster7t FPGAs are a massively parallel array of programmable compute elements within the new MLPs that deliver high FPGA-based compute density (Figure 3). The MLPs are highly configurable, compute-intensive blocks that support integer formats from 4 to 24 bits and efficient floating-point modes including direct support for TensorFlow’s 16-bit format as well as the supercharged block floating-point format that doubles the compute engines per MLP.
The MLPs are tightly coupled with embedded memory blocks, eliminating the traditional delays associated with FPGA routing to ensure that data is delivered to the MLPs at the maximum performance of 750 MHz. This combination of high-density compute and high-performance data delivery results in a processor fabric that delivers the highest usable FPGA-based tera-operations per second (Tops).
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JULY 2019 #348 – Get a PDF of the Issue