Mouser Provides Microsemi PolarFire FPGA Evaluation Kit

Mouser Electronics is now offering the PolarFire Evaluation Kit from Microsemi, which allows designers to evaluate the highly regarded PolarFire FPGA product family. The flash-based PolarFire FPGAs deliver 100K to 500K logic elements at up to 50 percent lower power consumption than equivalent SRAM-based FPGAs, as well as best-in-class security and reliability.

The Microsemi PolarFire Evaluation Kit, available to order from Mouser Electronics, provides a robust hardware design platform based on a 300K logic element PolarFire FPGA with DDR4, DDR3 and SPI-flash memory. The onboard FPGA integrates reliable PRINT_Microsemi PolarFire Eval Kitnon-volatile FPGA fabric, 12.7 Gbps transceivers, 1.6 Gbps inputs and outputs (I/Os), best-in-class-performance, hardened security IP, and crypto processors. The silicon features power optimization with the lowest static power for mid-range FPGAs, while the Flash Freeze mode yields best-in-class standby power.

The evaluation kit includes SMA connectors for testing the transceiver channel, high pin count FPGA mezzanine card, x4 PCIe edge connector, dual Gigabit Ethernet connectors, and programming using an on-board embedded FlashPro5 programmer. The kit provides high-performance evaluation for a variety of applications, such as industrial automation, cellular infrastructure, security, imaging and video, and USB.

The kit also ships with a one-year Libero Gold Software License, which includes the Libero SoC PolarFire Design Suite of comprehensive, easy-to-learn, easy-to-adopt development tools. The suite integrates industry-standard Synopsys Synplify Pro synthesis and Mentor Graphics ModelSim simulation with best-in-class constraints management and debug capabilities.

Mouser Electronics | www.mouser.com

Xilinx Provides Design Platform for Scalable Storage

At the Flash Memory Summit earlier this month in Santa Clara, CA, leading FPGA vendor Xilinx rolled out the Xilinx NVMe-over-Fabrics reference design. It provides designers a flexible platform to enable scalable storage solutions and integrate custom acceleration functions into their storage arrays. The reference design eliminates the need for a dedicated x86 processor or an external NIC, thus creating a highly integrated, reliable and cost-effective solution. The NVMe-over-Fabrics (NVM-oF) reference platform is implemented on the Fidus Sidewinder card which supports up to 4 NVMe SSDs, and has a Xilinx ZU19EG Ultrascale+ MPSoC device. The reference platform is delivered with the required software drivers.

The Xilinx NVMe-over-Fabric Platform is a single-chip storage solution that integrates NVMe-over-Fabric and target RDMA offloads with a processing subsystem to provide a very power-efficient and low-latency solution compared to existing products that require both an external host chip and a Network Interface Card (NIC). This 2x100Gb Ethernet platform enables customers to implement value-added storage workload acceleration, such as compression and erasure code.

Xilinx | www.xilinx.com

Kintex Ultrascale FPGA-Based Cards Target Radar, Comms

Pentek has ntroduced the newest member of the Jade family of high-performance data converter XMC modules based on the Xilinx Kintex Ultrascale FPGA. The Model 71141 is a 6.4 GHz dual channel analog-to-digital and digital-to-analog converter with programmable DDCs (digital downconverters) and DUCs (digital upconverters). The Model 71141 is suitable for connection to IF or RF signals for very wideband communications or radar system applications including:

  • Satellite communications (SATCOM)
  • Phased array radar, SIGINT and ELINT
  • Synthetic aperture radar (SAR)71141
  • Time-of-flight and LIDAR distance measurement
  • RF sampling software defined radio (SDR)

For applications that require unique functions, users can install custom IP for specialized data processing tasks. Pentek’s Navigator FPGA Design Kit includes source code for all factory-installed IP modules. Developers can integrate their own IP with the Pentek functions or use the Navigator kit to completely replace the Pentek IP with their own.

The Pentek Navigator tools reduce the development time and cost associated with complex designs. Users can also select the size of the FPGA they would like installed so they are getting exactly what they need performance-wise without paying for a larger FPGA they may not need. Unlike others in the industry, Pentek still provides application support to customers at no cost.

The Model 71141 is the first of the Pentek Jade products to use the Texas Instruments ADC12DJ3200 12-bit A/D. The front end accepts analog RF inputs on a pair of front panel SSMC connectors. The converter operates in single-channel interleaved mode with a sampling rate of 6.4 GS/sec and an input bandwidth of 7.9 GHz; or, in dual-channel mode with a sampling rate of 3.2 GS/sec and input bandwidth of 8.1 GHz.

The A/D built-in digital down converters support 2x decimation in real output mode and 4x, 8x or 16x decimation in complex output mode. The A/D digital outputs are delivered into the FPGA for signal processing, data capture or for routing to other module resources.

A Texas Instruments DAC38RF82 D/A with DUC accepts a baseband real or complex data stream from the FPGA and provides that input to the upconversion, interpolation and dual D/A stages. When operating as a DUC, it interpolates and translates real or complex baseband input signals. It delivers real or quadrature (I+Q) analog outputs to the dual 14-bit D/A converter. The two 6.4 GS/sec 14-bit D/As pair well with the dual input channels while delivering more than twice the output performance of previous generations of Pentek products.

The 71141 factory-installed functions include two A/D acquisition and two D/A waveform generation IP modules. In addition, IP modules for DDR4 SDRAM memories, a controller for all data clocking and synchronization functions, a test signal generator and a PCIe Gen.3 interface complete the factory-installed functions. System integrators get to market with less time and risk, because the 71141 delivers a complete turnkey solution without the need to develop any FPGA IP.

The Pentek Jade Architecture is based on the Xilinx Kintex UltraScale FPGA, which raises the digital signal processing (DSP) performance by over 50% with equally impressive reductions in cost, power dissipation and weight. As the central feature of the Jade Architecture, the FPGA has access to all data and control paths, enabling factory-installed functions including data multiplexing, channel selection, data packing, gating, triggering and memory control. A 5 GB bank of DDR4 SDRAM is available to the FPGA for custom applications. The x8 PCIe Gen 3 link can sustain 6.4 GB/s data transfers to system memory. Eight additional gigabit serial lanes and LVDS general-purpose I/O lines are available for custom solutions.

The Model 71141 XMC module is designed to operate with a wide range of carrier boards in PCIe, 3U and 6U VPX, AMC, and 3U and 6U CompactPCI form factors, with versions for both commercial and rugged environments. Designed for air-cooled, conduction-cooled and rugged operating environments, the Model 71141 XMC module with 5 GB of DDR4 SDRAM starts at $18,795. Additional FPGA options are available. The Navigator Design Suite consists of two packages. The Navigator BSP is $2,500 and the Navigator FDK is $3,500.

Pentek | www.pentek.com

Reliability and Failure Prediction: A New Take

HALT methodology has been a popular way to test harsh environment reliability. A new approach involves PCB design simulation for vibration and acceleration for deeper yet faster analyses.

By Craig Armenti & Dave Wiens—Mentor Board Systems Division

Many electronic products today are required to operate under significant environmental stress for countless hours. The need to design a reliable product is not a new concept, however, the days of depending on a product’s “made in” label as an indicator of reliability are long gone. PCB designers now realize the importance of capturing the physical constraints and fatigue issues for a design prior to manufacturing to reduce board failure and improve product quality.

Simulation results should be available in a two-phase post-processor for each simulation, providing broad input on the PCB’s behavior under the defined conditions.

Simulation results should be available in a two-phase post-processor for each simulation, providing broad input on the PCB’s behavior under the defined conditions.

Although every product is expected to fail at some point. That’s inevitable. But premature failures can be mitigated through proper design when proper attention is paid to potential issues due to vibration and acceleration. ….

Read this article in the August 325 issue of Circuit Cellar

Not a Subscriber yet? Become one today:

 

Or purchase the August 2017 issue at the  CC-Webshop

 

Power Analysis of a Software DES Encryption Routine

This article continues the foray into breaking software security routines, now targeting a software implementation of DES. This builds on a previous example of breaking a hardware AES example.

By Colin O’Flynn

In the previous column, I broke a simple XOR password check using side-channel power analysis. How can we apply this to more complex algorithms though? In my Circuit   Cellar   313   (August   2016) story, I demonstrated how to break the AES encryption standard running on a FPGA.

The EFF’s “Deep Crack” board could brute force a DES key in a matter of days. (Photo courtesy of Electronic Frontier Foundation)

The EFF’s “Deep Crack” board could brute force a DES key in a matter of days. (Photo courtesy of Electronic Frontier Foundation)

While I originally considered breaking a software implementation of AES in this column, there was just too much overlap between those columns. So instead I decided to pick on something new. This time, I’ll cover how we can break a software implementation of DES. The actual process ends up being very similar. But by using a different algorithm, it might help give you a bit of perspective on how the underlying  attack  works.  ….

Read this article in the August 325 issue of Circuit Cellar

Not a Subscriber yet? Become one today:

 

Or purchase the August 2017 issue at the  CC-Webshop

 

The Most Technical

Input Voltage

–Jeff Child, Editor-in-Chief

JeffHeadShotIt is truly a thrill and an honor for me to be joining the Circuit Cellar team as the magazine’s new Editor-in-Chief. And in this—my first editorial in my new role—I want to seize the opportunity to talk about Circuit Cellar. A lot of factors attracted me to this publication. But in a nutshell its position in the marketplace is compelling. It intersects with two converging trends happening in technology today.

First, there’s the phenomenon of the rich set of tools, chips, and information resources available today. They put more power into the hands of makers and electronics DIY experts than ever before. You’ve got hardware such as Arduino and Raspberry Pi. Open source software ranging from Linux to Eclipse make integrating and developing software easier than ever. And porting back and forth between open source software and commercial embedded software is no longer prohibitive now that commercial software vendors are in a “join them, not beat them” phase of their thinking. Easy access has even reached processors thanks to the emergence of RISC-V for example (click here for more). Meanwhile, powerful FPGA chips enable developers to use one chip where an entire board or box was previously required.

The second big trend is how system-level chip technologies—like SoC-style processors and the FPGAs I just mentioned—are enabling some of the most game-changing applications driving today’s markets: including commercial drones, driverless cars, Internet-of-Things (IoT), robotics, mobile devices and more. This means that exciting and interesting new markets are attracting not just big corporations looking for high volume play, but also small start-up vendors looking to find their own niche within those market areas. And there are a lot of compelling opportunities in those spaces. Ideas that start as small embedded systems projects can—and are—blossoming into lucrative new enterprises.

What’s so exciting is that Circuit Cellar readers are at the center of both those two trends. There’s a particular character this magazine has that separates it from other technology magazines. There are a variety of long-established publications that cover electronics and whose stated missions are to serve engineers. I’ve worked for some of them, and they all have their strengths. But you can tell just by looking at the features and columns of Circuit Cellar that we don’t hold back or curtail our stories when it comes to technical depth. We get right down to the bits and bytes and lines code. Our readers are engineers and academics who want to know not only the rich details of a microcontroller’s on-board peripherals, but also how other like-minded geeks applied that technology to their DIY or commercial project. They want to know if the DC-DC converter they are considering has a wide enough input voltage to serve their needs.

Another cool thing for me about Circuit Cellar is the magazine’s origin story. Back when I was in high school and in my early days studying Computer Science in college, Steve Ciarcia had a popular column called Circuit Cellar in BYTE magazine. I was a huge fan of BYTE. I would take my issue and bring it to a coffee shop and read it intently. (Mind you this was pre-Internet. Coffee shops didn’t have Wi-Fi.) What I appreciated most about BYTE was that it had far more technical depth than the likes of PC World and PC Computing. I felt like it was aimed at a person with a technical bent like myself. When Steve later went on to found this magazine—nearly 30 years ago—he gave it the Circuit Cellar name but he also maintained that unique level of technical depth that entices engineers.

With all that in mind, I plan to uphold the stature and legacy in the electronics industry that I and all of you have long admired about Circuit Cellar. We will work to continue being the Most Technical information resource for professional engineers, academics, and other electronics specialists world-wide. Meanwhile, you can look forward to expanded coverage of those exciting market-spaces I discussed earlier. Those new applications really exemplify how embedded computing technology is changing the world. Let’s have some fun.

TeraFire Hard Cryptographic Microprocessor

Microsemi Corp. recently added Athena’s TeraFire cryptographic microprocessor to its new PolarFire field programmable gate array (FPGA) “S class” family. The TeraFire hard core provides Microsemi customers access to advanced security capabilities with high performance and low power consumption.

Microsemi
Features, benefits, and specs:

  • Supports additional algorithms and key sizes commonly used in commercial
  • Internet communications protocols such as TLS, IPSec, MACSec and KeySec.
  • The Athena TeraFire EXP-5200B DPA-resistant cryptographic microprocessor capable of nearly 200 MHz operation.
  • Enables high-speed DPA-resistant cryptographic protocols at speeds well over 100 Mbps
  • Integrated true random number generator for generating keys on-chip and for protecting cryptographic protocols
  • The TeraFire crypto microprocessor is extensible with additional object code licensed from Athena or with accelerators attached via the PolarFire FPGA fabric

Microsemi’s PolarFire “S class” FPGAs with Athena’s TeraFire cryptographic microprocessor will be available in Q2 2017. A soft version of the core is available for Microsemi’s SmartFusion2 SoC FPGAs.

Microsemi | www.microsemi.com

The Future of Embedded FPGAs

The embedded FPGA is not new, but only recently has it started becoming a mainstream solution for designing chips, SoCs, and MCUs. A key driver is today’s high-mask costs of advanced ICs.  For a chip company designing in high nodes, a change in RTL could cost millions of dollars and set the design schedule back by months. Another driver is constantly changing standards. The embedded FPGA is so compelling because it provides designers with the flexibility to update RTL at any time after fabrication, even in-system. Chip designers, management, and even the CFO like it.Tate Fig1

Given these benefits, the embedded FPGA is here to stay. However, like any technology, it will evolve to become better and more widespread. Looking back to the 1990s when ARM and others offered embedded processor IP, the technology evolved to where embedded processors appear widely on most logic chips today. This same trend will happen with embedded FPGAs. In the last few years, the number of embedded FPGA suppliers has increased dramatically: Achronix, Adicsys, Efinix, Flex Logix, Menta, NanoXplore, and QuickLogic. The first sign of market adoption was DARPA’s agreement with Flex Logix to provide TSMC 16FFC embedded FPGA for a wide range of US government applications. This first customer was critical as it validated the technology and paved the way for others to adopt.

There are a number of things driving the adoption of the embedded FPGA:

  • Mask costs are increasing rapidly: approximately $1 million for 40 nm, $2 million for 28 nm, and $4 million for 16 nm.
  • The size of design teams required to design advanced node is increasing. Fewer chips are being designed, but they want the same functions as in the past.
  • Standards are constantly changing.
  • Data centers require programmable protocols.
  • AI and machine learning algorithms

Surprisingly, embedded FPGAs don’t compete with FPGA chips. FPGA chips are used for rapid prototyping and lower-volume products that can’t justify the increasing cost of ASIC development. When systems with FPGAs hit high volume, FPGAs are generally converted to ASICs for cost reduction.

In contrast, embedded FPGAs don’t use external FPGAs and they can do things external FPGAs can’t, such as:

  • They are lower power because SERDES aren’t needed. Standard CMOS interfaces can run 1 GHz+ in 16 nm for embedded FPGA with hundreds and thousands of interconnects available.
  • Embedded FPGA is lower cost per LUT. There is no expensive packaging and a one-third of the die area of an FPGA chip is SERDES, PLLs, DDR PHYs, etc. that are no longer needed.
  • 1-GHz operations in the control path
  • Embedded FPGAs can be optimized: lots of MACs (Multiplier-Accumulators) for DSP or none; exactly the kind of RAM needed or none.
  • Tiny embedded FPGAs of just 100 LUTs up to very large embedded FPGAs of greater than 100K LUTs
  • Embedded FPGAs can be optimized for very low power operation or very high performance.

The following markets are likely to see widespread utilization of embedded FPGAs: the Internet of Things (IoT); MCUs and customizable programmable blocks on the processor bus; defense electronics; networking chips; reconfigurable wireless base stations; flexible, reconfigurable ASICs and SoCs; and AI and deep Learning accelerators.

To integrate embedded FPGAs, chip designers need them to have the following characteristics: silicon proven IP; density in LUTs/square millimeters similar to FPGA chips; a wide range of array sizes from hundreds of LUTs to hundreds of thousands of LUTs; options for a lot of DSP support and the kind of RAM a customer needs; IP proven in the process node a company wants with support of their chosen VT options and metal stack; an IP implementation optimized for power or performance; and proven software tools.

Over time, embedded FPGA IP will be available on every significant foundry from 180 to 7 nm supporting a wide range of applications. This means embedded FPGA suppliers must be capable of cost-effectively “porting” their architecture to new process nodes in a short time (around six months). This is especially true because process nodes keep getting updated over time and each major step requires an IP redesign.

Early adopters of embedded FPGA will have chips with wider market potential, longer life, and higher ROI, giving designers a competitive edge over late adopters. Similar benefits will accrue to systems designers. Clearly, this technology is changing the way chips are designed, and companies will soon learn that they can’t afford to “not” adopt embedded FPGA.

This article appears in Circuit Cellar 323.

Geoff Tate is CEO/Cofounder of Flex Logix Technologies. He earned a BSc in Computer Science from the University of Alberta and an MBA from Harvard University. Prior to cofounding Rambus in 1990, Geoff served as Senior Vice President of Microprocessors and Logic at AMD.

TeraFire Hard Cryptographic Microprocessor

Microsemi Corp. recently added Athena’s TeraFire cryptographic microprocessor to its new PolarFire field programmable gate array (FPGA) “S class” family. The TeraFire hard core provides Microsemi customers access to advanced security capabilities with high performance and low power consumption.Microsemi

Features, benefits, and  specs:

  • Supports additional algorithms and key sizes commonly used in commercial Internet communications protocols such as TLS, IPSec, MACSec and KeySec.
  • The Athena TeraFire EXP-5200B DPA-resistant cryptographic microprocessor capable of nearly 200MHz operation.
  • Enables high-speed DPA-resistant cryptographic protocols at speeds well over 100 Mbps
  • Integrated true random number generator for generating keys on-chip and for protecting cryptographic protocols
  • The TeraFire crypto microprocessor is extensible with additional object code licensed from Athena or with accelerators attached via the PolarFire FPGA fabric

Microsemi’s PolarFire “S class” FPGAs with Athena’s TeraFire cryptographic microprocessor will be available in Q2 2017. A soft version of the core is available for Microsemi’s SmartFusion2 SoC FPGAs.

Source: Microsemi 

New Cyclone 10 FPGA Family

Intel recently launched the Intel Cyclone 10 family of FPGAs. Well suited for IoT applications, the new FPGAs are designed to deliver fast and power-efficient processing. They can collect and send data, and make real-time decisions based on the input from IoT devices. You can program the FPGAs  to deliver the specific level of computing and functions required by different IoT applications.Cyclone INTEL

Cyclone 10 GX supports 10G transceivers and hard floating point digital signal processing (DSP). Furthermore, it offers 2× the performance of the previous Cyclone generation. The architectural innovation in the implementation of IEEE 754 single-precision hardened floating-point DSP blocks can enable processing rates up to 134 giga floating-point operations per second (GFLOPs) for applications such as motion or motor control systems.

The Intel Cyclone 10 LP is the perfect solution for applications where cost and power are key factors in the design decision. These systems typically use FPGA densities that are sub 75K LE and chip-to-chip bridging functions between electronic components or I/O expansion for micro-processors. Cyclone 10 LP can also be used for automotive video processing used in rear-view cameras and in sensor fusion, where data gathered while the car is on the road is combined from multiple sensors in the car to provide a more complete view of what is happening.

The Cyclone 10 FPGA family will be available in the second half of 2017, along with evaluation kits, boards, and the latest version of Intel’s Quartus FPGA programming software.

Source: Intel

New Embedded Solution for Debugging FPGAs

Exostiv Labs recently announced that its EXOSTIV solution for Intel FPGAs will be available in December 2016. Providing up to 200,000 times more visibility on an FPGA than other solutions, EXOSTIV enables the debugging and verification of FPGA board prototypes at speed of operation. It provides extended visibility on internal nodes over long periods of time with minimal impact on the FPGA resources. Thus, you can discover issues related to complex interactions between numerous IPs when simulation is impracticable.

EXOSTIV for Intel FPGAs will be released in December 2016 with support for Arria 10 devices first. Pricing starts at $5,100.

Source: Exostiv Labs 

Low Latency 48-Port FPGA Networking Appliance

BittWare and LDA Technologies are collaborating on a low-latency 48-port FPGA networking appliance. The LDA e4 is a 10/25-Gbps-capable FPGA board enclosure that repurposes the serial links on BittWare’s PCIe FPGA boards into high-speed Ethernet ports.

Features, benefits, and specs:

  • 6″ FPGA-to-port trace lengths
  • Layer 1 replication, support for various CPUs and operating systems
  • A high-accuracy clock source enables accurate timestamping
  • Enables out-of-band management and a zero configuration option

Source: BittWare

FPGA Board Support Packages Simplify App Dev

BittWare recently announced the availability of Arria 10 FPGA Board Support Packages (BSPs) for Altera’s OpenCL SDK 16.0.2. With BittWare’s OpenCL BSPs, you can start developing applications for Altera’s Arria 10 1150GX FPGA using OpenCL.

Using OpenCL, you can code your systems and algorithms in a high-level C-based framework and directly create FPGA programming files from a pure software development flow. The applications are endless, from use in data centers to defense/aerospace systems.

BittWare ‘s Arria 10 BSPs are well suited for acceleration applications such as machine learning. The High Performance Computing (HPC) BSP is the traditional OpenCL model, using a host that moves data to the accelerator system over PCI Express (PCIe). The BSP platform is the standard platform for OpenCL accelerators. In addition, BittWare can provide custom BSPs specifically tailored to your requirements.

BittWare offers an OpenCL Developer’s Bundle comprising a low-profile Arria 10 1150GX FPGA-based PCIe board, BittWorks Lite II software tools, Altera’s OpenCL SDK, and Altera’s Quartus II. You can also get the Developer’s Bundle with a Stratix V board.

The Arria 10 OpenCL Bundle and BSP are currently available. Contact BittWare for pricing.

Source: BittWare

Software-Programmable FPGAs

Modern workloads demand higher computational capabilities at low power consumption and cost. As traditional multi-core machines do not meet the growing computing requirements, architects are exploring alternative approaches. One solution is hardware specialization in the form of application specific integrated circuits (ASICs) to perform tasks at higher performance and lower power than software implementations. The cost of developing custom ASICs, however, remains high. Reconfigurable computing fabrics, such as field-programmable gate arrays (FPGAs), offer a promising alternative to custom ASICs. FPGAs couple the benefits of hardware acceleration with flexibility and lower cost.

FPGA-based reconfigurable computing has recently taken the spotlight in academia and industry as evidenced by Intel’s high-profile acquisition of Altera and Microsoft’s recent announcement to deploy thousands of FPGAs to speed up Bing search. In the coming years, we should expect to see hardware/software co-designed systems supported by reconfigurable computing to become common. Conventional RTL design methodologies, however, cannot productively manage the growing complexity of algorithms we wish to accelerate using FPGAs. Consequently, FPGA programmability is a major challenge that must be addressed both technologically by leveraging high-level software abstractions (e.g., language and compilers), run-time analysis tools, and readily available libraries and benchmarks, as well as scholastically through the education of rising hardware/software engineers.

Recent efforts related to software-programmable FPGAs have focused on designing high-level synthesis (HLS) compilers. Inspired by classical C-to-gates tools, HLS compilers automatically transform programs written in traditional untimed software languages to timed hardware descriptions. State-of-the-art HLS tools include Xilinx’s Vivado HLS (C/C++) and SDAccel (OpenCL) as well as Altera’s OpenCL SDK. Although HLS is effective at translating C/C++ or OpenCL programs to RTL hardware, compilers are only a part of the story in realizing truly software-programmable FPGAs.

 
Efficient memory management is central to software development. Unfortunately, unlike traditional software programming, current FPGA design flows require application-specific memories to sustain high performance hardware accelerators. Features such as dynamic memory allocation, pointer chasing, complex data structures, and irregular memory access patterns are also ill-supported by FPGAs. In lieu of basic software memory abstractions techniques, experts must design custom hardware memories. Instead, more extensible software memory abstractions would facilitate software-programmability of FPGAs.

In addition to high-level programming and memory abstractions, run-time analysis tools such as debuggers and profilers are essential to software programming. Hardware debuggers and profilers in the form of hardware/co-simulation tools, however, are not ready for tackling exascale systems. In fact, one of the biggest barriers to realizing software-programmable FPGAs are the hours, even days, it takes to generate bitstreams and run hardware/software co-simulators. Lengthy compilation and simulation times cause debugging and profiling to consume the majority of FPGA development cycles and deter agile software development practices. The effect is compounded when FPGAs are integrated into heterogeneous systems with CPUs and GPUs over complex memory hierarchies. New tools, following architectural simulators, may aid in rapidly gathering performance, power, and area utilization statistics for FPGAs in heterogeneous systems. Another solution to long compilation and simulation times is using overlay architectures. Overlay architectures mask the FPGA’s bit-level configurability with a fixed network of simple processing nodes. The fixed hardware in overlay architectures enables faster programmability at the expense of finer grained, bit-level parallelism of FPGAs.

Another key facet of software programming is readily available libraries and benchmarks. Current FPGA development is marred with vendor specific IPs cores that span limited domains. As FPGAs become more software-programmable, we should expect to see more domain experts providing vendor agnostic FPGA-based libraries and benchmarks. Realistic, representative, and reproducible vendor-agnostic libraries and benchmarks will not only make FPGA development more accessible but also serve as reference solutions for developers.

Finally, the future of software-programmable FPGAs lies not only in technological advancements but also in educating the next generation of hardware/software co-designing engineers. Software engineers are rarely concerned with the downstream architecture except when exercising expert optimizations. Higher-level abstractions and run-time analysis tools will improve FPGA programmability but developers will still need a working knowledge of FPGAs to design competitive hardware accelerators. Following reference libraries and benchmarks, software engineers must become fluent with the notion of pipelining, unrolling, partitioning memory into local SRAM blocks and hardened IPs. Terms like throughout, latency, area utilization, power and cycle time will enter software engineering vernacular.

Recent advances in HLS compilers have demonstrated the feasibility of software-programmable FPGAs. Now, a combination of higher-level abstractions, run-time analysis tools, libraries and benchmarks must be pioneered alongside trained hardware/software co-designing engineers to realize a cohesive software engineering infrastructure for FPGAs.
 

Udit Gupta earned a BS in Electrical and Computer Engineering at Cornell University. He is currently studying toward a PhD in Computer Science at Harvard University. Udit’s past research includes exploring software-programmable FPGAs by leveraging intelligent design automation tools and evaluating high-level synthesis compilers with realistic benchmarks. He is especially interested in vertically integrated systems—exploring the computing stack from applications, tools, languages, and compilers to downstream architectures

New FPGA Board Based on the Xilinx UltraScale VU190 Device

BittWare recently released a new COTS PCIe board based on Xilinx’s 20-nm UltraScale VU190 FPGA. The XUSP3R is a 3/4-length PCIe board offers up to four Gen3 x8 PCIe interfaces, along with four front panel QSFP28 cages, supporting 16 lanes of 25 Gbps or 4 lanes of 100 Gbps, including 100 GbE. Four DIMM sockets support massive memory configurations including up to 256 GB of DDR4 memory across four 72-bit wide banks.

Alternatively, each DIMM socket can be populated with BittWare’s dual bank QDR DIMMs, each providing 576 Mb of QDR-II+. An optional Hybrid Memory Cube (HMC) module with up to 4 GB is also available that can be populated in addition to, and independent of, the DIMMs. Together, these features make the XUSP3R well suited for a variety of data center and networking applications, including compute acceleration, network processing, cybersecurity, and storage.

The board also offers features and tools for simplified development and integration. A comprehensive Board Management Controller (BMC) with host software support for advanced system monitoring simplifies platform management. A complete software tool suite and FPGA development/project examples are also available.

The XUSP3R’s features and specs:

  • High-performance Xilinx Virtex UltraScale 190/160/125
  • Up to four independent PCIe Gen3 x8 interfaces
  • Four QSFP28 cages for 4x 100GbE, 16x 25GbE, 4x 40GbE, or 16x 10GbE (or combinations thereof)
  • Four DIMM sites that support DDR4-2133 SDRAM, QDR-IV, and QDR-II+
  • Optional HMC Module (in addition to, and independent of, the DIMM sites)
  • Board Management Controller for Intelligent Platform Management
  • USB 2.0 for programming, debug, or control with optional integrated Platform Cable USB functionality
  • Timestamping and synchronization support
  • Complete software support with BittWare’s BittWorks II Toolkit
  • FPGA development kit for FPGA board support IP and integration

The XUSP3R board is in production and shipping now. Contact BittWare for more details and pricing.

Source: BittWare