Software-Programmable FPGAs

Modern workloads demand higher computational capabilities at low power consumption and cost. As traditional multi-core machines do not meet the growing computing requirements, architects are exploring alternative approaches. One solution is hardware specialization in the form of application specific integrated circuits (ASICs) to perform tasks at higher performance and lower power than software implementations. The cost of developing custom ASICs, however, remains high. Reconfigurable computing fabrics, such as field-programmable gate arrays (FPGAs), offer a promising alternative to custom ASICs. FPGAs couple the benefits of hardware acceleration with flexibility and lower cost.

FPGA-based reconfigurable computing has recently taken the spotlight in academia and industry as evidenced by Intel’s high-profile acquisition of Altera and Microsoft’s recent announcement to deploy thousands of FPGAs to speed up Bing search. In the coming years, we should expect to see hardware/software co-designed systems supported by reconfigurable computing to become common. Conventional RTL design methodologies, however, cannot productively manage the growing complexity of algorithms we wish to accelerate using FPGAs. Consequently, FPGA programmability is a major challenge that must be addressed both technologically by leveraging high-level software abstractions (e.g., language and compilers), run-time analysis tools, and readily available libraries and benchmarks, as well as scholastically through the education of rising hardware/software engineers.

Recent efforts related to software-programmable FPGAs have focused on designing high-level synthesis (HLS) compilers. Inspired by classical C-to-gates tools, HLS compilers automatically transform programs written in traditional untimed software languages to timed hardware descriptions. State-of-the-art HLS tools include Xilinx’s Vivado HLS (C/C++) and SDAccel (OpenCL) as well as Altera’s OpenCL SDK. Although HLS is effective at translating C/C++ or OpenCL programs to RTL hardware, compilers are only a part of the story in realizing truly software-programmable FPGAs.

Efficient memory management is central to software development. Unfortunately, unlike traditional software programming, current FPGA design flows require application-specific memories to sustain high performance hardware accelerators. Features such as dynamic memory allocation, pointer chasing, complex data structures, and irregular memory access patterns are also ill-supported by FPGAs. In lieu of basic software memory abstractions techniques, experts must design custom hardware memories. Instead, more extensible software memory abstractions would facilitate software-programmability of FPGAs.

In addition to high-level programming and memory abstractions, run-time analysis tools such as debuggers and profilers are essential to software programming. Hardware debuggers and profilers in the form of hardware/co-simulation tools, however, are not ready for tackling exascale systems. In fact, one of the biggest barriers to realizing software-programmable FPGAs are the hours, even days, it takes to generate bitstreams and run hardware/software co-simulators. Lengthy compilation and simulation times cause debugging and profiling to consume the majority of FPGA development cycles and deter agile software development practices. The effect is compounded when FPGAs are integrated into heterogeneous systems with CPUs and GPUs over complex memory hierarchies. New tools, following architectural simulators, may aid in rapidly gathering performance, power, and area utilization statistics for FPGAs in heterogeneous systems. Another solution to long compilation and simulation times is using overlay architectures. Overlay architectures mask the FPGA’s bit-level configurability with a fixed network of simple processing nodes. The fixed hardware in overlay architectures enables faster programmability at the expense of finer grained, bit-level parallelism of FPGAs.

Another key facet of software programming is readily available libraries and benchmarks. Current FPGA development is marred with vendor specific IPs cores that span limited domains. As FPGAs become more software-programmable, we should expect to see more domain experts providing vendor agnostic FPGA-based libraries and benchmarks. Realistic, representative, and reproducible vendor-agnostic libraries and benchmarks will not only make FPGA development more accessible but also serve as reference solutions for developers.

Finally, the future of software-programmable FPGAs lies not only in technological advancements but also in educating the next generation of hardware/software co-designing engineers. Software engineers are rarely concerned with the downstream architecture except when exercising expert optimizations. Higher-level abstractions and run-time analysis tools will improve FPGA programmability but developers will still need a working knowledge of FPGAs to design competitive hardware accelerators. Following reference libraries and benchmarks, software engineers must become fluent with the notion of pipelining, unrolling, partitioning memory into local SRAM blocks and hardened IPs. Terms like throughout, latency, area utilization, power and cycle time will enter software engineering vernacular.

Recent advances in HLS compilers have demonstrated the feasibility of software-programmable FPGAs. Now, a combination of higher-level abstractions, run-time analysis tools, libraries and benchmarks must be pioneered alongside trained hardware/software co-designing engineers to realize a cohesive software engineering infrastructure for FPGAs.

Udit Gupta earned a BS in Electrical and Computer Engineering at Cornell University. He is currently studying toward a PhD in Computer Science at Harvard University. Udit’s past research includes exploring software-programmable FPGAs by leveraging intelligent design automation tools and evaluating high-level synthesis compilers with realistic benchmarks. He is especially interested in vertically integrated systems—exploring the computing stack from applications, tools, languages, and compilers to downstream architectures

New FPGA Board Based on the Xilinx UltraScale VU190 Device

BittWare recently released a new COTS PCIe board based on Xilinx’s 20-nm UltraScale VU190 FPGA. The XUSP3R is a 3/4-length PCIe board offers up to four Gen3 x8 PCIe interfaces, along with four front panel QSFP28 cages, supporting 16 lanes of 25 Gbps or 4 lanes of 100 Gbps, including 100 GbE. Four DIMM sockets support massive memory configurations including up to 256 GB of DDR4 memory across four 72-bit wide banks.

Alternatively, each DIMM socket can be populated with BittWare’s dual bank QDR DIMMs, each providing 576 Mb of QDR-II+. An optional Hybrid Memory Cube (HMC) module with up to 4 GB is also available that can be populated in addition to, and independent of, the DIMMs. Together, these features make the XUSP3R well suited for a variety of data center and networking applications, including compute acceleration, network processing, cybersecurity, and storage.

The board also offers features and tools for simplified development and integration. A comprehensive Board Management Controller (BMC) with host software support for advanced system monitoring simplifies platform management. A complete software tool suite and FPGA development/project examples are also available.

The XUSP3R’s features and specs:

  • High-performance Xilinx Virtex UltraScale 190/160/125
  • Up to four independent PCIe Gen3 x8 interfaces
  • Four QSFP28 cages for 4x 100GbE, 16x 25GbE, 4x 40GbE, or 16x 10GbE (or combinations thereof)
  • Four DIMM sites that support DDR4-2133 SDRAM, QDR-IV, and QDR-II+
  • Optional HMC Module (in addition to, and independent of, the DIMM sites)
  • Board Management Controller for Intelligent Platform Management
  • USB 2.0 for programming, debug, or control with optional integrated Platform Cable USB functionality
  • Timestamping and synchronization support
  • Complete software support with BittWare’s BittWorks II Toolkit
  • FPGA development kit for FPGA board support IP and integration

The XUSP3R board is in production and shipping now. Contact BittWare for more details and pricing.

Source: BittWare

New Dev Kit for Xilinx FPGA-Enabled Accelerator Cards

BittWare recently announced upcoming availability of an OpenPOWER CAPI Developer’s Kit for its Xilinx FPGA-enabled accelerator cards. The kit is intended to give you a fast way to connect the Xilinx All Programmable FPGA to a CAPI-enabled IBM POWER8 system.

The kit includes:

  • BittWare XUSP3S FPGA accelerator card, which is a ¾-length PCIe board featuring the Xilinx Virtex UltraScale VU095, four QSFPs for 4× 100 GbE, and flexible memory configurations with up to 64 GB of memory and support for Hybrid Memory Cube (HMC)
  • IBM Power Service Layer (PSL) IP to provide the connection to the POWER8 chip
  • CAPI host support library
  • An example CAPI design


BittWare’s OpenPOWER CAPI Developer’s Kit is scheduled to be available in Q2 2016.

Source: BittWare

An Introduction to Verilog

If you are new to programming FPGAs and CPLDs or looking for a new design language, Kareem Matariyeh has the solution for you. In this article, he introduces you to Verilog. Although the hardware description language has been used in the ASIC industry for years, it has all the tools to help you implement complex designs, such as a creating a VGA interface or writing to an Ethernet controller. Matariyeh writes:

Programmable logic has been around for well over two decades. Today, due to larger and cheaper devices on the market, FPGAs and CPLDs are finding their way into a wide array of projects, and there is a plethora of languages to choose from. VHDL is the popular choice outside of the U.S. It is preferred if you need a strong typed language. However, the focus of this article will be on another popular language called Verilog, which is a hardware description language that is similar to the C language.

Typically, Verilog is used in the ASIC design industry. Companies such as Sun Microsystems, Advanced Micro Devices, and NVIDIA use Verilog to verify and test new processor architectures before committing to physical silicon and post-fab verification. However, Verilog can be used in other ways, including implementing complex designs such as a VGA interface. Another complex design such as an Ethernet controller can also be written in Verilog and implemented in a programmable device.

This article is mostly tailored to engineers who need to learn Verilog and do not know or know little about the language. Those who know VHDL will benefit from reading this article as well and should be able to pick up Verilog fairly quickly after reviewing the example listings and referring to the Resources at the end of the article. This article does not go over hardware, but I have included some links that will help you learn more about how the hardware interacts with this language at the end.


First, it is best to know what variable types are available in Verilog. The basic types available are: binary, integer, and real. Other types are available but they are not used as often as these three. Keep everything in the binary number system as much as possible because type casting can cause post-implementation issues, but not all writers are the same. Binary and integer types have the ability to use other values such as “z” (high impedance) and “x” (don’t care). Both are nice to have around when you want a shared bus between designs or a bus to the outside world. Binary types can be assigned by giving an integer value. However, there are times when you want to assign or look at a specific bit. Some of the listings use this notation. In case you are curious, it looks like this: X’wY, where X is the word size, w is the number base—b for binary, h for hex—and Y is the value. Any value without this is considered an integer by default. Keeping everything in binary, however, can become a pain in the neck especially when dealing with numbers larger than 8 bits.Table1

Table 1 shows some of the variable types that are available in Verilog. Integer is probably the most useful one to have around because it’s 32 bits long and helps you keep track of numbers easily. Note that integer is a signed type but can also be set with all “z” or “x.” Real is not used that much, when it is used the number is truncated to an integer. It is best to keep this in mind when using the real type, granted it is the least popular compared to binary and integer. When any design is initialized in a simulator, the initial values of a binary and integer are all “x.” Real, on the other hand, is 0.0 because it cannot use “x.” There are other types that are used when interconnecting within and outside of a design. They are included in the table, but won’t be introduced until later.Table2

Some, but not all, operators from C are in Verilog. Some of the operators available in Verilog are in Table 2. It isn’t a complete list, but it contains most of the more commonly used operators. Like C, Verilog can understand operations and perform implicit casting (i.e., adding an integer with a 4-bit word and storing it into a binary register or even a real); typically this is frowned on mostly due to the fact that implicit casting in Verilog can open a new can of worms and cause issues when running the code in hardware. As long as casting does not give any erroneous results during an operation, there should be no show-stoppers in a design. Signed operation happens only if integers and real types are used in arithmetic (add, subtract, multiply) operations.


In Verilog, designs are called modules. A module defines its ports and contains the implementation code. If you think of the design as a black box, Verilog code typically looks like a black box with the top missing. Languages like Verilog and VHDL encourage black box usage because it can make code more readable, make debugging easier, and encourage code reuse. In Verilog, multiple code implementations cannot have the same module name. This is in stark contrast to VHDL, where architectures can share the same entity name. The only way to get around this in Verilog is to copy a module and rename it.

In Listing 1, a fairly standard shift register inserts a binary value at the end of a byte every clock cycle. If you’re experienced with VHDL, you can see that there aren’t any library declarations. This is mainly due to the fact that Verilog originated from an interpretive foundation. However, there are include directives that can be used to add external modules and features. Obviously, the first lines after the module statement are defining the modules’ port directions and type with the reserved words input and output. There is another declaration called inout, which is bidirectional but not in the listing. A module’s input and output ports can use integer and real, but binary is recommended if it is a top-level module.Listing1

The reg statement essentially acts like a storage unit. Because it has the same name as the output port it acts like one item. Using reg this way is helpful because its storage ability allows the output to remain constant while system inputs change between clock cycles. There is another kind of statement called wire. It is used to tie more than one module together or drive combinational designs. It will appear in later listings.

The next line of code is the always statement or block. You want to have a begin and end statement for it. If you know VHDL, this is the same as the process statement and works in the same fashion. If you are completely new to programmable logic in general, it works like this: “For every action X that happens on signals indicated in the sensitivity list, follow these instructions.” In some modules, there is usually a begin and an end statement. This is the equivalent of curly braces seen in C/C++. It’s best to use these with decision structures (i.e., always, if, and case) as much as possible.

Finally, the last statement is a logical left shift operation. Verilog bitwise operators in some instances need the keyword assign for the operation to happen. The compiler will tell you if an assign statement is missing. From there, the code does its insertion operation and then waits for the next positive edge of the clock. This was a pretty straightforward example; unfortunately, it doesn’t do much. The best way to get around that is to add more features using functions, tying-in more modules, or using parameters to increase flexibility.


Tasks and functions make module implementation clearer. Both are best used when redundant code or complex actions need to be split up from the main source. There are some differences between tasks and functions.

A task can call other tasks and functions, while a function can call only other functions. A task does not return a value; it modifies a variable that is passed to it as an output. Passing items to a task is also optional. Functions, on the other hand, must return one and only one value and must have at least one value passed to them to be valid. Tasks are well-suited for test benches because they can hold delay and control statements. Functions, however, have to be able to run within one time unit to work. This means functions should not be used for test benches or simulations that require delays or use sequential designs. Experimenting is a good thing because these constructs are helpful.

There is one cardinal rule to follow when using a function or task. They have to be defined within the module, unlike VHDL where functions are defined in a package to get maximum flexibility. Tasks and functions can be defined in a separate file and then attached to a module with an include statement. This enables you to reuse code in a project or across multiple projects. Both tasks and functions can use types other than binary for their input and output ports, giving you even more flexibility.Listing2

Listing 2 contains a function that essentially acts like a basic ALU. Depending on what is passed to the function, the function will process the information and return the calculated integer value. Tasks work in the same way, but the structure is a little different when dealing with inputs and outputs. As I said before, one of the major differences between a task and a function is that the former can have multiple outputs, rather than just one. This gives you the ability to make a task more complicated internally, if need be.Listing3

Listing 3 is an example of a task in action with more than one output. Note how it is implemented the same way as a function. It has to be defined and called within the module in order to work. But rather than define the task explicitly within the module, the task is defined in a separate file and an include directive is added in the module code just to show how functions and tasks can be defined outside of a module and available for other modules to use.


If too much is added to a module, it can become so large that debugging and editing become a chore. Doing this also minimizes code reuse to the point where new counters and state machines are being recreated when just using small modules/functions from a previous project is more than adequate. A good way to get around these issues is by making multiple modules in the same file or across multiple files and creating an instantiation of that module within an upper-level module to use its abilities. Multiple modules are good to have for a pipelined system. This enables you to use the same kind of module over multiple areas of a system. Older modules can also be used this way so less time is used on constant recreation.Listing4

That is the idea of code reuse in a nutshell. Now I will discuss an example of code reuse and multiple modules. The shift register from Listing 1 is having its data go into an even parity generator and the result from both modules is output through the top-level module in Listing 4. All of this is done across multiple files in one listing for easier reading. In all modular designs, there is always a module called a top-level entity, where all of the inputs and outputs of a system connect to the physical world. It is also where lower-level entities are spawned. Subordinates can spawn entities below themselves as well (see Figure 1).Figure 1

Think of it as a large black box with smaller black boxes connected with wires and those small black boxes have either stuff or even smaller black boxes. Pretty neat, but it can get annoying. Imagine a situation where a memory controller for 10-bit addressing is created and then the address length needs to be extended to 16 bits. That can be a lot of files to go through to change 10 to 16. However, with parameters all that needs to be changed is one value in one file and it’s all done.


Parameters are great to have around in Verilog and can make code reuse even more attractive. Parameters allow words to take the place of a numerical value like #define in C, but with some extra features such as overriding. Parameters can be put in length descriptors, making it easy to change the size of an output, input, or variable. For example, if a VGA generator had a color depth of 8 bits but needed to be changed to 32-bit color depth, then instead of changing the locations where the value occurs, only the value of the parameter would be changed and when the module was recompiled it would be able to display 32-bit color. The same can be done for memory controllers and other modules that have ports, wires, or registers with 1 bit or more in size. Parameters can also be overridden. This is performed just before or when a module is instantiated. This is helpful if the module needs to be the same all the time across separate projects that are using the same source, but needs to be a little different for another project. Parameters can also be used in functions and tasks as long as the parameter is in the same file the implementation code is in. Parameters with functions and tasks give Verilog the flexibility of a VHDL package, granted it really isn’t a package, because the implementation is located in a module and not in a separate construct.Listing5

There are many ways to override parameters. One way is by using the defparam keyword, which explicitly changes the value of the parameter in the instantiated module before it is invoked. Another way is by overriding the parameter when the module is being invoked. Listing 5 shows how both are done with dummy modules that already have defined parameters. The defparam method is from an older version of the language, so depending on the version of Verilog being used, make sure to pick the right method.

Download the entire article.

Evaluation Boards for SuperSpeed USB-to-FIFO Bridge ICs

FTDI recently launched a new family of evaluation/development modules to encourage the implementation of its next-generation USB interfacing technology. Its FT600/1Q USB 3.0 SuperSpeed ICs are in volume production and backed up by the UMFT60XX offering. The family comprises four models that provide different FIFO bus interfaces and data bit widths. With these modules, the operational parameters of FT600/1Q devices can be fully assessed and interfacing with external hardware undertaken, such as FPGA platforms.

At 78.7 mm × 60 mm, the UMFT600A and UMFT601A each have a high-speed mezzanine card (HSMC) interface with 16-bit-wide and 32-bit-wide FIFO buses, respectively. The UMFT600X and UMFT601X measure 70 mm × 60 mm and incorporate field-programmable mezzanine card (FMC) connectors with 16-bit-wide and 32-bit-wide FIFO buses, respectively.

The HSMC interface is compatible with most Altera FPGA reference design boards, while the FMC connector delivers the same functionality in relation to Xilinx boards. Fully compatible with USB 3.0 SuperSpeed (5 Gbps), USB 2.0 High Speed (480 Mbips), and USB 2.0 Full Speed (12 Mbps) data transfer, the UMFT60xx modules support two parallel slave FIFO bus protocols with an achievable data burst rate of around 400 MBps. The multi-channel FIFO mode can handle up to four logic channels. It is complemented by the 245 synchronous FIFO mode, which is optimized for more straightforward operation.

Source: FTDI

Encapsulated 80-A Digital Power Module for FPGAs, Processors, & Memory

Intersil Corp. recently announced the industry’s first 80-A fully encapsulated digital DC/DC PMBus power module that provides point-of-load (POL) conversions for advanced FPGAs, DSPs, ASICs, processors, and memory. The ISL8273M is a complete step-down regulated power supply that delivers up to 80-A output current and operates from industry-standard 5- or 12-V input power rails. Multiphase current sharing of up to four ISL8273M power modules enables you to create a 320-A solution with output voltages as low as 0.6 V. The compact (18 mm × 23 mm) ISL8273M provides high power density and performance for increasingly space-constrained data center equipment and wireless communications infrastructure systems.Intersil ISL8273M

The ISL8273M digital power module leverages a patented ChargeMode control architecture that delivers superior efficiencies, with up to 94% peak efficiency and better than 90% efficiency on most conversions. It also provides a single clock cycle fast transient response to output current load steps common in FPGAs and DSPs processing power bursts.

The 80A ISL8273M further distances itself from competitive digital power modules by delivering 2× higher output current. Its proprietary High Density Array (HDA) package offers unmatched electrical and thermal performance through a single-layer conductive package substrate that reduces lead inductance and dissipates heat primarily through the system board.

Key specs and features:

  • 80-A digital switch mode power supply with current sharing, multiphase and multi-modules support for up to 320-A power rails
  • Wide input voltage range from 4.5 to 14 V and programmable Vout from 0.6 to 2.5 V
  • PMBus-enabled solution for full system configuration, telemetry, and monitoring of all conversions and operating parameters
  • Up to 94% peak conversion efficiency with 1% output voltage accuracy
  • Single clock cycle transient response
  • Programmable Vout, soft-start, soft-stop, sequencing, margining and under-voltage, over-voltage, under-current, over-current, under temperature and over-temperature
  • Monitors Vin, Vout, Iout, temperature, duty cycle, switching frequency, power good and faults
  • Internal nonvolatile memory saves module configuration parameters and fault logging
  • Compact, thermally-enhanced high density array (HDA) package simplifies thermal management, solution positioning and PCB routing

The ISL8273M, available now in a thermally enhanced 18 mm × 23 mm × 7.5 mm HDA package, costs $69 in 1,000-piece quantities. The ISL8273MEVAL1Z 80A digital module evaluation board is available to speed time-to-market and priced at $89.

Source: Intersil Corp.

New Arria 10 Boards Target Cyber/Security, SigInt, & Acceleration

BittWare recently announced two new boards in its Altera Arria 10 FPGA product roadmap to complement their existing Arria 10 3U VPX and PCIe offerings: A10PED and A10XM4.

The A10PED Dual Arria 10 PCIe full-length Gen3 x16 Card supporting either the 660 or 1150 KLE size FPGAs (GX), with one supporting an optional SoC (SX) with dual ARM. Primarily targeting signal and network packet processing applications the board provides 28 lanes of serial I/O up to 10.325 Gbps each, with support for high-accuracy time stamping. Featuring 4x 260-pin DDR4 SODIMMs and a Hybrid Memory Cube (HMC), the A10PED will support up to 68 GB of memory with a peak aggregate memory bandwidth of over 175 GB/sec (not including I/O or PCIe). For latency-sensitive applications, some or all of the DDR4 SODIMMs can be replaced with proprietary QDR-II/IV SRAM SODIMMs. These memory options, coupled with full support for Altera’s OpenCL tools, also make this board compelling for acceleration & co-processing applications.

The A10XM4 Arria 10 XMC (VITA 42) Module provides network interface (NIC) and cyber/security capabilities in addition to host/carrier acceleration for applications in radar, EW, networking, and SigInt. In addition, it will support full conduction cooling. Compatible with any standard XMC carrier, the A10XM4 features an Arria 10 GX FPGA with two lanes of 10 GigE, along with up to 16 GB of memory and PCIe Gen3 x8 PCIe to the host. BittWare’s NIC application example and OpenCL BSP will greatly simplify the integration and development of cyber/security additions to and off-loading of standard host applications.

The A10PED full length PCIe board will be available Q4 2015 and the A10XM4 XMC board will be available Q1 2016.  Contact BittWare for configurations, pricing, and details.

Source: BittWare

Radiation-Tolerant FPGA Kit

Microsemi recently announced the availability of the RTG4 FPGA Development Kit for high-bandwidth space applications. The innovative kit provides space designers an evaluation and development platform for applications such as data transmission, serial connectivity, and more.Microsemi RTG4-Dev Kit

The development kit provides all necessary reference to evaluate and adopt RTG4 technology quickly. You don’t need to build a test board and assemble the device onto the board. The RTG4 Development Kit is ideal for evaluating and designing for remote sensing space payloads, radar and imaging, and spectrometry. Other applications include mobile satellite services (MSS) communication satellites, high-altitude aviation, medical electronics, and civilian nuclear power plant control.

RTG4 FPGAs feature reprogrammable flash configuration, which makes prototyping easier. Reprogrammable flash technology offers complete immunity to radiation-induced configuration upsets in the harshest radiation environments, without the configuration scrubbing required with SRAM FPGA technology. RTG4 supports space applications requiring up to 150,000 logic elements and up to 300 MHz of system performance.

The RTG4 Development Kit’s features and specs:

  • One RT4G150 device in a ceramic package with 1,657 pins
  • Two 1GB DDR3 synchronous dynamic random access memory (SDRAM)
  • 2GB SPI flash memory
  • PCI Express Gen 1 interface
  • One pair SMA connectors for testing of the full-duplex SERDES channel
  • Two FMC connectors with HPC/LPC pinout for expansion
  • RJ45 interface for 10/100/1000 Ethernet
  • USB micro-AB connector
  • Embedded Flashpro5 programmer and external programming header
  • Current measurement test points

The RTG4 Development Kit features a RT4G150 device offering more than 150,000 logic elements in a ceramic package with 1,657 pins. Kits are available now for purchase.

Source: Microsemi

FPGA-Based Storage Reference Design Doubles NAND Flash Life

Altera Corp. recently developed a storage reference design  based on its Arria 10 SoCs that doubles the life of NAND flash. In addition, can increase the number of program-erase cycles by up to 7×. The design features an Arria 10 SoC with an integrated dual-core ARM Cortex A9 processor in an optimized, single-chip solution. It uses a Mobiveil SSD controller and NVMdurance NAND optimization software. This reference design provides improved performance and flexibility in NAND utilization while reducing the cost of the NAND array by increasing the lifetime of data center equipment.NAND_AlteraMobiveil’s controller supports multi-core architectures, enabling threads to run on each core with their own queue and interrupt without any locks required. NVMdurance’s NAND flash optimization software monitors the NAND Flash’s condition and automatically adjusts the control parameters in real time. The reference design also features end-to-end data protection, encryption and compression, and optimizes throughput and power consumption, all in a small silicon footprint.

Altera’s NAND storage reference design is available today.

Source: Altera Corp.

ZestET2-NJ Gigabit Ethernet FPGA Module

Orange Tree Technologies recently launched the ZestET2-NJ high-performance Gigabit Ethernet FPGA module, which comprises a Gigabit Ethernet processing engine, Xilinx Artix-7 FPGA, DDR3 memory, and general-purpose I/O. Delivering the maximum sustained Ethernet bandwidth of over 100 MBps in both directions simultaneously, it is aimed at data acquisition and control applications in markets such as industrial vision, radar, sonar and medical imaging.OrangeTree-zestet2-nj

The Xilinx Artix-7 XC7A35T FPGA, which has more than 33,000 logic cells, 1.8 Mb of Block RAM and 90 DSP slices, is tightly coupled with 512 MB of 400-MHz DDR3 SDRAM, giving it an ample memory bandwidth of 1.6 GBps for high-speed processing and formatting of streaming data.  With ease of integration in mind, there are 105 FPGA I/O pins available for connection to the user’s equipment.

Orange Tree’s proprietary GigEx chip handles the entire TCP/IP stack at over 100 MBps in each direction simultaneously. It enables the User FPGA to be dedicated entirely to the application for maximum efficiency.  The module measures just 40 × 50 mm, making it ideal for integration into your products.

Source: Orange Tree Technologies

USB-to-FPGA Communications: A Case Study of the ChipWhisperer-Lite

Sending data from a computer to an FPGA is often required. This might be FPGA configuration data, register settings, or streaming data. An easy solution is to use a USB-connected microcontroller instead of a dedicated interface chip, which allows you to offload certain tasks into the microcontroller.

In Circuit Cellar 299 (June 2015), Colin O’Flynn writes:

Often your FPGA-based project will require computer communication and some housekeeping tasks. A popular solution is the use of a dedicated USB interface chip, and a soft-core processor in the FPGA for housekeeping tasks.

For an open-source hardware project I recently launched, I decided to use an external USB microcontroller instead of a dedicated interface chip. I suspect you’ll find a lot of useful design tidbits you can use for yourself—and, because it’s open source, getting details of my designs doesn’t involve industrial espionage!

The design is called the ChipWhisperer-Lite (see Photo 1). This device is a training aid for learning about side-channel power analysis of cryptographic implementations. Side-channel power analysis uses measurements of small power variations during execution of the cryptographic algorithms to break the implementation of the algorithm.

Photo 1: This shows the ChipWhisperer-Lite, which contains a Xilinx Spartan 6 LX9 FPGA and Atmel SAM3U2C microcontroller. The remaining circuitry involves the power supplies, ADC, analog processing, and a development device which the user programs with some cryptographic algorithm they are analyzing.

Photo 1: This shows the ChipWhisperer-Lite, which contains a Xilinx Spartan 6 LX9 FPGA and Atmel SAM3U2C microcontroller. The remaining circuitry involves the power supplies, ADC, analog processing, and a development device which the user programs with some cryptographic algorithm they are analyzing.

In a previous article, “Build a SoC Over Lunch” (Circuit Cellar 289, 2014), I made the case for using a soft-core processing in an FPGA. In this article I’ll play the devil’s advocate by arguing that using an external microcontroller is a better choice. Of course the truth lies somewhere in between: in this example, the requirement of having a high-speed USB interface makes an external microcontroller more cost-effective, but this won’t always be the case.

This article assumes you require computer communication as part of your design. There are many options for this. The easiest from a hardware perspective is to use a USB-Serial converter, and many projects use such a system. The downside is a fairly slow interface, and the requirement of designing a serial protocol.

A more advanced option is to use a USB adapter with a parallel interface, such as the FTDI FT2232H. These can achieve very high-speed data rates—basically up to the limit of the USB 2.0 interface. The downside of these options is that it still requires some protocol implemented on your FPGA for many applications, and it has limited extra features (such as if you need housekeeping tasks).

The solution I came to is the use of a USB microcontroller. They are widely available from most vendors with USB 2.0 high-speed (full 480 Mbps data rate) interfaces, and allow you to perform not only the USB interface, but the various housekeeping tasks that your system will require. The USB microcontroller will also likely be around the same price (or possibly cheaper) than the equivalent specialized interface chip.

When selecting a microcontroller, I recommend finding one with an external memory bus interface. This external memory bus is normally designed to allow you to map devices such as SRAM or DRAM into the memory space of the microcontroller. In our case we’ll actually be mapping FPGA registers into the microcontroller memory space, which means we don’t need any protocol for communication with the FPGA.


Figure 1: This figure shows the basic connections used for memory-mapping the FPGA into the microcontroller memory space. Depending on your requirements, you can add some additional custom lines, such as a flag to indicate different FPGA register banks to use, as only a 9-bit address bus is used in this example.

I selected an Atmel SAM3U2C microcontroller, which has a USB 2.0 high-speed interface. This microcontroller is low-cost and available in TQFP package, which is convenient if you plan on hand assembling prototype boards. The connections between the FPGA and microcontroller are shown in Figure 1.

On the FPGA, it is easy to map this data bus into registers. This means that to configure some feature in the FPGA, you can just directly write into a register. Or if you are transferring data, you can read from or write to a block-RAM (BRAM) implemented in the FPGA.

Check out Colin’s ChipWhisperer-Lite KickStarter Video:

New High-Performance VC Z Series Cameras

Vision Components recently announced the availability of its new intelligent camera series VC Z. The embedded systems offer real-time image processing suitable for demanding high-speed and line scan applications. All models are equipped with Xilinx’s Zynq module, an ARM dual-core Cortex-A9 with 866 MHz and an integrated FPGA.Vision Components - VC_Z_series_stapel_pingu

The new camera is based on the board camera series VCSBC nano Z. With a footprint of 40 × 65 mm, these compact systems are especially easy to integrate into machines and plants. They are optionally available with one or two remote sensor heads and thus suitable for stereo applications.You can choose between two enclosed camera types: the VC nano Z, which has housing dimensions of 80 × 45 × 20 mm, and the VC pro Z, which measures 90 × 58 × 36 mm and can be fitted with a lens and an integrated LED illumination. The new operating system VC Linux ensures optimal interaction between hardware and software.

Source: Vision Components

Engineering “Moonshot” Projects

In 2009, Andrew Meyer, an MIT-trained engineer and entrepreneur, co-founded LeafLabs, a Cambridge, MA-based R&D firm that designs “powerful physical computing devices for control and communication among smart machines (including humans).” We recently asked Andrew to tell us about his background, detail some of his most intriguing projects, tell us about his contributions to Project Ara, and share his thoughts on the future of electrical engineering.AndrewMeyerLeaflabs

CIRCUIT CELLAR: How did you become interested in electronics? Did you start at a young age?

ANDREW: Yes, actually, but I am not sure I really got anywhere fooling around as a kid. I had a deep love of remote control cars and airplanes in middle school. I was totally obsessed with figuring out how to build my own control radio. This was right before the rise of Google, and I scoured the net for info on circuits. In the end, I achieved a reasonable grasp on really simple RC type circuits but completely failed in figuring out the radio. Later in high school I took some courses at the local community college and built an AM radio and got into the math for the first time – j and omega and all that.

CIRCUIT CELLAR: What is Leaflabs? How did it start? Who comprises your team today?

ANDREW: LeafLabs is an R&D firm specializing in embedded and distributed systems. Projects start as solving specific problems for a client, but the idea is to turn those relationships into product opportunities. To me, that’s what separates R&D from consulting.


The LeafLabs Office (Source: LeafLabs)

I started LeafLabs with a handful of friends in 2009. It was an all MIT cast of engineers, and it took four or five years before I understood how much we were holding ourselves back by not embracing some marketing and sales talent. The original concept was to try and design ICs that were optimized for running certain machine learning algorithms at low power. The idea was that smartphones might want to do speech to text some day without sending the audio off to the cloud. This was way too ambitious for a group of 22 year olds with no money.

Our second overly ambitious idea was to try and solve the “FPGA problem.” I’m still really passionate about this, but it too was too much for four kids in a basement to take a big bite off. The problem is that FPGAs vendors like Xilinx and Altera have loads of expertise in silicon, but great software is just not in their DNA. Imagine if x86 never published their instruction set. What if Intel insisted on owning not just the processors, but the languages, compilers, libraries, IDEs, debuggers, operating systems, and the rest of it? Would we ever have gotten to Linux? What about Python? FPGAs have enormous potential to surpass even the GPU as a completely standard technology in computer systems. There should some gate fabric in my phone. The development tools just suck, suck, suck. If any FPGA executives are reading this: Please open up your bitstream formats, the FSF and the rest of the community will get the ball rolling on an open toolchain that will far exceed what you guys are doing internally. You will change the world.

CIRCUIT CELLAR: How did the Maple microcontroller board come about?

ANDREW: Arduino was really starting to come up at the time. I had just left Analog, where we had been using the 32-bit Cortex M3. We started asking “Chips like the STM32 are clearly the way of the future, why on earth is Arduino using a chip from the ‘90s?” Perry, another LeafLabs founder, was really passionate about this. ARM is taking over the world, the community deserves a product that is as easy to use as Arduino, but built on top of modern technology.

CIRCUIT CELLAR: Can you give a general overview of your involvement with Project Ara?

ANDREW: We got into Ara at the beginning as subcontractors to the company that was leading a lot of the engineering, NK Labs. Since then our role has expanded quite a bit, but we are still focused on software and firmware development. Everyone understood that Ara was going to require a lot of firmware and FPGA work, and so we were a natural choice to get involved. One of the first Ara prototypes actually used the Maple software library, libmaple, and had eight FPGAs in it! For your readers that are interested in Ara, please to check out and

LeafLabs is focused on firmware development. What’s really exciting to me about the project is the technology under the hood. Basically, what we have done is built a network on a PCB. The first big problem with embedded linux devices is that they are completely centered around the SoC. Change the SoC and you are in for ton of software development, for instance, to bring your display driver back to life. Similarly, changes to the design, such as incorporating a faster Wi-Fi chip, might force you to change the SoC. This severe coupling between everything keeps designers from iterating. You have this attitude of “OK, no one touch this design for the next 5 years, we finally got it working.” If we have learned anything from SaaS and App companies it’s that quickly iterating and continuous deployment are key to great products. If your platform inhibits iteration, you have a big problem.

The other problem with embedded systems is that there are so many protocols! SDIO, USB, DSI, I2C, SPI, CSI, blah blah blah. Do we really need so many!? Think how much mileage we get out of TCP/IP. The protocol explosion just adds impedance to the entire design process, and forces engineers to be worrying about bits toggling on traces rather than customer facing features.

The technology being developed for Ara, called Greybus, solves both these problems. The centerpiece of our phone is a switch, and the display, Wi-Fi, audio, baseband, etc all hang off the switch as network devices. Even the processor is just another module hanging off this network. All modules speak the same “good enough” protocol called UniPro (Unified Protocol). The possibilities here are absolutely tantalizing. To learn more about Greybus, see here:

CIRCUIT CELLAR: Can you define “minimalist data acquisition” for our readers? What is it and why does it interest you?

ANDREW: More and more fields, but particularly in neuroscience, are having to deal with outrageously huge real-time data sets. There are 100 billion neurons in the human brain. If we want to listen to just 1,000 of them, we are already talking about ~1 Gbps. Ed Boyden, a professor at MIT, asked us if we could build some hardware to help handle the torrent. Could we scale to 1 Tbps? Could we build something that researchers on a budget could actually afford and that mere mortals could use?

The Willow (Source: LeafLabs)

The Willow (Source: LeafLabs)

Willow is a hardware platform for capturing, storing, and processing neuroscience data at this scale. We had to be “minimalist” to keep costs down, and ensure our system is easy to use. Since we need to use an FPGA anyway to interface with a data source (like a bank of ADCs, or an array of image sensors), we thought, “Why not use the same chip for interfacing to storage?” With a single $150 FPGA and a couple of $200 SSD drives, we can record at 12 Gbps, put guarantees on throughput, and record for a couple of hours!

CIRCUIT CELLAR: What are you goals for LeafLabs for the next 6 to 12 months?

ANDREW: Including our superb remote contractors, our team is pushing 20. A year from now, it could be double that. This is a really tricky transition—where company culture really starts to solidify, where project management becomes a first-order problem, and where people’s careers are on the line. My first goal for LeafLabs is make sure we nail this transition and build off of a really solid foundation. Besides that, we are always looking for compelling new problems to work on and new markets to play in. Getting into neuroscience has been an absolute blast.

The complete interview appears in Circuit Cellar 298 (May 2015).

Low-Profile PCIe Board Platform

BittWare recently announced today its second low-profile PCIe board—the A5-PCIe-S (A5PS). The new board is based on Altera’s Arria V GZ FPGA, which provides a high level of system integration and flexibility for I/O, routing, and processing. Thus, the A5PS is a reliable platform for a variety of applications (e.g., network processing, security, broadcast, and signals intelligence).BittWare A5PS

Featuring dual SFP+ cages that run up to 12.5 Gbps, the A5PS provides dual 10GigE ports using optical transceivers as well as passive copper cabling up to 7 m. These ports are serviced by the advanced 28-nm Arria V GZ FPGA, which also supports a Gen3 x8 PCIe interface and either 8-GB DDR3 or 36-MB QDRII+. Sophisticated time-stamping and synchronization options are supported by dual SMA connectors for interfacing to 1-PPS or 10-MHz reference clocks, in addition to the tunable on-board high accuracy, temperature compensated oscillator (TCXO). A comprehensive Board Management Controller (BMC) with host software support for advanced system monitoring is also provided.

The A5PS features and specifications include:

  • Altera Arria V GZ FPGA
  • PCIe x8 interface supporting Gen1, Gen2, or Gen3
  • Dual SFP+ cages for 2x 10GigE: Support for a wide range of optical transceiver; built-in low-latency active drivers/receivers for passive copper cables up to 7 m
  • Memory options (pick one): DDR3 (single 72-bit bank of up to 8 GBytes DDR3-1600 with ECC); QDRII+ (two 18-bit banks of up to 144 Mb each—288 Mb or 36 MB total)
  • Board Management Controller for Intelligent Platform Management
  • USB 2.0 for programming, debug, or control
  • Timestamping and synchronization support
    • Dual SMA for reference clock/synchronization inputs
    • Tunable high-accuracy TCXO
    • Programmable clock synthesizer (Si5338)
  • Complete software support with BittWare’s BittWorks II Toolkit
  • Broad range of IP offerings
    • 10 GigE MAC
    • TCP/IP Offload Engines (TOE), UDP Offload Engines
    • PTP/IEEE-1588
    • PCIe DMA

The A5PS board currently costs $1,500 in 1000s for the A5PS with the Arria V GZ E1 with no external memory. Contact BittWare for additional configurations, pricing, and details.

Source: BittWare


RTG4 Radiation-Tolerant FPGAs for High-speed Signal Processing Applications

Microsemi Corp. today announced availability of its RTG4 high-speed, signal-processing radiation-tolerant FPGA family. The RTG4’s reprogrammable flash technology offers complete immunity to radiation-induced configuration upsets in the harshest radiation environments, requiring no configuration scrubbing, unlike SRAM FPGA technology. RTG4 supports space applications requiring up to 150,000 logic elements and up to 300 MHz of system performance.Microsemi RTG4-  3-4view

Typical uses for RTG4 include remote sensing space payloads, such as radar, imaging and spectrometry in civilian, scientific and commercial applications. These applications span across weather forecasting and climate research, land use, astronomy and astrophysics, planetary exploration, and earth sciences. Other applications include mobile satellite services (MSS) communication satellites, as well as high altitude aviation, medical electronics and civilian nuclear power plant control. Such applications have historically used expensive radiation-hardened ASICs, which force development programs to incur substantial cost and schedule risk. RTG4 allows programs to access the ease-of-use and flexibility of FPGAs without sacrificing reliability or performance.

The flexibility, reliability and performance of RTG4 FPGAs make it much easier to achieve this. RTG4 is Microsemi’s latest development in a long history of radiation-tolerant FPGAs that are found in many NASA and international space programs.

Key product features include:

  • Up to 150,000 logic elements; each includes a four-input combinatorial look-up table (LUT4) and a flip-flop with built-in single event upset (SEU) and single event transient (SET) mitigation
  • High system performance, up to 300 MHz
  • 24 serial transceivers, with operation from 1 Gbps to 3.125 Gbps
  • 16 SEU- and SET-protected SpaceWire clock and data recovery circuits
  • 462 SEU- and SET-protected multiply-accumulate mathblocks
  • More than 5 Mb of on-board SEU-protected SRAM
  • Single event latch-up (SEL) and configuration memory upset immunity
  • Total ionizing dose (TID) beyond 100 Krad

Engineering silicon, Libero SoC development software, and RTG4 development kits are available now. RTG4 FPGAs and development kits have already shipped to some of the 120+ customers engaged in the RTG4 lead customer program. Flight units qualified to MIL-STD-883 Class B are expected to be available in early 2016.

Microsemi will present more information on RTG4 FPGAs in a live webinar on May 6 and will also be hosting Microsemi Space Forum events in the U.S., India and Europe starting in June, presenting information on RTG4 FPGAs and the extensive range of Microsemi space products.

Source: Microsemi Corp.