From Flip-Flops to Applications
FPGAs are popular with designers around the world for their flexibility and reprogrammability. In this article, I will cover some of the basics of FPGAs—from their architecture to some aspects of FPGA-based design.
Field-programmable gate arrays (FPGAs) are semiconductor devices that contain an array of configurable logic blocks (CLBs) which are connected via programmable interconnects. They were developed in the late ‘80s to address processing, industrial, automotive, and even aerospace applications. Since then, the number of applications centered around FPGAs has increased exponentially. In this article we will explore the architecture of FPGAs, as well as some aspects of creating an FPGA design.
FPGAs are categorized based on their different end-user applications, such as those in automotive, defense and space. Table 1 shows some examples of devices for each of those industries.
FPGA resources are categorized by the following parameters:
- Number of Available I/O’s (Differential and Single-Ended).
- Amount of Internal memory (RAM)
- DSP Slices (Complex Multipliers)
- Connectivity Ports (Serial Interfaces, Memory Interfaces, etc.)
Pin-compatible devices with more resources than are needed should be considered while choosing an FPGA for a new design, to allow for enough design margin.
As a basic rule, 60% to 75% utilization of an FPGA’s logic resources—like slices and LUTs—can be considered a higher limit, beyond which congestion and timing issues can arise.
FPGAs are used in a broad range of applications in technologies like 5G wireless, embedded vision, industrial Internet-of-Things (IoT), and cloud computing. This is made possible by the availability of ARM processor cores and C-based compilers for a given FPGA platform.
Another important application of FPGAs is in the field of platform engineering, an increasingly in-demand discipline in the industry. In platform engineering, SoCs and ASICs can be prototyped on FPGAs and validated even before the tape-out of the final device. Let’s now try to understand FPGA architecture and its workflows.
Modern FPGAs come with their programmable logic integrated with the processor system core in the same chip—converting an FPGA into a programmable SoC. Figure 1 is the various logic components inside a Zynq Ultrascale+ FPGA. Typically, these consist of a configurable logic block (CLB), digital signal processing (DSP), a transceiver, input/output (I/O) blocks, memories, and Federal Development and Certification Environment (FDCE) blocks.
CLBs: A CLB’s logic function is defined and configured by the FPGA user. A typical CLB contains a set of lookup tables (LUTs) and D-type flip-flops with clock enable (FDCE). When the logic is programmed into the FPGA, each CLB takes a part of the logic and configures itself to perform that function. Figure 2 is a typical CLB contained in an FPGA.
DSP Slices: Many algorithms, such as AI, require a lot of math and signal processing to handle a specific scenario. DSP broadens the scope of the overall FPGA structure, so that complex algorithms such as filtering or matrix multiplication are performed with significantly greater efficiency than they would be using many CLBs.
Transceivers: Several transceivers available in complex FPGAs can transmit and receive data at a high data rate via a Serializer/Deserializer (SERDES), a pair of functional blocks which can rapidly send thousands of signals through a single transceiver path. A set of high-speed transceiver blocks can be connected using a GT cable to transmit and receive data at a rate of tens of Gbps. Figure 3 is a block diagram of a SERDES.
I/O Blocks: I/O blocks are a vital part of an FPGA—they are where the FPGA’s data connects to external circuitry. I/O ports are defined in a Xilinx Design Constraints (XDC) file with the extension .xdc. In an XDC file, one needs to provide a pin number along with a logic level voltage (such as LVCMOS or LVTTL) based on the external device to which it’s connected. It’s important to refer to the I/O bank voltages of the I/O ports to select the correct port.
Consider the Zynq UltraScale+ device as an example. Each I/O bank contains 52 SelectIO interface pins. In some devices, there are high-range (HR) I/O mini-banks containing 26 SelectIO pins, each with their own independent power supply and VREF pin. The SelectIO pins can be configured to various I/O standards, either single-ended or differential. Single-ended I/O standards are, for example, LVCMOS, LVTTL, and POD. Some examples of differential I/O standards are LVDS, SLVS, and LVPECL.
Certain rules must be obeyed while combining different input, output, and bidirectional standards in the same bank:
- Output standards with the same output VCCO requirement can be combined in the same bank.
- Input standards with the same VCCO and VREF requirements can be combined in the same bank.
- Input standards and output standards with the same VCCO requirement can be combined in the same bank.
Block Random Access Memory: There are various types of memory available which can be interfaced with an ASIC. But in FPGAs with limited die area, memory is more constrained. The dedicated memory on the chip itself is referred to as block RAM (BRAM). There are other types of RAM, like unified RAM (URAM), or distributed RAM (DRAM), which is part of SLICEM. While each individual block is a fixed size (36Kb for Xilinx 7 series chips), these blocks can be subdivided or cascaded to make smaller or larger BRAM blocks as needed. They can also be configured to support special functionality such as error-correction. BRAMs are a major component of FPGAs, and a high percentage of utilization can result in congestion and non-routable scenarios. So, BRAM usage must be planned efficiently to prevent this.
A typical FPGA BRAM instance is somewhat different from the hard memories found in an ASIC. When writing code for a BRAM to be inferred, the ports mapping and instantiation must be correctly implemented in the HDL code to ensure it’s synthesized properly. Figure 4 shows a typical FPGA-inferred BRAM.
BRAM can be either synchronous or asynchronous. When we say that BRAM is synchronous, we mean that reads and writes are synchronous with the clock. Listing 1 is a small snippet of Verilog code defining synchronous BRAM.
FPGA BRAM code
parameter DATA_WIDTH = 8 ;
parameter ADDRESS_WIDTH = 8 ;
input [DATA_WIDTH-1:0] din;
input [ADDRESS_WIDTH-1:0] address;
output [DATA_WIDTH-1:0] dout;
reg [DATA_WIDTH-1:0] memoryelement[ADDRESS_WIDTH-1:0];
reg [DATA_WIDTH-1:0] d_out[ADDRESS_WIDTH-1:0];
always @(posedge clk)begin
if (cs && we && !oe) begin
else if(cs && !we && oe) begin
assign dout =d_out[address];
FDCE: FDCEs are the flip-flop blocks present in an FPGA. These are limited in number, and they play a major part in the overall FPGA utilization. FDCE blocks play a critical role in logic design, as well as in timing constraints and placement. An efficient FDCE placement promotes an accurate timing design.
FPGA Design Flow
FPGA design flow is typically spread across four major stages: elaboration, synthesis, routing, and device programming. In the elaboration stage, the design is compiled, checked for any syntax errors, and converted into circuitry. During elaboration, behavioral simulations can be performed to assess if the design meets the logic requirements. A test bench can be added, and the design assessed.
Once the design meets the logic requirements, it’s parsed in the synthesis stage, where it’s converted into a flattened netlist. This netlist file then translates, maps, and finally performs placement inside the FPGA. If the design over-utilizes the FPGA’s resources in the synthesis stage, it remains unplaced and throws an error in the design tool.
In this case, one needs to clean up the design and restart all the stages. When the design is placed properly, it passes through the routing stage, in which all of the paths are routed. During the routing stage, the impact of timing constraints shows up. There may be issues like SLR crossing, SLL issues, or congested nets due to high slack. This can result in heavily negative slack or a design exhibiting level six (or higher) congestion, which will never converge. In this stage, timing must be analyzed appropriately, and necessary actions need to be taken on individual paths. Figure 5 shows the FPGA design flow diagram.
All the necessary timing constraints are written in the XDC file, which is attached to Vivado or any other FPGA tool. Note that the FPGA has a unique Debug Hub feature which allows users to debug the signals using an interface between the FPGA’s JTAG Boundary Scan and the Vivado Debug core. The user can probe signals listed during the compile. All those signals will appear in the hardware manager once the bit file and ltx files are loaded.
So, we’ve discussed various areas of FPGAs, from architecture to design flow. We also covered different issues which arise while creating an FPGA design. Needless to say, this is only the tip of the iceberg when it comes to working with FPGAs. But I hope this article is a satisfactory primer on the topic.
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JANUARY 2023 #390 – Get a PDF of the issueSponsor this Article