Design Solutions Research & Design Hub

Designing Accelerators: Hardware vs. Software

Written by Nishant Mittal

Speed and Synthesis

The choice of implementing speed optimizations in hardware versus software has always been a moving target. More recently, acceleration has tipped to the hardware side. In this article, Nishant looks at how accelerators are designed on FPGAs, focusing mostly on Xilinx-specific accelerator design techniques using high-level synthesis tools.

  • How accelerators are designed on FPGAs: software vs. hardware

  • What is the accelerator design process?

  • What are the design approaches to accelerator design?

  • Vivado HLS tool from Xilinx

  • Xilinx’s Microblaze FPGA softcore

  • Xilinx Vitis software

Since the early days of Silicon Valley, hardware and software have been shifting around their roles. Early on, hardware’s role was to be the medium on which software implements algorithms to make things work. After some time, sensors were introduced that added some intelligence to hardware and loosened some of the software’s responsibility. With the emergence of ASICs and SoCs, things started getting complicated for software in terms keeping up with speed requirements. Many optimization algorithms were introduced to tackle the speed problem but the complete burden on software kept the speed bottleneck on the software side.

For quite some time now, with an exponential growth in FPGA and ASICs—plus neural networking adding its own robotic intelligence—things have really turned around and hardware began sharing some portion of algorithm processing. Eventually that hardware evolved into being called hardware accelerators. In this article, we will discuss how accelerators are designed on FPGAs.

SOFTWARE TO HARDWARE EXTRACTION

In this article, I will mostly discuss Xilinx accelerator design techniques using an HLS (high level synthesis) tool. An HLS tool takes C/C++ language inputs, analyzes them and converts them into a hardware-specific HDL (hardware definition language). After the tool analyzes the code and extracts the logic into HDL-specific code, that code can then be loaded onto an FPGA. The advantage of this method is there is either no or very little software-to-hardware interaction. That’s because the entire logic set exists in the form of hardware. Given that the software designer has to write this code, the other advantage is that the software designer does not need to know any hardware-related specifics beyond the basics.

Although this approach looks quite impressive on paper, the implementation of a huge software algorithm on hardware brings its own set of challenges. First, hardware is fixed. Once developed, it cannot be expanded and modified like software. We need to live with it. The same applies to an FPGA. An FPGA comes with fixed BRAM, URAM, LUTs, FIFOs and other FPGA elements. That means we need to optimize the software so that it fits into the hardware.

Latency on hardware cannot be neglected. Let’s say one piece of hardware is placed on one corner of FPGA and another on the other corner. That means there’s a huge path to be travelled by the logic for those hardware pieces to interact. This can create timing violations and also a reduction in frequency. The hardware will add a lot of timing-optimizing components like buffers, which will impact the performance of the algorithm.

We all know that hardware doesn’t have its own “brain.” It gets all its brain power from the instructions we fetch to it. The same goes for a hardware accelerator. In other words, for hardware to be able to work in parallel depends on how we have written the code. Its parallel execution capability depends on us.

— ADVERTISMENT—

Advertise Here

THE PROCESS

Now let’s examine the entire design thought process involved before an SoC turns into an accelerator. Figure 1 shows the accelerator design process. First, the requirements need to be converted to C/C++ -based code. The logic has to be optimized to make software execution time run at its best. Next, a test bench is designed to test the functionality of software.

Figure 1 Accelerator design process
Figure 1
Accelerator design process

Once the software is verified, it is then converted to HLS-based code that supplies “pragmas” (directives) for hardware (definition) conversion. We will discuss this more in the next section. Once the hardware is generated, a hardware test bench is designed and hardware is verified. If the hardware is not optimized and doesn’t fit on the FPGA, it needs to reiterate the software so that less memory is used. After the accelerator is design verified, next it is integrated with a processor—or FPGA softcore like Xilinx’s Microblaze—to put the accelerator into some application.

Figure 2 shows the block diagram of an implementation of accelerators using the Vivado HLS tool from Xilinx. Vivado HLS provides a unique environment for software developers to design accelerators of their choice. You just need to write a piece of code and a test bench and click a button. The rest of the stuff is handled smoothly by the Vivado HLS tool inside Xilinx Vitis software. The only constraint is that the C code needs to be formatted per Xilinx guidelines.

Figure 2 HLS design flow (Image source: Xilinx Vivado HLS datasheet)
Figure 2
HLS design flow (Image source: Xilinx Vivado HLS datasheet)

The difference between Vivado and Vitis may seem confusing. Vivado is intended for a hardware-centric approach to designing hardware. In contrast, Vitis provides a software-centric approach to developing both hardware and software.

DESIGN APPROACHES

Now let’s examine in more detail the various ways in which an accelerator can be designed. We need to understand that the thought process behind making an accelerator for an algorithm should always revolve around optimization and parallelism. Although the developer need not know hardware specs, it’s still important to think from a hardware point of view. To understand this thought process, let’s analyze a simple calculator problem.

Let’s say you need to implement y = m/n + x – p. If this is to be written in the form of code, it would look as follows:

Int calci (char m, char n, char x, char p) {
Char y;
Y=m/n+x-p;
Return y;

High Level Synthesis works in three steps:

  1. Scheduling, which determines chunk of operations, which happen in each clock cycle
  2. Binding, which performs hardware resource assignment to all operations
  3. Control logic extraction, which extracts logic to create finite state machine-based HDL design

If we analyze the calculation code, we see that the division operation and addition take 1 clock cycle, and the final subtraction operation takes another clock cycle. That means this entire operation takes 2 clock cycles to complete this calculation once this is implemented on hardware. If you observe the code, the first cycle did two operations at once. This was possible because the hardware, which was inferred was a DSP (digital signal processor). To make any software infer a specific hardware, we need to provide directives to the tool in the form of “pragmas.” For example, if you want the level of parallelism to be some number according to your design, you can specify that in a pragma, and the tool will perform the operations accordingly.

The HLS tool not only helps developers to infer hardware, but also helps to infer various bus protocols—like AXI—as and when necessary. For more details about kinds of pragmas that tell the Vivado HLS tool to infer various hardware components, check out the Vivado High Level Synthesis guide [1] from Xilinx.

— ADVERTISMENT—

Advertise Here

Now that the design is completed, we can create an IP block out of that accelerator. Figure 3 shows the IP design of a typical HLS accelerator. Vivado HLS has a few control signals to keep the data transactions intact. These are named in the format: ap_XX. In this example, we have ap_start, ap_done, ap_idle and ap_ready. These signals are register controlled as well as algorithm controlled. They are based on pragmas that we provide.

Figure 3 HLS IP block
Figure 3
HLS IP block
CONCLUSION

In this article, we discussed how algorithms implemented on hardware can speed up the performance of the overall system. We examined how the Vivado HLS tool helps developers to implement accelerators without needing to have much information about the hardware. We also compared the C/C++ implementation with respect to the various types of hardware it creates on the hardware description language. 

RESOURCES

Reference:
[1] Vivado HLS User’s Manual
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_3/ug902-vivado-high-level-synthesis.pdf

Xilinx | www.xilinx.com

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • NOVEMBER 2021 #376 – Get a PDF of the issue

Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.


Note: We’ve made the May 2020 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article
Systems Engineer | + posts

Nishant Mittal is a Hardware Systems Engineer in Hyderabad, India.

Supporting Companies

Upcoming Events


Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2022 KCK Media Corp.

Designing Accelerators: Hardware vs. Software

by Nishant Mittal time to read: 6 min