Modern workloads demand higher computational capabilities at low power consumption and cost. As traditional multi-core machines do not meet the growing computing requirements, architects are exploring alternative approaches. One solution is hardware specialization in the form of application specific integrated circuits (ASICs) to perform tasks at higher performance and lower power than software implementations. The cost of developing custom ASICs, however, remains high. Reconfigurable computing fabrics, such as field-programmable gate arrays (FPGAs), offer a promising alternative to custom ASICs. FPGAs couple the benefits of hardware acceleration with flexibility and lower cost.
FPGA-based reconfigurable computing has recently taken the spotlight in academia and industry as evidenced by Intel’s high-profile acquisition of Altera and Microsoft’s recent announcement to deploy thousands of FPGAs to speed up Bing search. In the coming years, we should expect to see hardware/software co-designed systems supported by reconfigurable computing to become common. Conventional RTL design methodologies, however, cannot productively manage the growing complexity of algorithms we wish to accelerate using FPGAs. Consequently, FPGA programmability is a major challenge that must be addressed both technologically by leveraging high-level software abstractions (e.g., language and compilers), run-time analysis tools, and readily available libraries and benchmarks, as well as scholastically through the education of rising hardware/software engineers.
Recent efforts related to software-programmable FPGAs have focused on designing high-level synthesis (HLS) compilers. Inspired by classical C-to-gates tools, HLS compilers automatically transform programs written in traditional untimed software languages to timed hardware descriptions. State-of-the-art HLS tools include Xilinx’s Vivado HLS (C/C++) and SDAccel (OpenCL) as well as Altera’s OpenCL SDK. Although HLS is effective at translating C/C++ or OpenCL programs to RTL hardware, compilers are only a part of the story in realizing truly software-programmable FPGAs.
Efficient memory management is central to software development. Unfortunately, unlike traditional software programming, current FPGA design flows require application-specific memories to sustain high performance hardware accelerators. Features such as dynamic memory allocation, pointer chasing, complex data structures, and irregular memory access patterns are also ill-supported by FPGAs. In lieu of basic software memory abstractions techniques, experts must design custom hardware memories. Instead, more extensible software memory abstractions would facilitate software-programmability of FPGAs.
In addition to high-level programming and memory abstractions, run-time analysis tools such as debuggers and profilers are essential to software programming. Hardware debuggers and profilers in the form of hardware/co-simulation tools, however, are not ready for tackling exascale systems. In fact, one of the biggest barriers to realizing software-programmable FPGAs are the hours, even days, it takes to generate bitstreams and run hardware/software co-simulators. Lengthy compilation and simulation times cause debugging and profiling to consume the majority of FPGA development cycles and deter agile software development practices. The effect is compounded when FPGAs are integrated into heterogeneous systems with CPUs and GPUs over complex memory hierarchies. New tools, following architectural simulators, may aid in rapidly gathering performance, power, and area utilization statistics for FPGAs in heterogeneous systems. Another solution to long compilation and simulation times is using overlay architectures. Overlay architectures mask the FPGA’s bit-level configurability with a fixed network of simple processing nodes. The fixed hardware in overlay architectures enables faster programmability at the expense of finer grained, bit-level parallelism of FPGAs.
Another key facet of software programming is readily available libraries and benchmarks. Current FPGA development is marred with vendor specific IPs cores that span limited domains. As FPGAs become more software-programmable, we should expect to see more domain experts providing vendor agnostic FPGA-based libraries and benchmarks. Realistic, representative, and reproducible vendor-agnostic libraries and benchmarks will not only make FPGA development more accessible but also serve as reference solutions for developers.
Finally, the future of software-programmable FPGAs lies not only in technological advancements but also in educating the next generation of hardware/software co-designing engineers. Software engineers are rarely concerned with the downstream architecture except when exercising expert optimizations. Higher-level abstractions and run-time analysis tools will improve FPGA programmability but developers will still need a working knowledge of FPGAs to design competitive hardware accelerators. Following reference libraries and benchmarks, software engineers must become fluent with the notion of pipelining, unrolling, partitioning memory into local SRAM blocks and hardened IPs. Terms like throughout, latency, area utilization, power and cycle time will enter software engineering vernacular.
Recent advances in HLS compilers have demonstrated the feasibility of software-programmable FPGAs. Now, a combination of higher-level abstractions, run-time analysis tools, libraries and benchmarks must be pioneered alongside trained hardware/software co-designing engineers to realize a cohesive software engineering infrastructure for FPGAs.
Udit Gupta earned a BS in Electrical and Computer Engineering at Cornell University. He is currently studying toward a PhD in Computer Science at Harvard University. Udit’s past research includes exploring software-programmable FPGAs by leveraging intelligent design automation tools and evaluating high-level synthesis compilers with realistic benchmarks. He is especially interested in vertically integrated systems—exploring the computing stack from applications, tools, languages, and compilers to downstream architectures