First True Hybrid x86 Architecture Offers Quantum Core Count Leap
The 12th generation Intel Core mobile and desktop processors have recently arrived in the embedded computing space, with implementations on the high-performance Computer-on-Module standards COM-HPC and COM Express. Featuring an innovative performance hybrid architecture, the new high-end embedded processors, formerly codenamed Alder Lake, are more than just another next generation of Intel Core i9, i7, i5, or i3 processors. So, what makes the difference? And what can engineers of next-generation high-end embedded and edge appliances expect from these new modules?
It is not just the usual performance boost that makes the new modules with 12th generation Intel Core processor technology attractive. Most impressive is the fact that engineers can now leverage up to 14 cores/20 threads on mobile (ball grid array, or BGA, mounted) and up to 16 cores/24 threads on desktop (land grid array, or LGA, mounted) processor variants. This truly offers a quantum leap in multitasking, parallel processing, virtualization, and scalability options when compared to previous COM Express Type 6 and COM-HPC Client size A modules with 11th Gen Intel Core and Xeon processors, which featured only up to eight cores. But doubled core count does not mean that engineers will get twice as much performance, as the hybrid architecture offers the same high-performance core count—the so-called performance cores (P-cores). However, these are supplemented by low-power cores, the so-called efficient cores or E-cores for short. This hybrid architecture provides many advantages for various embedded application areas.
WORKSTATION-CLASS INDUSTRIAL-GRADE EQUIPMENT
Since this new embedded processor generation is, as usual, a derivate from standard business IT processors, workstation-class industrial-grade equipment is the first application area to benefit. Such equipment is required in a variety of different vertical markets, from medical backend systems for image processing in ultrasound devices to professional broadcasting and entertainment equipment for video and sound processing. Similar ruggedness and performance demands come from stationary smart video surveillance systems for public safety in cities, railway stations and on motor highways, or from equipment used in control room systems where workloads are typically high and various heterogeneous tasks must run in parallel. Some of these tasks have to be executed on performance cores but many others can run on the low-power E-cores, which increases overall system efficiency.
EDGE APPLIANCES AND MOBILE MACHINES
Beyond these industrial, standard IT-like applications, there are many other areas that can benefit from the new performance balancing options offered by Intel’s new hybrid architecture with massive core count. Good examples with even higher hybrid demands are the mega-trending edge computers and IoT gateways used in autonomous logistics vehicles, mobile machinery for agriculture, or construction and commercial machines. Other major markets are smart factory and process automation applications, including AI-based quality inspection, and collaborative robotics incorporating industrial vision and real-time control. All these systems integrate multiple virtual machines for tasks such as AI-based vision, real-time control and secure connectivity that must run in parallel. However, not all these tasks require highest performance. That is exactly why hardware-based—and thus most efficient—performance balancing of threads for optimum resource allocation is one of the game changers that this next generation of Computer-on-Modules offers.
REAL-TIME APPLICATIONS INCLUDING TSN
The Intel Thread Director natively delivers the thread performance balancing intelligence so that engineers don’t necessarily need to manually allocate threads to certain cores. They can use the embedded logic of this Intel technology, which instructs the operating systems’ schedulers every 30 ms to allocate threads to the most suitable cores for highest efficiency. This results in significant improvements for real-life applications that comb integer, vector and AI tasks plus background operations in parallel. In real-time applications, the allocation of resources is particularly challenging because the OS needs to be able to deactivate the core allocation to enable deterministic calculation and response times. This is where hypervisor technology, such as the RTS Hypervisor from Real-Time Systems, comes into play (Figure 1). Designed to allocate dedicated resources to real-time applications, suitable solutions can quickly and easily be implemented with hypervisor technology—but in-depth understanding of the behavior of the new processor cores is a must. Therefore, embedded computing platforms that are prequalified for such technologies are the perfect starting point as they simplify the qualification of the new x86 hybrid architecture. With virtualization, real-time machines can run on all 8 E-cores, leaving the P-cores to handle performance-hungry secondary tasks such as AI-driven awareness on the basis of embedded vision technologies. A single processor can then power entire automotive manufacturing cells with various situational awareness robots (aka cobots)—including hard real-time—and with Intel TCC and TSN support, not only for the processor itself but also natively over standard Ethernet wiring.
WORKLOAD CONSOLIDATION ON ONE SINGLE PLATFORM
Once you start to think about such a “one for all tasks” concept of processor utilization, the idea of system consolidation for heterogeneous jobs is not far behind. Previously separate manufacturing machine computers for the HMI, different controllers, IoT gateways and Industry 4.0 implementations can all reside on a single platform, too—with all the cost savings and reliability improvements this entails. And this can be done with faster and more energy-efficient thread execution than ever before, all automatically managed by the Intel Thread Director—except for real-time applications.
Besides the hybrid core setup, the new Intel Core processors—available in a soldered BGA (Table 1) and a socketed LGA (Table 2) variant—provide many additional advantages. The mobile BGA processors, for example, come with up to 96 execution units of the integrated Intel Iris Xe GPU and are estimated to deliver extraordinary improvements of up to 129% in graphics performance as compared to the 11th Gen Intel Core processors. This not only enables a truly immersive user experience but also speeds the processing of highly parallelized workloads such as AI algorithms.
FAST AND HIGH THROUGHPUT
Optimized for highest embedded client performance, the graphics of the LGA processor-based modules now deliver up to 94% faster performance. Another stunning fact, which is of particular relevance for vision systems with AI, is that the image classification inference performance has nearly tripled with up to 181% higher throughput. In addition, the modules offer massive bandwidth to connect discrete GPUs for maximum graphics and GPGPU-based AI performance. Compared to the BGA versions, these and all other peripherals benefit from once more doubled lane speed as they come with ultra-fast PCIe 5.0 interface technology in addition to PCIe 4.0 off the processor. Furthermore, the desktop chipsets provide up to 8x PCIe 3.0 lanes for additional connectivity, while the mobile BGA variants offer up to 16x PCIe 4.0 lanes off the CPU and up to 8 PCIe 3.0 lanes off the chipset. Besides these massive performance increases, there is support for up to 128 GByte system memory so that designers of next-gen edge workstations consolidating many different machines on one single processor won’t ever have to struggle with memory bottlenecks again.
AI, AI, AI
Last but not least, the new Intel Core generation impresses with dedicated AI engines that support Windows ML, Intel Distribution of OpenVINO toolkit and Chrome Cross ML. The different AI workloads can seamlessly be delegated to the P-cores, E-cores and the GPU execution units to process even the most intensive edge AI tasks. The built-in Intel Deep Learning boost technology leverages different cores via Vector Neural Network Instructions (VNNI), and the integrated graphics support AI-accelerated DP4a GPU instructions that can be scaled to dedicated GPUs. What is more, Intel’s lowest-power built-in AI accelerator, the Intel Gaussian & Neural Accelerator 3.0 (Intel GNA 3.0), enables dynamic noise suppression and speech recognition and can even respond to wake-up voice commands while the processor is in a low-power state. Combining these features with support for Real-Time Systems’ hypervisor technology, as well as OS support for Real-Time Linux and Wind River VxWorks, yields a truly rounded ecosystem package to facilitate and accelerate the development of edge computing applications.
COMPUTER-ON-MODULES IN DIFFERENT FLAVORS
A new generation of Computer-on-Modules from embedded computing vendors such as congatec makes the high bandwidth and performance of the new Intel Core processors available in different flavors (Figure 2). Designers can choose from the new flagship COM-HPC Client module standard, which is designed for highest bandwidth, or the well established COM Express Type 6 standard. As both standards are hosted by the PICMG, OEMs will find the same reliability and community support for either form factor. So, which one should they choose? For existing applications, it is recommended to use COM Express Type 6. New designs, especially those where it is foreseeable that bandwidth and performance demands will increase even further, are best realized with COM-HPC as it is expected to become the predominant successor of COM Express within the next five to 10 years. However, in this context it is good to know that COM Express designs will be supported way longer; after all, even predecessor ETX is still up and running and purchasable today, 23 years after its introduction at the end of the last century. So, designers don’t need to worry about picking the wrong form factor as the embedded computing industry is as stable as the business of its industrial OEM customers. They invariably need long-term availability, which is in fact an inherent feature of standardized Computer-on-Modules as the standardization of feature sets and footprints enables engineers to use the next generation of processors on the same carrier board without any NRE for the PCB design. Another benefit is the high scalability for product families—as one look at what the two different LGA and BGA versions of the new Intel Core generation offer proves.
congatec is a rapidly growing technology company focusing on embedded and edge computing products and services. The high-performance computer modules are used in a wide range of applications and devices in industrial automation, medical technology, transportation, telecommunications and many other verticals. Backed by controlling shareholder DBAG Fund VIII, a German midmarket fund focusing on growing industrial businesses, congatec has the financing and M&A experience to take advantage of these expanding market opportunities. congatec is the global market leader in the computer-on-modules segment with an excellent customer base from start-ups to international blue chip companies. Founded in 2004 and headquartered in Deggendorf, Germany, the company reached sales of $127.5 million US in 2020. More information is available on its website at www.congatec.com or via LinkedIn, Twitter, and YouTube.
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • OCTOBER 2022 #387 – Get a PDF of the issue