The Future of Network-on-Chip (NoC) Architectures
Adding multiple processing cores on the same chip has become the de facto design choice as we continue extracting increasing performance per watt from our chips. Chips powering smartphones and laptops comprise four to eight cores. Those powering servers comprise tens of cores. And those in supercomputers have hundreds of cores. As transistor sizes decrease, the number of cores on-chip that can fit in the same area continues to increase, providing more processing capability each generation. But to use this capability, the interconnect fabric connecting the cores is of paramount importance to enable sharing or distributing data. It must provide low latency (for high-quality user experience), high throughput (to maintain a rate of output), and low power (so the chip doesn't overheat).
Ideally, each core should have a dedicated connection to a core with which it's intended to communicate. However, having dedicated point-to-point wires between all cores wouldn't be feasible due to area, power, and wire layout constraints. Instead, for scalability, cores are connected by a shared network-on-chip (NoC). For small core counts (eight to 16), NoCs are simple buses, rings, or crossbars. However, these topologies aren't too scalable: buses require a centralized arbiter and offer limited bandwidth; rings perform distributed arbitration but the maximum latency increases linearly with the number of cores; and crossbars offer tremendous bandwidth but are area and power limited. For large core counts, meshes are the most scalable. A mesh is formed by laying out a grid of wires and adding routers at the intersections which decide which message gets to use each wire segment each cycle, thus transmitting messages hop by hop. Each router has four ports (one in each direction) and one or more ports connecting to a core. Optimized mesh NoCs today take one to two cycles at every hop.
Today's commercial many-core chips are fairly homogeneous, and thus the NoCs within them are also homogen