Problem 1: A certain CPU has three blocks of combinatorial logic that are used in each instrucion cycle, and these have delays of 7 ns, 3 ns, and 10 ns, respectively. A register in this technology has a total of 2 ns of setup time and propagation delay. What is the throughput of this CPU if no pipelining is used?
Answer 1: The minimum clock period in a non-pipelined implementation is defined by the propagation delay of the logic, plus the overhead (setup time plus propagation delay) of one register. For this example, these times add up to 7 + 3 + 10 + 2 = 22 ns, which corresponds to a clock frequency of 45.45 MHz. One instruction is completed per clock cycle, so the throughput is 45.45 MIPS.
Problem 2: Instruction latency is the time required for one instruction to complete. What is the latency for this CPU?
Answer 2: The instruction latency for the non-piplined CPU is the same as the clock period, 22 ns.
Problem 3: Pipelining is the process of inserting additional registers into a sequence of combinatorial blocks so that different blocks can be processing different instructions at the same time. What is the maximum clock frequency if we introduce pipelining to this CPU, and how many pipeline registers are required?
Answer 3: The longest single stage of combinatorial logic requires 10 ns, so this puts the lower limit on the clock period of 10 ns + 2 ns (register overhead) = 12 ns, which is a clock frequency of 83.33 MHz. Since the total of the other two combinatorial blocks also happens to add up to 10 ns, only one pipeline register is required.
Problem 4: What is the throughput and instruction latency for the pipelined CPU?
Answer 4: An instruction is still completed on every clock cycle, so now the throughput is 83.33 MIPS, for a speedup of 1.833×. However, each instruction now requires two clock cycles to complete, so the instruction latency is 2 × 12 ns = 24 ns, slightly (9.1%) longer than the non-pipelined CPU.
Contributor: David Tweed