The Microprogram Control Unit
In this two-part series, Wolfgang takes a deep dive into the concepts and design approaches in microprogramming. In Part 1, he looks at the basics of implementing a microprogram control unit on an FPGA and how it contrasts with a purchased MCU or IP core.
It’s always a good idea to have a well-stocked toolbox and box of tricks and not just a single hammer. With that in mind, this article is intended to fill up the toolbox for solving application problems by circuit design and programming.
Our topic may be of practical use if pursuing your project demands your own design activities. If a readily available piece of hardware—like, for example, a microcontroller (MCU)—fits perfectly into your application environment (as shown in Figure 1a), you will certainly not bother designing your own circuits. If not, you will choose the principal solution according to the requirements of function, speed and cost. Sometimes, it may suffice to complement the MCU by some application-specific circuitry (Figure 1b). More demanding requirements may be fulfilled by FPGAs and hardware-software co-design (Figure 1c).
Here, the idea of microprogramming comes into play. It could be another design alternative worth considering. Microprogramming is a blend of hardware and software, with which you can often combine advantages and mitigate disadvantages. Microprogrammed control units are easier to design than complex application-specific circuits. Microprograms consist of microinstructions that are adapted to the functional units to be controlled.
A microprogrammed control unit cannot compute. It can only energize control signals, sense conditions and branch accordingly. If the application problem needs to be solved by control activities—in other words, by emulating some kind of state machine—the microprogram control unit will clearly be the superior choice over the ubiquitous general-purpose processor (GPP). It requires fewer resources (silicon real estate, logic cells) than an otherwise comparable RISC core of a GPP and reacts faster.
A microprogram control unit thoroughly designed for performance can excite all control signals at once, evaluate all conditions simultaneously and branch to one of multiple targets. Subroutine calls and switching between microprograms (as an alternative to the usual interrupt) can be implemented so that additional machine cycles are not required—in other words, without overhead). Replacing machine programs with microprograms may yield a speedup of around 2:1 up to 40:1, depending on the kind of application problem. Therefore, it should be worthwhile to revive these principles and to implement them with state-of-the-art technologies. It makes sense to begin by designing such control units as soft cores.
In this article we will discuss the rationale of our undertaking and give an introduction to the principles of operation. In Part 2, we will present proposals for small microprogrammed machines as well as an outlook on principles well-suited for more advanced projects. More details may be found in the accompanying material provided on Circuit Cellar’s article materials webpage.
TWO FLAVORS OF PROGRAMMABILITY
It goes without saying that today’s engineers prefer programmable hardware. One key reason is that they can alter and update their solutions again and again without cutting traces, soldering wires or even designing new PCBs. Any MCU may serve as an example. You write a program, compile it and load it into the target machine. If errors appear, you will change the program and try again.
Nowadays, however, hardware design must not necessarily be a totally different art, associated with PCBs, components, schematics, timing diagrams and even soldering irons. CPLDs and FPGAs are programmable too. With all that in mind, we can speak of two flavors of programmability. Design changes can be made in the application environment, in other words by the customers. Integrated development environments support circuit design based on a behavioral description, looking almost like a common program. The languages Verilog and C, VHDL and Ada, and the like are not that far apart. Thus, programmers must not shun developing their own hardware. A piece of hardware as a functional unit is nothing more than a subroutine or program module. Synthesizing the circuits is left over to the development system. To make this approach viable, however, we need an underlying architecture, a hardware platform. Usually, it is based on industry-standard RISC processors. The alternative to be discussed here are microprogram control units.
Basically, programmability is about mastering complexity and keeping cost low. The complexity of the application solution lies in the program, which is only memory content. Most of the design errors are not hardware design flaws but programming bugs. Most changes are not circuit modifications but program patches. Once the hardware is up and running, you may hack and tinker until your solution works reasonably. If everything is soft, you can make changes at the customers as long as they accept that.
Our second flavor, the programmable CPLD or FPGA, however, has a severe drawback. The circuitry on those devices, while designed so easily, is then synthesized to program an integrated circuit (IC). Here, highly complex Boolean algorithms come into play. The synthesis could run in a short amount of time. The runtime and storage requirements could, however, also grow beyond acceptable limits. The behavioral description is easy to write, takes time to synthesize and is difficult to debug. You may alter a FOR statement or a BRANCH instruction in a snap, but it is in no way as easy to insert a NAND gate into an FPGA design.
Experience shows that programming in the spirit of our first flavor can solve very complex problems. It has been experimentally proven that one can make it to the moon and back this way, with far less than 100KB of storage capacity. The memory content of the stored-program machine results from the compilation of source programs created by natural intelligence, experience and cunning. In contrast, the programming data of CPLDs or FPGAs result from Boolean synthesis.
Boolean algorithms have their peculiarities, affecting the runtime of the synthesis, the consumption of resources (cells, connection paths and the like) and the clock frequency with which the circuit can be operated. Small changes could cause that the synthesis takes too long, that the circuit can no longer be operated with the intended clock frequency, or that it draws too much current. It may even occur that your design may no longer fit into the selected IC.
With all that in mind, circuit synthesis is a critical part of the development process. So, it makes sense to look for ways to avoid it or limit it to circuits of manageable complexity. This is the rationale behind our reviving well-proven principles of microprogramming. The intention is to bring our first flavor of programmability nearer to the hardware, especially at the register transfer level (RTL).
ORIGINS OF MICROPROGRAMMING
The principle of microprogramming has been developed to simplify the design of the control sections of general-purpose processors (Figure 2). In every machine cycle, the control section energizes control signals and senses condition or state signals, thus controlling the steps of instruction execution (Figure 3). A legacy control unit is some kind of a state machine, built from gates and flip-flops. To get an idea of how such a control unit works, imagine counters or shift registers stepping through the machine cycles, and combinational circuits energizing control signals depending on the instruction step, the opcode, and the conditions fed back from the units to be controlled.
The decisive idea of microprogramming is to generate the control signals not by flip-flops and gates, but by bit patterns stored in a read-only memory. Additional stored bit patterns control the addressing of this memory by selecting conditions and providing address bits (Figure 4). This way the sequence of control signals stepping the machine through instruction execution becomes programmable. A microprogrammable control unit may be seen as some kind of computer, as a small, simple computer inside of a bigger and more complex one.
The decoder 1 addresses the read-only memory 2, which is made up of two matrices: A and B. Think of diode matrices, for example. Matrix A contains the bit patterns of the control signals and matrix B the subsequent addresses. If an instruction is to be executed, the operation code is first applied to decoder 1. Therefore, the opcode addresses the first microinstruction. When the clock pulse of the machine cycle passes through the decoder 1, it energizes the associated row of the read-only memory 2.
Matrix A thus supplies the control signals. The address of the next microinstruction, which is fed back to decoder 1, comes from matrix B. It is possible to have conditions act upon the address generation. Figure 4 shows an example of this. Here, two rows of matrix B are arranged downstream of a row of matrix A via a selection stage 3. This way, two alternative follow-up addresses are stored. In the example, the next microinstruction address depends on the sign flip-flop of the accumulator. If the sign is positive, the next microinstruction is read from the first of the two following addresses. If the sign is negative, it’s read from the second.
The typical microprogram control unit comprises a microprogram memory or control storage, a microinstruction register or Control Storage Data Register (CSDR), a microinstruction address register or Control Storage Address Register (CSAR), and circuits for microinstruction addressing and sequence control. The microinstructions to be executed are loaded out of the control storage into the microinstruction register. This register acts directly upon the circuits to be controlled. Figure 5 shows a principal block diagram.
A note about technical terms: Microprogramming is a venerable principle, invented and implemented decades ago. The details and the terms designating them were mostly confined to the particular manufacturers. I have chosen to adopt mainly the terms coined around IBM’s legendary System /360 (for example, take a look at reference  and references  through ). So, we speak of CSDR and CSAR, of emit-fields, break-ins, staticizers (stats) and so on. In no way, however, is it intended to revive the nitty-gritty details. To the engineering work of the past, we should devote our respect, but we should concentrate on 16nm (or even smaller geometry) FPGAs.
COMPUTER IN THE COMPUTER
The microprogram control unit can be thought of as a simple computer within a more complex computer (Figure 6). It is primarily intended to control the execution of the instructions. Every machine instruction is controlled by a microprogram. There are also microprograms for fetching the next instruction, handling interrupts or machine checks, and so on.
We want to implement the basic idea of the simple computer with contemporary means. Such a computer can be fit in a more complicated machine as a control section. But it also makes sense to use it from the outset as a platform to solve application problems. As an autonomous platform, such a machine is faster and requires fewer resources. As a control section, it offers the advantages that have been known for decades. Above all, it makes the complexity manageable.
Concerning GPPs, microprogramming is no enticing selling point. Aside from hobbyist, educational or research endeavors, surely no one would design such machines as counterparts to the well-known processors. I imagine, however, opportunities would exist in FPGA-based design, especially on both extreme ends of cost and performance.
When the application requires only a somewhat more advanced sequencer controlling application-specific circuitry, then a microprogrammable branch sequencer or algorithmic state machine (ASM) could be a viable alternative to an MCU, an industry-standard soft core or a finite state machine (FSM)—the latter being designed by behavioral description. Together with the application-specific circuits, it could fit in a small, low-cost FPGA. In contrast to a synthesized FSM offering the same advantage, it is programmable by natural intelligence instead by Boolean synthesis, making debugging, changing and adapting to new requirements easier.
At the other end of the spectrum, FPGAs contain numerous complex application-specific functional units. In order to initialize them, to supply them with parameters, to coordinate their operation, to take care of the administrative work and to provide the comfort that one expects nowadays, everything is connected to industry-standard processor cores. This is the well-known state of the art. We, however, could consider complementing or even replacing the ready-made processor core with a high-performance microprogram control unit. What is to be controlled is attached directly, thus it can be operated upon and queried without any particular overhead.
For example, if a branch is to be taken according to a condition occurring in the interior of a particular functional unit, the corresponding signal levels have to read into the processor core, where they will be evaluated. Then the branch will be taken accordingly. In contrast, a microprogram could sense the condition immediately and branch in the same machine cycle.
The microprogram control unit acts as some kind of conductor of the application-specific functional units (Figure 7). The application circuits are not overly complex, so they can be designed by describing their behavior. Their interaction is organized by the microprogram control unit, which could be, if necessary, implemented as a hard IP core. Our inner computer should be straightforward in its principles of operation, but dimensioned generously. For example, should we need a 27-bit counter, we simply implement one. Today, there is no need to piffle around with each and every bit and flip-flop.
In particular, the machine consists of application-specific functional units. Nevertheless, there may be functions being more complicated. Then some of the complexity could be handed over to microprogramming, leaving less complex circuitry to be designed. This could be done easily by describing the desired behavior. It is not hardware design in the proper sense but programming in a high-level language. The functional units are some kinds of subroutines or program modules cast in silicon. Advanced development systems can synthesize functional units separately and generate netlists to interconnect them on the FPGA. So, our machine will be fitted into the FPGA in a comparatively short time. With function units properly designed and simulated, most of the ECOs (Engineering Change Orders) should concern the microprogram and the netlists, thereby avoiding the time-consuming Boolean synthesis.
ALL ABOUT MICROINSTRUCTIONS
Microinstruction formats: A micro-instruction contains bits and bitfields, energizing control signals and selecting the next microinstruction (Figure 8a). In addition, the microinstruction can contain immediate values that are used as addresses, constants and the like (emit-fields). The basic microinstruction formats are described using the illustrative terms “horizontal” and “vertical” (Figure 8b, Figure 8c).
Horizontal microinstructions: A microinstruction format is called horizontal if all the control activities the machine can carry out in one cycle are embraced in a single microinstruction (Figure 8b). Such formats result when all control bits, bitfields, immediate values, addresses, and so on are strung together. Horizontal microinstructions can be more than one hundred bits long. For examples, refer to references  through .
Vertical microinstructions: The individual microinstruction is shorter (for example, from 12 bits to somewhat more than 32 bits) and can encode only a few control activities (Figure 8c). There are microinstructions for data path control, for arithmetical and logical operations, and for sensing conditions and branching. The extreme implementation is as follows: each microinstruction controls only one activity, like moving data, fetching operands, performing arithmetic operations, storing results or branching conditionally. Such microinstructions are similar to the machine instructions of simple MCUs and RISC processors. Theory has shown that two types of microinstructions are sufficient—operation microinstructions having only one successor and branch microinstructions that select one of two successors (Figure 9). References  through  illustrate a somewhat more advanced vintage machine (the IBM System 360 Model 25) featuring vertical 16-bit microinstructions.
Diagonal microinstructions: This is just a “pun” denoting formats of medium length (for example, between 16 bits to 48 bits). Basically, they are shortened horizontal microinstructions with control bits, bitfields, addresses, emit-fields and so on. Each format is dedicated to specific functions, like control operations, arithmetical and logical operations or branching (Figure 10).
Horizontal or vertical? The format design should correspond to the functional units to be controlled. A microinstruction format should be as useful as possible—a collection of control activities the hardware can execute in one machine cycle.
The problem has two principal aspects. The first is the capabilities of the circuitry to be controlled. A fully horizontal design is only worthwhile if all the activities encoded in the microinstruction can actually take place in a single machine cycle. The second concerns the compromise between word length and decoding. If the word is longer, then decoding will need less circuitry. Decode time is also shorter. In the most extreme implementation, the microinstruction would contain only independent control bits with no decoding at all. If you choose a shorter word length, the microinstructions are to be encoded more densely. Consequently, decoding will require more circuitry, and decode time will be longer.
Basics of microinstruction addressing: When designing a microprogram control unit, the most important decisions are those concerned with microinstruction addressing. Providing microinstruction bits and bitfields for control signals is comparatively easy. The addressing principles and provisions, however, decide how fast the control unit can react on conditions.
The microinstruction is addressed in the control store and loaded into the microinstruction register CSDR. The next microinstruction may be addressed by incrementing the address, by loading an address, or by feeding in condition signals (multiway or functional branching). The address obtained this way is loaded into the microinstruction address register CSAR. Only now can the control store access begin fetching the following microinstruction. The microinstruction cycle described here requires two clocks—one to load the microinstruction into the CSDR and one to load the microinstruction address into the CSAR. Clocking is a somewhat intricate problem. It should already be well thought out before beginning a project.
Addressing the next consecutive microinstruction: When the microinstruction has only one successor, its address could be generated by incrementing (Figure 11) or by loading it as an immediate value (Figure 12). When incrementing, consecutive microinstructions will be read from consecutive addresses. The microinstruction address register (CSAR) acts as an address counter. Thus, the CSAR is similar to the instruction counter (IC) of a run-of-the-mill processor.
ALL ABOUT BRANCHING
Conditional branching: The key to the flexibility, and sometimes superiority, of microprogramming is conditional branching. Branching means selecting the successor of the current microinstruction out of two or more alternatives. The most straightforward kind of conditional branching is choosing one of two successors based on a selected condition. Typical general-purpose processors can select one of only a few conditions, like the bits in a flag register or a condition code. A microprogram control unit, however, could choose between all of the conditions occurring in the functional units and in the application environment. In our figures, the selection of conditions has been outlined by multiplexer or data selector symbols. Notice, however, that these illustrations are simplified, and condition signals are to be synchronized before they may affect microinstruction addressing.
Sources of the branch address: The most common solution is a branch address field in the microinstruction. When branching is combined with incrementing the address, only a single address field is required. When the address is always part of the microinstruction, there are two principal solutions to provide alternative addresses, two address fields or one address field and provisions to feed conditions into particular address bits. Additionally, branch addresses could be register contents (indirect branching). Occasionally, some branch addresses are even hard-wired.
Two addresses in the microinstruction require many bits. The basic alternative: The microinstructions, which are to be executed alternatively, are placed next to one another in the control storage. The selection is made by feeding condition signals directly into the CSAR. The most straightforward principle is making the condition signal the least significant address bit (Figure 13). Accordingly, both microinstructions follow one another in the control storage.
Multiway branching: If several condition signals are fed into address bit positions (Figure 14), one of more than two branch targets may be reached. Two or more (generally: n) condition signals included in the address of the next microinstruction yield a multiway branch in 2n directions, thus replacing several individual branches.
Functional branching: Signals from the circuits to be controlled are fed into address bit positions. Functional branching can be used to decode bit patterns. For example, the microprogram’s main control loop fetches the instruction to be executed into the instruction register and inserts its operation code into the microinstruction address, thus branching to the microprogram controlling the execution of this instruction.
We think particularly of humble projects, for example, of a small FPGA accommodating all the digital circuitry instead of an MCU connected to a CPLD or surrounded by discrete ICs. In MCU projects of this scale (think, for example, of the tiny Microchip PIC MCUs, AVR MCUs and the like), we would often prefer assembler programming, especially if stringent cost and timing requirements are to comply with. Microprograms can be similarly written.
Each halfway sophisticated macro-assembler may be turned into a microprogram assembler, where the microinstructions are entered as macros. Concerning more advanced projects, however, support of high-level language programming becomes indispensable. Here we must leave this task as a challenge to ambitious programmers. It goes without saying that people in search of a demanding topic for an academic thesis are also invited.
Let’s summarize what makes a microprogram control unit superior. The principles of operation, and hence the circuitry, are straightforward. Short signal paths allow for short clock cycles. All control signals can be energized at once, all condition signals queried immediately. In contrast to a purchased MCU or IP core, speed is not constrained by architectural limits, like the width of the I/O ports or the need to use I/O instructions to connect to the application environment and the outside world. Immediate values, like constants and addresses, could be stored and delivered as many bits wide as required.
To put it all in a nutshell, microprogramming brings programming in the proper sense (that is, based not on Boolean synthesis, but on natural intelligence and tinkering) more closely to the register-transfer level, making debugging, altering and updating much easier. State-of-the-art development systems enable users to implement application-specific machines on FPGAs. For a great part, the development of such application-specific hardware is nearly similar to programming in a high-level language. Thus, the prerequisites are on place to make good use of this well-proven technique.
References  through 
Our list begins with the pioneering paper of Maurice V. Wilkes. Then we mention two vintage textbooks. Typical of the literature of the past is that it is concerned with details of the technology of the time. College textbooks address the topic merely superficially. The most authoritative and inspiring sources are the handbooks and manuals of machines really built.  to  are a small selection, here restricted to vintage mainframe machines, above all IBM’s S/360. Devouring such sources, you may skip the machine-specific details and concentrate on the principles. In , I have tried to cover the subject comprehensively, of course with state-of-the-art implementations and applications in mind. Wilkes, Maurice V.: The Best way to Design an Automatic Calculation Machine. Report of Manchester University Computer Inaugural Conference, July, 1951, p. 16–18.
 Wilkes, Maurice V.; Stringer, J. B.: Microprogramming and the Design of the Control Circuits in an Electronic Digital Computer. Proceedings Cambridge Philosophical Society, Vo. 49, No. 2, 1953, p. 230–238.
 Wilkes, Maurice V.: The Growth of Interest in Microprogramming: A Literature Survey. Computing Surveys, Vol. 1, No. 3, September 1969, p. 139–145.
 Husson, Samir S.: Microprogramming. Principles and Practices. Prentice-Hall, 1970.
 Agrawala, Ashok; Rauscher, Tomlinson G.: Foundations of Microprogramming: Architecture, Software, and Applications. Academic Press, 1975.
 Matthes, Wolfgang: Mikroprogrammierung. Prinzipien, Architekturen, Maschinen. ISBN 978-3-8325-5234-3. Logos, 2021.
 Matthes, Wolfgang: Resource Algebra and the Future of FPGA Technology. Circuit Cellar, Issue 317, December 2016, p. 18-27.
 PICmicro Mid-Range MCU Family. Microchip Technology Inc., 1997.
 PIC16(L)F1508/9 20-Pin Flash, 8-Bit Microcontrollers with XLP Technology. Microchip Technology Inc., 2011–2014.
 PIC17C7XX High Performance 8-bit CMOS EPROM Microcontrollers with 10-bit A/D. Microchip Technology Inc., 1998–2013.
 PIC18(L)F67K40 64-Pin, Low Power, High Performance Microcontrollers with XLP Technology. Microchip Technology Inc., 2016–2017.
Synchronization and metastability:
 Johnson, Howard; Graham, Martin: High-Speed Digital Design. A Handbook of Black Magic. Prentice-Hall, 1993.
 Dally, William J.; Poulton, John W.: Digital Systems Engineering. Cambridge University Press, 1998.
 Becke, Georg; Haseloff, Eilhard: Das TTL-Kochbuch. Digitaler Schaltungsentwurf in Theorie und Praxis. Texas Instruments, 1996.
 Metastable Response in 5-V Logic Circuits. SDYA006. Texas Instrumens, 1997.
 Alfke, Peter; Philkofsky, Brian: Metastable Recovery. XAPP094. Xilinx, 1997.
 Metastability in Altera Devices. AN-042-04. Altera, 1999.
 An Introduction to Microprogramming. IBM Corporation, 1971.
 IBM System/360 Model 25 Functional Characteristics. IBM Corporation, 1972.
 IBM System/360 Model 25 Microprogram Listing System/360 Emulator. IBM Field Engineering Education Supplementary Course Material. IBM Corporation, 1970.
 2025 Processing Unit. IBM Field Engineering Education Student Self-Study Course. IBM Corporation, 1969.
 2025 Processing Unit. IBM Field Engineering Theory of Operation. IBM Corporation, 1968.
 2030 Processing Unit IBM Field Engineering Manual of Instruction. IBM Corporation, 1965.
 System /360 Model 30 IBM Field Engineering Handbook. IBM Corporation, n. d.
 System /360 Model 30 2030 Processing Unit. IBM Field Engineering Theory of Operation. IBM Corporation, 1967.
 System /360 Model 40 Functional Units. IBM Field Engineering Manual of Instruction. IBM Corporation, 1970.
 System /360 Model 40 CPU and Channels. IBM Field Engineering Supplementary Course Material. IBM Corporation, 1970.
 System /360 Model 40 2040 Processing Unit. IBM Field Engineering Diagram Manual. IBM Corporation, 1970.
 System /360 Model 40 Comprehensive Introduction. IBM Field Engineering Theory of Operation. IBM Corporation, 1970.
 System /360 Model 40 IBM Field Engineering Handbook. IBM Corporation, n. d.
 System /360 Model 50 Multiplexor Channel Field Engineering Theory of Operation. IBM Corporation, 1966.
 System /360 Model 50 2050 Processing Unit. IBM Field Engineering Diagram Manual. IBM Corporation, 1966.
 Spectra 70 System 7045 Processor EO Flow Charts. RCA Corporation, 1966.
The ultimate archive concerning computer architecture and vintage computers:
– Addendum –
Microprogramming Choices Explained (Part 1): The Microprogram Control Unit, By Wolfgang Matthes
Two flavors of programmability – it’s about how to make changes
Programmability allows for altering and updating again and again, even at the customers. You can have this effect in two ways, by programming microcontrollers or RISC processors or by programming CPLDs or FPGAs.
The latter is a choice even for people not that accustomed to gates and flip-flops. What we are talking about here is programming by describing the behavior. Hardware description languages are similar to familiar programming languages. Some development environments support even synthesizing circuitry out of source programs written in one of the popular high-level languages.
There is, however, an essential difference. To run a usual program, it must be compiled and loaded into the memory. A circuitry, however, has to be synthesized. This process relies on highly complex Boolean algorithms. Therefore, we speak of the two flavors of programmability.
What we want to emphasize here is the problem of making changes.
Figure A1 Two examples of changing minor details. a) software; b) hardware.
Changing a program is straightforward. Enter the statements or instructions, let the machine program be built again, load it and try whether the change shows to be effective or not. Sometimes, it is even feasible to alter the machine instructions residing in the memory. (There was a reason for the operating panels of the vintage computers. Veterans accustomed to the PDP-8 and the like will wistfully remember …)
However, implementing an ECO (Engineering Change Order) in the hardware is much different.
On a vintage PCB, you had to identify available gates or IC sockets. Then traces had to be cut, wires soldered on, and the like. A small ECO would require, say, half an hour if you were allowed to tinker. If it was required to be approved by your superiors and done in a centralized repair facility, the turnaround time was unpredictable.
Within an FPGA, however, you cannot simply change a gate or flip-flop. Each ECO, even the slightest, requires running the Boolean synthesis once again. When the hardware is implemented with FPGAs, it can be modified over and over.It is tempting to write the solution of the application problem as a program and to leave it to the FPGA development system to synthesize the hardware. Here, however, the intricacies of the circuit synthesis can become noticeable, especially concerning the depth of the combinational circuitry and the associated clock slowdown, not to speak of the turnaround time.
Figure A2 An application problem has been solved by hardware-software co-design. The engineer has written a program; the development system has synthesized a circuit to program an FPGA.
Figure A3 Now, the program has been changed. Therefore, the circuit must be synthesized again. However, this can yield deeper combinational networks. Consequently, the cycle time has to be increased accordingly.
As a remedy, we try to rely less on the Boolean synthesis and to increase the share of usual programming, in other words, solving the problems by natural intelligence and cunning.
We see microprogramming as a fundamental principle to achieve this objective. This way, we may build simple hardware platforms and solve the application tasks mainly by usual programming. (For a principal alternative, I refer to my CC article  and my web page realcomputerarchitecture.com.)
It is comparatively easy to design and debug such circuitry. The complexity of the solution of the application problem is not in the hardware but a memory content. Microprogramming brings usual programming down to the register transfer level. Thus we may be able to eliminate most design flaws and bring in most updates by programming instead of Boolean synthesis.
Opportunities where microprogramming could step in
We expect opportunities on both ends of the performance and complexity spectrum.
The low end
It is about emulating state machines, even somewhat more complex ones. This is, so to speak, a natural domain of the small microcontrollers. Think, for example, about PICs, AVRs, 8051s, and the like.
Problem-solving may be easy. Only select a well-suited device and write a program. Sometimes, however, this will not fly.
In bygone times, the microcontroller I/O ports were plain registers and open-drain or tri-state driver stages. Nowadays, microcontroller manufacturers devote a considerable share of silicon real estate to complex programmable peripheral circuitry where each pin has its own register file.
Nevertheless, occasionally there may be no way to fit the microcontroller’s peripherals to the intricacies of the application environment.
Therefore, you will have to choose a considerably more expensive platform (for example, a 32-bit RISC) or develop some application-specific circuitry.
Sometimes, the perfect IC would be a microcontroller core surrounded by arrays of FPGA-like logic cells, allowing you to design the peripherals yourself. If you need a 19-bit counter, then do not piffle around with 16-bit counter/timer units, interrupts, and so on, but don’t hesitate to simply design one.
Figure A4 How an ideal microcontroller could look. A programmable array of logic cells is the most versatile I/O interface. Application-specific I/O devices are synthesized as required.
Making good use of the principles of microprogramming
This approach could lead to programmable peripherals or I/O processors or even replace the industry-standard microcontroller with a programmable core adapted to the requirements of the application.
Imagine a simple microcontroller core, as shown below. It is a single-address Harvard machine centered around an accumulator (or working register, respectively), somewhat similar to a renowned microcontroller family ( to ).
We may, however, confidently state that the principal idea of an accumulator-based single-address machine is free. It goes back to the pioneering work of John von Neumann and others.
The microinstruction is the generously dimensioned single-address instruction enhanced by additional functions. What we want to stress is that we can all details and dimensions tailor to our needs. Our design is, for example, not confined to 8 bits word length but may be synthesized for 19 bits if adequate to the application task. Furthermore, the microinstructions could be as long as appropriate, combining, for example, arithmetic functions, I/O accesses, and branching.
Figure A5 A small microprogrammable platform. It is a matter of opinion to call it a microcontroller and to liken it to a well-known architecture. Here, however, we emphasize the freedom to tailor the hardware to our needs. Typical examples are the word length, the format of the microinstructions, and application-specific peripherals.
The high end
Large FPGAs are populated by complex application-specific functional units, accelerators, and the like. Conventionally, RISC IP cores provide for initialization, parameter passing, communication, diagnostics, and other housekeeping work.
A microprogram control unit, appropriately and generously dimensioned, would show less overhead in dealing with the functional units, shorter latencies, and so on. It could be a companion to the RISC core or even replace it.
Within a functional unit, the complexity is not in the data paths but in the control section. The more straightforward the principles of operation are, the fewer opportunities may occur to commit design errors. To make the control sections of the functional units as straightforward as possible, the more complex functions may be assigned to the microprogram. It has extremely low latencies and can be tailored to particular requirements. So we can expect the decline of performance to be low, often even negligible. This is illustrated here by an example from the past.
Figure A6 A microprogram control unit acting as some kind of conductor.
Figure A7 An example from the past. The I/O channels of the smaller models of IBM’s system /360 and /370 are no completely independent functional units. Some channel functions have been assigned to the operation section (CPU, ALU) and the microprogram control section (some details may be found in  to ).
Our block diagram shows the basic functional units or sections of such a machine. The architecture provides for autonomous operation of the I/O channels. Once activated, for example, by a Start I(O (SIO) instruction, they transfer data to and from the peripheral devices without programmed intervention. Both read and write data are buffered.
When a read buffer is full or a write buffer is empty, memory accesses are required. The channels control the I/O interface autonomously. Memory accesses, however, are executed via the ALU and are controlled by microprograms.
To empty or fill a buffer is not that simple. The memory address must be supplied and, after the access, incremented, the bytes transferred must be counted, and so on. Addressing the memory, incrementing the address, counting the bytes transferred, and so on is done via the ALU and controlled by microprograms. Thus the channels need no main storage interface, address registers, byte counters, and so on.
To call a supporting microprogram, a particular interrupt mechanism has been implemented. Such microprogram interrupts are called break-ins. They have nothing to do with the interrupts specified in the architecture. In our block diagram, the microprogram control unit has two control storage address registers (CSARs), one for the CPU and one for handling the break-ins (BRK).
The main storage is extended by an auxiliary storage area. There the CPU registers are saved, and the channels’ memory addresses, byte counts, and so on are stored.
If a buffer is to be emptied or filled, the channel issues a break-in request, causing the second CSAR to address the control storage. The break-in microprogram saves the CPU registers, loads the channel address, the byte count, and the channel status, writes or reads the buffer content, does the housekeeping, and swaps the register contents again. Then it resumes CPU operation by switching back to the first CSAR.
The principal idea could be applied to FPGA-based machines, too. The control sections within the functional units are limited to comparatively straightforward sequential control tasks. The more complex control activities are assigned to microprograms called via break-in requests. The break-in mechanism may be a viable design idea to keep functional units not overly complicated. More details will be described in the 2nd article on this subject.
The microinstruction delivers only a part of the address of its successors. The remaining bits are contributed by condition signals.
Figure A8 Inserting immediate values 0 and 1 in the microinstruction address. Two alternatives.
Here a single condition signal is inserted into the lowest-order address bit position. These block diagrams supplement Figure 13 in the printed article.
Besides the conditions, one needs the immediate values 0 and 1 to access the next microinstruction unconditionally. These values can be included in the set of selectable conditions (a), or their insertion can be controlled by a particular field in the microinstruction, as shown in (b) or in Figure 13. A separate encoding has the advantage that the COND SEL field may be freely available if the next microinstruction is to be addressed unconditionally.
Figure A9 Inserting two condition signals into the microinstruction address allows branching in four directions.
Figure A10 An example from the past. Details, for example, in  and .
This diagram depicts 4-way and functional (multiway) branching as implemented in IBM’s S/360 model 50. Four-way branching works as described above. The higher-order address bits may be immediates out of the microinstruction or taken from registers, staticizers, or other condition signals from various parts of the machine.
Inserting bits into an address means that an entire subspace of the address space may be occupied by microinstructions being potential successors. Hence microinstructions cannot be placed simply one behind the other.
When n bits are inserted into the lowest-order (rightmost) address bit positions, one of 2n potential successors may be addressed. To avoid squandering address space, addresses are to be generated selectively.
The principal solution is to split up the total address space into segments, partitions, or the like. The address format depends on the number of potential successors (for example, whether the successor is to be selected within a block of 4, 16, or 64 microinstructions or within the whole address space).
Figure A11 How the address space is split up in segments, zones, and microinstructions.
In our example, the microinstruction address space is divided into 64 segments of 16 zones of four microinstructions, allowing for selectively placing microinstruction blocks of different sizes.
The segment, zone, and microinstruction addresses may be immediate values or put together from various signals and register contents.
Clocking is a fundamental task in circuit design. The microprogram-controlled machines of the past had several clock phases and clock pulses committed to particular functions. They were generated even with monostable multivibrators or delay lines. This cannot be applied to the clock systems of the FPGAs. When logic was built on printed circuit boards, clock pulses could be generated in whatever way was deemed appropriate. You also had the choice between latches and flip-flops. In the FPGA, on the other hand, we have to use what is prefabricated, the logic cells with edge-controlled flip-flops, the clock signal paths, and the clock generation and management. But we can work with extremely high clock frequencies.
Our circuits are operated by clocks running continuously. The registers, counters, and flip-flops are controlled with enable signals (Clock Enable CE, Load Enable LD, and so on). A CLR signal is not an erase pulse but a signal allowing the register to be cleared; an LD signal is not a load pulse but a signal allowing the register to be loaded. What these signals enable or allow, respectively, becomes effective with the next clock edge. Most of our block diagrams do not show the clock signals and the clock inputs of the components.
Figure A12 Single-phase and multi-phase microinstruction cycles.
a) The single-phase microinstruction cycle. A single clock signal is applied to all flip-flops. The clock cycle is the microinstruction cycle. When flip-flops switch with a particular clock edge, all enable signals must be valid before this edge occurs. The successor to the current microinstruction must be selected before loading the microinstruction register CSDR.
b) The multi-phase microinstruction cycle. It makes sense to divide the microinstruction cycle into at least two phases. Then the control signals may be connected to the microinstruction register CSDR. At the beginning of the first phase (P1), the microinstruction register is loaded. Then the microinstruction fields are decoded. The control signals pass through the signal paths and combinational circuits. This way, the address of the next microinstruction has been obtained too. In the second phase (P2), it is loaded into the microinstruction address register CSAR.
Figure A13 The simplest multiphase clock is the two-phase clock, which results when both edges of a single clock signal are used.
It must be possible to branch on conditions. The microprogram control unit is a clock-synchronous state machine. Therefore, condition signals must be synchronized when not generated inside. It is essential when the condition signals deciding about the next microinstruction must be valid. There are three basic alternatives.
Figure A14 When must the condition signals be valid? Three basic alternatives.
a) At the beginning of the microinstruction. The microinstruction evaluates conditions that have been selected before. Fetching the next microinstruction can then begin immediately. Many microprogram control units work this way. Processing or I/O microinstructions select the conditions; branch microinstructions decide about the next microinstruction.
b) In the same microinstruction. The fields of the current microinstruction determine how the conditions are obtained. Therefore, the successor’s address will be available only later in the cycle. If the cycle time is given, the control storage must have a correspondingly short access time. Otherwise, the microinstruction cycle must be longer. Since the microinstruction selects or generates the branch conditions itself, we will often get by, however, with a single microinstruction where otherwise we would have needed two or more microinstructions.
c) At the end of the microinstruction cycle (late branching). When the current cycle begins, the microinstruction causes all successors to be read in parallel. In the meantime, the conditions are queried. Both activities take place at the same time. Calculations, comparisons, querying of conditions, and the like overlap the fetching of the next microinstructions. At the end of the microinstruction cycle, the conditions have become valid too. Accordingly, at the beginning of the new microinstruction cycle, the successor is selected from the microinstructions that have been read in advance.
Synchronization means to sample signals coming from outside, making them fit the timing requirements of a clocked (in other words, synchronous) circuitry ( to ).
Synchronization circuits (synchronizers)
Concerning the I/O ports of the microcontrollers, synchronizers are usually considered self-evident. In many datasheets and manuals, they are not even mentioned and not shown in the block diagrams. When designing circuitry, however, you will have to solve the problem by yourself. (At that, you should heed a particular pitfall: I/O ports have built-in synchronizers, the bus systems of the microprocessors typically do not.)
The most straightforward synchronizer is a D-type flip-flop connected to the asynchronous input signal and a clock.
Figure A15 The most straightforward synchronizer.
Clock and Data – the Setup-Hold Interval
There are two data sheet values relating to an interval in which the data must have settled, the setup time and the hold time.
The setup time is the minimum interval in which the signals must be valid and settled before the clock edge triggers the flip-flop.
The hold time is the minimum interval the signal must be kept valid and settled after the clock edge. To many flip-flops, a hold time of zero is specified. Then the input signal may change once the clock edge has passed the threshold voltage.
The synchronization is a sampling process, the signal being sampled, for example, by the low-to-high edge of the synchronization clock. But what happens when the input signal of a flip-flop changes in the setup-hold interval surrounding a clock edge? Sometimes nothing special will happen; the flip-flop will either change its state or keep the previous one. However, there is a critical time interval within the setup-hold interval. Its width depends on the circuit technology and the structure of the flip-flop (we speak of picoseconds here; approx. 1 to 150 ps are typical).If the input signal changes within this interval, the flip-flop may enter an intermediate state, called the metastable state. In such a state, the flip-flop emits output signals that do not correspond to one of both logic levels.
Figure A16 Output signals of a synchronization flip-flop. a) correct output signal; b), c) output signals typical of metastable states. After a while, the metastable state fades away; the signal then enters one of the two logic levels.
Such metastable states are unavoidable. One can only wait a particular time (settling time), hoping the metastable state has then vanished.
Occasionally, it may happen that it does not vanish. This is deemed a failure. How often will it occur? The relevant characteristic parameter is the mean time between failures (MTBF). If it exceeds the typical lifetime of the hardware considerably or meets the customer’s requirements, then we may be content.
Figure A17 Metastable states will settle if we wait long enough. a) The settling time Δt is obviously too short. b) The settling time is long enough; the downstream flip-flop sees a valid logic level at its input. 1 – invalid, 2 – valid logic levels. The settling time is usually not implemented by a delay line but by a clock cycle.
It is essential to synchronize all signals from outside with separate flip-flops and to provide enough settling time between synchronization and the clocks of the downstream flip-flops. Often, a clock cycle will suffice. FPGA manufacturers mention millions of years MTBF if allowing for a settling time of 5 ns, for example.
Figure A18 A typical synchronization problem in a microprogram control unit.
a)All condition signals are synchronized. Synchronization does not depend on condition selection. SYNC CLK could be the clock pulse at the beginning of the cycle.
b)The selected condition signal is synchronized. Three clock phases are needed to make this circuit work. The first loads the microinstruction into the microinstruction register. The second synchronizes the selected condition. The third causes the microinstruction address to be loaded or incremented (depending on whether the branch is to be taken or not).
Typical design flaws:
Figure A19 Synchronize all the asynchronous signals. Do not connect them unsynchronized to downstream flip-flops.
Figure A20 Synchronize each asynchronous signal with only one flip-flop. Otherwise, SYNC_1 might show a different signal level than SYNC_2. Add driver stages if you need a higher fan-out.
We presuppose that you are familiar with this basic tenet of digital technology and consider it when designing. For the sake of clarity, we have therefore omitted the synchronization circuits in most of the figures.
Pursuing a small project, we cannot expect a fully-fledged compiler. Instead, we must be content with an assembler. Principally, this is a program that converts symbolic names into bit patterns. There are several ways to provide such a development tool, the meta-assembler, the macro- assembler, and the homemade assembler.
Meta-assemblers are designed to generate assemblers for any machine code. You have only to set up the corresponding tables.
Any somewhat advanced macro-assembler can generate any bit pattern from any number of parameters. It is thus possible to define the microinstructions as macros. The parameters of the macros can be numerical values, symbolic addresses (labels), or symbolic identifiers (the latter are to be defined using EQ statements).
A homemade assembler is not that difficult to write. At its core, it’s just a program searching in tables. If the tables can get large, searching should be programmed adequately, for example, by hashing algorithms. However, if the tables are not too large (at most a few thousand entries), they can also be scanned item by item (linear search).
Figure A21 Declaring a microinstruction as a macro. The 6-bit opcode is 2AH. Being the 6 leftmost bits in a byte, it is to be captured as A8H.
In our example, we use a comparatively basic macro-assembler. It is part of the AVR Studio (Atmel / Microchip). a) shows a fictitious microinstruction format, b) the bit pattern of the call example given under d). The microinstruction is 32 bits long. It is captured as a macro called special (c). The contents of the microinstruction fields are the macro parameters @ 0 to @ 5. The 32 bits are split up into two 16-bit words. The bit positions to be inserted are cut out of the transferred parameters (by AND-ing; &), shifted appropriately (<< or >>), and inserted into the respective word (by OR-ing; |). d) shows an example of a call. Instead of the numerical values, symbolic addresses (labels) or identifiers (mnemonics) can be entered too.
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JANUARY 2021 #378 – Get a PDF of the issueSponsor this Article
Wolfgang Matthes has developed peripheral subsystems for mainframe computers and conducted research related to special-purpose and universal computer architectures for more than 20 years. He has also taught Microcontroller Design, Computer Architecture and Electronics (both digital and analog) at the University of Applied Sciences in Dortmund, Germany, since 1992. Wolfgang’s research interests include advanced computer architecture and embedded systems design. He has filed over 50 patent applications and written seven books. (www.realcomputerprojects.dev and