Retro Design Reloaded
Since he was a kid, Scott had thought about building a COSMAC “ELF” microcomputer himself. Now, armed with powerful, free PCB tools, he set out to build the system using available electronics and applying his 35 years of engineering savvy.
In 1976, Popular Electronics published the instructions to make the COSMAC “ELF” microcomputer (Figure 1). It was based on the RCA CDP 1802 microprocessor. As a kid, my dream was to build one to play with. Unlike the Altair or other 8080/8085 devices, the COSMAC was lower-cost CPU, and the project used a wire wrap board instead of a PCB. At that time, PCB design was out of reach for a young teenager.
I’ve long since lost my copy of the magazine, but through the years, the reference manual has survived (Figure 2). I had attempted to create a wire wrap version, based more on the COSMAC manual than the original Popular Electronics article. I still have a collection of wire wrap boards that I never managed to get working the way I had hoped. When I read Todd Wade’s article, “Retro-Computing Takes to the Cloud,” (Circuit Cellar 359, June 2020) [1] it woke those old memories of my early attempt to fabricate this microcomputer.
Now that I’ve grown up, PCBs are no longer out of reach. I already had experience with a few small Eagle PCB projects. However, the larger board I needed to build was out of reach for my hobby Eagle license. I also discovered that Eagle had moved to a subscription-based licensing model, which was not something I wanted to sign up for as a hobbyist. I took the opportunity to learn the newer piece of design software: KiCad, a freeware schematic and PCB layout package. Instructions on using KiCad could comprise a whole separate article, so I won’t go into any detail here. There are plenty of tutorials on the web.
Although most of the chips in the original article are CMOS 4000 series, and many are no longer available, I was able to select alternate parts from the 7400 series. Most of the components are 74HCxx parts, which fit nicely with the COSMAC device.
WHAT IS THE COSMAC?
The heart of the computer is the RCA CDP1802 COSMAC, which was nicknamed the “ELF” for the Popular Electronics article. The COSMAC is an 8-bit data, 16-bit address CPU, which uses von Neumann bus design. It runs on just 5V, unlike the Intel 8080 series, which required a bipolar supply and 12V. Like the 8080A, the CPU clock can run up to 3MHz, though for my project, I used a simple 2MHz oscillator. Each instruction occurs as fetch cycle and execution cycle, with each cycle taking 8 clocks. Therefore, the actual CPU executes commands at a measly 125,000 instructions per second—slower if branches occur. The COSMAC is no long manufactured, but units are available from surplus houses and on eBay. Readers could easily adapt the project to any of the CPUs that used similar designs, such are Intel 8085 or Motorola 6502
My first attempt at the microcomputer was simply to breadboard an entire system, as shown in Figure 3. I did not expect it to be robust, but I wanted to confirm that the CPU still worked, after travelling for many decades on a wire-wrap board in various boxes. I was able to build up a basic system and load it with a simple loop instruction that would toggle an output line. I attached a signal generator to the CPU clock line, so I could clock the device as slow as I wanted and examine machine state changing. This also gave me a chance to put my PicoScope to use, so I could verify my understanding of the CPU.
— ADVERTISMENT—
—Advertise Here—
Next it was time to bring up the KiCad design tool. KiCad is a robust, open-source design package. The keyboard shortcuts are substantially different from Eagle, which took some getting used to. For my first attempt, this was a version 0.1 in my world, so I created a large board and spread out all the components, giving me space to troubleshoot. I also wanted to use through-hole components, so I could socket them and attach test leads from my Pico Technology PicoScope. The board was a basic double-sided board. There were challenges where I had to find ways to accommodate the power supply traces from one end to the other.
When all was done, ordered and stuffed, I found a dozen novice mistakes. These ranged from incorrectly tying to ground the test and blanking inputs for the 74LS47 seven-segment driver when they should have been left floating, to actually having the power rails backward because I designed the footprint with the pins reversed. Fortunately, I did a continuity test on every VCC and ground IC socket before installing any devices. It was time to get the X-Acto knife, and cut some traces, and add some KYNAR wire to repair some incorrect signals.
The first board is shown in Figure 4. It used eight slide switches to program the memory. You set the instruction, and press a spring-loaded toggle to trigger a DMA IN load. There is a row of LEDs on the left, which display the current memory address as the program is manually being loaded. When loading is complete, you reset the CPU to zero-out the registers, and press Run to begin execution from location 0x0000.
For the next version, I fixed the flaws in the original design, and added DMA integration, using a PIC 18F4550 microcontroller (MCU) from Microchip Technology and removed the tedious eight data switches and load switch. Also, about that same time, Robert Lacoste had published Part 1 in a series of articles discussing multilayer PCB design and other PCB design topics (Circuit Cellar 367, February 2021) [2]. The concept of multilayer boards had always intimidated me. However, that article explained it well, and I decided to take my new-found knowledge and put it to the test. The next-generation board would use four layers, simplifying the distribution of power. In this case, reusing the original design, I only had a single flaw—one of the NAND gates did not have its VCC and GND pins tied to the supply lines.
THE COSMAC MACHINE
The internal architecture of the CPU is shown in Figure 5, which I scanned in from my decades-old manual. It has 16 scratchpad registers, each 16 bits wide. There are also a few 4-bit registers, which tell the CPU how to use the scratchpad registers. For example, there is no dedicated program counter or stack pointer. Instead, the program counter is designated by a 4-bit register referred to as “P.” Upon reset, all registers are cleared to zero, meaning that the P register points to the scratchpad register 0, which becomes the program counter. Execution begins at address 0x0000. When an interrupt occurs, the P register is set to 1, which results in the value in scratchpad register 1 becoming the interrupt vector. Therefore, prior to allowing any interrupt to occur, register 1 must be set correctly.
The COSMAC CPU uses multiplexed address lines similar to that used by the Intel 8080 and 8085. However, with the COSMAC, the upper and lower address lines are multiplexed together, rather than multiplexing the upper address bus with the data bus. The upper 8-bits of the 16-bit address are placed on the address bus and latched by a timing pulse A. After that, the lower 8 bits appear, and then the read or write signal assert. The interface from the CPU to the memory is shown in Figure 6.
The original design from the Popular Electronics article used two 4-bit static RAMs (SRAMs) with 8-bit addressing, giving the system whopping 256 bytes of memory. I could easily improve on that with a single 32KB SRAM. And to allow nonvolatile programming, I also added an Atmel (now Microchip Technology) 8KB EEPROM. I could control the boot process with a single switch that uses the MSB address line as the chip select signal. This would bank switch the two memory devices between 0x8000 and 0x0000, as also shown in Figure 6.
THE STATE OF THE MACHINE
The CPU mode is controlled by two input lines: Wait and Clear. The truth table for these signals is shown in Table 1. Four pushbuttons are used with debounce and latching NAND gates to place the CPU in one of the selected modes (Figure 7). The Wait single is also switched to the single-step circuit, described later. A single-digit display is located on the board to show the user the current mode of the CPU. A simple power-on reset circuit places the CPU in reset mode. Note that if the bank selection is set to the SRAM, the SRAM will be invalid after the power-on, so it is necessary to place the CPU into load mode and begin programming it. However, if bank selection is set to the EEPROM, the operator can simply press the Run button to begin operation.
The debounced latches, which select the mode of the CPU. The Schmitt trigger has a simple delay to force the CPU to enter Clear on power-up.
Using a switch selection, located near the mode push buttons, the CPU can be placed in a Wait mode as well. This allows the CPU to be single stepped with another spring-loaded toggle switch. With each toggle, a latch is set to place the CPU into a Run state, and a timing pulse will then clear it and put the CPU back into a Wait state. CPU advances one instruction with every toggle.
— ADVERTISMENT—
—Advertise Here—
In addition to the modes, the CPU has four individual states, which are indicated by two state code signals: SC0 and SC1 (Table 2). The most common states are Fetch and Execute, though S2 is used to acknowledge the DMA signal when programming the mini-computer. A single-digit display is present to show the current state of the CPU as it executes. This allows the user to see the state when clocking the CPU with a slower external source, or even manually. When running at full speed, the indicator changes rapidly; it is really just for entertainment purposes.
I also included a jumper that allows the user to select between the onboard 2MHz clock or an external clock input. When using the external clock driven from a signal generator, the user can control the clock speed manually, all the way down to stopped. The address lines are also buffered and connected to an array of LEDs, allowing the user to watch the memory addressing count up, or branch to new locations. Like the state display, it’s useful when the CPU is being clocked at a low speed. Otherwise, it’s just entertaining.
IMPLEMENTING I/O
The microcomputer doesn’t have much use if there is no way to provide input or output. The CPU solves this with three output selector lines— N0, N1 and N2—that can be trigged by an instruction. There are also four external flag inputs that can be read by instructions.
I didn’t have any specific input requirements in mind when I designed it, and the possibilities are endless, so I solved that simply by punting. I included a 40-pin connector, which allows a user to connect to the data bus, 15 of the 16 address bits, output and input flags. This enables a user to jumper the signals to a breadboard and add any desired I/O.
When an I/O instruction is executed, it either reads or writes the contents between the data bus and a scratchpad register, and activates a value of 0 to 7 on the N lines. I used a 3-to-8 decoder to decode the N bits and latch the data bus when N = 1 (b001). These latches are attached to a pair of seven-segment displays. The first test program I wrote was simply to increment a scratchpad register and output the new value to these displays.
The CPU has four external flag inputs (EF1, EF2, EF3, EF4), which can be tested in code and used for branching. The instructions allow testing a single flag at a time, so attempting to multiplex these flags into 16 different values would require some external latching be performed to ensure the persistence of all four signals. The 40-pin connector also allows connection to the state codes, interrupt line, clock input, 3 N output flags, four external flag inputs, and an interrupt input. This provides ample connections to prototype a variety of interfaces.
LOADING A PROGRAM
As described earlier, initially the process of loading a program was extremely tedious. It was necessary to place the CPU into Load mode, manually set the eight input switches to contain the 8-bit instruction, then trigger the DMA IN line with the toggle switch. The timing diagram for the DMA load is shown in Figure 8. The timing sequence is:
- While the CPU is executing or halted, a DMA line is asserted.
- When ready, the CPU enters state 2, as determined by the SC lines.
- Timing Pulse A latches the upper 8 bits of the address.
- The memory write line is asserted to store the data in RAM.
- Upon completion of the memory write, the CPU continues operations
The timing diagram for DMA access. When the State Code lines are [1,0], the CPU is in DMA mode, and the data is written by the MR pulse.
These events cause the CPU to load the value on the data bus into the address in the current program counter, then advance the program counter one step. If the DMA line continues to be asserted, the process continues to repeat steps 3 and 4, incrementing the address with each loop.
When state 2 is detected, a tristate buffer places the contents of the input switches on the data bus. Once the state code indicates that the CPU is no longer in DMA mode, the buffer is disabled. The program counter for the DMA in or out is always the scratchpad register 0. The address LEDs allow observing the address lines, to ensure the correct address while loading. When loading is complete, it is necessary to press the button and place the CPU into Reset mode 1, which clears all registers to 0. Then Run is pressed to execute.
DMAing FROM A PIC MCU
For the next version of the board, I decided to add something a little more modern. Most people would think of SMD components, but that wasn’t what I had in mind. Turns out, it’s extremely tedious to load the SRAM manually using eight 1-bit switches. And when you make a mistake, it’s even more tedious to start loading the code from address zero again. The next board is shown in Figure 9.
My solution was to integrate a modern PIC 18F4550 MCU that I had lying around. This PIC supports USB, so I could attach it to a PC, and integrate it into the CPU DMA IN loading functionality. Then I sent the program from a PC using the USB connection. The integration of the PIC is shown in Figure 10. The Microchip library supports the USB interface, so the amount of code I had to write for the PIC was minimal. All I needed was a simple program for the PIC to transfer the bytes from its memory to the SRAM or EEPROM on the board. A simple listing of the loop that pushes the bytes to the DMA of the CPU is shown in Listing 1. It places the output value in register B, asserts the DMA signal, and then waits for the CPU to enter the DMA state. Once the DMA state is entered, it releases the DMA signal and waits for the state codes to no longer indicate DMA mode.
Listing 1
Shown here is the loop that pushes the bytes to the DMA of the CPU.
void WriteMemory(int len, unsigned char *data) {
unsigned char x, dma;
for (x = 0; x < len; x++) {
PORTD = x;
PORTB = data[x]; // Place the byte on PORT D
Delay10TCYx(1); // let it settle...
LATAbits.LATA3 = 1;
do {
dma = PORTAbits.RA2; // Wait for the machine state to enter DMA
} while (dma == 1);
LATAbits.LATA3 = 0; // Now, we can release the DMA request
do {
dma = PORTAbits.RA2; // Wait for the machine state to exit DMA
} while (dma == 0);
}
PORTD = 0;
for (x = 0; x < 64; x++) { // Now erase the buffer
ToSendDataBuffer[x] = 0;
}
}
Next, I wrote a simple program on the PC that would send the machine code HEX file to the PIC. Since the USB Class I was using was a simple HID type, there was no special driver needed to use the interface in Windows. The program simply enumerates the USB devices looking for the VID/PID being used. The program prompts the user for the name of the compiled HEX file to load, and transfers it to the PIC.
Finally, the pièce de résistance was to write an assembler that would read an assembly file and generate the machine code. This is new territory for me. There are 79 mnemonics in the instruction set for the COSMAC. Some are simple 1-byte instructions that operate on the accumulator or a register. In many cases, the 4-bit value in the low nibble specifies the scratchpad register. Other instructions use 1- or 2-byte operands as values to use as immediate arguments, indexes or branches. Also, the assembler needed to track labels and perform a second pass to replace them with the addresses. The entire project is two small C++ source code files. Once the assembler compiles the program, it is ready to be downloaded to the board, using the USB transfer program I described previously. Obviously, these could be combined in the future.
THE FIRST PROGRAM
The first program I put together was a simple loop to increment a value stored in a memory location, and output the value to the display. The assembly code is shown in Listing 2. The first eight instructions set the value of scratchpad register 3 to 0x0080, which is the location of a value to increment and display. It then stores value of 1 in that location. Memory location 0x0080 was chosen, because I knew this small program would not reach over 100 bytes of memory, so it was safe from any collision. Next, it loads a value of 1 into the D register. Now, the program enters a loop where it increments the D register until it overflows. This execution is just a simple delay.
Listing 2
The first program, shown here in assembly code, is a simple loop to increment a value stored in a memory location, and output the value to the display.
ldi 0x80 ; load 0x80
plo 3 ; put in R3 low
ldi 0x00 ; load 0x00
phi 3 ; put in R3 hi Now is 0x0080
ldi 01 ; Load 0x01
str 3 ; Put in R(3)
RESTART: ldi 0 ; Load 0
LOOP: adi 01 ; increment it
bnz LOOP ; did it overflow?
sex 3 ; set the X to 3 (for register 3)
out 1 ; output R(x) -> N=1
dec 3 ; decrement R3 (because out increments it)
ldn 3 ; Load R(3) -> D
smi 0x09 ; subtract 9 from D -> D
bnz INCREMENT ; Is it equal?
ldi 0xff ; set it to the end
str 3 ; Store it back to R(3)
INCREMENT:
ldn 3 ; Load R(3) -> D
adi 01 ; Increment it
str 3 ; Store it back to R(3)
br RESTART ; Unconditional brach to the outer loop
idl ; If we got here... stop
After the delay expires, the X register is set to 3. Then it executes an OUT. The CPU does not support outputting data directly from a scratchpad register, so the register must contain the address for memory to output. The X register controls selection of which scratchpad register to use. The OUT places that data on the bus, while setting the N output bits to 1. This causes the data bus to be latched and given to the BCD decoders to appear on the seven segment displays. The OUT also increments the scratchpad register; therefore, after outputting the value, the contents of scratchpad register 3 are decremented, so it continues to refer to the same memory location. Following that, the value is loaded into the D register and compared to 9. If so, it sets the memory location to a value of 0xff, so the next increment begins at 0 again. Otherwise, it jumps to INCREMENT. Then the memory location specified by scratchpad register 3 is loaded, incremented and stored back. Finally, the program branches unconditionally back to location RESTART to start the process over.
— ADVERTISMENT—
—Advertise Here—
There is plenty of room for learning and improvement in the assembler. This includes supporting longer branches beyond the 8-bit pages, defining memory segments to ensure there is no overlap between text and data, and declaring variables to make symbol references much easier to monitor.
SINGLE STEPPING
As mentioned earlier, it is possible to single step the CPU by alternately placing it in Wait and Run modes. The circuit in Figure 7, which shows the mode select logic, also allows the user to switch the ~WAIT line between the 4 input NAND gates, or the single step circuit shown in Figure 11.
The single step process is accomplished with a spring-loaded switch that is debounced through the NAND gates. The action clocks a “1” into flipflop IC15A, which is used to issue a preset to flipflop IC21A. The inverse output is used, because the preset is active low. When IC21A becomes preset, the “StepWait” line becomes “1,” releasing the CPU to run. When the CPU issues a TPA pulse, flipflop IC15A is cleared, and then the CPU issues a TPB pulse, which is used to clock a “0” into flipflop IC21A, placing the CPU back into Wait mode, concluding the execution of exactly one cycle. The timing diagram of a single step event is shown in Figure 12, captured from a PicoScope trace.
CONCLUSION
Bringing back old memories of the COSMAC CPU was simply a fun project. This was just one project that I never had the time or resources to complete until now. Having gone this far, I can see plenty of things that can be improved. For example, the address display lines are buffered, and when the CPU is single stepped, the lines do not hold their value, so latching the values would have been in order allowing the user to see each address being executed. Also, the DMA access could be streamed to operate faster, and it was not tested for larger blocks transfers, so there are likely bugs to be found. I found the need for many test points while I was troubleshooting. But hindsight is always 20/20, and since I completed the troubleshooting, I don’t see their need in the future. And as mentioned earlier, the assembler is a first attempt into a new paradigm, and lacks many things.
Looking at my pile of old wire wrap projects, there are many more I hope to accomplish. In addition to completing this childhood project that I had not managed to get working, it was valuable experience in multilayer boards, and integrating one MCU to another CPU, writing machine code. And so far, it was the most fun.
RESOURCES
References:
[1] “Retro-Computing Takes to the Cloud,” by Todd Wade — Circuit Cellar 359, June 2020
[2] “Understanding Proper PCB Design (Part 1): 4-Layer Board Design,” By Robert Lacoste — Circuit Cellar 367, February 2021
Be sure the check on the code for this article on Circuit Cellar’s article code & files webpage
KiCad | www.kicad.org
Microchip Technology | www.microchip.com
Pico Technology | www.picotech.com
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • OCTOBER 2021 #375 – Get a PDF of the issue
Sponsor this ArticleScott Weber is a software engineer for a scanner manufacturer in the US. He earned his BSEE late in life from the University of Texas in Arlington. Scott has been working with embedded controllers for both work and fun during the past 20 years, and has been developing PC software for more than 35 years. He lives in Texas with his beautiful wife and her equally beautiful garden.