Just recently I had the opportunity to use one of the new RP2040 Microcontrollers from the Raspberry Pi Foundation. I’ll be honest in saying I initially chose this part mostly based on its availability. It is a low-cost dual-core ARM Cortex M0+ part with a basic set of peripherals including DMA, timers, SPI and I2C interfaces. This may seem a bit limiting compared to some of the peripheral-rich M0+ available (or should I say unavailable?) from more established vendors. However, these chips have an ace up their sleeve in the form of “Programmable IO” or PIO.
My application required the MCU read of data from an SD card. I could have accomplished this by putting the SD card into SPI mode and using an SPI interface, but I wanted to use native SD mode and read data on four bits at a time if at all possible. Could I configure the PIO to do this for me?
The RP2040 PIO is effectively a set of separate processors specifically designed to manipulate IO and transfer data. The PIO instructions are highly deterministic so it’s easy to design synchronous interfaces with precise and repeatable timing. There are two PIO modules in the device, each consisting of four state machines with dedicated RX and TX FIFOs. The four state machines share a 32-word program memory. This might not sound like much, but you can get a lot done in just a few instructions as we will see. The state machines also share a set of interrupt flags which can be used for inter-task synchronisation and/or to interrupt the MCU.
Figure 1 shows the block diagram for one of the PIO state machines, extracted from the RP2040 Data Sheet. Each state machine consists of an input and output shift register (called the ISR and OSR respectively), and two scratch registers (X and Y) which can be used as loop counters among other things. There is a program counter to keep track of the current program step and a clock divider and control logic. The shift registers can be programmed to shift in either direction and to optionally automatically push or pull data to or from their respective FIFOs once a specified number of bits is clocked in or out.

PIO programs are written in assembler and compiled into a C header file by a compiler included in the SDK provided by the vendor. There are only nine instructions, shown in the table below, but they are quite flexible as you can see. Additionally, each instruction can also set or clear a small number of I/O pins (called side-set pins) and optionally wait a specific number of cycles after execution. One really powerful feature is that the state machine can execute instructions directly from the ISR. I have not used this feature here, but it would allow you to create for some really complex behaviour. The RP2040 documentation provides lots more details and plenty of examples.
Instruction | Description |
JUMP | Jump to a new label in the program. Can be conditional based on state of a pin or scratch register. Can auto decrement a scratch register and jump if zero. |
WAIT | Halt execution until a pin or IRQ flag is high or low |
IN | Shifts a number of bits from an input source into the ISR and increments the input shift count accordingly. Sources can be pins, scratch registers, shift registers. If auto-push is enabled the contents of the ISR are pushed into the RX FIFO if the bit count reaches a programmed threshold. |
OUT | Shifts a number of bits from the OSR to a destination, which may be I/O pins, I/O pin direction registers, scratch registers or the ISR. Optionally the contents of the OSR can be executed as an instruction. |
PUSH | Push the contents of the ISR to the RX FIFO as a 32-bit word. May optionally stall if the FIFO is full |
PULL | Load a 32-bit word from the TX FIFO to the OSR. May optionally stall if the FIFO is empty. If not blocking a pull on an empty FIFO loads the X scratch register into the OSR |
MOV | Copy data from source to destination, Sources may be I/O pins, scratch registers, shift registers status register or null. Destinations may be I/O pins, scratch registers, shift registers, the program counter or the source data may be executed as an instruction. |
IRQ | Set or clear an IRQ flag or wait for a flag to be cleared. |
SET | Set the value of the destination immediately. Destinations may include I/O pins, scratch registers, or the I/O pin direction registers |
To interface to an SD card in native mode we need a clock line (CLK) which is always driven by the host, a command line (CMD) used to both send commands to and receive different length responses from the card, and four data lines (D0-D3) which are used to send and receive data. I don’t need to send data to the card, so the data lines are configured to be output only in my implementation.
— ADVERTISMENT—
—Advertise Here—
I used three state machines to implement the interface. The first drives the clock pin and an interrupt flag which allows the other state machines to sync up to the clock when necessary. Listing 1 shows the PIO assemble code. You can see that the program consists of just two instructions – the first sets the IRQ flag on one system clock cycle and the second clears it on the next. The clock pin is driven by the side-set command associated with each instruction. PIO programs automatically wrap from bottom to top so no loop instruction is needed to make these two instructions execute endlessly. Figure 2 shows how this works.
.define sd_irq 7
;-------------------------------------------------------------------------------
.program clock
.side_set 1
irq set sd_irq side 1
irq clear sd_irq side 0

Listing 2 shows that the command line state machine code is a bit more complex. On entry the first two instructions set the command pin high and set its direction to output, ready to write a command. The state machine then stalls on the OUT instruction since the TX FIFO is empty. It might help to follow along with the top half of the diagram in Figure 3.
.program cmd_rw
.origin 0
set pins, 0b1 ; set pin high
set pindirs, 0b1 ; and set to output
public wait_tx:
; Transmit
out x, 8 ; first byte is tx count -> x
out y, 8 ; second byte is rx count -> y
wait 0 irq sd_irq ; sync on clock
cmd_tx_loop:
out pins, 1 ; send out bits
jmp x-- cmd_tx_loop ; until done
; Receive
set pindirs, 0b0 ; set pin to input
public wait_rx:
wait 0 pin, 0 ; wait for a zero
cmd_rx_loop:
in pins, 1 ; read in each bit
jmp y-- cmd_rx_loop ; until done

To start the command-response sequence the user writes the command string into the TX FIFO a byte at a time (via DMA in my case). At the same time, we set up a second DMA channel ready to read the response data from the RX FIFO. The first two bytes of the command sequence are the bit count of the command data and expected response in (actually one less than the bit count because of the way the loop works). These counts are stored in the X and Y scratch registers respectively. Commands are always 48 bits long, but responses vary in length.
Since the writing of data into the FIFO occurs asynchronously with the SD Card clock, we now have to wait until the IRQ flag indicates the clock has gone low, before shifting out the data one bit at a time in a loop, until all bytes of the command are sent.
The program then switches the command pin to be an input and waits for the card to send the response as shown in the lower part of Figure 3. The response always starts with a zero bit. Once this occurs, we begin reading in the response data one bit at a time, counting down until all bits are read in. The program then wraps back to the start, resetting the CMD pin as an output, then stalls, ready for the next command.
You will note that the command state machine code is forced to the origin of the instruction memory and that there are a couple of “public” labels in the code. This is done so that the user code can check the program counter and reset the state machine if it gets hung up waiting for something that will never happen (for example commands that do not elicit any response)
The data read state machine works in a similar way, but it only has to receive data, so the pins are always inputs. Data is transmitted 4 bits at a time preceded by a zero Start of Packet (SOP) nibble and finishes with an End of Packet (EOP) nibble of all ones. Listing 3 shows the code and Figure 4 shows how it works. Initially the State Machine stalls on the OUT instruction, waiting for the user to initiate the read by writing to the TX FIFO. In this case the user writes the number of nibbles (less one) to be read in. We don’t read in the SOP or EOP nibbles. The state machine then loads this value into the X scratch register then waits for the D0 pin to go low indicating the start of data transmission. Once this occurs, the state machine is again synchronised with the clock and the data is read in one nibble at a time until we are done, before wrapping back to the start to wait for the next read.
.program dat_r
public start:
out x, 32 ; first word is nibble count -> x
wait 0 pin, 0 ; wait for a zero
wait 0 irq sd_irq ; sync on clock
dat_r_loop:
in pins, 4 ; read in each nibble
jmp x-- dat_r_loop ; until done

The code for all three state machines takes just 18 instructions, leaving a spare state machine and plenty of instruction memory in this PIO controller (not to mention another identical one that is unused). In the same project I used the remaining PIO resources to implement an 8-bit parallel bus to interface for a TFT display and an I2S interface to a CODEC showing just how flexible and versatile the PIO can be. I will definitely be using this MCU again.
Sponsor this ArticleAndrew Levido (andrew.levido@gmail.com) earned a bachelor’s degree in Electrical Engineering in Sydney, Australia, in 1986. He worked for several years in R&D for power electronics and telecommunication companies before moving into management roles. Andrew has maintained a hands-on interest in electronics, particularly embedded systems, power electronics, and control theory in his free time. Over the years he has written a number of articles for various electronics publications and occasionally provides consulting services as time allows.