Basics of Design CC Blog Research & Design Hub

Asynchronous High-Bandwidth, Low-Latency Communication

CPU-Intensive Devices

Three students of Cornell University implement high-bandwidth and low-latency SPI communication between a PIC32 and an OV7670 camera module to better utilize the performance and processing speed of the microcontroller by reducing unnecessary computational loads.

  • What are asynchronous high-bandwidth and low-latency communication?

  • How to improve the performance of a microcontroller?

  • Why should you reduce unnecessary computational loads?

  • How to communicate between a PIC32 and an OV7670 camera module?

  • How to feed the data from the target straight into the SRAMs without CPU intervention?

  • PIC32 microcontroller

  • OV7670 camera module

  • 23LC1024 SRAM chips

  • Function generator

  • Oscilloscope

Microcontrollers are often used to solve problems in real-time applications with strict timing deadlines. The greater the computational load on the CPU, the harder it is to guarantee that processes meet their deadlines. Our project reduces the computational load on the CPU by offloading processes to dedicated hardware. More specifically, we consider a scenario where a microcontroller must read data from a “target” module at a higher rate than that of which our microcontroller is capable. Although the system is designed for a PIC32 microcontroller and an OV7670 camera module, it will become clear that our solution can be used to tackle a wide range of problems.

The purpose of this project was to implement high-bandwidth, low-latency SPI communication between a PIC32 and a target module, which in this case was the OV7670 camera [1], a simple VGA camera and image processor used with microcontrollers. The OV7670 uses eight parallel data pins to output image data at 10-48MHz. The PIC32 used in this project, however, only runs at 40MHz. Thus, we would need to read each of the eight data pins and store the new data in, at most, three cycles. This is essentially impossible due to both clock frequency and memory constraints. But fear not, we found another way.

HIGH-LEVEL DESIGN

The workaround that we devised consists of eight 23LC1024 SRAM chips [2] and a few logic integrated circuits (ICs). The camera offloads data at a much higher rate than that which the PIC32 is capable of reading. Our circuit serves as a buffer that can read picture frame data into SRAM modules without CPU intervention. This allows the PIC32 to read the frame data at its own pace.

The implementation of our solution can be split into three sections: 1) setting up our pins and configuring the target device (Initialization); 2) reading data from the target and storing it in the SRAM (Target to SRAM); and 3) reading the data from the SRAM and sending it to connected PC (SRAM to PC). At a high level, Target to SRAM works by sending a written instruction to the SRAMs, telling the target to transmit data, and then connecting each of the eight data lines from the target to the eight SRAMs in parallel. Once a full payload (in this case an image) has been received from the target, we begin SRAM to PC by giving the SRAMs a read instruction and then serially transmitting the data to the PC byte by byte. This solution enables us to read data into the SRAMs at the target frequency with little software intervention, and then retrieve the data at our leisure.

THE SRAM (23LC2024)

The SRAM module we chose communicates using the SPI protocol, which requires four communication lines: chip select, clock line, data input, and data output. Figure 1 is a snippet from the SRAM Datasheet [3].

Figure 1 Instructions and timing diagrams of these instructions are taken from the datasheet of the SRAM module.
Figure 1
Instructions and timing diagrams of these instructions are taken from the datasheet of the SRAM module.

The SRAM is capable of ingesting several instructions, but the only two that are relevant to us are the READ and WRITE instructions. Both of these begin by dropping the active-low chip, chip select (CS), for the entire transaction. During the transaction, the clock line (SCK) pulses 32 + times, where n is the number of data bits being written or read. The first 32 bits comprise the 8-bit instruction followed by a 24-bit address. The following n bits correspond to a payload. In WRITE, the data input (SI) is read at the rising edge of SCK.

— ADVERTISMENT—

Advertise Here

Similarly, new data is written to the data output (SO) at the rising edge of SCK in READ. Note that when n > 8 (i.e., SCK is pulsed more than eight times following the 32-bit instruction and address), the 23LC2024 enters sequential read or write mode (SEQREAD, SEQWRITE) which continues the respective operation by shifting the memory address by one every eight pulses until the CS is raised. These operations allow camera data to be streamed into or out of the SRAM.

THE TARGET (OV7670 CAMERA)

The OV7670 (subsequently called “the target”), has a pinout in accordance with Table 1.

The camera has several registers that can be configured to change the functionality of the device. (A full list can be perused from the OV7670 Datasheet [1], though you may find more enjoyment in watching paint dry). We can write to these registers using the Serial Camera Control Bus (SCCB), Omnivision’s protocol that is compatible with I2C, over the SDIOC and SDIOD lines. An SCCB WRITE transaction is performed in accordance with the snapshot from a logic analyzer shown in Figure 2, where S and P refer to the start and stop sequences, respectively, as shown in Figure 3.

After configuring camera mode registers over SCCB, we can begin streaming frame data into the SRAM modules. To do this, we first need to connect a clock line of at least 10MHz to XCLK—at which point, the target will immediately start providing data across the eight parallel data pins D[0:7]. The new data is synchronized with three digital signals: VSYNC, HREF, and PCLK. We can visualize the effect of the former two using the timing diagram in Figure 4.

According to the datasheet, the falling edge of VSYNC indicates the start of a new image, and the rising edge indicates its end. In contrast, the rising edge of HREF indicates the beginning of a new row of data, and the falling edge indicates that a row has ended. Note that the values of D[0:7] are meaningless in any instance that VSYNC is HIGH or HREF is LOW.

When it comes to actually extracting each byte, the datasheet provides a separate diagram (Figure 5). The latter signal used to synchronize the data is PCLK, a clock line provided by the target tied directly to XCLK. Similar to the SRAM, new data becomes available at the rising edge of PCLK.

Figure 2 Logic analyzer view of I2C/SCCB example writes sequence, which features the data values and bits of each packet.
Figure 2
Logic analyzer view of I2C/SCCB example writes sequence, which features the data values and bits of each packet.
Figure 3 Timing diagrams for start and stop signals are outlined in the SCCB protocol.
Figure 3
Timing diagrams for start and stop signals are outlined in the SCCB protocol.
Figure 4 Timing diagram describing the sequence of signals to expect from the OV7670 camera when it outputs new data across pins D[7:0]. Note that the values of D[7:0] represent valid data only when VSYNC is LOW, HREF is HIGH, and HSYNC is HIGH.
Figure 4
Timing diagram describing the sequence of signals to expect from the OV7670 camera when it outputs new data across pins D[7:0]. Note that the values of D[7:0] represent valid data only when VSYNC is LOW, HREF is HIGH, and HSYNC is HIGH.
Figure 5 Timing diagram describing the sequence of signals to expect from the OV7670 camera when an image is being written to its data output pins. This must be used in conjunction with Figure 4 to understand how the output data is synchronized with the OV7670’s output clock PCLK.
Figure 5
Timing diagram describing the sequence of signals to expect from the OV7670 camera when an image is being written to its data output pins. This must be used in conjunction with Figure 4 to understand how the output data is synchronized with the OV7670’s output clock PCLK.
PROGRAM/HARDWARE DESIGN

The completed circuit is shown in Figure 6. Our program and hardware design are explained in Table 1, which contains descriptions of the I/O pin header to our circuit (outlined in green in Figure 6).

Table 1 Contextualizing each connection in the circuit I/O pin header. In this table, we provide the pin’s ID in the header (ordered as in the top left of Figure 1 from left to right), the corresponding ID of the pin in the PIC32 datasheet, the net label that we use to refer to the pin in later explanations, and a brief description of the connection formed by the pin.
Table 1
Contextualizing each connection in the circuit I/O pin header. In this table, we provide the pin’s ID in the header (ordered as in the top left of Figure 1 from left to right), the corresponding ID of the pin in the PIC32 datasheet, the net label that we use to refer to the pin in later explanations, and a brief description of the connection formed by the pin.
Figure 6 The completed circuit. The I/O pin header is outlined in green.
Figure 6
The completed circuit. The I/O pin header is outlined in green.

Initialization: We begin by initializing each of the I/O pins described in Table 1. An important note is that we drive the XCLK input of the target with CCLK, which is configured to leverage an output-compare module and outputs a 10MHz Square Wave.

Next, we configure the target by writing to the camera’s internal registers using the SCCB protocol. Most notably, we configure the camera to: reset each register to the default value on initialization; and set the frame data output to CIF resolution with YUV encoding.

— ADVERTISMENT—

Advertise Here

Our exact configuration is given in the file, “ov7670.c” which is available for download on the Circuit Cellar Article Code and Files webpage.

Target to SRAM: As we mentioned before, the goal is to feed the data from the target straight into the SRAMs, without CPU intervention. To accomplish our goal, we need to hack the SPI transaction to allow the PIC32 to initialize a WRITE transaction (PHASE1), and then give control to the target device so it can finish the data write (PHASE2). More specifically:

PHASE1
  • Starts when we send the WRITE instruction to the SRAM.
  • Finishes when WRITE and the 24-bit address have been transmitted to the SRAM.
PHASE2

Starts when PHASE1 is over and VSYNC drops from HIGH to LOW, indicating the start of a new image.

Finishes when VSYNC raises from LOW to HIGH, indicating the end of the image.

Some pseudocode for better understanding each phase and the order of operations is given in Listing 1. These changes that we make in software result in phase-dependent hardware connections, shown in Table 2.

Listing 1

Pseudocode, which is intended to add clarity to the discussion of phases.


// ===== PHASE1 START ===== \\ 

// drop chip select 
setPinLow(SCS);

// send write instruction and address to SRAM
SRAM_Send(RAM_WRITE_CMD | addr);

// ===== PHASE1   END ===== //

// turn on the camera
setPinLow(CAM_POW);

// wait for current image to end (VSYNC to go HIGH)
while (~readPin(VSYNC) & VSYNC);

// when VSYNC is HIGH, set clock select HIGH to 
// indicate data is coming
setPinHigh(CLK_SEL);

// wait for start of a new image (VSYNC to go back LOW)
while (readPin(VSYNC) & VSYNC);

// ===== PHASE2 START ===== //

// raise chip select (VSYNC CS control takes over)
setPinHigh(SCS);

// wait for the current image to end (VSYNC to go back HIGH)
while (~readPin(VSYNC) & VSYNC);

// ===== PHASE2  END ===== //

// turn off the camera
setPinHigh(CAM_POW);

// drop clock select
setPinLow(CLK_SEL);
TABLE 2 Pin connections are based on a specific phase. The first column comprises each pin of the SRAM module. The second column indicates the pins of the circuit I/O header (as in Table 1) to which the SRAM pins are connected during PHASE1. The third column indicates the pins of the circuit I/O header to which the SRAM pins are connected during PHASE2.
TABLE 2
Pin connections are based on a specific phase. The first column comprises each pin of the SRAM module. The second column indicates the pins of the circuit I/O header (as in Table 1) to which the SRAM pins are connected during PHASE1. The third column indicates the pins of the circuit I/O header to which the SRAM pins are connected during PHASE2.

To understand why we made these choices and why this works, we will separate each of the four connections (except for MISO, since we don’t READ in this section) into its own subsections.

Chip Select: A schematic of the logic we implemented for the chip select portion of the circuit is given in Figure 7.

In closed form, we can express the value of CS as:

When we start PHASE1, we drop SCS, which causes CS to be LOW regardless of the values of CLK_SEL and VSYNC. At the moment PHASE2 starts, CLK_SEL is HIGH, VSYNC is LOW, and SCS is set LOW. This causes A to be LOW, and thus, CS remains LOW. As soon as we finish PHASE2, VSYNC goes HIGH, which causes A to go HIGH, and thus, CS to go HIGH. Therefore, CS remains LOW for the entire transaction and performs as expected.

Clock Line: A schematic of the logic we implemented for the clock line is shown in Figure 8.

In closed form, we can express the value of SCK as:

On the one hand, during PHASE1, we want the SRAM to be in phase with FCLK, which is a bit-banged output signal that we use as the clock while writing the instruction and address. On the other hand, we want the SRAM to be in phase with PCLK during PHASE2 because the data is output from the target at the rising edge of PCLK.

From the pseudocode, we can see that CLK_SEL is LOW during PHASE1 which causes B to go LOW and for the output signal of the OR gate to pulse in phase with FCLK. Thus, SCLK is set correctly for PHASE1. For PHASE2, CLK_SEL is HIGH, which causes A to go LOW and B to be in phase with HREF and PCLK. (Note that this is the only time we want data loaded into the SRAM.) Therefore, SCK is set correctly for both phases.

MOSI Lines: Figure 9 is a schematic of the logic that determines the values of MOSI[0:7], the eight data-in lines of the SRAM module. In closed form, we can represent MOSI as:

where MOSIi represents the i-th SRAM and Di represents the i-th data line from the target.

— ADVERTISMENT—

Advertise Here

Figure 7 Circuit implementation of the logic that determines the value of CS, the chip select pin of each SRAM module.
Figure 7
Circuit implementation of the logic that determines the value of CS, the chip select pin of each SRAM module.
Figure 8 Circuit implementation of the logic that determines the value of SCK, the clock line of each SRAM module.
Figure 8
Circuit implementation of the logic that determines the value of SCK, the clock line of each SRAM module.
Figure 9  Circuit implementation of the logic that determines the values of MOSI[0:7], the eight data-in lines of the SRAM module.
Figure 9
Circuit implementation of the logic that determines the values of MOSI[0:7], the eight data-in lines of the SRAM module.

In this configuration, each of the SRAMs has a shared line from the PIC32, which is used to transmit the instruction. During PHASE1, we set the PWDN pin of the target HIGH, which causes the camera to go into standby mode, during which PCLK does not pulse and D[0:7] is LOW. Thus, MOSIi=SRAM_CMD whenever FCLK pulses during PHASE1. During PHASE2, however, FCLK is always LOW, and thus, MOSIi=Di whenever PCLK pulses. Note also that the diodes protect each of the outputs, and the pull-down resistor ensures that MOSI never floats. Therefore, the MOSI lines are set correctly for both phases.

SRAM TO PC: With the data from the camera stored in the SRAM modules, it is then streamed through the PIC to the PC. This begins with an SPI transaction between the SRAM and PIC. The SRAM remains in phase with FCLK for this entire transaction. The first few rows of the camera data saved to the SRAM do not contain useful image data and are discarded. The actual image data is then read by pulsing FCLK HIGH, then reading each byte, then setting FCLK LOW again.

Each byte read from the SRAM is sent to the PC over UART between the clock pulses. This data can then be picked up by a Python script and displayed. The output of the process is discussed in the next section.

RESULTS

We want to verify three major points from our image output:

  • We receive some sort of data.
  • The pixels are aligned, indicating that SRAMs are being written to and read from correctly.
  • The data stream is as we expect, meaning that the camera is configured properly.

Image Analysis: As stated before, the OV7670 can output several color formats; we chose YUV. To best explain the image encoding/decoding, we introduce some notation. Based on the YUV and CIF register configuration, the camera should output 352×240×2=168,960 bytes of data that constitute 352×240=84,480 pixels. Let Pi denote the i-th pixel. Each byte of data can be one of three values:

  • Yi: The luminance value of the Pi.
  • Ui: The first chroma value shared between Pand Pi+1.
  • Vi: The second chroma value shared between Pand Pi+1.

Given this notation, we would expect the camera to output frame data in the order given in Table 3.

Each of the P0, …, Pi, …, P84479 pixels has its own luminance Yi, but pixels PiPi+1 share chroma components UiVi for each i such that i mod 2=0. In other words, each pixel can be stored using an average of 2 bytes by arranging them as shown in Table 4.

Table 3 The byte sequence is expected to be received from the data output pins of the OV7670 during the transmission of a new image. Includes notation introduced under “Image Analysis.”
Table 3
The byte sequence is expected to be received from the data output pins of the OV7670 during the transmission of a new image. Includes notation introduced under “Image Analysis.”
Table 4 How the expected output bytes in Table 3 correspond to individual pixels in the OV7670’s output image [4]. Includes notation introduced under “Image Analysis.”
Table 4
How the expected output bytes in Table 3 correspond to individual pixels in the OV7670’s output image [4]. Includes notation introduced under “Image Analysis.”

To convert a YUV image to RGB, we create a pixel [RiGiBi] for each Pi where RiGiBi are obtained by the transformation given in Figure 10.

If we were to compare the RGB output with images simply encoded with only Y values, only U values, and only V, we would observe what is shown in Figure 11. Clearly, the Y channel encodes the grayscale portion of the image, whereas the U and V encode the colors.

After reading data from the camera and storing it across eight parallel SRAMs, the PIC32 serially transmits the data to a Python script, which converts the data to images. Figure 12 contains six windows, all showing the same image. But each window constructs the image using different YUV-to-RGB conversions. The same subject, captured on an iPhone, is shown in Figure 13.

Figure 10 Conversion from the YUV image encoding to the RGB image encoding. Includes notation introduced under “Image Analysis.”
Figure 10
Conversion from the YUV image encoding to the RGB image encoding. Includes notation introduced under “Image Analysis.”
Figure 10 Conversion from the YUV image encoding to the RGB image encoding. Includes notation introduced under “Image Analysis.”
Figure 11
The same image was constructed using the YUV components of each pixel (first from the top), only the Y component of each pixel (second), only the U component of each pixel (third), and only the V component of each pixel (fourth) [4].
Figure 12 Screenshot representing the outputs of six different methods used to decode the raw frame data received from the picture taken by the OV7670. It is clear that the subject displayed in the bottom left window is the same subject in Figure 13.
Figure 12
Screenshot representing the outputs of six different methods used to decode the raw frame data received from the picture taken by the OV7670. It is clear that the subject displayed in the bottom left window is the same subject in Figure 13.
Figure 13 Picture of individual wearing a hoodie taken on an iPhone. The subject of this photo is the same subject in the photo taken by the OV7670, which is shown in the bottom left window of Figure 12.
Figure 13
Picture of individual wearing a hoodie taken on an iPhone. The subject of this photo is the same subject in the photo taken by the OV7670, which is shown in the bottom left window of Figure 12.
Figure 14 QR code for accessing the video from our final demo, explaining how everything works
Figure 14
QR code for accessing the video from our final demo, explaining how everything works

We constructed the bottom two images of Figure 12 by taking each even byte A and odd byte B in our data stream, and adding pixels [A,A,A] and [B,B,B] to images bw1 and bw2, respectively. Since bw1 is the grayscale representation of the image, we can confirm that the camera is sending the bytes in the correct order. The other four windows are each different attempts to convert the data to an RGB image. Although you can clearly make out the same shapes in the colored images as in the B/W images, something is not correct. We deduced that it’s most likely an issue with the camera configurations and that given more time to experiment with the configurations, a proper image could be resolved.

The important thing to note is that the pixels are aligned, and we can definitively state that we received an image from the camera. This also shows that the data streamed into the SRAM from the camera is then read out from the SRAM in the correct order. By these metrics, we have proven that we can read data from a target device asynchronously, using our parallel SRAM configuration. This is applicable because our circuit effectively acts as a buffer to help high-speed peripherals interact with low-speed devices.

The video from our final demo explaining how everything works can be accessed by using the QR code in Figure 14.

CONCLUSIONS

The high-bandwidth, low-latency communication with a target device was a success. Despite the poor image quality, we were able to get an image from the camera to the PIC through the SRAM modules with little software intervention.

This project involved a large amount of troubleshooting, which involved several tools such as a function generator and an oscilloscope. These tools were largely used to deal with the timing requirements of our bit-banged communication protocols. They were also used to ensure that data output from the target matched what we were reading from the SRAMs. If we were to continue this project, we would improve the camera quality, then add wireless communication, so that the camera does not have to be connected to the computer to take a picture. If anything were to be changed, we would likely have used a different camera. We were expecting higher image quality when we started. In fact, we expected the camera to be the easier part of the project. In reality, the camera lacked documentation. This made figuring out how to get a good image from this camera more difficult. Ultimately, we showed that high-bandwidth, high-speed devices can be used without a high-frequency CPU

REFERENCES
[1] OV7670 Datasheet
https://people.ece.cornell.edu/land/courses/ece4760/FinalProjects/f2021/jfw225_aei23_dsb298/jfw225_aei23_dsb298/OV7670_2006.pdf
[2] SCCB Functional Specification
https://people.ece.cornell.edu/land/courses/ece4760/FinalProjects/f2021/jfw225_aei23_dsb298/jfw225_aei23_dsb298/SCCBSpec_AN.pdf
[3] SRAM Datasheet
https://people.ece.cornell.edu/land/courses/ece4760/FinalProjects/f2021/jfw225_aei23_dsb298/jfw225_aei23_dsb298/SRAM.pdf
[4] Hacking the OV7670 Camera Module
http://embeddedprogrammer.blogspot.com/2012/07/hacking-ov7670-camera-module-sccb-cheat.html

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JULY 2022 #384 – Get a PDF of the issue

Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.


Note: We’ve made the May 2020 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article
+ posts

Joseph Whelan is expected to graduate in May of 2023 from Cornell University’s College of Engineering with the following degrees: BS Computer Science and BS Electrical & Computer Engineering. Joseph is currently an engineer at the Johns Hopkins University Applied Physics Lab and is the Co-founder/CTO of a healthcare computer-vision startup. He is interested in hardware, software, and the integration of the two. Joseph can be reached at jfw225@cornell.edu.

Akugbe Imudia is expected to graduate in May of 2022 from Cornell University’s College of Engineering with a BS in Electrical & Computer Engineering. He is interested in Computer Architecture, hardware design, and Embedded Systems. Akugbe is also an E-board member of the Cornell Maker Club. He can be reached at aei23@cornell.edu.

Devlin Babcock is expected to graduate in December of 2022 with a BS in Electrical and Computer Engineering. Devlin is interested in embedded systems and hardware design with FPGAs. He is currently on the software team for Alpha CubeSat at Cornell. He can be reached at dsb298@cornell.edu.

Supporting Companies

Upcoming Events


Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2022 KCK Media Corp.

Asynchronous High-Bandwidth, Low-Latency Communication

by Joseph Whelan, Akugbe Imudia, and Devlin Babcock time to read: 15 min