How it Works
DDR4 SDRAM provides a lower operating voltage and a higher transfer rate than its processors. It can also process more data within a single clock cycle, which improves efficiency. In this article, Nishant looks at DDR4 from the system design level, the physical structure level and the protocol level.
DDR4 (double data rate 4th gen SDRAM) provides a low operating voltage (1.2V) and a high transfer rate. DDR4 adds four new bank groups to its bucket with each bank group having a single-handed operation feature. This makes DDR4 capable of processing four data banks within a single clock cycle and thereby increasing efficiency compared to older DDR formats. DDR4 has been widely adopted worldwide, and it requires an interface typically implemented with a complex ASIC or FPGA. In this article, we examine DDR4 at the system level, the physical structure level and finally the protocol level. This article represents my collective understanding from DDR4 JEDEC specifications  and various other white papers . Links to references for detailed study are provided in RESOURCES at the end of the article.
TOP LEVEL ABSTRACTION
A typical DRAM has several signal lines, mainly Clock, Reset, Data, Address, RAS, CAS, Write Enable and Data Control. The complete set of major DRAM I/O signals is not limited to those, by they are some of the most important signal lines responsible for data movement. CKE is the Clock Enable signal which enables internal clocks, buffers and drivers. CK_t and CK_c are the differential clock inputs, CS_n is Chip Select, which is used to select DIMMs or for memory cascading. DQ and DQS are the data bus and the data strobe, which is a flag for data movement. RAS_n, CAS_n and WE_n are the major control buses which are used to select the data location, interpret data movement and enable or disable Write and Read. ACT_n is the Activate function. We will discuss these in more detail later. BG0-1, BA0-1 and A0-13 are Bank Group, Bank Address and Address Inputs respectively. Figure 1 shows a top-level representation of a typical DRAM.
Now let’s try to visualize the internal structure of a DRAM where the data gets distributed across the memory. At a high level, it all starts with a Bank Group inside of which contains Bank 0 through 3. There are four such Bank Groups. All these are gated using I/O buffers and controlled via CMD (command) and Address registers. Figure 2 shows a visual representation of this.
To understand the data flow from the external world to Rows and Columns inside the memory, let’s step back to the system level view of a DDR connection and then come back inside the memory. At the top level, the DRAM is connected to an ASIC or FPGA via a physical interface called a DDR PHY. DDR PHY connects to the core using DDR controller via a DFI (DDR PHY interface). The controller is responsible for initialization, data movement, conversion and bandwidth management.
In any system, user programmable logic is generally nonstandard and depends upon drivers from different system designers. The “user” sends something called a logical address, which is converted to a physical address using the PHY interface. DRAM only sees the physical address. This physical address has various fields like Bank Group, Bank, Row and Column.
When we say “Row and Column”, the location of row and column is identified using a row and column decoder. Row activates a line in the memory array called a word line that gets activated using sense amplifiers. The Column address then reads out part of word loaded into sense amplifier. The width of a Column is called the “Bit Line.” Figure 3 illustrates of how the data and control flow is arranged. The width of a column is standard, that is 4 bits, 8 bits or 16 bits, which is same as the DQ bus width. A ×16 device has two Bank Groups, while a ×4 or ×8 have four.
When we talk about bits in DRAM, a bit is physically a capacitor that holds charge, with data flowing through a transistor as a switch. A capacitor is a passive element and cannot store a charge forever. In order to retain the information, it is necessary for the capacitor to refresh over time. To visualize the transistor-level diagram, look at Figure 4. When a row is activated, a whole page gets loaded to the sense amplifiers. ACT signals control the data movement and control of the sense amplifiers.
DRAM has fixed sizes. All organizations need to comply by the sizes detailed in JEDEC specifications. Now let’s look at how to calculate the sizes of DRAM. Here, we’ll calculate the DRAM capacity of 4Gb×8 device.
For a ×8 device, the number of Row address bits:
A0 to A14= 15 bits.
So, the total number of rows = 32k.
For a ×8 device, the number of Column Address bits:
A0 to A9 = 10 bits.
So, the total number of columns= 1k.
Width of each column = 8 bits
Number of Bank Groups = 4
Number of Banks = 4
Total DRAM capacity:
Num. Rows × Num. Columns × Width of Column × Num Bank Groups × Num Banks
= 32,000 × 1,000 x 8 × 4 × 4
= 4,096,000,000 bits
= 4 Gbits
Now, let’s look at the concept of cascading, which is an essential method for increasing the memory capacity without much increase in total subsystem cost. There are two types of cascading: depth cascading and width cascading.
Depth cascading is typically defined by “rank.” DRAMs can be single rank, dual rank or quad rank. Rank is used to increase the memory capacity of system. Normally, a single 16GB memory DRAM can be costly compared to two 8GB DRAMs. Here is where rank comes into picture. Rank can be “inter die” or “intra-die.” Inter die means two different memory die soldered on a board, while intra die means two die stacked in a 3D fashion in a dual-die package. In this case there would be two Chip Select lines (CS_n), which select between the memory die, and the address and data lines shared between them.
The next type of cascading is width cascading, where there remains just one Chip Select line, but data lines are distributed across different die. For example, a single 8Gb×8 device can be converted to two 4Gb×4 devices and connected in width cascading format.
DDR memory works on the principle of burst operation with a burst length of 8, or a chopped burst of 4 where read and write operations happen in the same burst. Implementing or a read or write operation involves a huge list of signals, all working together. But to understand it from the 30,000-foot view, there are two main steps for a general read and write. These include the ACTIVATE (ACT) and READ/WRITE commands. The ACT command starts with ACT_n and CS_n signals set low.
The address bit register, along with a Read or Write command, is used to select the column for burst operation. This step is CAS or Column Address Strobe. Since each bank has only one sense amplifier, it is necessary to deactivate the first before moving to the second. This is done using the PRECHARGE command. There are other convenient command substitutes like the RDA (Read with Auto Precharge) and the WRA (Write with Auto Precharge) commands which take care of activation and deactivation automatically. A10 bit overload is done to indicate Precharge.
We talked about Activate, Precharge, Read and Write commands. These are actually controlled using a truth table, which takes input from CS, ACT, RAS, CAS, WE, A10 I/O signals. The table as in JEDEC spec JESD79-4C .
As mentioned earlier, the first step for either read or write is to send the ACT command. The value on the address bus indicates the row address. Next, the RDA command is issued. The value on the address bus indicates the column address at this moment. The difference between read and write is that the write command issues two writes. The first one is to an address column and the second one is to an address column+8. Because we are already in the row, we don’t need to re-issue the ACT command. Finally, a WRA command is issued. Figure 5 shows summary of this high-level view of a read/write. For more detailed waveforms, refer to the JEDEC spec .
DRAM INITIALIZATION AND CALIBRATION
When a board containing DDR4 DRAM is powered up, the power ramp up step involves multiple operations to initialize and calibrate the DRAM to handle external conditions like board delays, temperature and so on. In order to tackle these things, a DRAM needs to undergo initialization, training and calibration. The calibration of DDR4 involves a state machine, which can be referenced from the JEDEC spec . To simplify things here, we’ll look at how conceptually this works. In summary, initialization consists of four phases.
1) Power up
2) ZQ calibration
3) Vref DQ calibration
4) Memory training
Figure 6 shows a summary flow chart of how the complete initialization and calibration happens in a DDR4 DRAM. Although there are hundreds more minute steps that we won’t cover in this article, the flow chart provides the big picture.
ZQ calibration is related to the DQ pins (the data pins). DQ pins are bidirectional and responsible for handling complete data transactions. Every DQ pin has a DQ calibration block which connects to the external world with a pull-down resistor that is externally programmable. This resistor value is 240Ω to be precise, however, because of the material tolerance and external factors like temperature, these values are variable. ZQ calibration makes sure that this resistor is programmed taking all those external factors into consideration. Having parallel 240Ω resistors enables users to tune the drive strength and termination resistance as well. This helps in signal integrity for different PCBs.
DDR4 terminations are typically SSTL (stub series terminated logic). This improves signal integrity at high speeds and saves power. In DDR4, there is an internal voltage reference instead of a pull up on the receiver side. Therefore, it decides the threshold based on the voltage reference value. This voltage reference is called VrefQ and it can be set using an MR6 resistor.
Although the initial calibration is complete, the alignment of clock, delays and so forth still need to be done. This step is Read/Write Training. Data and Data Strobe signals can be connected over different length traces on the board to different memory elements in DIMMs. It is necessary to train DDR DRAMs so that these length delays a taken into account. Not only Data signals, but even clock delays need to be aligned so that the data eye centers itself. The DRAM controller sends a series of tDQS pulses to delay the signal to center the data. Figure 7 illustrates this concept with a waveform from Micron Technology’s DDR4 DRAM.
A write-followed-by-read transaction can be observed in Figure 8. This waveform is referenced from Micron’s MT40A4G4 series DDR4 DRAM datasheet . There are many control signals that we haven’t discussed in this article, but you can reference them from the JEDEC spec  or from various DDR4 datasheets like those from Micron Technology.
In this article, we looked at the various DDR4 specifications, both at the system and the protocol level. We examined how the data and control flow happens, which in turn issues a successful write and read. We also discussed various ways in which a DDR DRAM self-calibrates itself to cope up with various delay and jitters arising from external factors.
 JEDEC spec: https://www.jedec.org/category/technology-focus-area/main-memory-ddr3-ddr4-sdram
 Micron datasheet : https://www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr4/16gb_ddr4_sdram.pdf
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • FEBRUARY 2021 #367 – Get a PDF of the issue