The Backbone Protocol
AMBA is an open standard for the connection and management of functional blocks in an SoC. In this article, Nishant reviews the history of AMBA, and then focuses on one of AMBA’s protocol specs, AXI4, and how it improves performance and bandwidth.
An open standard for on-chip interconnect specifications, the Arm Advanced Microcontroller Bus Architecture (AMBA) defines the management of functional block connections around each other. AMBA enables efficient IP (intellectual property) reuse and faster design turn around with less human errors. It offers flexibility to connect various sets of specifications, thereby enabling compatibility between different vendors.
Various AMBA standards are defined based on the performance that they offer. That performance is measured in terms of bandwidth and latency improvements added to the system. In this article, we will examine all these protocol features. This is collective information that I have gathered from various white papers and Arm datasheets. Note that, for this article, I’m focusing on the AXI4 AMBA specification. Other versions of AXI may offer more or less features than AXI4, but the overall concepts remain the same.
AMBA was introduced to the world in the late 1990s when it started with low-speed Advanced Peripheral Bus (APB) (Figure 1) and Advanced System Bus (ASB) (not shown) (APB) is widely used today for low-bandwidth interfaces such as register accesses and I2C. Around 1999, Arm introduced Advanced High-Performance Bus (AHB). This is a clock-edge protocol containing address and data phases with bus muxes and larger bus width. Around 2003, Advanced Extensible Interface-3 (AXI3) was introduced along with Advanced Trace Bus (ATB). In 2010 came AXI4 Lite and AXI4 Stream along with ACE Lite. (ACE stands for AXI Coherency Extensions.) Those were much more suitable as the backbone protocol interfaces for FPGAs and networking. ACE brought with it the cache concept to AXI. In 2014 CHI (Coherent Hub Interface) was introduced for cache coherency and improved congestion handling. The development of CHI continues even today, with the CHI.C as the most recent version to date.
Terminology Note: Although the terms “master” and “slave” have long be used in the electronics industry, those terms are discouraged these days for obvious and valid social reasons. The industry as a whole has not yet come to any widespread agreement on replacement terms. However, for this article we’ve decided to follow the updated documentation on developer.arm.com and use the term “manager” to replace “master,” and use the term “subordinate” to replace “slave.”
AXI defines the protocol for the interface but not the interconnect. All AXI connections are between manager and subordinate interfaces. Both manager and subordinate protocols are similar in nature, making it easier to validate the protocols. The interconnect does all kinds of protocol conversions and other features like bit mapping, width mapping and so forth. As a result, it’s easy to create a system that requires AXI3 as subordinate and AXI4 as manager by using an interconnect. As shown in Figure 2, AXI protocol follows five basic control and data signals: Write Address, Write Data, Write Response, Read Address and Read Data.
Write Address informs the subordinate which address of memory to be written. Write Data carries the data into it. Similarly, Read Address and Read Data has the address location to be read and its data content, respectively. Read Response goes along with Read Data, however Write Response is handled from a separate Write Response channel. Since write and read are two independent channels, bandwidth is improved because reads and writes can happen simultaneously.
Now let’s look at the AXI4 protocol in a little more detail. Write or read happens for a single data or multiple data. Figure 3 shows the single data of the write transaction. A transaction is initiated by the manager by sending a AWVALID signal that gets consumed when AWREADY is signaled by the subordinate. There are multiple ways in which the handshaking can happen. A Ready can be always high, or Ready can come before Valid or it can come after Valid. It all depends on how the system has been designed and how other IP expects the handshake to happen.
Once the address handshake is completed, the WVALID and WREADY handshake happens in a similar fashion, after which the WDATA is consumed by the subordinate. Once the data is consumed, it signals WLAST. Finally, the subordinate responds to write using the response channel (B channel) by sending OKAY and then the subordinate asserts BVALID. BREADY is already asserted by manager. For multiple data, WLAST is an important signal. There are multiple WVALID signals for multiple data, and once the last data has been consumed, it asserts WLAST indicating the end of the data to be transferred.
Now let’s examine the Read Transaction behavior for single data. As mentioned earlier, the write and read transactions are symmetric—the only difference is that the response of read exists within the RRESP. Figure 4 shows the Read Transaction for a single data. As seen in the waveform, ARADDR contains the address that the manager wants to read. When ARVALID is asserted by the manager and ARREADY is received by subordinate, it accepts the request and it starts the signaling process.
Now the subordinate transfers the data to the manager from the RDATA by sending RVALID. Since RREADY is already asserted by the manager, the data is received. Because it is a single data, RLAST also gets asserted, otherwise it gets asserted when multiple transactions occur. RRESP is the response channel which indicates if the transaction was OKAY.
MORE CHANNEL ATTRIBUTES
Apart from the generic signals, there are other important attributes of the protocol. To identify the data size length and type, we have AxLEN[7:0], AxSIZE[2:0] and AxBURST[1:0] (x =R or W). AxLEN defines the number of data transfers possible in each burst transaction. For AXI4, the number of data transfers vary from 1 to 256. AxSIZE defines the number of bytes possible in each transfer, which varies from 1 to 128 bytes per transfer.
AXI4 protocol defines three burst types: Fixed (00), INCR(01) and WRAP(10). In FIXED mode, the address is the same for every transfer of burst—used for loading and emptying FIFOs for example. Length of burst varies from 1 to 16 transfers. In INCR, the subordinate increments the address and the length varies from 1 to 256 for AXI4. Unaligned transfers are supported in this mode. Finally, WRAP mode increments the address similar to INCR except for the fact that, after the max address limit is reached, it wraps around to a lower address. The length of burst is limited to 2, 4, 8 or 16 and transfers must be aligned. WRAP is mostly used for cache operations.
Another important attribute of AXI4 is AxPROT[2:0], which is responsible for protection from illegal snooping of data. Bit 0 identifies privileged or unprivileged access. Bit 1 indicates secure or non-secure access. Next, the last bit is used to indicate if the access is instruction or data. There is a provision of cache management in AXI4 using the AxCache [3:0] attribute. Bit 0 of the attribute is the “bufferable” bit defining whether the response has come from the destination or somewhere in between. Bit 1 is to identify if it is cacheable or modifiable. Bit 2 is used to identify if there is a cache miss on read, while bit 3 identifies if there is a cache miss on write.
The write data strobe signal tells the subordinate which bit of the data bus is required. It is indicated as WSTRB[x:0] with one bit per byte of WDATA. Let’s consider the example of a 64-bit WDATA where only the first three positions are supposed to be the valid data. In that case, WSTRB is supposed to have 0x7.
AxQOS , AxREGION and AxUSER are other attributes that help in setting QoS priority, setting up regions in the system address map and to transfer control information, respectively.
AXI provides an ID for all the channels, namely AWID, WID, BID, ARID and RID. “Provision of ID” provides a feature to send unlinked out-of-order transactions and thus improving performance. A Transaction Ordering mechanism helps in maintaining the data flow and prevent congestion. There are mainly three rules in ordering: 1) All data transfers must be issued in same order as address sequence; 2) Transactions with different IDs can complete in any order; and 3) The manager can have multiple transactions but they need to complete in order.
Atomic Access is defined in AMBA as a provision in which a particular memory region can be accessed without being corrupted by other write operations. Atomic Access has two types: Locked access and Exclusive access. In Locked access when a manager is performing atomic access, all other manager requests are rejected. In Exclusive access, other managers can access the subordinate except for the memory region being accessed by the manager who has ordered atomic access. Note that Locked access is not supported in AXI4.
Next, let’s discuss how the Exclusive transaction happens in atomic access. Consider a scenario where a manager tries to access address 0xABCD, while the Exclusive Access Monitoring hardware in a subordinate has ID 0 and 1. First, the manager sends a read with ID 0. The value stored in 0xABCD gets recorded in the table with ID 0. Here again the manager reads the value in 0xABCD with ID 1. It again stores successfully with EXOKAY response. Next, the manager tries to write 0x3 to address 0xABCD. Since 0xABCD is already present in the table, Exclusive Access Monitoring hardware will send EXOKAY. However, because the content was changed it will not update the value in the table. When another write request comes from the manager, it will send an OKAY instead of EXOKAY because there is no entry in the table and the user would have to restart the entire process since it will not write to the memory. AxLOCK is the signal used to indicate if its Exclusive(1) or normal(0) access.
AXI protocol is subdivided to AXI-LITE, AXI4 full and AXI Stream (AXIS). AXIS contains only basic Valid, Ready and Data signals with other attributes considered as side band signals. AXI Stream is not memory mapped, there is it is used mostly for sending continuous data in computations. AXI Lite doesn’t support burst transfer but it does support Exclusive accesses.
In this article, we examined the various protocol attributes of AMBA AXI4. We learned how performance and bandwidth are improved in AXI4 and various ways in which data congestion is prevented.
Arm Developer | https://developer.arm.com
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • MAY 2021 #370 – Get a PDF of the issue