Recently I needed to code a fast data logger that could log the state of an embedded system into RAM in real-time, for later dumping and analysis off-line. The log messages were of varying length and contained a mixture of ASCII text and raw byte data. To use the memory efficiently, I wanted to packetize each message by inserting an “end of packet” (EoP) delimiter. This way I could tell where one record ended and the next one started when dumping them out for analysis.
Herein lies a problem. Since the bytes in the message data could take any value between 0x00 and 0xFF, what can we use as an EoP token that we can unambiguously distinguish from a valid data byte? Let’s use a concrete example – say we choose 0x00 to be our EoP token. How do we distinguish between a zero byte in the data payload and the zero byte signifying the end of the packet?
One common way to do this is to arbitrarily select an escape value (say as 0x55) and use this to signal that the following character has special significance. In our example, whenever we encounter an 0x00 in the payload we change it to the sequence 0x55 0x00 and whenever we encounter an 0x55 in the payload we encode it as 0x55 0x55. Now when we decode the packet and find an 0x55 we know the value immediately following it is a part of the data and not the EoP token. Figure 1 shows how this works.
Encoding a message using escape sequences (shown in red) allows us to distinguish between a zero byte in the body of the message and the zero byte “end of packet” message (in the grey box). Any zero byte not preceded by an escape character must be the EoP byte. We also need to escape the escape character so we can use it in the body of the message. This encoding method creates a variable overhead which could double the length of the message in the worst case.
We have eliminated the ambiguity by “stuffing” additional escape bytes, shown in red, into the payload. Note that the number of bytes added (the overhead) is variable and is dependent on the message contents. In the worst case the overhead could be equal to the message size – for an all zero message for example. This is obviously not very efficient.
Consistent overhead byte stuffing (COBS) provides a better approach. As its name suggests the amount of overhead required is consistent regardless of the message content. In fact, it requires a maximum of one overhead byte for each 254 characters or part thereof. Figure 2 shows how it works.
Constant Overhead Byte Stuffing (COBS) encoding adds a maximum of one additional character per 254 bytes of data and guarantees the message will not contain any zero bytes, so the end of packet is easily identified. The overhead byte at the beginning of the message indicates the count of bytes to the next zero byte position (position 3 in this example). That zero byte is replaced the count of bytes to the next zero byte and so on. The last stuffed byte points to the position at the end of the message where the end of packet character will be appended.
An overhead byte, shown in red text in the figure, is positioned at the start of the payload. If there are no zero bytes in the message, the value of this byte is one greater than the message length (effectively the index of the EoP zero byte). If there are zero bytes, as for these examples, the overhead byte contains the position of the first one counting from its location. This byte is then replaced by the position of the next zero byte, or the EoP zero if there are no more in the payload. The arrows in the diagram show how this works.
— ADVERTISMENT—
—Advertise Here—
If the message contains more than 254 consecutive bytes of non-zero characters like that in Figure 3, the initial overhead byte is set to 0xFF and a new overhead byte is inserted after the 254th character. In this case the overhead byte is additional and does not replace a value in the unencoded message. After this the encoding continues as before.
If the message has more than 254 consecutive non-zero bytes an addition overhead byte is required as shown here. The initial overhead byte is set to 0xFF which indicates the position count to the next stuffed byte (0x06 in red text in this case). In this case the stuffed byte does not replace the message data as we would do for a zero byte. Encoding continues as before with each stuffed byte pointing to the next zero byte and so on.
The table below shows the encoded values for several messages to give you an ideal how they will look. Red text indicates additional overhead bytes, blue text indicates bytes replacing zero bytes and the shaded text is the EoP token.
You will see that COBS encoding guarantees that the body of the encoded message never contains a zero byte. This means that a zero byte can be inserted at the end of each packet, and this will unambiguously identify the end of one packet and the start of the next.
Implementing COBS encoding in code is relatively simple as shown in the Code Snippet. To encode the data, the program first creates a pointer to the location of the initial “stuffed” byte, then iterates through the input buffer copying non-zero bytes and counting them to create the value to “stuff”. If a zero byte is encountered, or the stuff count reaches 255, the stuff count is stored in the stuff byte location and the stuff counter reset. The stuff pointer is set to the current location and the process continues until the input buffer is exhausted. Finally, the end of packet token is added.
Decoding is similar. First, we set the stuff count value to zero since the first byte will always be a stuffed byte and we set the original count to 255 since the first position is an overhead byte not a zero in the original data. The program then iterates through the encoded data copying it into the output buffer and decrementing the stuff count until it reaches zero. If the original overhead byte was less than 255 a zero byte is inserted in the output buffer, otherwise we do nothing since this must be an overhead byte. In either case, the new value of stuff count is stored, and the iteration process continues until the end of packet byte is reached.
References
Cheshire, Stuart, and Mary Baker. “Consistent Overhead Byte Stuffing,” no. 2 (1999): 15. http://www.stuartcheshire.org/papers/cobsforton.pdf
“Consistent Overhead Byte Stuffing.” In Wikipedia, May 12, 2022. https://en.wikipedia.org/w/index.php?title=Consistent_Overhead_Byte_Stuffing&oldid=1087394483.
GDB online Debugger. “Online C Compiler – Online Editor.” Accessed September 27, 2022. https://www.onlinegdb.com/online_c_compiler
Sponsor this ArticleAndrew Levido (andrew.levido@gmail.com) earned a bachelor’s degree in Electrical Engineering in Sydney, Australia, in 1986. He worked for several years in R&D for power electronics and telecommunication companies before moving into management roles. Andrew has maintained a hands-on interest in electronics, particularly embedded systems, power electronics, and control theory in his free time. Over the years he has written a number of articles for various electronics publications and occasionally provides consulting services as time allows.