Consistent Overhead Byte Stuffing

Stuart Cheshire and Mary Baker, Member, IEEE


Manuscript received July 29, 1998; approved by IEEE/ACM Transactions on Networking Editor J. Crowcroft. This work was supported in part by Rockwell International Science Center, FX Palo Alto Laboratory, AT&T/Lucent, under NSF Faculty Career Grant CCR-9501799, and by a grant from the Keio Research Institute at SFC, Keio University & the Information-Technology Promotion Agency of Japan, and by an Alfred P. Sloan Faculty Research Fellowship. This paper was presented in part at SIGCOMM ’97, Cannes, France, September 1997.

Stuart Cheshire is with Apple Computer, Cupertino, CA 95014 USA.

Mary Baker is with the Computer Science Department, Stanford University, Stanford, CA 94305 USA.

Publisher Item Identifier S 1063-6692(99)03826-1

 

 

1063–6692/99$10.00  ©  1999 IEEE


Abstract—Byte stuffing is a process that encodes a sequence of data bytes that may contain ‘illegal’ or ‘reserved’ values, using a potentially longer sequence that contains no occurrences of these values. The extra length is referred to here as the overhead of the encoding. To date, byte stuffing algorithms, such as those used by SLIP, PPP and AX.25, have been designed to incur low average overhead, but little effort has been made to minimize their worst-case overhead. However, there are some increasingly popular network devices whose performance is determined more by the worst case than by the average case. For example, the transmission time for ISM-band packet radio transmitters is strictly limited by FCC regulation. To adhere to this regulation, the current practice is to set the maximum packet size artificially low so that no packet, even after worst-case overhead, can exceed the transmission time limit.

This paper presents a new byte stuffing algorithm, called Consistent Overhead Byte Stuffing (COBS), which tightly bounds the worst-case overhead. It guarantees in the worst case to add no more than one byte in 254 to any packet. For large packets this means that their encoded size is no more than 100.4% of their pre-encoding size. This is much better than the 200% worst-case bound that is common for many byte stuffing algorithms, and is close to the information-theoretic limit of about 100.07%. Furthermore, the COBS algorithm is computationally cheap, and its average overhead is very competitive with that of existing algorithms.

Index Terms—Packet, Serial, Transmission, Framing, Byte stuffing

I.   Introduction

T

HE PURPOSE of byte stuffing is to convert data packets into a form suitable for transmission over a serial medium like a telephone line. When packets are sent over a serial medium there needs to be some way to tell where one packet ends and the next begins, particularly after errors, and this is typically done by using a special reserved value to indicate packet boundaries. Byte stuffing ensures, at the cost of a potential increase in packet size, that this reserved value does not inadvertently appear in the body of any transmitted packet. In general, some overhead (additional bytes transmitted over the serial medium) is inevitable if we are to perform byte stuffing without loss of information.

Current byte stuffing algorithms, such as those used by Serial Line IP (SLIP) [RFC1055], the Point-to-Point Protocol (PPP) [RFC1662] and AX.25 (Amateur Packet Radio) [ARRL84], have highly variable per-packet overhead. For example, using conventional PPP byte stuffing, the total overhead (averaged over a large number of packets) is typically 1% or less, but this overhead is not consistent for all packets. Some packets incur no overhead at all, while others incur much more; in principle some could increase in size by as much as 100%. High-level Data Link Control (HDLC) uses a bit stuffing technique [ECMA-40] that has a worst-case expansion of only 20%, but bit-oriented algorithms are often harder to implement efficiently in software than byte-oriented algorithms.

While large variability in overhead may add jitter and unpredictability to network behavior, this problem is not fatal for PPP running over a telephone modem. An IP host [RFC791] using standard PPP encapsulation [RFC1662] need only make its transmit and receive buffers twice as large as the largest IP packet it expects to send or receive. The modem itself is a connection-oriented device and is only concerned with transmitting an unstructured stream of byte values. It is unaware of the concept of packet boundaries, so unpredictability of packet size does not directly affect modem design.

In contrast, new devices are now becoming available, particularly portable wireless devices, that are packet-oriented, not circuit-oriented. Unlike telephone modems these devices are aware of packets as distinct entities and consequently impose some finite limit on the maximum packet size they can support. Channel-hopping packet radios that operate in the unlicensed ISM (Industrial/Scientific/Medical) bands under the FCC Part 15 rules [US94-15] are constrained by a maximum transmission time that they may not exceed. If the overhead of byte stuffing unexpectedly doubles the size of a packet, it could result in a packet that is too large to transmit legally. Using conventional PPP encoding [RFC1662], the only way to be certain that no packets will exceed the legal limit is to set the IP maximum transmission unit (MTU) to half the device’s true MTU, despite the fact that it is exceedingly rare to encounter a packet that doubles in size when encoded.

Halving the MTU can significantly degrade the performance seen by the end-user. For example, assuming one 40-byte TCP acknowledgement for every two data packets [RFC1122], a Metricom packet radio with an MTU of 1000 bytes achieves a maximum end-user throughput of 5424 bytes per second [Che96]. Halving the MTU to 500 bytes reduces the maximum end-user throughput to 3622 bytes per second, a reduction of roughly 33%, which is a significant performance penalty.

Although, as seen from experiments presented in this paper, packets that actually double in size rarely occur naturally, it is not acceptable to ignore the possibility of their occurrence. Without a factor-of-two safety margin, the network device would be open to malicious attack through artificially constructed pathological packets. An attacker could use such packets to exploit the device’s inability to send and/or receive worst-case packets, causing mischief ranging from simple denial of service attacks to the much more serious potential for exploiting array-bounds errors such as the one in the Unix ‘finger’ dæmon exploited by the infamous Internet Worm of 1989 [RFC1135].

One way to avoid the performance degradation of halving the MTU might be to set the IP MTU closer to the underlying device MTU, and then use some kind of packet fragmentation and reassembly to handle those packets that, when encoded, become too large to send. Packet fragmentation is not without cost though, as Chris Bennett reported in [Ben82] where he wrote, “As well as the global overheads such as endpoint delay, fragmentation generates overheads for other components of the system — notably the intermediate gateways.”

One kind of packet fragmentation is link-layer fragmentation (‘transparent fragmentation’). While this could work, requiring link-layer software to perform fragmentation and reassembly is a substantial burden to impose on device driver writers. In [Kent87], Kent and Mogul wrote “… transparent fragmentation has many drawbacks … gateway implementations become more complex and require much more buffer memory …” Fragmentation and reassembly also provides more opportunities for software bugs [CA-96.26] and adds protocol overhead, because the link-layer packet headers have to contain additional fields to support the detection and reassembly of fragments at the receiving end.

A second way to use fragmentation would be to avoid adding a new layer of fragmentation and reassembly at the device level, and instead to use IP’s own existing fragmentation and reassembly mechanisms, but this also has problems. One problem is that in current networking software implementations IP fragmentation occurs before the packet is handed to the device driver software for transmission. If, after encoding by the device driver, a particular packet becomes too large to send, there is no mechanism for the driver to hand the packet back to IP with a message saying, “Sorry, I failed, can you please refragment this packet into smaller pieces and try again?” Also, some software goes to great lengths to select an optimal packet size. It would be difficult for such software to cope with an IP implementation where the MTU of a device varies for every packet. Finally, it is unclear, if we were to create such a mechanism, how it would handle the IP header’s ‘Don’t Fragment’ bit without risking the creation of path MTU discovery [RFC1191] ‘black holes’.

All these problems could be avoided if we used a byte stuffing algorithm that did not have such inconsistent behavior. This paper presents a new algorithm, called Consistent Overhead Byte Stuffing (COBS), which can be relied upon to encode all packets efficiently, regardless of their contents. It is computationally cheap, easy to implement in software, and has a worst-case performance bound that is better even than HDLC’s bit stuffing scheme. All packets up to 254 bytes in length are encoded with an overhead of exactly one byte. For packets over 254 bytes in length the overhead is at most one byte for every 254 bytes of packet data. The maximum overhead is therefore roughly 0.4% of the packet size, rounded up to a whole number of bytes.

Using Consistent Overhead Byte Stuffing, the IP MTU may be set as high as 99.6% of the underlying device’s maximum packet size without fear of any packet inadvertently exceeding that limit. Thus COBS provides better end-user performance than existing mechanisms, because COBS enables a given piece of hardware to send larger IP packet payloads.

All the data stuffing algorithms discussed in this paper, including COBS, have execution times that are O(n) with respect to the size of the data set being stuffed. The constant factor varies depending on the algorithm and the implementation, but all the algorithms execute in linear time.

The remainder of this paper proceeds as follows: The next section describes two widely used methods for performing data stuffing, HDLC bit stuffing and PPP byte stuffing.

Section III describes our new framing protocol, called Consistent Overhead Byte Stuffing (COBS). COBS guarantees that for all packets, no matter what their contents, the overhead will be small. In addition to the basic algorithm, Section III also presents a simple variant called Zero Pair Elimination (COBS/ZPE) which improves on straight COBS’s average performance at the expense of a fractionally higher worst-case bound.

Sections IV and V compare the costs of HDLC, conventional PPP, and COBS for encoding the same data packets. In the least favorable case for COBS, network traffic consisting predominantly of small packets, COBS is found to add on average less than 0.5% additional overhead compared to conventional PPP. Although this is a small price for the performance benefit of being able to use much larger packets, it is possible to eliminate even this cost. COBS/ZPE achieves an average overhead lower than PPP’s, even for small-packet traffic, without giving up COBS’s advantage of also guaranteeing a very low bound on worst-case overhead. Section IV considers the expected overhead from a theoretical point of view, for data consisting of uniformly distributed random eight-bit values. Real network traffic often does not have a uniform distribution of byte values, so Section V presents experimental results comparing COBS with conventional PPP for real network traffic.

Section VI presents our conclusions.

II.   Current Data Stuffing Algorithms

When packet data is sent over any serial medium, a protocol is needed by which to demarcate packet boundaries. This is done by using a special bit-sequence or character value to indicate where the boundaries between packets fall. Data stuffing is the process that transforms the packet data before transmission to eliminate any accidental occurrences of that special framing marker, so that when the receiver detects the marker, it knows, without any ambiguity, that it does indeed indicate a boundary between packets. Some communications protocols use bit stuffing algorithms (which operate on a bit-by-bit basis), and some use byte stuffing algorithms (which operate on a byte at a time).

Both bit stuffing and byte stuffing in general increase the size of the data being sent. The amount of increase depends on the patterns of values that appear in the original data and can vary from no overhead at all to, in the worst case for PPP, doubling the packet size. Bit stuffing and byte stuffing are discussed in more detail below.

HDLC [ECMA-40] uses a bit stuffing scheme. It uses the binary sequence 01111110, called the Flag Sequence, to mark boundaries between packets. To eliminate this pattern from the data, the following transformation is used: whenever the transmitter observes five ones in a row, it inserts a zero immediately following. This eliminates the possibility of six consecutive ones ever occurring inadvertently in the data. The receiver performs the reverse process: After observing five ones in a row, if the next binary digit is a zero it is deleted, and if it is a one then the receiver recognizes it as one of the special framing patterns. This process of inserting extra zeroes (‘bit stuffing’ or ‘zero insertion’) increases the transmitted size of the data. In the worse case, for data that consists entirely of binary ones, HDLC framing can add 20% to the transmitted size of the data.

This kind of bit-level insertion and deletion is easy to implement in the hardware of a serial transmitter, but is not easy to implement efficiently in software. Software gets efficiency from working in units of 8, 32, or more, bits at a time, using on-chip registers and wide data buses. Algorithms that are specified in terms of individual bits can be hard to implement efficiently in software because it can be difficult to take good advantage of the processor’s ability to operate on bytes or words at a time. For this reason it is more common for software algorithms to use byte stuffing.

PPP uses a byte stuffing scheme [RFC1662]. It operates in terms of bytes instead of bits and uses a byte with value 0x7E (the same as the HDLC Flag Sequence) to mark boundaries between packets. To eliminate this value from the data payload, the following transformation is used: everywhere that 0x7E appears in the data it is replaced with the two-character sequence 0x7D,0x5E. 0x7D is called the Control Escape byte. Everywhere that 0x7D appears in the data it is replaced with the two-character sequence 0x7D,0x5D. The receiver performs the reverse process: whenever it sees the Control Escape value (0x7D) it discards that byte and XORs the following byte with 0x20 to recreate the original input.

On average this byte stuffing does reasonably well, increasing the size of purely random data by a little under 1%, but in the worst case it can double the size of the data being encoded. This is much worse than HDLC’s worst case, but PPP byte stuffing has the advantage that it can be implemented reasonably efficiently in software.

PPP’s byte stuffing mechanism, in which the offending byte is prefixed with 0x7D and XORed with 0x20, allows multiple byte values to be eliminated from a packet. For example, PPP byte stuffing can facilitate communication over a non-transparent network by eliminating all ASCII control characters (0x00-0x1F) from the transmitted packet data. It is equally possible to do this using a COBS-type algorithm [Che98], but the subject of this paper is the minimal byte stuffing necessary to facilitate reliable unambiguous packet framing, not extensive byte stuffing to compensate for non-transparency of the underlying network.

A more exhaustive treatment of framing and data stuffing can be found in [Che98].

III.   Consistent Overhead Byte Stuffing Algorithms

This section begins by describing the data encoding format used by COBS and the procedure for converting packet data to and from that encoded format. It then describes some of the properties and implications of using COBS encoding, and describes a variant of COBS called COBS/ZPE which has slightly different properties. COBS is described in terms of detecting packet boundaries in a stream of bytes, but it can also be used equally well to decode a raw bit-stream, and this section concludes by describing how this is achieved.

A.  COBS Syntax and Semantics

COBS performs a reversible transformation on a data packet to eliminate a single byte value from it. Once eliminated from the data, that byte value can then safely be used as the framing marker without risk of ambiguity.

For the description that follows, zero was chosen as the framing value to be eliminated. In practice zero is a good value to choose for a real-world implementation because zeroes are common in binary computer data, and COBS performs marginally better when it has many bytes to eliminate. However, elimination of some value other than zero can be achieved with only trivial changes to the algorithm, such as the addition of a simple post-processing step (like XORing all the output bytes with the value to be eliminated) [Che97] [Che98].

COBS first takes its input data and logically appends a single zero byte. (It is not necessary actually to add this zero byte to the end of the packet in memory; the encoding routine simply has to behave as if the added zero were there.)

COBS then locates all the zero bytes in the packet (including the added one), and divides the packet at these boundaries into one or more zero-terminated chunks. Every zero-terminated chunk contains exactly one zero byte, and that zero is always the last byte of the chunk. A chunk may be as short as one byte (i.e. a chunk containing just a solitary zero byte) or as long as an entire packet.

COBS encodes each zero-terminated chunk using one or more variable length COBS code blocks. Chunks of 254 bytes or fewer are encoded as a single COBS code block. Chunks longer than 254 bytes are encoded using multiple code blocks, as described later in this section. After a packet’s constituent chunks have all been encoded using COBS code blocks, the entire resulting aggregate block of data is completely free of zero bytes, so zeroes can then be placed around the encoded packet to mark clearly where it begins and ends.

A COBS code block consists of a single code byte, followed by zero or more data bytes. The number of data bytes is determined by the code byte. Fig. 1 shows some examples of valid COBS code blocks and the corresponding zero-terminated data chunks they represent.

For codes 0x01 to 0xFE, the meaning of each code block is that it represents the sequence of data bytes contained within the code block, followed by an implicit zero byte. The zero byte is implicit — it is not actually contained within the sequence of data bytes in the code block.

These code blocks encode data without adding any overhead. Each code block begins with one code byte followed by n data bytes, and represents n data bytes followed by one trailing zero byte. Thus the code block adds no overhead to the data: a chunk (n+1) bytes long is encoded using a code block (1+n) bytes long.

Fig. 1.   Example code blocks and the data chunks they represent. Each COBS code block begins with a single code byte (shown shaded), followed by zero or more data bytes.


These basic codes are suitable for encoding zero-terminated chunks up to 254 bytes in length, but some of the zero-terminated chunks that make up a packet may be longer that that. To cope with these chunks, code 0xFF is defined slightly differently. Code 0xFF represents the sequence of 254 data bytes contained within the code block, without any implicit zero. Defining the maximum length code (0xFF) as an exception this way allows us to encode chunks that are too long to be encoded as a single code block. Chunks longer than 254 bytes are encoded using one or more of these special maximum length code blocks, followed by a single normal code block. Each special maximum length code block has no implied trailing zero, but the final (normal) code block does include an implied trailing zero at the end, so this aggregate sequence of code blocks correctly encodes the required long chunk of non-zero data with a single zero byte at the end. Unlike the other code blocks, this maximum length code block does add some overhead to the encoded data. 254 bytes of packet data are encoded using 255 bytes, an overhead of just under 0.4%.

Fig. 2 shows an example of a 680-byte zero-terminated chunk (679 non-zero bytes and one zero) which is encoded using three COBS code blocks. The first two are special maximum-length (254 data bytes, no implied zero) COBS code blocks, and the final code block is a standard 172-byte (171 data bytes and an implied zero) COBS code block. Thus in this case 680 bytes of user data is encoded using 682 bytes, an overhead of just under 0.3%.

Fig. 2.   Encoding oversized chunks.


The meanings of the various COBS code values are summarized in Table 1.

COBS has the property that the byte value zero is never used as a code byte, nor does it ever appear in the data section of any code block. This means that COBS takes an input consisting of characters in the range [0,0xFF] and produces an output consisting of characters only in the range [1,0xFF]. Having eliminated all zero bytes from the data, a zero byte can now be used unambiguously to mark boundaries between packets. This allows the receiver to synchronize reliably with the beginning of the next packet, even after an error. It also allows new listeners to join a broadcast stream at any time and without fail receive and decode the very next error free packet.

B.  COBS Encoding Procedure

The job of the COBS encoder is to translate the raw packet data into a series of COBS code blocks. Breaking the packet into zero-terminated chunks as described above is a useful way to understand the logic of COBS encoding, but in practice it is more efficient to write an encoder that translates from packet data directly to code blocks, without going through an intermediate data format.

Code

Followed by

Meaning

0x00

(not applicable)

(not allowed)

0x01

no data bytes

A single zero byte

n

(n–1) data bytes

The (n–1) data bytes,
followed by a single zero

0xFF

254 data bytes

The 254 data bytes,
not followed by a zero

Table 1.   Code values used by consistent overhead byte stuffing.


The encoding routine searches through the first 254 bytes of the packet looking for the first occurrence of a zero byte. If no zero is found, then a code of 0xFF is output, followed by the 254 non-zero bytes. If a zero is found, then the number of bytes examined, n, is output as the code byte, followed by the actual values of the (n–1) non-zero bytes up to, but not including, the zero byte. The zero is then skipped and this process is repeated until all the bytes of the packet (including the final zero which is logically appended to every packet) have been encoded. Fig. 3 shows an example of packet encoding.

Fig. 3.   Example COBS encoding showing input (with a phantom zero logically appended) and corresponding zero-free output, with code bytes shown shaded for clarity.


The implementation of COBS is very simple. The Appendix gives complete C source code listings to perform both COBS encoding and decoding.

There is a small optimization that may be performed in one particular situation. The reason for logically appending a zero to every packet before encoding is that all code blocks except code 0xFF represent a block of data that ends with an implied zero. If the data to be encoded did not actually end with a zero there might be no valid COBS encoding of that data. Adding a zero to the end of every packet circumvents this problem. However, if by chance the packet happens naturally to end with a maximum length 0xFF code block then there is no need to logically append a zero in order to make it encodable. This optimization can be performed without ambiguity, because the receiver will observe that the decoded packet does not end with a zero, and hence will realize that in this case there is no trailing zero to be discarded in order to recreate the original input data. In practice, the gain achieved by this optimization is very slight, but it is useful from a theoretical standpoint. It simplifies analysis of the algorithm’s behavior by allowing us to say that COBS encoding adds “no more than one byte of overhead for every 254 bytes of packet data.” Without this optimization, that statement would not be true for packets containing an exact multiple of 254 bytes and no zeroes; these packets would incur one extra byte of overhead.

C.  Behavior of COBS

COBS encoding has low overhead (on average 0.23% of the packet size, rounded up to a whole number of bytes) and furthermore, for packets of any given length, the amount of overhead is virtually constant, regardless of the packet contents. This section offers an intuitive understanding of why the overhead of COBS is so uniform, in marked contrast to the overhead of PPP and other two-for-one byte stuffing schemes, which are extremely variable.

COBS encoding has low overhead because, in most cases, the size of each code block is exactly the same as the size of the data sequence it encodes. For all of the code blocks 0x01 to 0xFE, the code block contains one code byte and (n–1) non-zero data bytes. The data represented by that code block is the (n–1) data bytes followed by a single zero byte at the end. Hence n input bytes are encoded using exactly n output bytes, so the output size is the same as the input size. Even for code block 0xFF where there is no implied zero and 255 output bytes are used to encode 254 input bytes, the overhead is just 0.4%.

COBS encoding overhead has very low variability compared to other byte stuffing algorithms like PPP. To see why this is so we need to understand how overhead is generated in byte stuffing algorithms. For any byte stuffing algorithm like PPP or COBS, each byte in a packet either generates no overhead, or it generates some fixed amount x. In a given packet of length n bytes, if the proportion of bytes that are overhead-generating is p, then the total overhead for the packet will be npx bytes. For all of today’s byte stuffing algorithms, x has some fixed value, but the values of n and p are determined by the size of the packet being encoded and the nature of the data in that packet.

For PPP, overhead is incurred whenever a byte has one of the reserved values 0x7D or 0x7E. On average for uniformly distributed data, only one byte out of every 128 is a reserved value. This occurrence is relatively rare — less than 0.8% of the bytes — but when it does occur the cost is relatively high — an encoding overhead of 100% for that particular byte. All the other byte values — the vast majority — incur no overhead. Thus for PPP x=1 and, for the average packet, p=1/128.

For COBS, the behavior has the opposite characteristics. Overhead is not the rare case; it is the common case. Overhead is not generated by occurrences of the reserved value, it is generated by the large majority of bytes that are not the reserved value zero: If more than 254 bytes are encountered without finding a single zero, then one byte of overhead is generated. Fortunately, although overhead is the common case, the one byte of overhead generated is amortized over 254 bytes, making the cost per byte very small. The rare event of encountering a zero can be viewed as an occasional lucky bonus: both the zero and some number of non-zero bytes that preceded it are then encoded with no overhead at all. Thus for COBS the proportion of bytes that are overhead-generating is high — usually the majority of the bytes in the packet — but the amount of overhead each one generates, x, is only 1/254.

Because of these two opposite kinds of behavior, PPP and COBS generate overhead in opposite ways. The encoding overhead of PPP can be characterized as the sum of a series of high-cost low-probability events, whereas the overhead of COBS can be characterized as a series of high-probability low-cost events. For PPP to encode a packet with low overhead, it requires that the proportion of overhead-generating bytes in the packet be small. In the pathological case where every byte in the packet generates overhead, p=1 and the size of the packet doubles. In contrast, COBS already assumes that most bytes generate overhead, so the worst possible case where every byte generates overhead is hardly any different from the average case. For the same reason, the best case for COBS is also hardly any better than the average case. Because x is so small, the final encoded size of the data is very insensitive to the proportion p of bytes that are overhead-generating.

To summarize: The best case for COBS is one byte of overhead, and the worst case is one byte for every 254 bytes of packet data. This is good for large packets, but has an unfortunate side effect for small packets. Every packet 254 bytes or smaller always incurs exactly one byte of overhead, no more, no less.

We regard one byte of overhead a small price to pay in exchange for vastly more predictable overhead and the consequent significant performance gains we get from the ability to send much larger IP packets. However, there could be circumstances where any cost, however small, is unacceptable. To address this concern the next section presents a minor modification to basic COBS called Zero Pair Elimination (ZPE), which exhibits better performance for small packets.

D.  Zero Pair Elimination

In experiments on real-world data (see Section V) we observed not only that zero is a common value in Internet traffic, but furthermore that adjacent pairs of zeros are also very common, especially in the headers of small TCP/IP packets. To take advantage of this property of real-world traffic, we created a variant of COBS where the maximum encodable sequence length is slightly shortened, freeing some of the high-numbered codes for other uses. These codes are reassigned to indicate sequences ending with a pair of implicit zeroes. Any reduction in the maximum sequence length increases the worst-case overhead, so the maximum length should not be reduced by too much. With our packet traces we found that reassigning 31 codes gives good performance without sacrificing too much of the good worst-case bound that makes COBS useful [Che98]. For other protocols and traffic mixes the ‘optimum’ cut-off point may be different, but it should be remembered that the benefit of COBS (and COBS/ZPE) encoding is that it is extremely insensitive to the nature of the data being encoded. The penalty for picking the ‘wrong’ cut-off point is very slight.

Code

Followed by

Meaning

0x00

(not applicable)

(not allowed)

0x01-
0xDF

(n–1) data bytes

The (n–1) data bytes,
followed by a single zero

0xE0

223 data bytes

The 223 data bytes,
not followed by a zero

0xE1-
0xFF

(n–225) data bytes

The (n–225) data bytes,
followed by a pair
of zero bytes

Table 2.   Code values used by consistent overhead byte stuffing with zero-pair elimination. Codes 0x01 to 0xDF have the same meaning as in basic COBS, but codes 0xE0 to 0xFF are reassigned to have new meanings.


In COBS/ZPE codes 0x00 to 0xDF have the same meaning as in basic COBS, and code 0xE0 encodes the new maximum length sequence of 223 bytes without an implicit zero on the end. This change in code assignments gives COBS/ZPE a worst-case overhead of one byte in 223, or about 0.45%. Codes 0xE1 to 0xFF encode sequences that end with an implicit pair of zeroes, containing, respectively, 0 to 30 non-zero data bytes. These new code meanings are summarized in Table 2.

Reassigning some of the codes to indicate sequences ending with a pair of implicit zeroes has the good property that now some of the code blocks are smaller than the data they encode, which helps mitigate the one-byte overhead that COBS adds. Fig. 4 shows an example of a small packet (actually the beginning of a real IPv4 packet header) that gets one byte smaller as a result of encoding using COBS/ZPE.

The disadvantage of using COBS/ZPE is a slightly poorer worst-case overhead — about 0.45% instead of 0.40% — but this is still a very small worst-case overhead. In fact, as described in Section V, when a typical mix of real-world network traffic is encoded using COBS/ZPE, it actually gets smaller by about 1%. PPP byte stuffing cannot compete with this since PPP never makes any packet smaller.

COBS/ZPE is useful because pairs of zeroes are common in packet headers. Also, the trend towards aligning packet fields on 64-bit boundaries in high-performance protocols sometimes results in padding zeroes between fields. These padding zeroes waste precious bandwidth on slow wireless links. Using COBS/ZPE as the link-layer encoding for these slower links can help to mitigate this effect by encoding these patterns more efficiently. This increased efficiency for zero pairs makes it more attractive to use a single packet format on all networks, instead of designing different application-layer protocol formats for different speeds of network.

Although for certain packets COBS/ZPE does result in some reduction in size, COBS/ZPE should not be thought of as a general purpose compression algorithm, and it is not intended to compete with more sophisticated (and more computationally expensive) compression algorithms such as Huffman encoding [Huff52] [Knu85] and Lempel Ziv [LZ77] [Wel84]. Although, like PPP, these compression algorithms may have good average performance, for some data they can make the packet bigger instead of smaller [Hum81], and it can be hard to predict how much bigger they may make a packet in the worst case, which is contrary to our goal of ensuring a tight bound on worst-case performance.

It is also not clear that there is a great benefit to applying heavyweight compression at the link layer, since the majority of compressible data is much more effectively compressed before it even reaches the IP layer using data-specific algorithms such as JPEG [ISO10918] for images and MPEG [ISO11172] for video. In the case of data that has been encrypted, it is not possible to apply heavyweight compression at the link layer because data that has been properly encrypted is not compressible [Pra97].

ZPE is a lightweight technique that works well for small packets that contain zero pairs, without sacrificing the primary benefit of COBS: an aggressive bound on worst-case overhead, even for packets that may be large and contain no zero pairs.

Fig. 4.   Example COBS/ZPE encoding showing input (with a phantom zero logically appended) and corresponding zero-free output. For clarity, COBS code bytes are shown shaded.


E.  Lower-Level Framing

COBS is defined in terms of byte operations. This means that there also needs to be some underlying mechanism to detect where the byte boundaries fall in the bit-stream. When used over an RS-232 serial port [RS-232-C], the start/stop bits perform this function (not very reliably [Che98]), at a cost of 20% extra overhead. HDLC framing is usually more efficient, but it too, in the worst case, can add as much as 20% overhead. COBS’s twin benefits of guaranteeing to always resynchronize immediately after error and guaranteeing to do so without adding excessive overhead are somewhat diminished if the bit-level framing is not similarly well behaved.

Fortunately, with the choice of an appropriate bit-level framing pattern, COBS-encoded data becomes self-framing, and no separate lower-layer bit stuffing is needed. All that is required to make this work is a bit-level framing pattern chosen so that it can never occur anywhere in COBS-encoded data. If the framing pattern cannot inadvertently appear, there is no need for any bit stuffing mechanism to eliminate it.

Finding an appropriate framing pattern is easy. Since COBS-encoded data contains no zero bytes, we know that there is at least one binary ‘1’ bit somewhere in every byte. This means that in a bit-stream of COBS-encoded data there can be no contiguous run of more than fourteen zero-bits, which suggests a candidate for the COBS end-of-packet marker: a run of fifteen (or more) zero-bits.

The work of Barker [Bar53] and Artom [Art72] gives further guidance about the choice of an appropriate framing pattern. The framing pattern should not only be one that does not appear in the data stream, but also one that cannot appear in any overlap between the marker and the data stream. A framing marker consisting of just a run of zero-bits would not meet this requirement. It could never appear entirely within the encoded packet data, but appending this framing marker to the end of an encoded packet could result in an ambiguous situation. This is because nothing prevents an encoded packet from ending with as many as seven consecutive zero-bits. Upon seeing a long run of zero-bits, the receiver does not have any simple reliable way to tell which, if any, of those zero-bits actually belong to the tail of the previous packet, and which bits belong to the framing marker. This ambiguity can be solved by refining the definition of the framing marker to be a one-bit followed by fifteen zero-bits:

1000000000000000

Unintended occurrences of the framing marker are now eliminated, because any sequence made by concatenating bits from the end of a packet with bits from the start of the marker will result in a sequence that has at least one one-bit somewhere in the middle. This cannot be mistaken for a framing marker because the framing marker has no one-bits in the middle.

COBS bit-level framing can be defined to use a synchronous idle sequence, or an asynchronous idle sequence, as appropriate to the application requirements. Using a synchronous idle sequence, the end-of-packet marker is a one-bit followed by exactly fifteen zero-bits. If the link is idle and there are no packets to transmit, then these 16-bit end-of-packet markers are transmitted back-to-back continuously until there is a packet to transmit. Using an asynchronous idle sequence, the end-of-packet marker is a one-bit followed by fifteen or more zero-bits. In this case, after the basic end-of-packet marker has been sent, an idle line can simply be filled with as many zero-bits as are needed to bridge the gap until there is another packet to send. So that the receiver knows when the next packet begins, the asynchronous framing marker is defined to be a one-bit, followed by fifteen or more zero-bits, and then another one-bit, as shown below:

100000000000000 ... 01

For a line that is mostly busy, the synchronous idle specification is marginally better, because the framing marker is only 16 bits long, compared to the 17-bit minimum length of the asynchronous idle framing marker, so there is slightly less overhead.

For a line that is mostly idle, the asynchronous idle specification is marginally better, because packets can be sent with slightly less delay. Using synchronous idle markers, if an idle line has just begun transmitting its next idle end-of-packet marker when an outgoing packet arrives at the interface, the packet will have to wait for 15 bit-times before it can be sent. Using asynchronous idle, the interface could immediately send the start bit and begin sending the packet after a delay of only one bit-time.

Used with either of these framing marker definitions, COBS guarantees a very tight bound, not only on the number of bytes, but also on the total number of bits required to transmit a data packet.

IV.   Theoretical Analysis

This section presents a theoretical analysis of the behavior of COBS and COBS/ZPE in comparison to two other techniques, HDLC bit stuffing and PPP byte stuffing. It compares the best-case, worst-case, and average case-encoding overhead, given uniformly distributed random data.

HDLC and PPP are chosen for comparison because both work reliably and are reasonably efficient, and as a result both are widely used in today’s software. They are arguably the canonical examples respectively of bit and byte stuffing.

It is useful to calculate the expected performance for uniform random data, because data that is properly compressed and/or encrypted has a uniform distribution of byte values. Data that is well-compressed must in general have a uniform distribution of byte values, because if it did not then Huffman encoding [Huff52] could be used trivially to compress the data further, contradicting the description of the data as being ‘well compressed’. For similar reasons encryption algorithms seek to maximize the entropy of the data so as to maximize the strength of the encryption [Pra97]. Compressed data already makes up a large part of Internet traffic: For example, World-Wide Web traffic has grown to exceed all other Internet traffic sources combined, whether measured by number of packets, number of bytes, or number of flows [Braun94] [Thom97], and furthermore, 65% of World-Wide Web traffic is compressed GIF or JPEG image data [Gwert96]. We expect to see an increasing awareness of security issues on the Internet in the future, resulting in a similar increase in encrypted data. However, at the present time not all traffic is compressed and/or encrypted, so actual performance on today’s real packets is shown in Section V.

The section concludes with a discussion of the trade-off between minimizing the encoding overhead and minimizing the delay imposed on the stream of data being encoded.

A.  Best Case Overhead

The best case for HDLC bit stuffing is a packet that contains no occurrences of five binary ones in a row. In this case, HDLC encoding adds no overhead to the packet at all.

The best case for PPP byte stuffing is a packet that contains no occurrences of the reserved (0x7D or 0x7E) characters. In this case, PPP encoding adds no overhead to the packet at all.

The best case for COBS is a packet with plenty of zeroes, so that nowhere in the packet is there any contiguous sequence of more than 254 non-zero bytes. In this case, each block is encoded with no overhead at all. Counting the phantom zero that has to be added to every packet before encoding, this results in a best-case overhead of a single byte, for any size of packet.

The best case for COBS/ZPE is a packet composed entirely of zeroes. In this case, each pair of zeroes is encoded as a one-byte code, 0xE1, resulting in an approximate halving of packet size. As with basic COBS, a phantom zero is still appended to the tail of every packet, so the best-case encoded size is half of the length n of the packet plus one byte, rounded up to an integer number of bytes, meaning packet size is reduced by:

B.  Worst-Case Overhead

The worst case for HDLC bit stuffing is a packet that contains nothing but binary ones. In this case, HDLC encoding adds one bit for every five, giving an overhead of 20%.

The worst case for PPP byte stuffing is a packet that consists entirely of reserved (0x7D or 0x7E) characters. In this case, encoding doubles the size of the packet, giving an overhead of 100%.

The worst case for both COBS and COBS/ZPE is a packet that contains no zeroes at all. In the case of basic COBS, each sequence of 254 bytes of packet data is encoded using 255 bytes of output data, giving one byte of overhead for every 254 bytes of packet data. The maximum overhead is 1/254 (roughly 0.4%) of the packet size, rounded up to a whole number of bytes. For example, a maximum size IP packet over Ethernet is 1500 bytes long, and in the worst possible case COBS would add an overhead of six bytes to a packet of this size. In the case of COBS/ZPE, each sequence of 223 bytes of packet data is encoded using 224 bytes of output data, giving one byte of overhead for every 223 bytes of packet data. The maximum overhead is 1/223 (roughly 0.45%) of the packet size, rounded up to a whole number of bytes. For example, in the worst possible case COBS/ZPE would add an overhead of seven bytes to a maximum size Ethernet packet.

C.  Expected Overhead

Knowing each algorithm’s extremes is useful, but it is also useful to know the average overhead because that tells us how efficiently the algorithm uses the underlying medium, in the sense of what overall proportion of its intrinsic capacity we expect to be available for carrying data and what proportion we expect to be consumed by encoding overhead. Since byte stuffing is a process that takes as input a packet composed of characters from an alphabet of 256 possible symbols and gives as output a packet composed of characters from an alphabet of only 255 possible symbols, in general there must be some overhead. Exactly how much longer a particular packet becomes may or may not depend on the contents of that packet, depending on the algorithm being used. With some algorithms there may be fortunate packets that incur no overhead at all, but information theory tells us that for random data the average overhead must be at least:

This theoretical bound gives us a metric against which to judge different byte stuffing schemes. Some algorithms may be able to beat this bound for some packets, but there is no algorithm whose average performance over all packets can beat this bound. Perhaps more significantly, this average performance bound tells us that no algorithm can ever have a worst-case bound better than 0.07063%. If any algorithm did have a worst-case bound better than this, that would necessarily mean that its average performance over all inputs would also be better than 0.07063%, and we know that cannot be the case.

In practice we can take advantage of the fact that our network traffic is not purely random data, by recognizing that certain patterns of data occur more frequently than others and devising an algorithm that encodes this particular mix of data more efficiently. This allows us to improve our average performance for typical network traffic, but it does not change the fact that the worst case can never be improved beyond the theoretical limit.

C.i.  Expected Overhead for HDLC

In [Pap87] Papastavridis shows that in a prefix-synch­ronized code such as HDLC, if the data is n bits long, the prefix is k+1 bits long, and n»k, the mean number of stuffing bits is approximately:

For HDLC, the prefix referred to is the sequence 0111111, so k=6. Thus Papastavridis shows that the mean number of stuffing bits in HDLC-encoded data is roughly one bit in 63. Note that Papastavridis shows his result in terms of the proportion of stuffing bits expected to be observed in the final output data, not in terms of the number of stuffing bits the encoding algorithm is expected to add to the input data. If in the output one bit in 63 is a stuffing bit, that is equivalent to saying that the encoding algorithm added one bit for every 62 bits of input data. Hence, in terms consistent with those used throughout the rest of this paper, HDLC bit stuffing adds an average overhead of one bit in 62, or a little over 1.6%.

C.ii.  Expected Overhead for PPP

For PPP the expected overhead is easy to calculate. PPP has only two distinct behavior patterns: it either reads a single byte and writes a single byte, or it reads a single byte and writes a pair of bytes. In uniformly distributed data, the probability that any given byte will be one of PPP’s two reserved values, causing PPP to output two bytes instead of one, is 2/256. In a packet of length n, there will be on average n ´ 2/256 occurrences of reserved values and n ´ 254/256 occurrences of other values, giving an expected output length of:

An expected output length 1.0078125 times the input length gives an expected overhead of 0.78125%, about 11 times worse than the theoretical optimum.

C.iii.  Expected Overhead for COBS

For COBS the average overhead is a little harder to calculate than for PPP, since COBS has 255 different behaviors, rather than just the two that PPP has. In addition, each behavior not only writes a different number of output bytes, each behavior also reads a different number of input bytes. Since the number of input bytes read is not always one as it is for PPP, we must also calculate the average number of bytes read per code block, and divide the average output by the average input to determine the overall average overhead.

First we determine the average input per code block by multiplying the number of bytes each code block consumes by the probability of that code block occurring, and summing the results. Likewise we determine the average output per code block by doing the same calculation using the number of bytes each code block generates. The ratio of average output divided by average input gives us the average encoding overhead.

If the first byte the algorithm encounters is a zero, then that single byte is encoded as a code block. The input in this case is one byte and the output is one byte and the probability p(0x01) of this happening is 1/256. If the first byte is not a zero, but the second byte is, then the algorithm reads two bytes and outputs two bytes and the probability p(0x02) of this happening is 255/256 ´ 1/256. The probability p(n) that the encoder reads n–1 non-zero bytes (n ≤ 254) followed by a zero is:

The longest code block, code 0xFF, occurs when the algorithm encounters 254 non-zero bytes without seeing a single zero. The probability p(255) of this happening is (255/256)254, and in this case the algorithm reads 254 bytes and writes out a block of 255 bytes.

The ratio of average output divided by average input is:

The theoretical average overhead for COBS on random data is therefore a little under 0.23%. This is about 31/4 times worse than the theoretical optimum, more than three times better than PPP’s average.

C.iv.  Expected Overhead for COBS/ZPE

As with basic COBS, we calculate the probability of each code block, and the number of bytes of input and output for each code block. The probability p(0x01) that the first byte is a zero and the second is not, resulting in a code 0x01 block, is 1/256 ´ 255/256. The probability p(0xE1) that both the first and the second bytes are zero, resulting in a code 0xE1 block, is 1/256 ´ 1/256. Table 3 enumerates, for each code of the form k+n, the probability of code block k+n occurring, and the number of input and output bytes for that code block.

Base Code k

Offset
n

Probability
p(k+n)

Input
in(k
+n)

Output
out(k
+n)

k = 0x00

1≤n<0x20

n

n

k = 0x00

0x20≤n<0xE0