Error detecting and correcting codes

(Notes for EECS 373, Winter 2005)

Data can be corrupted in transmission or storage by a variety of undesirable phenomenon, such as radio interference, electrical noise, power surges, bad spots on disks or tapes, or scratches or dirt on CD or DVD media. It is useful to have a way to to detect (and sometimes correct) such data corruption.

Errors come in several forms. The most common situation is that a bit in a stream of data gets flipped (a 0 becomes a 1 or a 1 becomes a 0). It is also possible for a bit to get deleted, or for an extra bit to be inserted. In some situations, burst errors occur, where several successive bits are affected.

Parity bit

We can detect single errors with a parity bit. The parity bit is computed as the exclusive-OR (even parity) or exclusive-NOR (odd parity) of all of the other bits in the word. Thus, the resulting word with a parity bit will always have an even (for even parity) or odd (for odd parity) number of 1 bits in it. If a single bit is flipped in transmission or storage, the received data will have the wrong parity, so we will know something bad has happened.

Note that we can't tell which bit was corrupted (or if it was just the parity bit that was corrupted). Double errors go undetected, triple errors get detected, quadruple errors don't, etc. Random garbage has a 50% probability of being accepted as valid.

Overhead is small; if we put a parity bit on each byte, add 1 bit for each 8, so data transmitted or stored grows by 12.5%. Larger words reduce the overhead: 16 bit words: 6.25%, 32 bit words: 3.125%, 64 bit words: 1.5625%.

Original data plus correction bits form a codeword. The codeword, generally larger than the original data, is used as the representation for that data for transmission or storage purposes. An ordered pair notation is often used, (c,d) represents a codeword of c bits encoding a data word of d bits.

Error correcting codes

What if just detecting errors isn't enough? What if we want to find and fix the bad data.

Brute force repetition

Can repeat each bit three times: 00011011 becomes 000 000 000 111 111 000 111 111 Any single bit error can be corrected; just take a majority vote on each group of three. Double errors within a group will still corrupt the data. Overhead is large; 8 bits became 24; 200% increase in data size.

Can extend to correct even more errors; repeat each bit 5 times to correct up to 2 errors per group, but even more overhead.

More efficient approaches to single error correction

Just repeating the bits is fairly inefficient. We could do better if we could have a compact way to figure out which bit got flipped (if any). As the number of bits in a word gets large, things are going to get very complicated very fast. We need some systematic way to handle things.

Hamming distance

A key issue in designing any error correcting code is making sure that any two valid codewords are sufficiently dissimilar so that corruption of a single bit (or possibly a small number of bits) does not turn one valid code word into another. To measure the distance between two codewords, we just count the number of bits that differ between them. If we are doing this in hardware or software, we can just XOR the two codewords and count the number of 1 bits in the result. This count is called the Hamming distance (Hamming, 1950).

The key significance of the hamming distance is that if two codewords have a Hamming distance of d between them, then it would take d single bit errors to turn one of them into the other.

For a set of multiple codewords, the Hamming distance of the set is the minimum distance between any pair of its members.

Minimum Hamming distance for error detection

To design a code that can detect d single bit errors, the minimum Hamming distance for the set of codewords must be d + 1 (or more). That way, no set of d errors in a single bit could turn one valid codeword into some other valid codeword.

Minimum Hamming distance for error correction

To design a code that can correct d single bit errors, a minimum distance of 2d + 1 is required. That puts the valid codewords so far apart that even after bit errors in d of the bits, it is still less than half the distance to another valid codeword, so the receiver will be able to determine what the correct starting codeword was.

Pracical Hamming implementation

Here are a few useful references (all PDF format):