Learn you Galois Fields for Great Good (08)

Cyclic Redundancy Check (CRC)

This is part 8 of a series on Abstract Algebra. In this part, we'll explain and implement Cyclic Redundancy Check (CRC) codes.

This is a very exciting post in this series as it represents a phase change. We're now shifting our focus away from the foundational material and towards applications. We'll start with a fairly straight-forward application: CRC codes. If you've followed and understood the series so far, you'll be in excellent shape to deeply understand these.

Let's get started!

Motivation for Coding Theory

It's pretty easy to take modern telecommunication systems for granted. They are pretty reliable these days. Failure is more of the exception than the rule. But this wasn't always the case. In fact, a lot of work has gone into making them reliable.

The core problem is that physics conspires against us by relentlessly corrupting signals. Much work has gone into preventing noise and corruption in communication systems, but it's not possible to build a completely noiseless system. At best, we can hope to drive the probability of "noise" low, accepting that we still have to use a noisy/corrupting/imperfect medium for all communications.

So, if signal corruptions can still happen, how can we make these systems reliable?

The answer comes from Information Theory, initiated by Claude Shannon in the late 1940s (wiki). For our purposes, the theory implies that if we use an encoding with enough redundancy, we can overcome any corruption or noise issues. This leads us to an area of study called Coding Theory.

This is where we will find rich applications of abstract algebra to encode message data in ways that will be resilient to the capricious whims of physics!

Codes for Detecting Corruption

Suppose we wish to send the binary message:

0100010101011010

We believe our communications medium may corrupt any bit, flipping it.

Parity bit

One very simple scheme is to add a parity bit. Here we sum all bits in our message and add an additional bit that makes the message have an even number of bits set to 1.

Using our above example we have 7 bits set, so we add a parity bit set to 1 such that 8 bits are set in total:

01000101010110101

Now, if any bit is flipped during transmission, we'll count an odd number of bits and know that some corruption happened. This does not help us correct the error, so we'd need to request re-transmission of the message.

But, if more than one bit is corrupted, then we can be fooled into thinking the message is valid! Not good! How can we improve the "parity" to make it more resilient to corruptions?

Checksum bits

An intuitive answer is to just expand the parity to many bits. This is called a checksum. The structure of a message with a checksum looks like this:

<n-bits of message data> <p-bits of checksum>

As long as n is much bigger than p, the overhead of the checksum is insignificant overhead.

But, how do we construct a good checksum? We want a "bit-mixing" property where any one bit flip should alter many checksum bits. The checksum should ideally behave like a random number assigned to a message. Even if a message changes slightly, it is still a different message with a different assigned number.

How can we do this?

Using Abstract Algebra to compute a Checksum

A clever approach is to treat the entire n-bit message as an element in GF(2^n) and then reduce it to an element in GF(2^p) using some polynomial Q (like in the previous section). This idea leads immediately to CRC codes!

Let's use an example to drive home the idea:

n = 8
p = 3
message = 01001011

Let's consider the message bits as an element of GF(2^n). In this context, our message is the polynomial:

x^6 + x^3 + x + 1

Maybe it's a little weird to think of message data as a polynomial, but by this point in the series we should be excited by the power representation gives us!

Now, we need a reducing polynomial for GF(2^3). How about the one we used in the last article:

Q = x^3 + x + 1 (decimal: 11, binary: 1011)

Exercise: Reduce the message polynomial by Q to get an element in GF(2^3). Do this before continuing.

We'll show our work below. The easiest way to do this is by "eliminating" bits from left to right with XOR. As discussed last time, in GF(2^k) this is equivalent to subtracting some "multiple" of Q.

     01001011
   ^  1011
   ------------
     00010011
   ^    1011
   ------------
     00000101

So, our resulting polynomial in GF(2^3) is x^2 + 1 (decimal: 5, binary: 101)

Using this checksum, we'd transmit the following sequence of bits:

01001011101

And the receiver can then compute the checksum with the same algorithm and compare it to the value it received.

Ubiquity of CRC codes

These CRC codes can be implemented very cheaply in hardware using a shift-register of p-bits and some xor gates. This helped them gain popularity. A CRC code depends on the checksum size (parameter: p) and the polynomial used for reducing (parameter: Q).

Below is a few CRC code configurations:

Name	Uses	Size (`p`)	Polynomial (`Q`)
CRC-3-GSM	Mobile Networks	`3`	`x^3 + x + 1`
CRC-8-GSM-B	Mobile Networks	`8`	`x^8 + x^6 + x^3 + 1`
CRC-16-CCITT	Bluetooth and others	`16`	`x^16 + x^12 + x^5 + 1`
CRC-32	Ethernet and others	`32`	`x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1`

Wikipedia maintains a larger list, here.

Implementing CRC-32 in Software

Source Code: `src/crc32.rs`

Implementing CRC-32 Checksum

CRC-32 is a checksum that reduces n-bits into 32-bits. Viewed as an operation in an abstract algebra, we are reducing a polynomial in GF(2^n) to a polynomial in GF(2^32).

The polynomial used for this reduction is:

x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1

In hexadecimal this polynomial is 0x104c11db7, but its typical to drop the leading term (x^32) so the polynomial number fits in a 32-bit unsigned integer. After applying that adjustment, we get:

const Q: u32 = 0x04c11db7;

Our approach to implementing CRC-32 will be similar to a hardware implementation. This is not the fastest approach in software, but it is the simplest and easiest to understand. It also helps us understand why CRC codes are so nice to implement in hardware.

To start, we'll define a type for the current checksum state, which should be exactly 32-bits

#[derive(Debug, Clone, Copy, PartialEq)]
struct CRC(u32);

impl CRC {
    fn new() -> CRC {
        CRC(0)
    }
}

Decomposing into shifts

Consider the polynomial:

x^4 + x^3 + x + 1

We can decompose this polynomial as follows:

(x^3 + x^2 + 1)*x + 1

As we have discussed in the previous section, multiplying by x in GF(2^k) is the same as shifting bits to the left by one. Rewriting, we have:

(x^3 + x^2 + 1)<<1 + 1

Decomposing repeatedly:

((x^2 + x)<<1 + 1)<<1 + 1
(((x + 1)<<1 + 0)<<1 + 1)<<1 + 1
((((1)<<1 + 1)<<1 + 0)<<1 + 1)<<1 + 1

Thinking about this as raw bit operations, notice that we can build up any polynomial by left-shifting and adding (xor-ing) new bits one-at-a-time. In this particular case, we perform:

xor 1
left-shift
xor 1
left-shift
xor 0
left-shift
xor 1

Exercise: Decompose x^5 + x^2 + x into shifts and bit insertions.

Reducing the left-shift (`a << 1`)

In the implementation of GF(2^k) we found that we needed to reduce a << 1 from a k+1 bit number to a k bit number. For CRC, we need to do the same thing. Whenever, the kth bit is set, we need to subtract (XOR) away Q.

Let's modify the sequence above to include these reduce operations:

xor 1
left-shift
if most-significant-bit set, xor Q
xor 1
left-shift
if most-significant-bit set, xor Q
xor 0
left-shift
if most-significant-bit set, xor Q
xor 1

Exercise: Decompose x^5 + x^2 + x into shifts and bit insertions and reductions in GF(2^3)

As you can see, a CRC code can be implemented very simply by just shifting the bits and applying xors. By reducing every time the MSB is set, we ensure that we always have a number in the checksum field GF(2^p). When the input stream runs out of bits, we have our checksum!

Let's implement it:

impl CRC {
    fn append_bit(&mut self, bit: u8) {
        // Extract the most-significant-bit (MSB), we will need to check it after shifting
        let msb = (self.0 >> 31) & 1;

        // Shift the bits (notice: this overwrites the old MSB)
        let mut shifted = self.0 << 1;

        // Check if we need to reduce. If the MSB was set, then we have a 33-bit number
        // and we need to reduce it back to 32-bits using Q (notice: Q doesn't include
        // the MSB. Since the MSB was shifted off, it has been implicitly cleared already)
        if msb == 1 { shifted ^= Q; }

        // Add the next bit and save the result
        self.0 = shifted ^ (bit as u32);
    }
}

And that's the core operation of CRC-32, the rest is just supporting details!

Exercise: Really make sure you follow this algorithm. It can be helpful to draw diagrams of bits shifting, new bits entering, and Q getting XOR'd. You can use the following parameters to construct a simpler example: p = 3, Q = x^3 + x + 1 (decimal: 11, binary: 1011)

Next up, we need a method to append an entire byte, rather than one bit. This is pretty simple. We will just iterate each bit from MSB to LSB, inserting each bit:

impl CRC {
    fn append_byte(&mut self, byte: u8) {
        // For each byte, just append each bit from MSB to LSB
        for i in (0..8).rev() {
            let bit = (byte >> i)&1;
            self.append_bit(bit);
        }
    }
}

A routine for the `cksum` tool

We'll now use the above routines to construct the same checksum as the Unix cksum tool. This tool is specified by the POSIX Standard, IEEE Std 1003.1-2017 (here).

For the most part, we'll simply be calling our append_byte() routine, but there are a few additional quirks that we'll explain as we go:

pub fn cksum(dat: &[u8]) -> u32 {
    // Initialize our CRC state
    let mut crc = CRC::new();

    // Insert all of the data bytes in order
    for b in dat {
        crc.append_byte(*b);
    }

    // Quirk #1: The `cksum` tool appends the data length also. There is no fixed size
    // for the length field. Instead, we append data bytes from LSB to MSB. We stop
    // appending bytes when the MSB becomes zero.
    let mut len: u64 = dat.len() as u64;
    loop {
        // Append the lowest byte
        crc.append_byte(len as u8);

        // Shift the byte off
        len >>= 8;

        // When the remaining bytes are zero, we're done!
        if len == 0 {
            break;
        }
    }

    // Quirk #2: The `cksum` tool always appends an empty 32-bits after everything.
    // The main reason is to force each bit to cause a reduction (if required).
    // Each reduction causes bit-mixing/scrambling throughout the checksum.
    for _ in 0..32 {
        crc.append_bit(0);
    }

    // Quirk #3: The `cksum` tool inverts all the bits in the final result. The author
    // doesn't have an immediate justification for this operation. (Notice: In Rust
    // code, '!crc' is bitwise inversion. In C code and many related languages, this
    // would be '~crc' instead)
    !crc.0
}

Testing Time

Here are some tests that verify our implementation matches the values computed by the cksum tool. We also have a test here demonstrating a desired property of a good checksum. Notably, if one bit changes, we want many checksum bits to change.

#[cfg(test)]
mod test {
    use super::*;

    // Check some well-known strings for the right CRC-32 value
    #[test]
    fn test_strings() {
        assert_eq!(cksum(b""), 0xffffffff);
        assert_eq!(cksum(b"a"), 0x48c279fe);
        assert_eq!(cksum(b"123456789"), 0x377a6011);
        assert_eq!(cksum(b"finite fields are super fun when you really understand them!"), 0xa695ef0f);
    }

    // Test that a single flipped bit causes the checksum to flip many bits!
    #[test]
    fn test_one_flipped_bit() {
        let a: u64 = 0x42d5151330d94a84;
        let b = a ^ (1u64 << 30);  // flip the 30th bit

        // One bit flip causes many bits to flip in the checksum, very good!
        assert_eq!(cksum(&a.to_le_bytes()), 0xd146283d);
        assert_eq!(cksum(&b.to_le_bytes()), 0x01c33b8f);
    }
}

Exercise: Why does cksum(b"") result in 0xffffffff when no set bits were appended ?

Exercise: Why does cksum(b"a") result in many set bits even though "a" is only one byte?

Exercise: Can you explain why altering a single input bit can significantly change the checksum?

Building and Testing

The above code can be built and tested the normal rust ways:

cargo test
cargo build

Command-line `cksum` tool

Finally, we'll implement a command-line cksum tool

Source Code: `src/bin/crc32_cksum.rs`

use std::io::{self, Read};
use learn_you_galois_fields_for_great_good::crc32;

fn main() -> io::Result<()> {
    let mut buffer = String::new();
    io::stdin().read_to_string(&mut buffer)?;

    let cksum = crc32::cksum(buffer.as_bytes());
    println!("{} {}", cksum, buffer.len());

    Ok(())
}

Comparing to the standard `cksum` tool

Let's compare our tool's output to a standard Unix cksum tool:

$ echo -n 'a' | ./target/debug/crc32_cksum
1220704766 1

$ echo -n 'a' | cksum
1220704766 1

$ echo -n 'I Love Abstract Algebra' | ./target/debug/crc32_cksum
1470057247 23

$ echo -n 'I Love Abstract Algebra' | cksum
1470057247 23

Discussion

It’s actually a Ring

Throughout this article, we’ve referred to the checksum as belonging to the GF(2^p) field. But, we have never needed a multiplicative inverse in constructing a CRC. In Abstract Algebra this is known as a Ring.

However, we did use a reducing polynomial. So, what gives? The answer is that this polynomial does not have to be irreducible! We can construct a perfectly acceptable CRC with a polynomial that can be factored. In fact the CRC-40-GSM code (see here) uses x^40 + x^26 + x^23 + x^17 + x^3 + 1 = (x^23 + 1)(x^17 + x^3 + 1) for its polynomial. In practice, however, the polynomial is often irreducible as they tend to yield better error-detecting properties.

Quirks in practice

The algebraic structure we discussed here applies to all CRC codes, but many implementations have additional weird quicks in practice. Here is a short but incomplete list of some quirks you might experience in common implementations:

Initialization values: We used 0 as an initialization value. This is most mathematically justified. But, you'll see other values, such as 0xffffffff. A motivation is that this causes more reduction operations and more bit-scrambling.
Push-through bits: In our cksum implementation with appending 32 zero bits at the very end. This isn't required by the mathematical structure, but it's important to make sure that all bits end up getting scrambled by the reducing polynomial.
Bit reversal: Different algorithms insert bits in reversed order. This is likely most of a quirk of the original application (e.g. how bits were serialized in some hardware device).
Output XOR: Some codes will XOR the final value with some hardcoded constant. In the cksum implementation, the final inversion could be viewed as an XOR with 0xffffffff. In practice, these can be arbitrary values. Generally, these output XORs don't enhance the code's error detecting properties.

This is just a sampling of some common quirks. Many more exist. When implementing a CRC code, you should be careful to pay attention to subtle details and quirks.

Optimizing

Much can be done to optimize CRC codes in software. Our version here is just a simple reference implementation. We've focused on understandability over everything else. It's out of scope for us to dive deep into optimizations. But, we'll mention at least a few common ones here:

Lookup-tables: Computing these codes bit-by-bit is very slow in both software and hardware. Often we can instead precompute a lookup-table for the most-significant-byte. This allows us to shift out 8-bits at a time using the lookup-table. Many variants of this idea exist for even more speedup.
Special CPU Instructions: Many computer architectures include native instructions for computing common CRC codes. Its worth noting that these instructions alone may not yield the fastest implementation and performance can vary between CPU editions
Vectorization and CPU Architecture: Fast implementations will want to consider heavy use of vectorization. Operating on a large vector of bits in a single instruction is going to perform much better than bit-by-bit. But, these implementations can get complicated fast.

Generally, implementing a very efficient CRC code requires a lot of work. The bit-by-bit model is quite simple computationally, but it takes a lot of work to map it onto a modern out-of-order superscalar vectorized machine with a complicated memory hierarchy.

More Resources

A lot of ink has been spilled discussing CRC codes and I'm far from an authority on the subject. If you'd like to dive in further, here are some good starting points:

A Painless Guide to CRC Error Detection Algorithms by Ross N. Williams (original) (xorvoid.com)
A Systematic Approach to Building High Performance, Software-based, CRC Generators by Michael E. Kounavis and Frank L. Berry (archive.org)
Reverse-Engineering a CRC Algorithm by Gregory Ewing (archive.org)

Conclusion

And that brings us to the end of our first application in this series. Its pretty exciting to see all that hard work starting to pay off!

Hopefully this article helps demonstrate how straightforward CRC codes can be from both a theory and implementation perspective. Without the background in abstract algebra, CRC codes can seem unmotivated and mysterious. But, with the background, they should feel almost trivial. To me it feels almost like cheating: just treat data as a gigantic polynomial and long-divide it to a smaller polynomial. That's it?!

So, is this the end? Definitely not, this is the series that doesn't end...

Follow this line to the next article

xorvoid