On Correcting Bit Errors with CRCs

And correcting misconceptions along the way.

Stefan Filipek
6 min readApr 8, 2023

I have found that there is a prevalent misconception out there: Cyclic redundancy checks (CRCs) are only for error detection and cannot be used for error correction. This has been touted by many, many bright engineers I’ve worked with, which even led to some debating. I can’t necessarily blame anyone — it’s not widely taught, and having “check” in the name instead of “correct” (like FEC or ECC) simply reinforces the notion.

I learned in an undergrad digital communications course that CRCs can be used to both detect and correct even multi-bit errors. I’m here to share that knowledge… and preemptively defend myself in some inevitable future discussion.

Note: If you’re currently talking with me about CRC’s and I dropped this link on you: Hello there! I wish you well!

To that end, I’ll be explaining and demonstrating two different methods to correct errors with CRCs.

Background

I won’t repeat the theory of CRCs, the Wikipedia page does a good job and even covers mathematical limitations, but I still need to mention a couple basic concepts here.

CRC is basically long division. We can think if it as:

  • Dividend: The message.
  • Divisor: The CRC polynomial.
  • Remainder: The check value.
  • Quotient: Not used.

Given CRC’s operate in GF(2), addition (and subtraction) is simply a bit-wise XOR operation. This makes life easier for us computer nerds.

The check value is sent along with the message so the receiver can verify that the message has not been corrupted (with some probability). The receiver performs the same long division over the message and compares its calculated check value against the one sent. If they differ, we know that the message has been corrupted in some way.

We’ll refer to the difference (XOR) between the correct and calculated check value as the syndrome. This will be important later.

Below is a simple 8-bit CRC algorithm I wrote in Python for reference:

def crc8(data: bytes, poly: int, rem: int=0):
for byte in data:
# Add the data and remainder from the last division operation
rem ^= byte

# Perform long division over this byte
for _ in range(8):
rem <<= 1

# Should we subtract the divisor (polynomial)?
if rem & 0x100:
rem ^= poly | 0x100

return rem

Note: The most significant bit of the polynomial is always assumed and should not be given as part of the poly argument.

I’ll use the above 8-bit algorithm for demonstration, but everything mentioned can be trivially extended to any size CRC. There’s also more nuance in practice, such as bit ordering, initial values, final XOR values, etc., which I’m glossing over for simplicity.

Method 1: Syndrome Lookup

A Bit More Background

There’s an interesting property with CRCs: If you XOR (⊕) two messages together, the resulting check value is the same as XOR-ing the check values for the individual messages. In other words:

CRC(A B, poly) == CRC(A, poly) CRC(B, poly)

Mathematically, this makes sense as well:

remainder(A + B, divisor) == remainder(A, divisor) + remainder(B, divisor)

Modulo the divisor, of course.

With this in mind, let’s think of bit errors during transmission as an XOR’d message:

received_msg = actual_msg error_bits

And then:

CRC(received_msg, poly) = CRC(actual_msg, poly) CRC(error_bits, poly)

The CRC of the error bits is therefore the syndrome.

The above is simplified for CRCs that result in zero for an all-zero message. If that is not the case for the algorithm in use, then the CRC of an all-zero message of a the same length must also be XOR’d in:

CRC(received_msg, poly) = CRC(actual_msg, poly) CRC(error_bits, poly) CRC(00…, poly)

The Actual Algorithm

Let’s assume we have a fixed message length. We can preemptively generate all possible single bit error messages and their syndrome check values. If we get a corrupted message, we can simply look up the syndrome in the table to find where the error was, and then correct for it.

def crc8_error_table(num_bytes: int, poly: int, rem: int=0):
table = {}
error = bytearray(num_bytes)

for bit in range(num_bytes * 8):
error[bit // 8] = 0x80 >> (bit % 8)
syndrome = crc8(error, poly, rem)

assert syndrome not in table
table[syndrome] = bit

error[bit // 8] = 0

return table

Let’s demonstrate with some bit flips in the 6-byte message “foobar” and see if we can detect the errors. We’ll use 0x31 as the polynomial.

>>> poly = 0x31
>>> message = b'foobar'
>>> bad_1 = b'\xe6oobar' # Bit 0 flipped
>>> bad_2 = b'foobas' # Bit 47 flipped
>>> bad_3 = b'fonbar' # Bit 23 flipped

>>> check = crc.crc8(message, poly)
>>> bad_check_1 = crc.crc8(bad_1, poly)
>>> bad_check_2 = crc.crc8(bad_2, poly)
>>> bad_check_3 = crc.crc8(bad_3, poly)
>>> check, bad_check_1, bad_check_2, bad_check_3
(240, 28, 193, 107)

>>> table = crc.crc8_error_table(len(message), poly)
>>> table[check ^ bad_check_1]
0
>>> table[check ^ bad_check_2]
47
>>> table[check ^ bad_check_3]
23
>>> 0 in table
False
>>>

If the syndrome is not found in the table, then the error isn’t correctable.

This method also works for two error bits. The table generation step just becomes larger to cover all combinations, and the table would then return a tuple of bit positions.

Method 2: Backward Calculation

Given the syndrome, we can actually work backwards to find out where it all went wrong. To my knowledge, this only works for single bit errors.

The short version: Reverse the CRC operation starting with the syndrome. When the remainder turns to zero, we know where the bit flip occurred.

Even More Background

Hold up… the quotient in the division operation is thrown away. How can we possibly have enough information with just the remainder and dividend to do anything backwards?

Instead of trying to explain it mathematically, I’m going to focus on the bit-twiddling properties of the CRC algorithm. There are some key properties that will help here:

  1. CRC polynomials always have the most and least significant bits set. Mathematically, an 8-bit CRC polynomial has the form: X⁸ + … + 1.
  2. For each bit in the message, the check value is left-shifted and the least significant bit is set to zero.
  3. The least significant bit of the remainder can therefore only be set if we just XOR’d in the polynomial. Ah-HA!

As we reverse the CRC operation, we right-shift the remainder along the way (the reverse of left-shifts, obviously). If the least significant bit of the remainder is ever set, we know that the forward operation must have just XOR’d the polynomial to get it that way. We perform that operation ourselves, and continue.

The Backwards Algorithm

So, let’s first look at a generic, reversed CRC algorithm. Instead of examining the most significant bit of the remainder, we examine the least significant bit:

def crc8_rev(data: bytes, poly: int, rem: int=0):
for byte in reversed(data):
for _ in range(8):
if rem & 0x1:
rem ^= poly | 0x100
rem >>= 1
rem ^= byte
return rem

This works perfectly to reverse out the intermediate CRC states over the message. Add some printouts to both the forward and reverse operations, or yield intermediate values, to compare and verify for yourself.

However, when examining the check value syndrome, we are operating over the syndrome’s message and not the actual message (as discussed in the background of the first method).

To find the one and only errant bit flip, we work backwards from the syndrome and until remainder goes away. This is where division over the error message began, and where the errant bit flip occurred.

The following is my single-bit error finder:

def crc8_error_finder(num_bytes: int, poly: int, rem: int=0):
for bit in reversed(range(num_bytes * 8)):
if rem == poly:
return bit
if rem & 0x1:
rem ^= poly | 0x100
rem >>= 1
return None

Let’s test the algorithm over a few different single-bit error scenarios, like before:

>>> poly = 0x31
>>> message = b'foobar'
>>> bad_1 = b'\xe6oobar' # Bit 0 flipped
>>> bad_2 = b'foobas' # Bit 47 flipped
>>> bad_3 = b'fonbar' # Bit 23 flipped

>>> check = crc.crc8(message, poly)
>>> bad_check_1 = crc.crc8(bad_1, poly)
>>> bad_check_2 = crc.crc8(bad_2, poly)
>>> bad_check_3 = crc.crc8(bad_3, poly)
>>> check, bad_check_1, bad_check_2, bad_check_3
(240, 28, 193, 107)

>>> crc.crc8_error_finder(len(message), poly, check ^ bad_check_1)
0
>>> crc.crc8_error_finder(len(message), poly, check ^ bad_check_2)
47
>>> crc.crc8_error_finder(len(message), poly, check ^ bad_check_3)
23
>>> crc.crc8_error_finder(len(message), poly, 0)
>>>

Closing Remarks

I hope this proves that you can, in fact, correct bit errors with CRCs. Not only that, but there are multiple methods for doing so!

I have source code with unit tests here: https://gitlab.com/srfilipek/crc

There are always caveats to all error detection and correction, including CRCs. Not all messages are correctable, though. Error detection and correction can become ambiguous and produce incorrect results when there are multiple bit errors. If you’re interested in the limits of error detection and correction, the Error Detection Strength section here is a good place to start.

--

--

Stefan Filipek
Stefan Filipek

No responses yet