Or how to detect and correct errors
Last time we made a quick tour through the main theorems of Claude Shannon, which essentially solved the following two problems about communicating over a digital channel.
- What is the best encoding for information when you are guaranteed that your communication channel is error free?
- Are there any encoding schemes that can recover from random noise introduced during transmission?
The answers to these questions were purely mathematical theorems, of course. But the interesting shortcoming of Shannon’s accomplishment was that his solution for the noisy coding problem (2) was nonconstructive. The question remains: can we actually come up with efficiently computable encoding schemes? The answer is yes! Marcel Golay was the first to discover such a code in 1949 (just a year after Shannon’s landmark paper), and Golay’s construction was published on a single page! We’re not going to define Golay’s code in this post, but we will mention its interesting status in coding theory later. The next year Richard Hamming discovered another simpler and larger family of codes, and went on to do some of the major founding work in coding theory. For his efforts he won a Turing Award and played a major part in bringing about the modern digital age. So we’ll start with Hamming’s codes.
We will assume some basic linear algebra knowledge, as detailed our first linear algebra primer. We will also use some basic facts about polynomials and finite fields, though the lazy reader can just imagine everything as binary

hamming-3
What is a code?
The formal definition of a code is simple: a code
This is deceptively simple, but here’s the intuition. Say we know we want to send messages of length
Moreover, while in this post we’ll always work with
So we have these parameters
So coding theorists turn this mess of parameters into notation.
Definition: A code
for some alphabet , , has minimum distance , and- the alphabet
has size .
The basic goals of coding theory are:
- For which values of these four parameters do codes exist?
- Fixing any three parameters, how can we optimize the other one?
In this post we’ll see how simple linear-algebraic constructions can give optima for one of these problems, optimizing
Linear codes and the Hamming code
A code is called linear if it can be identified with a linear subspace of some finite-dimensional vector space. In this post all of our vector spaces will be
- The zero vector is a codeword.
- The sum of any two codewords is a codeword.
- Any scalar multiple of a codeword is a codeword.
Linear codes are the simplest kinds of codes, but already they give a rich variety of things to study. The benefit of linear codes is that you can describe them in a lot of different and useful ways besides just describing the encoding function. We’ll use two that we define here. The idea is simple: you can describe everything about a linear subspace by giving a basis for the space.
Definition: A generator matrix of a
There are a lot of equivalent generator matrices for a linear code (we’ll come back to this later), but the main benefit is that having a generator matrix allows one to encode messages
For the second description of
Definition: Let
Note
The benefit of having a parity check matrix is that you can do efficient error detection: just compute
So that raises an obvious question: if you give me the generator matrix of a linear code can I compute its minimum distance? It turns out that this problem is NP-hard in general. In fact, you can show that this is equivalent to finding the smallest linearly dependent set of rows of the parity check matrix, and it is easier to see why such a problem might be hard. But if you construct your codes cleverly enough you can compute their distance properties with ease.
Before we do that, one more definition and a simple proposition about linear codes. The Hamming weight of a vector
Proposition: The minimum distance of a linear code
Proof. Consider a nonzero
So now we can define our first code, the Hamming code. It will be a
We’ll construct the Hamming code by describing a parity-check matrix
Likewise for
- No row of
may be zero. - All rows of
must be distinct.
That is, any parity check matrix with those two properties defines a distance 3 linear code. The only question that remains is how large can
Theorem: For every
Since the Hamming code has distance 3, we can always detect if at most a single error occurs. Moreover, we can correct a single error using the Hamming code. If
Before we move on, we should note one interesting feature of linear codes.
Definition: A code is called systematic if it can be realized by an encoding function that appends some number
The interesting feature is that all linear codes are systematic. The reason is as follows. The generator matrix
If you work out the parameters of the Hamming code, you’ll see that it is a systematic code which adds
The Hamming bound and perfect codes
One nice fact about Hamming codes is that they optimize a natural problem: the problem of maximizing
There is a theorem called the Hamming bound, which describes a limit to how much you can pack disjoint balls of radius
Theorem: If an
Proof. The proof is quite simple. To say a code
Now a code is called perfect if it actually meets the Hamming bound exactly. As you probably guessed, the Hamming codes are perfect codes. It’s not hard to prove this, and I’m leaving it as an exercise to the reader.
The obvious follow-up question is whether there are any other perfect codes. The answer is yes, some of which are nonlinear. But some of them are “trivial.” For example, when
Theorem [van Lint ’71, Tietavainen ‘73]: Let
- A
-code - A
-code
The last two examples are known as the binary and ternary Golay codes, respectively, which are also linear. In other words, every possible set of parameters for a perfect code can be realized as one of these three linear codes.
So this theorem was a big deal in coding theory. The Hamming and Golay codes were both discovered within a year of each other, in 1949 and 1950, but the nonexistence of other perfect linear codes was open for twenty more years. This wrapped up a very neat package.
Next time we’ll discuss the Singleton bound, which optimizes for a different quantity and is incomparable with perfect codes. We’ll define the Reed-Solomon and show they optimize this bound as well. These codes are particularly famous for being the error correcting codes used in DVDs. We’ll then discuss the algorithmic issues surrounding decoding, and more recent connections to complexity theory.
Until then!
Posts in this series:
- A Proofless Introduction to Coding Theory
- Hamming’s Code
- The Codes of Solomon, Reed, and Muller
- The Welch-Berlekamp Algorithm for Correcting Errors in Data
Want to respond? Send me an email, post a webmention, or find me elsewhere on the internet.