The Welch-Berlekamp Algorithm for Correcting Errors in Data

In this post we’ll implement Reed-Solomon error-correcting codes and use them to play with codes. In our last post we defined Reed-Solomon codes rigorously, but in this post we’ll focus on intuition and code. As usual the code and data used in this post is available on this blog’s Github page.

The main intuition behind Reed-Solomon codes (and basically all the historically major codes) is

Error correction is about adding redundancy, and polynomials are a really efficient way to do that.

Here’s an example of what we’ll do in the post. Say you have a space probe flying past Mars taking photographs like this one

mars
Courtesy of NASA’s Viking Orbiter.

Unfortunately you know that if you send the images back to Earth via radio waves, the signal will get corrupted by cosmic something-or-other and you’ll end up with an image like this.

corrupted-mars

How can you recover from errors like this? You could do something like repeat each pixel twice in the message so that if one is corrupted the other will get through. But still, every now and then both pixels in a row will be corrupted and it’s twice as inefficient.

The idea of error-correcting codes is to find a way to encode a message so that it adds a lot of redundancy without adding too much extra information to the message. The name of the game is to optimize the tradeoff between how much redundancy you get and how much longer the message needs to be, while still being able to efficiently decode the encoded message.

A solid technique turns out to be: use polynomials. Even though you’d think polynomials are too simple (we teach them starting in the 7th grade these days!) they turn out to have remarkable properties. The most important of which is:

if you give me a bunch of points in the plane with different $ x$ coordinates, they uniquely define a polynomial of a certain degree.

This fact is called polynomial interpolation. We used it in a previous post to share secrets, if you’re interested.

What makes polynomials great for error correction is that you can take a fixed polynomial (think, the message) and “encode” it as a list of points on that polynomial. If you include enough, then you can get back the original polynomial from the points alone. And the best part, for each two additional points you include above the minimum, you get resilience to one additional error no matter where it happens in the message. Another way to say this is, even if some of the points in your encoded message are wrong (the numbers are modified by an adversary or random noise), as long as there aren’t too many errors there is an algorithm that can recover the errors.

That’s what makes polynomials so much better than the naive idea of repeating every pixel twice: once you allow for three errors you run the risk of losing a pixel, but you had to double your communication costs. With a polynomial-based approach you’d only need to store around six extra pixels worth of data to get resilience to three errors that can happen anywhere. What a bargain!

Here’s the official theorem about Reed-Solomon codes:

Theorem: There is an efficient algorithm which, when given points $ (a_1, b_1), \dots, (a_n, b_n)$ with distinct $ a_i$ has the following property. If there is a polynomial of degree $ d$ that passes through at least $ n/2 + d/2$ of the given points, then the algorithm will output the polynomial.

So let’s implement the encoder, decoder, and turn the theorem into code!

Implementing the encoder

The way you write a message of length $ k$ as a polynomial is easy. Pick a large prime integer $ p$ and from now on we’ll do all our arithmetic modulo $ p$. Then encode each character $ c_0, c_1, \dots, c_{k-1}$ in the message as an integer between 0 and $ p-1$ (this is why $ p$ needs to be large enough), and the polynomial representing the message is

$ m(x) = c_0 + c_1x + c_2x^2 + \dots + c_{k-1}x^{k-1}$

If the message has length $ k$ then the polynomial will have degree $ k-1$.

Now to encode the message we just pick a bunch of $ x$ values and plug them into the polynomial, and record the (input, output) pairs as the encoded message. If we want to make things simple we can just require that you always pick the $ x$ values $ 0, 1, \dots, n$ for some choice of $ n \leq p$.

A quick skippable side-note: we need $ p$ to be prime so that our arithmetic happens in a field. Otherwise, we won’t necessarily get unique decoded messages.

Back when we discussed elliptic curve cryptography (ironically sharing an acronym with error correcting codes), we actually wrote a little library that lets us seamlessly represent polynomials with “modular arithmetic coefficients” in Python, which in math jargon is a “finite field.” Rather than reinvent the wheel we’ll just use that code as a black box (full source in the Github repo). Here are some examples of using it.

>>> from finitefield.finitefield import FiniteField
>>> F13 = FiniteField(p=13)
>>> a = F13(7)
>>> a+9
3 (mod 13)
>>> a*a
10 (mod 13)
>>> 1/a
2 (mod 13)

A programming aside: once you construct an instance of your finite field, all arithmetic operations involving instances of that type will automatically lift integers to the appropriate type. Now to make some polynomials:

>>> from finitefield.polynomial import polynomialsOver
>>> F = FiniteField(p=13)
>>> P = polynomialsOver(F)
>>> g = P([1,3,5])
>>> g
1 + 3 t^1 + 5 t^2
>>> g*g
1 + 6 t^1 + 6 t^2 + 4 t^3 + 12 t^4
>>> g(100)
4 (mod 13)

Now to fix an encoding/decoding scheme we’ll call $ k$ the size of the unencoded message, $ n$ the size of the encoded message, and $ p$ the modulus, and we’ll fix these programmatically when the encoder and decoder are defined so we don’t have to keep carrying these data around.

def makeEncoderDecoder(n, k, p):
   Fp = FiniteField(p)
   Poly = polynomialsOver(Fp)

   def encode(message):
      ...

   def decode(encodedMessage):
      ...

   return encode, decode

Encode is the easier of the two.

def encode(message):
   thePoly = Poly(message)
   return [(Fp(i), thePoly(Fp(i))) for i in range(n)]

Technically we could remove the leading Fp(i) from each tuple, since the decoder algorithm can assume we’re using the first $ n$ integers in order. But we’ll leave it in and define the decode function more generically.

After we define how the decoder should work in theory we’ll run through a simple example step by step. Now on to the decoder.

The decoding algorithm, Berlekamp-Welch

There are a lot of different decoding algorithms for various error correcting codes. The one we’ll implement is called the Berlekamp-Welch algorithm, but before we get to it we should mention a much simpler algorithm that will work when there are only a few errors.

To remind us of notation, call $ k$ the length of the message, so that $ k-1$ is the degree of the polynomial we used to encode it. And $ n$ is the number of points we used in the encoding. Call the encoded message $ M$ as it’s received (as a list of points, possibly with errors).

In the simple method what you do is just randomly pick $ k$ points from $ M$, do polynomial interpolation on the chosen points to get some polynomial $ g$, and see if $ g$ agrees with most of the points in $ M$. If there really are few errors, then there’s a good chance the randomly chosen points won’t have any errors in them and you’ll win. If you get unlucky and pick some points with errors, then the $ g$ you get won’t agree with most of $ M$ and you can throw it out and try again. If you get really unlucky and a bad $ g$ does agree with most of $ M$, then you just run this procedure a few hundred times and take the $ g$ you get most often. But again, this only works with a small number of errors and while it could be good enough for many applications, don’t bet your first-born child’s life on it working. Or even your favorite pencil, for that matter. We’re going to implement Berlekamp-Welch so you can win someone else’s favorite pencil. You’re welcome.

Exercise: Implement the simple decoding algorithm and test it on some data.

Suppose we are guaranteed that there are exactly $ e < \frac{n-k+1}{2}$ errors in our received message $ M = (a_1, b_1, \dots, a_n, b_n)$. Call the polynomial that represents the original message $ P$. In other words, we have that $ P(a_i) = b_i$ for all but $ e$ of the points in $ M$.

There are two key ingredients in the algorithm. The first is called the error locator polynomial. We’ll call this polynomial $ E(x)$, and it’s just defined by being zero wherever the errors occurred. In symbols, $ E(a_i) = 0$ whenever $ P(a_i) \neq b_i$. If we knew where the errors occurred, we could write out $ E(x)$ explicitly as a product of terms like $ (x-a_i)$. And if we knew $ E$ we’d also be done, because it would tell us where the errors were and we could do interpolation on all the non-error points in $ M$.

So we’re going to have to study $ E$ indirectly and use it to get $ P$. One nice property of $ E(x)$ is the following

$ \displaystyle b_i E(a_i) = P(a_i)E(a_i),$

which is true for every pair $ (a_i, b_i) \in M$. Indeed, by definition when $ P(a_i) \neq b_i$ then $ E(a_i) = 0$ so both sides are zero. Now we can use a technique called linearization. It goes like this. The product $ P(x) E(x)$, i.e. the right-hand-side of the above equation, is a polynomial, say $ Q(x)$, of larger degree ($ e + k – 1$). We get the equation for all $ i$:

$ \displaystyle b_i E(a_i) = Q(a_i)$

Now $ E$, $ Q$, and $ P$ are all unknown, but it turns out that we can actually find $ E$ and $ Q$ efficiently. Or rather, we can’t guarantee we’ll find $ E$ and $ Q$ exactly, instead we’ll find two polynomials that have the same quotient as $ Q(x)/E(x) = P(x)$. Here’s how that works.

Say we wrote out $ E(x)$ as a generic polynomial of degree $ e$ and $ Q(x)$ as a generic polynomial of degree $ e+k-1$. So their coefficients are unspecified variables. Now we can plug in all the points $ a_i, b_i$ to the equations $ b_i E(a_i) = Q(a_i)$, and this will form a linear system of $ 2e + k-1$ unknowns ($ e$ unknowns come from $ E(x)$ and $ e+k-1$ come from $ Q(x)$).

Now we know that this system has good solution, because if we take the true error locator polynomial and $ Q = E(x)P(x)$ with the true $ P(x)$ we win. The worry is that we’ll solve this system and get two different polynomials $ Q’, E’$ whose quotient will be something crazy and unrelated to $ P$. But as it turns out this will never happen, and any solution will give the quotient $ P$. Here’s a proof you can skip if you hate proofs.

Proof. Say you have two pairs of solutions to the system, $ (Q_1, E_1)$ and $ (Q_2, E_2)$, and you want to show that $ Q_1/E_1 = Q_2/E_2$. Well, they might not be divisible, but we can multiply the previous equation through to get $ Q_1E_2 = Q_2E_1$. Now we show two polynomials are equal in the same way as always: subtract and show there are too many roots. Define $ R(x) = Q_1E_2 – Q_2E_1$. The claim is that $ R(x)$ has $ n$ roots, one for every point $ (a_i, b_i)$. Indeed,

$ \displaystyle R(a_i) = (b_i E_1(a_i))E_2(a_i) – (b_iE_2(a_i)) E_1(a_i) = 0$

But the degree of $ R(x)$ is $ 2e + k – 1$ which is less than $ n$ by the assumption that $ e < \frac{n-k+1}{2}$. So $ R(x)$ has too many roots and must be the zero polynomial, and the two quotients are equal.

$ \square$

So the core python routine is just two steps: solve the linear equation, and then divide two polynomials. However, it turns out that no python module has any decent support for solving linear systems of equations over finite fields.  Luckily, I wrote a linear solver way back when and so we’ll adapt it to our purposes. I’ll leave out the gory details of the solver itself, but you can see them in the source for this post. Here is the code that sets up the system

   def solveSystem(encodedMessage):
      for e in range(maxE, 0, -1):
         ENumVars = e+1
         QNumVars = e+k
         def row(i, a, b):
            return ([b * a**j for j in range(ENumVars)] +
                    [-1 * a**j for j in range(QNumVars)] +
                    [0]) # the "extended" part of the linear system

         system = ([row(i, a, b) for (i, (a,b)) in enumerate(encodedMessage)] +
                   [[0] * (ENumVars-1) + [1] + [0] * (QNumVars) + [1]])
                     # ensure coefficient of x^e in E(x) is 1

         solution = someSolution(system, freeVariableValue=1)
         E = Poly([solution[j] for j in range(e + 1)])
         Q = Poly([solution[j] for j in range(e + 1, len(solution))])

         P, remainder = Q.__divmod__(E)
         if remainder == 0:
            return Q, E

      raise Exception("found no divisors!")

   def decode(encodedMessage):
      Q,E = solveSystem(encodedMessage)

      P, remainder = Q.__divmod__(E)
      if remainder != 0:
         raise Exception("Q is not divisibly by E!")

      return P.coefficients

A simple example

Now let’s go through an extended example with small numbers. Let’s work modulo 7 and say that our message is

2, 3, 2 (mod 7)

In particular, $ k=3$ is the length of the message. We’ll encode it as a polynomial in the way we described:

$ \displaystyle m(x) = 2 + 3x + 2x^2 (\mod 7)$

If we pick $ n = 5$, then we will encode the message as a sequence of five points on $ m(x)$, namely $ m(0)$ through $ m(4)$.

[[0, 2], [1, 0], [2, 2], [3, 1], [4, 4]] (mod 7)

Now let’s add a single error. First remember that our theoretical guarantee says that we can correct any number of errors up to $ \frac{n+k-1}{2} – 1$, which in this case is $ [(5+3-1) / 2] – 1 = 2$, so we can definitely correct one error. We’ll add 1 to the third point, giving the received corrupted message as

[[0, 2], [1, 0], [2, 3], [3, 1], [4, 4]] (mod 7)

Now we set up the system of equations $ b_i E(a_i) = Q(a_i)$ for all $ (a_i, b_i)$ above. Rewriting the equations as $ b_iE(a_i) – Q(a_i) = 0$, and adding as the last equation the constraint that the coefficient of $ x^e$ is $ 1$, so that we get a “generic” error locator polynomial of the right degree. The columns represent the variables, with the last column being the right-hand-side of the equality as is the standard for Gaussian elimination.

# e0 e1 q0 q1 q2 q3
[
  [2, 0, 6, 0, 0, 0, 0],
  [0, 0, 6, 6, 6, 6, 0],
  [3, 6, 6, 5, 3, 6, 0],
  [1, 3, 6, 4, 5, 1, 0],
  [4, 2, 6, 3, 5, 6, 0],
  [0, 1, 0, 0, 0, 0, 1],
]

Then we do row-reduction to get

[
  [1, 0, 0, 0, 0, 0, 5],
  [0, 1, 0, 0, 0, 0, 1],
  [0, 0, 1, 0, 0, 0, 3],
  [0, 0, 0, 1, 0, 0, 3],
  [0, 0, 0, 0, 1, 0, 6],
  [0, 0, 0, 0, 0, 1, 2]
]

And reading off the solution gives $ E(x) = 5 + x$ and $ Q(x) = 3 + 3x + 6x^2 + 2x^3$. Note in particular that the $ E(x)$ given in this solution is the true error locator polynomial, but it is not guaranteed to be so! Either way, the quotient of the two polynomials is exactly $ m(x) = 2 + 3x + 2x^2$ which gives back the original message.

There is one catch here: how does one determine the value of $ e$ to use in setting up the system of linear equations? It turns out that an upper bound on $ e$ will work just fine, so long as the upper bound you use agrees with the theoretical maximum number of errors allowed (see the Singleton bound from last time). The effect of doing this is that the linear system ends up with some number of free variables that you can set to arbitrary values, and these will correspond to additional shared roots of $ E(x)$ and $ Q(x)$ that cancel out upon dividing.

A larger example

Now it’s time for a sad fact. I tried running Welch-Berlekamp on an encoded version of the following tiny image:

tiny-mars

And it didn’t finish after running all night.

Berlekamp-Welch is a slow algorithm for decoding Reed-Solomon codes because it requires one to solve a large system of equations. There’s at least one equation for each pixel in a black and white image! To get around this one typically encodes blocks of pixels together into one message character (since $ p$ is larger than $ n > k$ there is lots of space), and apparently one can balance it to minimize the number of equations. And finally, a nontrivial inefficiency comes from my implementation of everything in Python without optimizations. If we rewrote everything in C++ or Go and fixed the prime modulus, we would likely see reasonable running times. There are also asymptotically much faster methods based on the fast Fourier transform, and in the future we’ll try implementing some of these. For the dedicated reader, these are all good follow-up projects.

For now we’ll just demonstrate that it works by running it on a larger sample of text, the introductory paragraphs of To Kill a Mockingbird:

def tkamTest():
   message = '''When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow.  When it healed, and Jem's fears of never being able to play football were assuaged, he was seldom   self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. He   couldn't have cared less, so long as he could pass and punt.'''

   k = len(message)
   n = len(message) * 2
   p = 2087
   integerMessage = [ord(x) for x in message]

   enc, dec, solveSystem = makeEncoderDecoder(n, k, p)
   print("encoding...")
   encoded = enc(integerMessage)

   e = int(k/2)
   print("corrupting...")
   corrupted = corrupt(encoded[:], e, 0, p)

   print("decoding...")
   Q,E = solveSystem(corrupted)
   P, remainder = (Q.__divmod__(E))

   recovered = ''.join([chr(x) for x in P.coefficients])
   print(recovered)

Running this with unix time produces the following:

encoding...
corrupting...
decoding...
When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow. When it healed, and Jem's fears of never being able to play football were assuaged, he was seldom self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. He couldn't have cared less, so long as he could pass and punt.

real	82m9.813s
user	81m18.891s
sys	0m27.404s

So it finishes in “only” an hour or so.

In any case, the decoding algorithm is an interesting one. In future posts we’ll explore more efficient algorithms and faster implementations.

Until then!

Posts in this series:

Learning to Love Complex Numbers

This post is intended for people with a little bit of programming experience and no prior mathematical background.

So let’s talk about numbers.

Numbers are curious things. On one hand, they represent one of the most natural things known to humans, which is quantity. It’s so natural to humans that even newborn babies are in tune with the difference between quantities of objects between 1 and 3, in that they notice when quantity changes much more vividly than other features like color or shape.

But our familiarity with quantity doesn’t change the fact that numbers themselves (as an idea) are a human invention. And they’re not like most human inventions, the kinds where you have to tinker with gears or circuits to get a machine that makes your cappuccino. No, these are mathematical inventions. These inventions exist only in our minds.

Numbers didn’t always exist. A long time ago, back when the Greeks philosophers were doing their philosophizing, negative numbers didn’t exist! In fact, it wasn’t until 1200 AD that the number zero was first considered in Europe. Zero, along with negative numbers and fractions and square roots and all the rest, were invented primarily to help people solve more problems than they could with the numbers they had available. That is, numbers were invented primarily as a way for people to describe their ideas in a useful way. People simply  wondered “is there a number whose square gives you 2?” And after a while they just decided there was and called it $ \sqrt{2}$ because they didn’t have a better name for it. 

But with these new solutions came a host of new problems. You see, although I said mathematical inventions only exist in our minds, once they’re invented they gain a life of their own. You start to notice patterns in your mathematical objects and you have to figure out why they do the things they do. And numbers are a perfectly good example of this: once I notice that I can multiply a number by itself, I can ask how often these “perfect squares” occur. That is, what’s the pattern in the numbers $ 1^2, 2^2, 3^2, 4^2, \dots$? If you think about it for a while, you’ll find that square numbers have a very special relationship with odd numbers.

Other times, however, the things you invent turn out to make no sense at all, and you can prove they never existed in the first place! It’s an odd state of affairs, but we’re going to approach the subject of complex numbers from this mindset. We’re going to come up with a simple idea, the idea that negative numbers can be perfect squares, and explore the world of patterns it opens up. Along the way we’ll do a little bit of programming to help explore, give some simple proofs to solidify our intuition, and by the end we’ll see how these ideas can cause wonderful patterns like this one:

mandelbrot

The number i

Let’s bring the story back around to squares. One fact we all remember about numbers is that squaring a number gives you something non-negative. $ 7^2 = 49, (-2)^2 = 4, 0^2 = 0$, and so on. But it certainly doesn’t have to be this way. What if we got sick of that stupid fact and decided to invent a new number whose square was negative? Which negative, you ask? Well it doesn’t really matter, because I can always stretch it larger or smaller so that it’s square is -1.

Let’s see how: if you say that your made-up number $ x$ makes $ x^2 = -7$, then I can just use $ \frac{x}{\sqrt{7}}$ to get a number whose square is -1. If you’re going to invent a number that’s supposed to interact with our usual numbers, then you have to be allowed to add, subtract, and multiply $ x$ with regular old real numbers, and the usual properties would have to still work. So it would have to be true that $ (x / \sqrt{7})^2 = x^2 / \sqrt{7}^2 = -7/7 = -1$.

So because it makes no difference (this is what mathematicians mean by, “without loss of generality”) we can assume that the number we’re inventing will have a square of negative one. Just to line up with history, let’s call the new number $ i$. So there it is: $ i$ exists and $ i^2 = -1$. And now that we are “asserting” that $ i$ plays nicely with real numbers, we get these natural rules for adding and subtracting and multiplying and dividing. For example

  • $ 1 + i$ is a new number, which we’ll just call $ 1+i$. And if we added two of these together, $ (1+ i) + (1+i)$, we can combine the real parts and the $ i$ parts to get $ 2 + 2i$. Same goes for subtraction. In general a complex number looks like $ a + bi$, because as we’ll see in the other points you can simplify every simple arithmetic expression down to just one “real number” part and one “real number times $ i$” part.
  • We can multiply $ 3 \cdot i$, and we’ll just call it $ 3i$, and we require that multiplication distributes across addition (that the FOIL rule works). So that, for example, $ (2 – i)(1 + 3i) = (2 + 6i – i – 3i^2) = (2 + 3) + (6i – i) = (5 + 5i)$.
  • Dividing is a significantly more annoying. Say we want to figure out what $ 1 / (1+i)$ is (in fact, it’s not even obvious that this should look like a regular number! But it does). The $ 1 / a$ notation just means we’re looking for a number which, when we multiply by the denominator $ a$, we get back to 1. So we’re looking to find out when $ (a + bi)(1 + i) = 1 + 0i$ where $ a$ and $ b$ are variables we’re trying to solve for. If we multiply it out we get $ (a-b) + (a + b)i = 1 + 0i$, and since the real part and the $ i$ part have to match up, we know that $ a – b = 1$ and $ a + b = 0$. If we solve these two equations, we find that $ a = 1/2, b = -1/2$ works great. If we want to figure out something like $ (2 + 3i) / (1 – i)$, we just find out what $ 1 / (1- i)$ is first, and then multiply the result by $ (2+3i)$.

So that was tedious and extremely boring, and we imagine you didn’t even read it (that’s okay, it really is boring!). All we’re doing is establishing ground rules for the game, so if you come across some arithmetic that doesn’t make sense, you can refer back to this list to see what’s going on. And once again, for the purpose of this post, we’re asserting that all these laws hold. Maybe some laws follow from others, but as long as we don’t come up with any nasty self-contradictions we’ll be fine.

And now we turn to the real questions: is $ i$ the only square root of -1? Does $ i$ itself have a square root? If it didn’t, we’d be back to where we started, with some numbers (the non-$ i$ numbers) having square roots while others don’t. And so we’d feel the need to make all the $ i$ numbers happy by making up more numbers to be their square roots, and then worrying what if these new numbers don’t have square roots and…gah!

I’ll just let you in on the secret to save us from this crisis. It turns out that $ i$ does have a square root in terms of other $ i$ numbers, but in order to find it we’ll need to understand $ i$ from a different angle, and that angle turns out to be geometry.

Geometry? How is geometry going to help me understand numbers!?

It’s a valid question and part of why complex numbers are so fascinating. And I don’t mean geometry like triangles and circles and parallel lines (though there will be much talk of angles), I mean transformations in the sense that we’ll be “stretching,” “squishing,” and “rotating” numbers. Maybe another time I can tell you why for me “geometry” means stretching and rotating; it’s a long but very fun story.

The clever insight is that you can represent complex numbers as geometric objects in the first place. To do it, you just think of $ a + bi$ as a pair of numbers $ (a,b)$, (the pair of real part and $ i$ part), and then plot that point on a plane. For us, the $ x$-axis will be the “real” axis, and the $ y$-axis will be the $ i$-axis. So the number $ (3 – 4i)$ is plotted 3 units in the positive $ x$ direction and 4 units in the negative $ y$ direction. Like this:

single-complex-number

The “j” instead of “i” is not a typo, but a disappointing fact about the programming language we used to make this image. We’ll talk more about why later.

We draw it as an arrow for a good reason. Stretching, squishing, rotating, and reflecting will all be applied to the arrow, keeping its tail fixed at the center of the axes. Sometimes the arrow is called a “vector,” but we won’t use that word because here it’s synonymous with “complex number.”

So let’s get started squishing stuff.

Stretching, Squishing, Rotating

Before we continue I should clear up some names. We call a number that has an $ i$ in it a complex number, and we call the part without the $ i$ the real part (like 2 in $ 2-i$) and the part with $ i$ the complex part.

Python is going to be a great asset for us in exploring complex numbers, so let’s jump right into it. It turns out that Python natively supports complex numbers, and I wrote a program for drawing complex numbers. I used it to make the plot above. The program depends on a library I hate called matplotlib, and so the point of the program is to shield you from as much pain as possible and focus on complex numbers. You can use the program by downloading it from this blog’s Github page, along with everything else I made in writing this post. All you need to know how to do is call a function, and I’ve done a bit of window dressing removal to simplify things (I really hate matplotlib).

Here’s the function header:

# plotComplexNumbers : [complex] -&gt; None
# display a plot of the given list of complex numbers
def plotComplexNumbers(numbers):
   ...

Before we show some examples of how to use it, we have to understand how to use complex numbers in Python. It’s pretty simple, except that Python was written by people who hate math, and so they decided the complex number would be represented by $ j$ instead of $ i$ (people who hate math are sometimes called “engineers,” and they use $ j$ out of spite. Not really, though).

So in Python it’s just like any other computation. For example:

>>> (1 + 1j)*(4 - 2j) == (6+2j)
True
>>> 1 / (1+1j)
(0.5-0.5j)

And so calling the plotting function with a given list of complex numbers is as simple as importing the module and calling the function

from plotcomplex import plot
plot.plotComplexNumbers([(-1+1j), (1+2j), (-1.5 - 0.5j), (.6 - 1.8j)])

Here’s the result

example-complex-plot

So let’s use plots like this one to explore what “multiplication by $ i$” does to a complex number. It might not seem exciting at first, but I promise there’s a neat punchline.

Even without plotting it’s pretty easy to tell what multiplying by $ i$ does to some numbers. It takes 1 to $ i$, moves $ i$ to $ i^2 = -1$, it takes -1 to $ -i$, and $ -i$ to $ -i \cdot i = 1$.

What’s the pattern in these? well if we plot all these numbers, they’re all at right angles in counter-clockwise order. So this might suggest that multiplication by $ i$ does some kind of rotation. Is that always the case? Well lets try it with some other more complicated numbers. Click the plots below to enlarge.

Well, it looks close but it’s hard to tell. Some of the axes are squished and stretched, so it might be that our images don’t accurately represent the numbers (the real world can be such a pain). Well when visual techniques fail, we can attempt to prove it.

Clearly multiplying by $ i$ does some kind of rotation, maybe with other stuff too, and it shouldn’t be so hard to see that multiplying by $ i$ does the same thing no matter which number you use (okay, the skeptical readers will say that’s totally hard to see, but we’ll prove it super rigorously in a minute). So if we take any number and multiply it by $ i$ once, then twice, then three times, then four, and if we only get back to where we started at four multiplications, then each rotation had to be a quarter turn.

Indeed,

$ \displaystyle (a + bi) i^4 = (ai – b) i^3 = (-a – bi) i^2 = (-ai + b) i = a + bi$

This still isn’t all that convincing, and we want to be 100% sure we’re right. What we really need is a way to arithmetically compute the angle between two complex numbers in their plotted forms. What we’ll do is find a way to measure the angle of one complex number with the $ x$-axis, and then by subtraction we can get angles between arbitrary points. For example, in the figure below $ \theta = \theta_1 – \theta_2$.

angle-example

One way to do this is with trigonometry: the geometric drawing of $ a + bi$ is the hypotenuse of a right triangle with the $ x$-axis.

triangle-example

And so if $ r$ is the length of the arrow, then by the definition of sine and cosine, $ \cos(\theta) = a/r, \sin(\theta) = b/r$. If we have $ r, \theta$, and $ r > 0$, we can solve for a unique $ a$ and $ b$, so instead of representing a complex number in terms of the pair of numbers $ (a,b)$, we can represent it with the pair of numbers $ (r, \theta)$. And the conversion between the two is just

$ a + bi = r \cos(\theta) + (r \sin(\theta)) i$

The $ (r, \theta)$ representation is called the polar representation, while the $ (a,b)$ representation is called the rectangular representation or the Cartesian representation. Converting between polar and Cartesian coordinates fills the pages of many awful pre-calculus textbooks (despite the fact that complex numbers don’t exist in classical calculus). Luckily for us Python has built-in functions to convert between the two representations for us.

&gt;&gt;&gt; import cmath
&gt;&gt;&gt; cmath.polar(1 + 1j)
(1.4142135623730951, 0.7853981633974483)
&gt;&gt;&gt; z = cmath.polar(1 + 1j)
&gt;&gt;&gt; cmath.rect(z[0], z[1])
(1.0000000000000002+1j)

It’s a little bit inaccurate on the rounding, but it’s fine for our purposes.

So how do we compute the angle between two complex numbers? Just convert each to the polar form, and subtract the second coordinates. So if we get back to our true goal, to figure out what multiplication by $ i$ does, we can just do everything in polar form. Here’s a program that computes the angle between two complex numbers.

def angleBetween(z, w):
   zPolar, wPolar = cmath.polar(z), cmath.polar(w)
   return wPolar[1] - zPolar[1]

print(angleBetween(1 + 1j, (1 + 1j) * 1j))
print(angleBetween(2 - 3j, (2 - 3j) * 1j))
print(angleBetween(-0.5 + 7j, (-0.5 + 7j) * 1j))

Running it gives

1.5707963267948966
1.5707963267948966
-4.71238898038469

Note that the decimal form of $ \pi/2$ is 1.57079…, and that the negative angle is equivalent to $ \pi/2$ if you add a full turn of $ 2\pi$ to it. So programmatically we can see that for every input we try multiplying by $ i$ rotates 90 degrees.

But we still haven’t proved it works. So let’s do that now. To say what the angle is between $ r \cos (\theta) + ri \sin (\theta)$ and $ i \cdot [r \cos (\theta) + ri \sin(\theta)] = -r \sin (\theta) + ri \cos(\theta)$, we need to transform the second number into the usual polar form (where the $ i$ is on the sine part and not the cosine part). But we know, or I’m telling you now, this nice fact about sine and cosine:

$ \displaystyle \sin(\theta + \pi/2) = cos(\theta)$
$ \displaystyle \cos(\theta + \pi / 2) = -\sin(\theta)$

This fact is maybe awkward to write out algebraically, but it’s just saying that if you shift the whole sine curve a little bit you get the cosine curve, and if you keep shifting it you get the opposite of the sine curve (and if you kept shifting it even more you’d eventually get back to the sine curve; they’re called periodic for this reason).

So immediately we can rewrite the second number as $ r \cos(\theta + \pi/2) + i r \sin (\theta + \pi/2)$. The angle is the same as the original angle plus a right angle of $ \pi/2$. Neat!

Applying this same idea to $ (a + bi) \cdot (c + di)$, it’s not much harder to prove that multiplying two complex numbers in general multiplies their lengths and adds their angles. So if a complex number $ z$ has its magnitude $ r$ smaller than 1, multiplying by $ z$ squishes and rotates whatever is being multiplied. And if the magnitude is greater than 1, it stretches and rotates. So we have a super simple geometric understanding of how arithmetic with complex numbers works. And as we’re about to see, all this stretching and rotating results in some really weird (and beautifully mysterious!) mathematics and programs.

But before we do that we still have one question to address, the question that started this whole geometric train of thought: does $ i$ have a square root? Indeed, I’m just looking for a number such that, when I square its length and double its angle, I get $ i = \cos(\pi/2) + i \sin(\pi/2)$. Indeed, the angle we want is $ \pi/4$, and the length we want is $ r = 1$, which means $ \sqrt{i} = \cos(\pi/4) + i \sin(\pi/4)$. Sweet! There is another root if you play with the signs, see if you can figure it out.

In fact it’s a very deeper and more beautiful theorem (“theorem” means “really important fact”) called the fundamental theorem of algebra. And essentially it says that the complex numbers are complete. That is, we can always find square roots, cube roots, or anything roots of numbers involving $ i$. It actually says a lot more, but it’s easier to appreciate the rest of it after you do more math than we’re going to do in this post.

On to pretty patterns!

The Fractal

So here’s a little experiment. Since every point in the plane is the end of some arrow representing a complex number, we can imagine transforming the entire complex plane by transforming each number by the same rule. The most interesting simple rule we can think of: squaring! So though it might strain your capacity for imagination, try to visualize the idea like this. Squaring a complex number is the same as squaring it’s length and doubling its angle. So imagine: any numbers whose arrows are longer than 1 will grow much bigger, arrows shorter than 1 will shrink, and arrows of length exactly one will stay the same length (arrows close to length 1 will grow/shrink much more slowly than those far away from 1). And complex numbers with small positive angles will increase their angle, but only a bit, while larger angles will grow faster.

Here’s an animation made by Douglas Arnold showing what happens to the set of complex numbers $ a + bi$ with $ 0 \leq a, b \leq 1$ or $ -1 < a,b < 0$. Again, imagine every point is the end of a different arrow for the corresponding complex number. The animation is for a single squaring, and the points move along the arc they would travel if one rotated/stretched them smoothly.

complex-squaring

So that’s pretty, but this is by all accounts a well-behaved transformation. It’s “predictable,” because for example we can always tell which complex numbers will get bigger and bigger (in length) and which will get smaller.

What if, just for the sake of tinkering, we changed the transformation a little bit? That is, instead of sending $ z = a+bi$ to $ z^2$ (I’ll often write this $ z \mapsto z^2$), what if we sent

$ \displaystyle z \mapsto z^2 + 1$

Now it’s not so obvious: which numbers will grow and which will shrink? Notice that it’s odd because adding 1 only changes the real part of the number. So a number whose length is greater than 1 can become small under this transformation. For example, $ i$ is sent to $ 0$, so something slightly larger would also be close to zero. Indeed, $ 5i/4 \mapsto -9/16$.

So here’s an interesting question: are there any complex numbers that will stay small even if I keep transforming like this forever? Specifically, if I call $ f(z) = z^2$, and I call $ f^2(z) = f(f(z))$, and likewise call $ f^k(z)$ for $ k$ repeated transformations of $ z$, is there a number $ z$ so that for every $ k$, the value $ f^k(z) < 2$? “Obvious” choices like $ z=0$ don’t work, and neither do random guesses like $ z=i$ or $ z=1$. So should we guess the answer is no?

Before we jump to conclusions let’s write a program to see what happens for more than our random guesses. The program is simple: we’ll define the “square plus one” function, and then repeatedly apply that function to a number for some long number of times (say, 250 times). If the length of the number stays under 2 after so many tries, we’ll call it “small forever,” and otherwise we’ll call it “not small forever.”

def squarePlusOne(z):
   return z*z + 1

def isSmallForever(z, f):
   k = 0

   while abs(z) &lt; 2: z = f(z) k += 1 if k &gt; 250:
         return True

   return False

This isSmallForever function is generic: you can give it any function $ f$ and it will repeatedly call $ f$ on $ z$ until the result grows bigger than 2 in length. Note that the abs function is a built-in Python function for computing the length of a complex number.

Then I wrote a classify function, which you can give a window and a small increment, and it will produce a grid of zeros and ones marking the results of isSmallForever. The details of the function are not that important. I also wrote a function that turns the grid into a picture. So here’s an example of how we’d use it:

from plotcomplex.plot import gridToImage

def classifySquarePlusOne(z):
   return isSmallForever(z, squarePlusOne)

grid = classify(classifySquarePlusOne) # the other arguments are defaulted to [-2,2], [-2,2], 0.1
gridToImage(grid)

And here’s the result. Points colored black grow beyond 2, and white points stay small for the whole test.

Looks like they'll always grow big.

Looks like they’ll always grow big.

So it looks like repeated squaring plus one will always make complex numbers grow big. That’s not too exciting, but we can always make it more exciting. What happens if we replace the 1 in $ z^2 + 1$ with a different complex number? For example, if we do $ z^2 – 1$ then will things always grow big?

You can randomly guess and see that 0 will never grow big, because $ 0^2 – 1 = -1$ and $ (-1)^2 – 1 = 0$. It will just oscillate forever. So with -1 some numbers will grow and some will not! Let’s use the same routine above to see which:

def classifySquareMinusOne(z):
      return isSmallForever(z, squareMinusOne)

grid = classify(classifySquareMinusOne)
gridToImage(grid)

And the result:

second-attempt

Now that’s a more interesting picture! Let’s ramp up the resolution

grid = classify(classifySquareMinusOne, step=0.001)
gridToImage(grid)

second-attempt-zoomed

Gorgeous. If you try this at home you’ll notice, however, that this took a hell of a long time to run. Speeding up our programs is very possible, but it’s a long story for another time. For now we can just be patient.

Indeed, this image has a ton of interesting details! It looks almost circular in the middle, but if we zoom in we can see that it’s more like a rippling wave

second-attempt-zoomed2

It’s pretty incredible, and a huge question is jumping out at me: what the heck is causing this pattern to occur? What secret does -1 know that +1 doesn’t that makes the resulting pattern so intricate?

But an even bigger question is this. We just discovered that some values of $ c$ make $ z \mapsto z^2 + c$ result in interesting patterns, and some do not! So the question is which ones make interesting patterns? Even if we just, say, fix the starting point to zero: what is the pattern in the complex numbers that would tell me when this transformation makes zero blow up, and when it keeps zero small?

Sounds like a job for another program. This time we’ll use a nice little Python feature called a closure, which we define a function that saves the information that exists when it’s created for later. It will let us write a function that takes in $ c$ and produces a function that transforms according to $ z \mapsto z^2+c$.

def squarePlusC(c):
   def f(z):
      return z*z + c

   return f

And we can use the very same classification/graphing function from before to do this.

def classifySquarePlusC(c):
   return isSmallForever(0, squarePlusC(c))

grid = classify(classifySquarePlusC, xRange=(-2, 1), yRange=(-1, 1), step=0.005)
gridToImage(grid)

And the result:

mandelbrot

Stunning. This wonderful pattern, which is still largely not understood today, is known as the Mandelbrot set. That is, the white points are the points in the Mandlebrot set, and the black points are not in it. The detail on the border of this thing is infinitely intricate. For example, we can change the window in our little program to zoom in on a particular region.

mandelbrot-zoomed

And if you keep zooming in you keep getting more and more detail. This was true of the specific case of $ z^2 – 1$, but somehow the patterns in the Mandelbrot set are much more varied and interesting. And if you keep going down eventually you’ll see patterns that look like the original Mandelbrot set. We can already kind of see that happening above. The name for this idea is a fractal, and the $ z^2 – 1$ image has it too. Fractals are a fascinating and mysterious subject studied in a field called discrete dynamical systems. Many people dedicate their entire lives to studying these things, and it’s for good reason. There’s a lot to learn and even more that’s unknown!

So this is the end of our journey for now. I’ve posted all of the code we used in the making of this post so you can continue to play, but here are some interesting ideas.

  • The Mandelbrot set (and most fractals) are usually colored. The way they’re colored is as follows. Rather than just say true or false when zero blows up beyond 2 in length, you return the number of iterations $ k$ that happened. Then you pick a color based on how big $ k$ is. There’s a link below that lets you play with this. In fact, adding colors shows that there is even more intricate detail happening outside the Mandelbrot set that’s too faint to see in our pictures above. Such as this.
  • Some very simple questions about fractals are very hard to answer. For example, is the Mandelbrot set connected? That is, is it possible to “walk” from every point in the Mandelbrot set to every other point without leaving the set? Despite the scattering of points in the zoomed in picture above that suggest the answer is no, the answer is actually yes! This is a really difficult thing to prove, however.
  • The patterns in many fractals are often used to generate realistic looking landscapes and generate pseudo randomness. So fractals are not just mathematical curiosities.
  • You should definitely be experimenting with this stuff! What happens if you change the length threshold from 2 to some bigger number? What about a smaller number? What if you do powers different than $ 2$? There’s so much to explore!
  • The big picture thing to take away from this is that it’s not the numbers themselves that are particularly interesting, it’s the transformations of the numbers that generate these patterns! The interesting questions are what kinds of things are the same under these transformations, and what things are different. This is a very general idea in mathematics, and the more math you do the more you’ll find yourself wondering about useful and bizarre transformations.

For the chance to keep playing with the Mandelbrot set, check out this Mandelbrot grapher that works in your browser. It lets you drag rectangles to zoom further in on regions of interest. It’s really fun.

Until next time!

The Two-Dimensional Fourier Transform and Digital Watermarking

We’ve studied the Fourier transform quite a bit on this blog: with four primers and the Fast Fourier Transform algorithm under our belt, it’s about time we opened up our eyes to higher dimensions.

Indeed, in the decades since Cooley & Tukey’s landmark paper, the most interesting applications of the discrete Fourier transform have occurred in dimensions greater than 1. But for all our work we haven’t yet discussed what it means to take an “n-dimensional” Fourier transform. Our past toiling and troubling will pay off, though, because the higher Fourier transform and its 1-dimensional cousin are quite similar. Indeed, the shortest way to describe the $ n$-dimensional transform is as the 1-dimensional transform with inner products of vector variables replacing regular products of variables.

In this post we’ll flush out these details. We’ll define the multivariable Fourier transform and it’s discrete partner, implement an algorithm to compute it (FFT-style), and then apply the transform to the problem of digitally watermarking images.

As usual, all the code, images, and examples used in this post are available on this blog’s Github page.

Sweeping Some Details Under the Rug

We spent our first and second primers on Fourier analysis describing the Fourier series in one variable, and taking a limit of the period to get the Fourier transform in one variable. By all accounts, it was a downright mess of notation and symbol manipulation that culminated in the realization that the Fourier series looks a lot like a Riemann sum. So it was in one dimension, it is in arbitrary dimension, but to save our stamina for the applications we’re going to treat the $ n$-dimensional transform differently. We’ll use the 1-dimensional transform as a model, and magically generalize it to operate on a vector-valued variable. Then the reader will take it on faith that we could achieve the same end as a limit of some kind of multidimensional Fourier series (and all that nonsense with Schwarz functions and tempered distributions is left to the analysts), or if not we’ll provide external notes with the full details.

So we start with a real-valued (or complex-valued) function $ f : \mathbb{R}^n \to \mathbb{R}$, and we write the variable as $ x = (x_1, \dots, x_n)$, so that we can stick to using the notation $ f(x)$. Rather than think of the components of $ x$ as “time variables” as we did in the one-dimensional case, we’ll usually think of $ x$ as representing physical space. And so the periodic behavior of the function $ f$ represents periodicity in space. On the other hand our transformed variables will be “frequency” in space, and this will correspond to a vector variable $ \xi = (\xi_1, \dots, \xi_n)$. We’ll come back to what the heck “periodicity in space” means momentarily.

Remember that in one dimension the Fourier transform was defined by

$ \displaystyle \mathscr{F}f(s) = \int_{-\infty}^\infty e^{-2\pi ist}f(t) dt$.

And it’s inverse transform was

$ \displaystyle \mathscr{F}^{-1}g(t) = \int_{-\infty}^\infty e^{2\pi ist}f(s) ds$.

Indeed, with the vector $ x$ replacing $ t$ and $ \xi$ replacing $ s$, we have to figure out how to make an analogous definition. The obvious thing to do is to take the place where $ st$ is multiplied and replace it with the inner product of $ x$ and $ \xi$, which for this post I’ll write $ x \cdot \xi$ (usually I write $ \left \langle x, \xi \right \rangle$). This gives us the $ n$-dimensional transform

$ \displaystyle \mathscr{F}f(\xi) = \int_{\mathbb{R}^n} e^{-2\pi i x \cdot \xi}f(x) dx$,

and its inverse

$ \displaystyle \mathscr{F}^{-1}g(t) = \int_{\mathbb{R}^n} e^{2\pi i x \cdot \xi}f( \xi ) d \xi$

Note that the integral is over all of $ \mathbb{R}^n$. To give a clarifying example, if we are in two dimensions we can write everything out in coordinates: $ x = (x_1, x_2), \xi = (\xi_1, \xi_2)$, and the formula for the transform becomes

$ \displaystyle \mathscr{F}f(\xi_1, \xi_2) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-2 \pi i (x_1 \xi_1 + x_2 \xi_2)} f(\xi_1, \xi_2) dx_1 dx_2$.

Now that’s a nasty integral if I’ve ever seen one. But for our purposes in this post, this will be as nasty as it gets, for we’re primarily concerned with image analysis. So representing things as vectors of arbitrary dimension is more compact, and we don’t lose anything for it.

Periodicity in Space? It’s All Mostly the Same

Because arithmetic with vectors and arithmetic with numbers is so similar, it turns out that most of the properties of the 1-dimensional Fourier transform hold in arbitrary dimension. For example, the duality of the Fourier transform and its inverse holds, because for vectors $ e^{-2 \pi i x \cdot (-\xi)} = e^{2 \pi i x \cdot \xi}$. So just like in on dimension, we have

$ \mathscr{F}f(-\xi) = \mathscr{F}^{-1}f(\xi)$

And again we have correspondences between algebraic operations: convolution in the spatial domain corresponds to convolution in the frequency domain, the spectrum is symmetric about the origin, etc.

At a more geometric level, though, the Fourier transform does the same sort of thing as it did in the one-dimensional case. Again the complex exponentials form the building blocks of any function we want, and performing a Fourier transform on an $ n$-dimensional function decomposes that function into its frequency components. So a function that is perfectly periodic corresponds to a Fourier spectrum that’s perfectly concentrated at a point.

But what the hell, the reader might ask, is ‘periodicity in space’? Since we’re talking about images anyway, the variables we care about (the coordinates of a pixel) are spatial variables. You could, if you were so inclined, have a function of multiple time variables, and to mathematicians a physical interpretation of dimension is just that, an interpretation. But as confusing as it might sound, it’s actually not so hard to understand the Fourier transform when it’s specialized to image analysis. The idea is that complex exponentials $ e^{\pm 2 \pi i s \cdot \xi}$ oscillate in the $ x$ variable for a fixed $ \xi$ (and since $ \mathscr{F}$ has $ \xi$ as its input, we do want to fix $ \xi$). The brief mathematical analysis goes like this: if we fix $ \xi$ then the complex exponential is periodic with magnitudinal peaks along parallel lines spaced out at a distance of $ 1/ \left \| \xi \right \|$ apart. In particular any image is a sum of a bunch of these “complex exponential with a fixed $ \xi$” images that look like stripes with varying widths and orientations (what you see here is just the real part of a particular complex exponential).

Any image can be made from a sum of a whole lot of images like this one. This corresponds to a single point in the Fourier spectrum.

Any image can be made from a sum of a whole lot of images like the ones on top. They correspond to single points in the Fourier spectrum (and their symmetries), as on bottom.

What you see on top is an image, and on bottom its Fourier spectrum. That is, each brightly colored pixel corresponds to a point $ [x_1, x_2]$ with a large magnitude for that frequency component $ |\mathscr{F}f[x_1, x_2]|$.

It might be a bit surprising that every image can be constructed as a sum of stripey things, but so was it that any sound can be constructed as a sum of sines and cosines. It’s really just a statement about a basis of some vector space of functions. The long version of this story is laid out beautifully in pages 4 – 7 of these notes. The whole set of notes is wonderful, but this section is mathematically tidy and needs no background; the remainder of the notes outline the details about multidimensional Fourier series mentioned earlier, as well as a lot of other things. In higher dimensions the “parallel lines” idea is much the same, but with lines replaced by hyperplanes normal to the given vector.

Discretizing the Transform

Recall that for a continuous function $ f$ of one variable, we spent a bit of time figuring out how to find a good discrete approximation of $ f$, how to find a good discrete approximation of the Fourier transform $ \mathscr{F}f$, and how to find a quick way to transition between the two. In brief: $ f$ was approximated by a vector of samples $ (f[0], f[1], \dots, f[N])$, reconstructed the original function (which was only correct at the sampled points) and computed the Fourier transform of that, calling it the discrete Fourier transform, or DFT. We got to this definition, using square brackets to denote list indexing (or vector indexing, whatever):

Definition: Let $ f = (f[1], \dots f[N])$ be a vector in $ \mathbb{R}^N$. Then the discrete Fourier transform of $ f$ is defined by the vector $ (\mathscr{F}f[1], \dots, \mathscr{F}f[N])$, where

$ \displaystyle \mathscr{F}f[j] = \sum_{k=0}^{N-1} f[k]e^{-2 \pi i jk/N}$

Just as with the one-dimensional case, we can do the same analysis and arrive at a discrete approximation of an $ n$-dimensional function. Instead of a vector it would be an $ N \times N \times \dots \times N$ matrix, where there are $ n$ terms in the matrix, one for each variable. In two dimensions, this means the discrete approximation of a function is a matrix of samples taken at evenly-spaced intervals in both directions.

Sticking with two dimensions, the Fourier transform is then a linear operator taking matrices to matrices (which is called a tensor if you want to scare people). It has its own representation like the one above, where each term is a double sum. In terms of image analysis, we can imagine that each term in the sum requires us to look at every pixel of the original image

Definition: Let $ f = (f[s,t])$ be a vector in $ \mathbb{R}^N \times \mathbb{R}^M$, where $ s$ ranges from $ 0, \dots, N-1$ and $ t$ from $ 0, \dots, M-1$. Then the discrete Fourier transform of $ f$ is defined by the vector $ (\mathscr{F}f[s,t])$, where each entry is given by

$ \displaystyle \mathscr{F}f[x_1, x_2] = \sum_{s=0}^{N-1} \sum_{t=0}^{M-1} f[s, t] e^{-2 \pi i (s x_1 / N + t x_2 / M)}$

In the one-dimensional case the inverse transform had a sign change in the exponent and an extra $ 1/N$ normalization factor. Similarly, in two dimensions the inverse transform has a normalization factor of $ 1/NM$ (1 over the total number of samples). Again we use a capital $ F$ to denote the transformed version of $ f$. The higher dimensional transforms are analogous: you get $ n$ sums, one for each component, and the normalization factor is the inverse of the total number of samples.

$ \displaystyle \mathscr{F}^{-1}F[x_1, x_2] = \frac{1}{NM} \sum_{s=0}^{N-1} \sum_{t=0}^{M-1} f[s,t] e^{2 \pi i (sx_1 / N + tx_2 / M)}$

Unfortunately, the world of the DFT disagrees a lot on the choice of normalization factor. It turns out that all that really matters is that the exponent is negated in the inverse, and that the product of the constant terms on both the transform and its inverse is $ 1/NM$. So some people will normalize both the Fourier transform and its inverse by $ 1/ \sqrt{NM}$. The reason for this is that it makes the transform and its inverse more similar-looking (it’s just that, cosmetic). The choice of normalization isn’t particularly important for us, but beware: non-canonical choices are out there, and they do affect formulas by adding multiplicative constants.

The Fast Fourier Transform, Revisited

Now one might expect that there is another clever algorithm to drastically reduce the runtime of the 2-dimensional DFT, akin to the fast Fourier transform algorithm (FFT). But actually there is almost no additional insight required to understand the “fast” higher dimensional Fourier transform algorithm, because all the work was done for us in the one dimensional case.

All that we do is realize that each of the inner summations is a 1-dimensional DFT. That is, if we write the inner-most sum as a function of two parameters

$ \displaystyle g(s, x_2) = \sum_{t=0}^{M-1} f(s,t) e^{-2 \pi i (tx_2 / M)}$

then the 2-dimensional FFT is simply

$ \displaystyle \mathscr{F}f[x_1, x_2] = \sum_{s=0}^{N-1} g(s, x_2) e^{-2 \pi i (sx_1/N)}$

But now notice, that we can forget that $ g(s,x_2)$ was ever a separate, two-dimensional function. Indeed, since it only depends on the $ x_2$ parameter from out of the sum this is precisely the formula for a 1-dimensional DFT! And so if we want to compute the 2-dimensional DFT using the 1-dimensional FFT algorithm, we can compute the matrix of 1-dimensional DFT entries for all choices of $ s, x_2$ by fixing each value of $ s$ in turn and running FFT on the resulting “column” of values. If you followed the program from our last FFT post, then the only difficulty is in understanding how the data is shuffled around and which variables are fixed during the computation of the sub-DFT’s.

To remedy the confusion, we give an example. Say we have the following 3×3 matrix whose DFT we want to compute. Remember, these values are the sampled values of a 2-variable function.

$ \displaystyle \begin{pmatrix} f[0,0] & f[0,1] & f[0,2] \\ f[1,0] & f[1,1] & f[1,2] \\ f[2,0] & f[2,1] & f[2,2] \end{pmatrix}$

The first step in the algorithm is to fix a choice of row, $ s$, and compute the DFT of the resulting row. So let’s fix $ s = 0$, and then we have the resulting row

$ \displaystyle f_0 = (f[0,0], f[0,1], f[0,2])$

It’s DFT is computed (intentionally using the same notation as the inner summation above), as

$ \displaystyle g[0,x_2] = (\mathscr{F}f_0)[x_2] = \sum_{t=0}^{M-1} f_0[t] e^{- 2 \pi i (t x_2 / M)}$

Note that $ f_0[t] = f[s,t]$ for our fixed choice of $ s=0$. And so if we do this for all $ N$ rows (all 3 rows, in this example), we’ll have performed $ N$ FFT’s of size $ M$ to get a matrix of values

$ \displaystyle \begin{pmatrix} g[0,0] & g[0,1] & g[0,2] \\ g[1,0] & g[1,1] & g[1,2] \\ g[2,0] & g[2,1] & g[2,2] \end{pmatrix}$

Now we want to compute the rest of the 2-dimensional DFT to the end, and it’s easy: now each column consists of the terms in the outermost sum above (since $ s$ is the iterating variable). So if we fix a value of $ x_2$, say $ x_2 = 1$, we get the resulting column

$ \displaystyle g_1 = (g[0, 1], g[1,1], g[2,1])$

and computing a DFT on this row gives

$ \displaystyle \mathscr{F}f[x_1, 1] = \sum_{s=0}^{N-1} g_1[s] e^{-2 \pi i sx_1 / N}$.

Expanding the definition of $ g$ as a DFT gets us back to the original formula for the 2-dimensional DFT, so we know we did it right. In the end we get a matrix of the computed DFT values for all $ x_1, x_2$.

Let’s analyze the runtime of this algorithm: in the first round of DFT’s we computed $ N$ DFT’s of size $ M$, requiring a total of $ O(N M \log M)$, since we know FFT takes time $ O(M \log M)$ for a list of length $ M$. In the second round we did it the other way around, computing $ M$ DFT’s of size $ N$ each, giving a total of

$ O(NM \log M + NM \log N) = O(NM (\log N + \log M)) = O(NM \log (NM))$

In other words, if the size of the image is $ n = NM$, then we are achieving an $ O(n \log n)$-time algorithm, which was precisely the speedup that the FFT algorithm gave us for one-dimension. We also know a lower bound on this problem: we can’t do better than $ NM$ since we have to look at every pixel at least once. So we know that we’re only a logarithmic factor away from a trivial lower bound. And indeed, all other known DFT algorithms have the same runtime. Without any assumptions on the input data (or any parallelization), nobody knows of a faster algorithm.

Now let’s turn to the code. If we use our FFT algorithm from last time, the pure Python one (read: very slow), then we can implement the 2D Fourier transform in just two lines of Python code. Full disclosure: we left out some numpy stuff in this code for readability. You can view the entire source file on this blog’s Github page.

def fft2d(matrix):
   fftRows = [fft(row) for row in matrix]
   return transpose([fft(row) for row in transpose(fftRows)])

And we can test it on a simple matrix with one nonzero value in it:

A = [[0,0,0,0], [0,1,0,0], [0,0,0,0], [0,0,0,0]]
for row in fft2d(A):
   print(', '.join(['%.3f + %.3fi' % (x.real, x.imag) for x in row]))

The output is (reformatted in LaTeX, obviously):

$ \displaystyle \begin{pmatrix} 1 & -i & -1 & i \\ -i & -1 & i & 1 \\ -1 & i & 1 & -i \\ i & 1 & -i & -1 \end{pmatrix}$

The reader can verify by hand that this is correct (there’s only one nonzero term in the double sum, so it just boils down to figuring out the complex exponential $ e^{2 \pi i (x_1 + x_2 / 4)}$). We leave it as an additional exercise to the reader to implement the inverse transform, as well as to generalize this algorithm to higher dimensional DFTs.

Some Experiments and Animations

As we did with the 1-dimensional FFT, we’re now going to switch to using an industry-strength FFT algorithm for the applications. We’ll be using the numpy library and its “fft2” function, along with scipy’s ndimage module for image manipulation. Getting all of this set up was a nightmare (thank goodness for people who guide users like me through this stuff, but even then the headache seemed unending!). As usual, all of the code and images used in the making of this post is available on this blog’s Github page.

And so we can start playing with a sample image, a still from one of my favorite television shows:

sherlock

The Fourier transform of this image (after we convert it to grayscale) can be computed in python:

def fourierSpectrumExample(filename):
   A = ndimage.imread(filename, flatten=True)
   unshiftedfft = numpy.fft.fft2(A)
   spectrum = numpy.log10(numpy.absolute(unshiftedfft) + numpy.ones(A.shape))
   misc.imsave(&quot;%s-spectrum-unshifted.png&quot; % (filename.split('.')[0]), spectrum)

With the result:

sherlock-spectrum-unshifted

The Fourier spectrum of Sherlock and Watson (and London).

A few notes: we use the ndimage library to load the image and flatten the colors to grayscale. Then, after we compute the spectrum, we shift and take a logarithm. This is because the raw spectrum values are too massive; plotting them without modification makes the image contrast too high.

Something is odd, though, because the brightest regions are on the edges of the image, where we might expect the highest-frequency elements to be. Actually, it turns out that a raw DFT (as computed by numpy, anyhow) is “shifted.” That is, the indices are much like they were in our original FFT post, so that the “center” of the spectrum (the lowest frequency component) is actually in the corner of the image array.

The numpy folks have a special function designed to alleviate this called fftshift. Applying it before we plot the image gives the following spectrum:

sherlock-spectrum

Now that’s more like it. For more details on what’s going on with shifting and how to use the shifting functions, see this matlab thread. (As a side note, the “smudges” in this image are interesting. We wonder what property of the original image contributes to the smudges)

Shifted or unshifted, this image represents the frequency spectrum of the image. In other words, we could take the inverse DFT of each pixel (and its symmetric partner) of this image separately, add them all together, and get back to our original image! We did just that using a different image (one of size 266 x 189, requiring a mere 25137 frequency components), to produce this video of the process:

Many thanks to James Hance for his relentlessly cheerful art (I have a reddish version of this particular masterpiece on my bedroom wall).

For the interested reader, I followed this youtube video’s recommended workflow to make the time-lapsed movie, along with some additional steps to make the videos play side by side. It took quite a while to generate and process the images, and the frames take up a lot of space. So instead of storing all the frames, the interested reader may find the script used to generate the frames on this blog’s Github page (along with all of the rest of the code used in this blog post).

Digital Watermarking

Now we turn to the main application of Fourier transforms to this post, the task of adding an invisible digital watermark to an image. Just in case the reader lives in a cave, a watermark is a security device used to protect the ownership or authenticity of a particular good. Usually they’re used on money to prevent counterfeits, but they’re often applied to high-resolution images on the web to protect copyrights. But perhaps more than just protect existing copyrights, watermarks as they’re used today are ugly, and mostly prevent people from taking the image (paid for or not) in the first place. Here’s an example from a big proponent of ugly watermarks, Shutterstock.com.

stock-photo

Now if you were the business of copyright litigation, you’d make a lot of money by suing people who took your clients’ images without permission. So rather than prevent people from stealing in the first place, you could put in an invisible watermark into all of your images and then crawl the web looking for stolen images with your watermark. It would be easy enough to automate (Google already did most of the work for you, if you just want to use Google’s search by image feature).

Now I’m more on the side of Fair Use For All, so I wouldn’t hope for a company to actually implement this and make using the internet that much scarier of a place. But the idea makes for an interesting thought experiment and blog post. The idea is simply to modify the spectrum of an image by adding in small, artificial frequency components. That is, the watermarked image will look identical to the original image to a human, but the Fourier spectrum will contain suspicious entries that we can extract if we know where to look.

Implementing the watermarking feature is quite easy, so let’s do that first. Let’s work again with James Hance’s fine artwork.

hance-up-sw-bw

Let’s call our image’s pixel matrix $ A$ and say we’re working with grayscale images for simplicity (for color, we just do the same thing to all three color channels). Then we can define a watermark matrix $ W$ by the following procedure:

  1. Pick a radius $ r$, a length $ L$, a watermark strength $ \alpha$, and a secret key $ k$.
  2. Using $ k$ as a seed to a random number generator, define a random binary vector $ v$ of length $ L$.
  3. Pick a subset $ S$ of the circle of coordinates centered at the image’s center of radius $ r$, chosen or rejected based on the entries of $ v$.
  4. Let $ W$ be the matrix of all zeros (of the same dimension as $ A$ with 1’s in the entries of $ S$.
  5. Compute the watermarked image as $ \mathscr{F}^{-1}(\mathscr{F}(A) + \alpha W)$. That is, compute the DFT of $ A$, add $ \alpha W$ to it, and then compute the inverse Fourier transform of the result.

The code for this is simple enough. To create a random vector:

import random
def randomVector(seed, length):
   random.seed(secretKey)
   return [random.choice([0,1]) for _ in range(length)]

To make the watermark (and flush out all of the technical details of how it’s done:

def makeWatermark(imageShape, radius, secretKey, vectorLength=50):
   watermark = numpy.zeros(imageShape)
   center = (int(imageShape[0] / 2) + 1, int(imageShape[1] / 2) + 1)

   vector = randomVector(secretKey, vectorLength)

   x = lambda t: center[0] + int(radius * math.cos(t * 2 * math.pi / vectorLength))
   y = lambda t: center[1] + int(radius * math.sin(t * 2 * math.pi / vectorLength))
   indices = [(x(t), y(t)) for t in range(vectorLength)]

   for i,location in enumerate(indices):
      watermark[location] = vector[i]

   return watermark

We use the usual parameterization of the circle as $ t \mapsto (\cos(2 \pi t / n), \sin(2 \pi t / n)$ scaled to the appropriate radius. Here’s what the watermark looks like as a spectrum:

watermark-spectrum

It’s hard to see the individual pixels, so click it to enlarge.

And then applying a given watermark to an image is super simple.

def applyWatermark(imageMatrix, watermarkMatrix, alpha):
   shiftedDFT = fftshift(fft2(imageMatrix))
   watermarkedDFT = shiftedDFT + alpha * watermarkMatrix
   watermarkedImage = ifft2(ifftshift(watermarkedDFT))

   return watermarkedImage

And that’s all there is to it! One might wonder how the choice of $ \alpha$ affects the intensity of the watermark, and indeed here we show a few example values of this method applied to Hance’s piece:

Click to enlarge. It appears that it's not until alpha becomes egregiously large that we notice the effects. This could be in part due to the fact that this is an image of a canvas (which has lots of small textures in the background).

Click to enlarge. The effects are most visible in the rightmost image where alpha = 1,000,000

It appears that it’s not until $ \alpha$ becomes egregiously large (over 10,000) that we visibly notice the effects. This could be in part due to the fact that this is an image of a canvas (which has lots of small textures in the background). But it’s good to keep in mind the range of acceptable values when designing a decoding mechanism.

Indeed, a decoding mechanism is conceptually much messier; it’s the art to the encoding mechanism’s science. This paper details one possible way to do it, which is essentially to scale everything up or down to 512×512 pixels and try circles of every possible radius until you find one (or don’t) which is statistically similar to the your random vector. And note that since we have the secret key we can generate the exact same random vector. So what the author of that paper suggests is to extract each circle of pixels from the Fourier spectrum, treating it as a single vector with first entry at angle 0. Then you do some statistical magic (compute cross-correlation or some other similarity measure) between the extracted pixels and your secret-key-generated random vector. If they’re sufficiently similar, then you’ve found your watermark, and otherwise there’s no watermark present.

The code required to do this only requires a few extra lines that aren’t present in the code we’re already presented in this article (numpy does cross-correlation for you), so we leave it as an exercise to the reader: write a program that determines if an image contains our watermark, and test the algorithm on various $ \alpha$ and with modifications of the image like rotation, scaling, cropping, and jpeg compression. Part of the benefit of Fourier-based techniques is the resilience of the spectrum to mild applications of these transformations.

Next time we’ll use the Fourier transform to do other cool things to images, like designing filters and combining images in interesting ways.

Until then!

Bezier Curves and Picasso

Pablo Picasso

Pablo Picasso in front of The Kitchen, photo by Herbert List.

Simplicity and the Artist

Some of my favorite of Pablo Picasso’s works are his line drawings. He did a number of them about animals: an owl, a camel, a butterfly, etc. This piece called “Dog” is on my wall:

Dachshund-Picasso-Sketch

(Jump to interactive demo where we recreate “Dog” using the math in this post)

These paintings are extremely simple but somehow strike the viewer as deeply profound. They give the impression of being quite simple to design and draw. A single stroke of the hand and a scribbled signature, but what a masterpiece! It simultaneously feels like a hasty afterthought and a carefully tuned overture to a symphony of elegance. In fact, we know that Picasso’s process was deep. For example, in 1945-1946, Picasso made a series of eleven drawings (lithographs, actually) showing the progression of his rendition of a bull. The first few are more or less lifelike, but as the series progresses we see the bull boiled down to its essence, the final painting requiring a mere ten lines. Along the way we see drawings of a bull that resemble some of Picasso’s other works (number 9 reminding me of the sculpture at Daley Center Plaza in Chicago). Read more about the series of lithographs here.

the bull

Picasso’s, “The Bull.” Photo taken by Jeremy Kun at the Art Institute of Chicago in 2013. Click to enlarge.

Now I don’t pretend to be a qualified artist (I couldn’t draw a bull to save my life), but I can recognize the mathematical aspects of his paintings, and I can write a damn fine program. There is one obvious way to consider Picasso-style line drawings as a mathematical object, and it is essentially the Bezier curve. Let’s study the theory behind Bezier curves, and then write a program to draw them. The mathematics involved requires no background knowledge beyond basic algebra with polynomials, and we’ll do our best to keep the discussion low-tech. Then we’ll explore a very simple algorithm for drawing Bezier curves, implement it in Javascript, and recreate one of Picasso’s line drawings as a sequence of Bezier curves.

The Bezier Curve and Parameterizations

When asked to conjure a “curve” most people (perhaps plagued by their elementary mathematics education) will either convulse in fear or draw part of the graph of a polynomial. While these are fine and dandy curves, they only represent a small fraction of the world of curves. We are particularly interested in curves which are not part of the graphs of any functions.

Three French curves.

For instance, a French curve is a physical template used in (manual) sketching to aid the hand in drawing smooth curves. Tracing the edges of any part of these curves will usually give you something that is not the graph of a function. It’s obvious that we need to generalize our idea of what a curve is a bit. The problem is that many fields of mathematics define a curve to mean different things.  The curves we’ll be looking at, called Bezier curves, are a special case of  single-parameter polynomial plane curves. This sounds like a mouthful, but what it means is that the entire curve can be evaluated with two polynomials: one for the $ x$ values and one for the $ y$ values. Both polynomials share the same variable, which we’ll call $ t$, and $ t$ is evaluated at real numbers.

An example should make this clear. Let’s pick two simple polynomials in $ t$, say $ x(t) = t^2$ and $ y(t) = t^3$. If we want to find points on this curve, we can just choose values of $ t$ and plug them into both equations. For instance, plugging in $ t = 2$ gives the point $ (4, 8)$ on our curve. Plotting all such values gives a curve that is definitely not the graph of a function:

example-curve

But it’s clear that we can write any single-variable function $ f(x)$ in this parametric form: just choose $ x(t) = t$ and $ y(t) = f(t)$. So these are really more general objects than regular old functions (although we’ll only be working with polynomials in this post).

Quickly recapping, a single-parameter polynomial plane curve is defined as a pair of polynomials $ x(t), y(t)$ in the same variable $ t$. Sometimes, if we want to express the whole gadget in one piece, we can take the coefficients of common powers of $ t$ and write them as vectors in the $ x$ and $ y$ parts. Using the $ x(t) = t^2, y(t) = t^3$ example above, we can rewrite it as

$ \mathbf{f}(t) = (0,1) t^3 + (1,0) t^2$

Here the coefficients are points (which are the same as vectors) in the plane, and we represent the function $ f$ in boldface to emphasize that the output is a point. The linear-algebraist might recognize that pairs of polynomials form a vector space, and further combine them as $ (0, t^3) + (t^2, 0) = (t^2, t^3)$. But for us, thinking of points as coefficients of a single polynomial is actually better.

We will also restrict our attention to single-parameter polynomial plane curves for which the variable $ t$ is allowed to range from zero to one. This might seem like an awkward restriction, but in fact every finite single-parameter polynomial plane curve can be written this way (we won’t bother too much with the details of how this is done). For the purpose of brevity, we will henceforth call a “single-parameter polynomial plane curve where $ t$ ranges from zero to one” simply a “curve.”

Now there are some very nice things we can do with curves. For instance, given any two points in the plane $ P = (p_1, p_2), Q = (q_1, q_2)$ we can describe the straight line between them as a curve: $ \mathbf{L}(t) = (1-t)P + tQ$. Indeed, at $ t=0$ the value $ \mathbf{L}(t)$ is exactly $ P$, at $ t=1$ it’s exactly $ Q$, and the equation is a linear polynomial in $ t$. Moreover (without getting too much into the calculus details), the line $ \mathbf{L}$ travels at “unit speed” from $ P$ to $ Q$. In other words, we can think of $ \mathbf{L}$ as describing the motion of a particle from $ P$ to $ Q$ over time, and at time $ 1/4$ the particle is a quarter of the way there, at time $ 1/2$ it’s halfway, etc. (An example of a straight line which doesn’t have unit speed is, e.g. $ (1-t^2) P + t^2 Q$.)

More generally, let’s add a third point $ R$. We can describe a path which goes from $ P$ to $ R$, and is “guided” by $ Q$ in the middle. This idea of a “guiding” point is a bit abstract, but computationally no more difficult. Instead of travelling from one point to another at constant speed, we want to travel from one line to another at constant speed. That is, call the two curves describing lines from $ P \to Q$ and $ Q \to R$ $ \mathbf{L}_1, \mathbf{L_2}$, respectively. Then the curve “guided” by $ Q$ can be written as a curve

$ \displaystyle \mathbf{F}(t) = (1-t)\mathbf{L}_1(t) + t \mathbf{L}_2(t)$

Multiplying this all out gives the formula

$ \displaystyle \mathbf{F}(t) = (1-t)^2 P + 2t(1-t)Q + t^2 R$

We can interpret this again in terms of a particle moving. At the beginning of our curve the value of $ t$ is small, and so we’re sticking quite close to the line $ \mathbf{L}_1$ As time goes on the point $ \mathbf{F}(t)$ moves along the line between the points $ \mathbf{L}_1(t)$ and $ \mathbf{L}_2(t)$, which are themselves moving. This traces out a curve which looks like this

Screen Shot 2013-05-07 at 9.38.42 PM

This screenshot was taken from a wonderful demo by data visualization consultant Jason Davies. It expresses the mathematical idea quite superbly, and one can drag the three points around to see how it changes the resulting curve. One should play with it for at least five minutes.

The entire idea of a Bezier curve is a generalization of this principle: given a list $ P_0, \dots, P_n$ of points in the plane, we want to describe a curve which travels from the first point to the last, and is “guided” in between by the remaining points. A Bezier curve is a realization of such a curve (a single-parameter polynomial plane curve) which is the inductive continuation of what we described above: we travel at unit speed from a Bezier curve defined by the first $ n-1$ points in the list to the curve defined by the last $ n-1$ points. The base case is the straight-line segment (or the single point, if you wish). Formally,

Definition: Given a list of points in the plane $ P_0, \dots, P_n$ we define the degree $ n-1$ Bezier curve recursively as

$ \begin{aligned} \mathbf{B}_{P_0}(t) &= P_0 \\ \mathbf{B}_{P_0 P_1 \dots P_n}(t) &= (1-t)\mathbf{B}_{P_0 P_1 \dots P_{n-1}} + t \mathbf{B}_{P_1P_2 \dots P_n}(t) \end{aligned}$

We call $ P_0, \dots, P_n$ the control points of $ \mathbf{B}$.

While the concept of travelling at unit speed between two lower-order Bezier curves is the real heart of the matter (and allows us true computational insight), one can multiply all of this out (using the formula for binomial coefficients) and get an explicit formula. It is:

$ \displaystyle \mathbf{B}_{P_0 \dots P_n} = \sum_{k=0}^n \binom{n}{k}(1-t)^{n-k}t^k P_k$

And for example, a cubic Bezier curve with control points $ P_0, P_1, P_2, P_3$ would have equation

$ \displaystyle (1-t)^3 P_0 + 3(1-t)^2t P_1 + 3(1-t)t^2 P_2 + t^3 P_3$

Higher dimensional Bezier curves can be quite complicated to picture geometrically. For instance, the following is a fifth-degree Bezier curve (with six control points).

BezierCurve

A degree five Bezier curve, credit Wikipedia.

The additional line segments drawn show the recursive nature of the curve. The simplest are the green points, which travel from control point to control point. Then the blue points travel on the line segments between green points, the pink travel along the line segments between blue, the orange between pink, and finally the red point travels along the line segment between the orange points.

Without the recursive structure of the problem (just seeing the curve) it would be a wonder how one could actually compute with these things. But as we’ll see, the algorithm for drawing a Bezier curve is very natural.

Bezier Curves as Data, and de Casteljau’s Algorithm

Let’s derive and implement the algorithm for painting a Bezier curve to a screen using only the ability to draw straight lines. For simplicity, we’ll restrict our attention to degree-three (cubic) Bezier curves. Indeed, every Bezier curve can be written as a combination of cubic curves via the recursive definition, and in practice cubic curves balance computational efficiency and expressiveness. All of the code we present in this post will be in Javascript, and is available on this blog’s Github page.

So then a cubic Bezier curve is represented in a program by a list of four points. For example,

var curve = [[1,2], [5,5], [4,0], [9,3]];

Most graphics libraries (including the HTML5 canvas standard) provide a drawing primitive that can output Bezier curves given a list of four points. But suppose we aren’t given such a function. Suppose that we only have the ability to draw straight lines. How would one go about drawing an approximation to a Bezier curve? If such an algorithm exists (it does, and we’re about to see it) then we could make the approximation so fine that it is visually indistinguishable from a true Bezier curve.

The key property of Bezier curves that allows us to come up with such an algorithm is the following:

Any cubic Bezier curve $ \mathbf{B}$ can be split into two, end to end,
which together trace out the same curve as $ \mathbf{B}$.

Let see exactly how this is done. Let $ \mathbf{B}(t)$ be a cubic Bezier curve with control points $ P_0, P_1, P_2, P_3$, and let’s say we want to split it exactly in half. We notice that the formula for the curve when we plug in $ 1/2$, which is

$ \displaystyle \mathbf{B}(1/2) = \frac{1}{2^3}(P_0 + 3P_1 + 3P_2 + P_3)$

Moreover, our recursive definition gave us a way to evaluate the point in terms of smaller-degree curves. But when these are evaluated at 1/2 their formulae are similarly easy to write down. The picture looks like this:

subdivision

The green points are the degree one curves, the pink points are the degree two curves, and the blue point is the cubic curve. We notice that, since each of the curves are evaluated at $ t=1/2$, each of these points can be described as the midpoints of points we already know. So $ m_0 = (P_0 + P_1) / 2, q_0 = (m_0 + m_1)/2$, etc.

In fact, the splitting of the two curves we want is precisely given by these points. That is, the “left” half of the curve is given by the curve $ \mathbf{L}(t)$ with control points $ P_0, m_0, q_0, \mathbf{B}(1/2)$, while the “right” half $ \mathbf{R}(t)$ has control points $ \mathbf{B}(1/2), q_1, m_2, P_3$.

How can we be completely sure these are the same Bezier curves? Well, they’re just polynomials. We can compare them for equality by doing a bunch of messy algebra. But note, since $ \mathbf{L}(t)$ only travels halfway along $ \mathbf{B}(t)$, to check they are the same is to equate $ \mathbf{L}(t)$ with $ \mathbf{B}(t/2)$, since as $ t$ ranges from zero to one, $ t/2$ ranges from zero to one half. Likewise, we can compare $ \mathbf{B}((t+1)/2)$ with $ \mathbf{R}(t)$.

The algebra is very messy, but doable. As a test of this blog’s newest tools, here’s a screen cast of me doing the algebra involved in proving the two curves are identical.


Now that that’s settled, we have a nice algorithm for splitting a cubic Bezier (or any Bezier) into two pieces. In Javascript,

function subdivide(curve) {
   var firstMidpoints = midpoints(curve);
   var secondMidpoints = midpoints(firstMidpoints);
   var thirdMidpoints = midpoints(secondMidpoints);

   return [[curve[0], firstMidpoints[0], secondMidpoints[0], thirdMidpoints[0]],
          [thirdMidpoints[0], secondMidpoints[1], firstMidpoints[2], curve[3]]];
}

Here “curve” is a list of four points, as described at the beginning of this section, and the output is a list of two curves with the correct control points. The “midpoints” function used is quite simple, and we include it here for compelteness:

function midpoints(pointList) {
   var midpoint = function(p, q) {
      return [(p[0] + q[0]) / 2.0, (p[1] + q[1]) / 2.0];
   };

   var midpointList = new Array(pointList.length - 1);
   for (var i = 0; i &lt; midpointList.length; i++) {
      midpointList[i] = midpoint(pointList[i], pointList[i+1]);
   }

   return midpointList;
}

It just accepts as input a list of points and computes their sequential midpoints. So a list of $ n$ points is turned into a list of $ n-1$ points. As we saw, we need to call this function $ d-1$ times to compute the segmentation of a degree $ d$ Bezier curve.

As explained earlier, we can keep subdividing our curve over and over until each of the tiny pieces are basically lines. That is, our function to draw a Bezier curve from the beginning will be as follows:

function drawCurve(curve, context) {
   if (isFlat(curve)) {
      drawSegments(curve, context);
   } else {
      var pieces = subdivide(curve);
      drawCurve(pieces[0], context);
      drawCurve(pieces[1], context);
   }
}

In words, as long as the curve isn’t “flat,” we want to subdivide and draw each piece recursively. If it is flat, then we can simply draw the three line segments of the curve and be reasonably sure that it will be a good approximation. The context variable sitting there represents the canvas to be painted to; it must be passed through to the “drawSegments” function, which simply paints a straight line to the canvas.

Of course this raises the obvious question: how can we tell if a Bezier curve is flat? There are many ways to do so. One could compute the angles of deviation (from a straight line) at each interior control point and add them up. Or one could compute the volume of the enclosed quadrilateral. However, computing angles and volumes is usually not very nice: angles take a long time to compute and volumes have stability issues, and the algorithms which are stable are not very simple. We want a measurement which requires only basic arithmetic and perhaps a few logical conditions to check.

It turns out there is such a measurement. It’s originally attributed to Roger Willcocks, but it’s quite simple to derive by hand.

Essentially, we want to measure the “flatness” of a cubic Bezier curve by computing the distance of the actual curve at time $ t$ from where the curve would be at time $ t$ if the curve were a straight line.

Formally, given $ \mathbf{B}(t)$ with control points $ P_0, P_1, P_2, P_3$ as usual, we can define the straight-line Bezier cubic as the colossal sum

$ \displaystyle \mathbf{S}(t) = (1-t)^3P_0 + 3(1-t)^2t \left ( \frac{2}{3}P_0 + \frac{1}{3}P_3 \right ) + 3(1-t)t^2 \left ( \frac{1}{3}P_0 + \frac{2}{3}P_3 \right ) + t^3 P_3 $

There’s nothing magical going on here. We’re simply giving the Bezier curve with control points $ P_0, \frac{2}{3}P_0 + \frac{1}{3}P_3, \frac{1}{3}P_0 + \frac{2}{3}P_3, P_3$. One should think about this as points which are a 0, 1/3, 2/3, and 1 fraction of the way from $ P_0$ to $ P_3$ on a straight line.

Then we define the function $ d(t) = \left \| \mathbf{B}(t) – \mathbf{S}(t) \right \|$ to be the distance between the two curves at the same time $ t$. The flatness value of $ \mathbf{B}$ is the maximum of $ d$ over all values of $ t$. If this flatness value is below a certain tolerance level, then we call the curve flat.

With a bit of algebra we can simplify this expression. First, the value of $ t$ for which the distance is maximized is the same as when its square is maximized, so we can omit the square root computation at the end and take that into account when choosing a flatness tolerance.

Now lets actually write out the difference as a single polynomial. First, we can cancel the 3’s in $ \mathbf{S}(t)$ and write the polynomial as

$ \displaystyle \mathbf{S}(t) = (1-t)^3 P_0 + (1-t)^2t (2P_0 + P_3) + (1-t)t^2 (P_0 + 2P_3) + t^3 P_3$

and so $ \mathbf{B}(t) – \mathbf{S}(t)$ is (by collecting coefficients of the like terms $ (1-t)^it^j$)

$ \displaystyle (1-t)^2t (3 P_1 – 2P_0 – P_3) + (1-t)t^2 (3P_2 – P_0 – 2P_3) $

Factoring out the $ (1-t)t$ from both terms and setting $ a = 3P_1 – 2P_0 – P_3$, $ b = 3P_2 – P_0 – 2P_3$, we get

$ \displaystyle d^2(t) = \left \| (1-t)t ((1-t)a + tb) \right \|^2 = (1-t)^2t^2 \left \| (1-t)a + tb \right \|^2$

Since the maximum of a product is at most the product of the maxima, we can bound the above quantity by the product of the two maxes. The reason we want to do this is because we can easily compute the two maxes separately. It wouldn’t be hard to compute the maximum without splitting things up, but this way ends up with fewer computational steps for our final algorithm, and the visual result is equally good.

Using some elementary single-variable calculus, the maximum value of $ (1-t)^2t^2$ for $ 0 \leq t \leq 1$ turns out to be $ 1/16$. And the norm of a vector is just the sum of squares of its components. If $ a = (a_x, a_y)$ and $ b = (b_x, b_y)$, then the norm above is exactly

$ \displaystyle ((1-t)a_x + tb_x)^2 + ((1-t)a_y + tb_y)^2$

And notice: for any real numbers $ z, w$ the quantity $ (1-t)z + tw$ is exactly the straight line from $ z$ to $ w$ we know so well. The maximum over all $ t$ between zero and one is obviously the maximum of the endpoints $ z, w$. So the max of our distance function $ d^2(t)$ is bounded by

$ \displaystyle \frac{1}{16} (\textup{max}(a_x^2, b_x^2) + \textup{max}(a_y^2, b_y^2))$

And so our condition for being flat is that this bound is smaller than some allowable tolerance. We may safely factor the 1/16 into this tolerance bound, and so this is enough to write a function.

function isFlat(curve) {
   var tol = 10; // anything below 50 is roughly good-looking

   var ax = 3.0*curve[1][0] - 2.0*curve[0][0] - curve[3][0]; ax *= ax;
   var ay = 3.0*curve[1][1] - 2.0*curve[0][1] - curve[3][1]; ay *= ay;
   var bx = 3.0*curve[2][0] - curve[0][0] - 2.0*curve[3][0]; bx *= bx;
   var by = 3.0*curve[2][1] - curve[0][1] - 2.0*curve[3][1]; by *= by;

   return (Math.max(ax, bx) + Math.max(ay, by) &lt;= tol);
}

And there we have it. We write a simple HTML page to access a canvas element and a few extra helper functions to draw the line segments when the curve is flat enough, and present the final result in this interactive demonstration (you can perturb the control points).

The picture you see on that page (given below) is my rendition of Picasso’s “Dog” drawing as a sequence of nine Bezier curves. I think the resemblance is uncanny 🙂

Picasso's "Dog," redesigned as a sequence of eight bezier curves.

Picasso’s “Dog,” redesigned as a sequence of nine bezier curves.

While we didn’t invent the drawing itself (and hence shouldn’t attach our signature to it), we did come up with the representation as a sequence of Bezier curves. It only seems fitting to present that as the work of art. Here we’ve distilled the representation down to a single file: the first line is the dimension of the canvas, and each subsequent line represents a cubic Bezier curve. Comments are included for readability.

bezier-dog-for-web

“Dog” Jeremy Kun, 2013. Click to enlarge.

Because standardizing things seems important, we define a new filetype “.bezier”, which has the format given above:

int int
(int) curve 
(int) curve
...

Where the first two ints specify the size of the canvas, the first (optional) int on each line specifies the width of the stroke, and a “curve” has the form

[int,int] [int,int] ... [int,int]

If an int is omitted at the beginning of a line, this specifies a width of three pixels.

In a general .bezier file we allow a curve to have arbitrarily many control points, though the code we gave above does not draw them that generally. As an exercise, write a program which accepts as input a .bezier file and produces as output an image of the drawing. This will require an extension of the algorithm above for drawing arbitrary Bezier curves, which loops its computation of the midpoints and keeps track of which end up in the resulting subdivision. Alternatively, one could write a program which accepts as input a .bezier file with only cubic Bezier curves, and produces as output an SVG file of the drawing (SVG only supports cubic Bezier curves). So a .bezier file is a simplification (fewer features) and an extension (Bezier curves of arbitrary degree) of an SVG file.

We didn’t go as deep into the theory of Bezier curves as we could have. If the reader is itching for more (and a more calculus-based approach), see this lengthy primer. It contains practically everything one could want to know about Bezier curves, with nice interactive demos written in Processing.

Low-Complexity Art

There are some philosophical implications of what we’ve done today with Picasso’s “Dog.” Previously on this blog we’ve investigated the idea of low-complexity art, and it’s quite relevant here. The thesis is that “beautiful” art has a small description length, and more formally the “complexity” of some object (represented by text) is the length of the shortest program that outputs that object given no inputs. More on that in our primer on Kolmogorov complexity. The fact that we can describe Picasso’s line drawings with a small number of Bezier curves (and a relatively short program to output the bezier curves) is supposed to be a deep statement about the beauty of the art itself. Obviously this is very subjective, but not without its proponents.

There has been a bit of recent interest in computers generating art. For instance, this recent programming competition (in Dutch) gave the task of generating art similar to the work of Piet Mondrian. The idea is that the more elegant the algorithm, the higher it would be scored. The winner used MD5 hashes to generate Mondrian pieces, and there were many many other impressive examples (the link above has a gallery of submissions).

In our earlier post on low-complexity art, we explored the possibility of representing all images within a coordinate system involving circles with shaded interiors. But it’s obvious that such a coordinate system wouldn’t be able to represent “Dog” with very low complexity. It seems that Bezier curves are a much more natural system of coordinates. Some of the advantages include that length of lines and slight perturbations don’t affect the resulting complexity. A cubic Bezier curve can be described by any set of four points, and more “intricate” (higher complexity) descriptions of curves require a larger number of points. Bezier curves can be scaled up arbitrarily, and this doesn’t significantly change the complexity of the curve (although scaling many orders of magnitude will introduce a logarithmic factor complexity increase, this is quite small). Curves with larger stroke are slightly more complex than those with smaller stroke, and representing many small sharp bends require more curves than long, smooth arcs.

On the downside, it’s not so easy to represent a circle as a Bezier curve. In fact, it is impossible to do so exactly. Despite the simplicity of this object (it’s even defined as a single polynomial, albeit in two variables), the best one can do is approximate it. The same goes for ellipses. There are actually ways to overcome this (the concept of rational Bezier curves which are quotients of polynomials), but they add to the inherent complexity of the drawing algorithm and the approximations using regular Bezier curves are good enough.

And so we define the complexity of a drawing to be the number of bits in its .bezier file representation. Comments are ignored in this calculation.

The real prize, and what we’ll explore next time, is to find a way to generate art automatically. That is to do one of two things:

  1. Given some sort of “seed,” write a program that produces a pseudo-random line drawing.
  2. Given an image, produce a .bezier image which accurately depicts the image as a line drawing.

We will attempt to explore these possibilities in the follow-up to this post. Depending on how things go, this may involve some local search algorithms, genetic algorithms, or other methods.

Until then!

Addendum: want to buy a framed print of the source code for “Dog”? Head over to our page on Society6.