# The Welch-Berlekamp Algorithm for Correcting Errors in Data

In this post we’ll implement Reed-Solomon error-correcting codes and use them to play with codes. In our last post we defined Reed-Solomon codes rigorously, but in this post we’ll focus on intuition and code. As usual the code and data used in this post is available on this blog’s Github page.

The main intuition behind Reed-Solomon codes (and basically all the historically major codes) is

Error correction is about adding redundancy, and polynomials are a really efficient way to do that.

Here’s an example of what we’ll do in the post. Say you have a space probe flying past Mars taking photographs like this one

Courtesy of NASA’s Viking Orbiter.

Unfortunately you know that if you send the images back to Earth via radio waves, the signal will get corrupted by cosmic something-or-other and you’ll end up with an image like this.

How can you recover from errors like this? You could do something like repeat each pixel twice in the message so that if one is corrupted the other will get through. But still, every now and then both pixels in a row will be corrupted and it’s twice as inefficient.

The idea of error-correcting codes is to find a way to encode a message so that it adds a lot of redundancy without adding too much extra information to the message. The name of the game is to optimize the tradeoff between how much redundancy you get and how much longer the message needs to be, while still being able to efficiently decode the encoded message.

A solid technique turns out to be: use polynomials. Even though you’d think polynomials are too simple (we teach them starting in the 7th grade these days!) they turn out to have remarkable properties. The most important of which is:

if you give me a bunch of points in the plane with different $x$ coordinates, they uniquely define a polynomial of a certain degree.

This fact is called polynomial interpolation. We used it in a previous post to share secrets, if you’re interested.

What makes polynomials great for error correction is that you can take a fixed polynomial (think, the message) and “encode” it as a list of points on that polynomial. If you include enough, then you can get back the original polynomial from the points alone. And the best part, for each two additional points you include above the minimum, you get resilience to one additional error no matter where it happens in the message. Another way to say this is, even if some of the points in your encoded message are wrong (the numbers are modified by an adversary or random noise), as long as there aren’t too many errors there is an algorithm that can recover the errors.

That’s what makes polynomials so much better than the naive idea of repeating every pixel twice: once you allow for three errors you run the risk of losing a pixel, but you had to double your communication costs. With a polynomial-based approach you’d only need to store around six extra pixels worth of data to get resilience to three errors that can happen anywhere. What a bargain!

Here’s the official theorem about Reed-Solomon codes:

Theorem: There is an efficient algorithm which, when given points $(a_1, b_1), \dots, (a_n, b_n)$ with distinct $a_i$ has the following property. If there is a polynomial of degree $d$ that passes through at least $n/2 + d/2$ of the given points, then the algorithm will output the polynomial.

So let’s implement the encoder, decoder, and turn the theorem into code!

## Implementing the encoder

The way you write a message of length $k$ as a polynomial is easy. Pick a large prime integer $p$ and from now on we’ll do all our arithmetic modulo $p$. Then encode each character $c_0, c_1, \dots, c_{k-1}$ in the message as an integer between 0 and $p-1$ (this is why $p$ needs to be large enough), and the polynomial representing the message is

$m(x) = c_0 + c_1x + c_2x^2 + \dots + c_{k-1}x^{k-1}$

If the message has length $k$ then the polynomial will have degree $k-1$.

Now to encode the message we just pick a bunch of $x$ values and plug them into the polynomial, and record the (input, output) pairs as the encoded message. If we want to make things simple we can just require that you always pick the $x$ values $0, 1, \dots, n$ for some choice of $n \leq p$.

A quick skippable side-note: we need $p$ to be prime so that our arithmetic happens in a field. Otherwise, we won’t necessarily get unique decoded messages.

Back when we discussed elliptic curve cryptography (ironically sharing an acronym with error correcting codes), we actually wrote a little library that lets us seamlessly represent polynomials with “modular arithmetic coefficients” in Python, which in math jargon is a “finite field.” Rather than reinvent the wheel we’ll just use that code as a black box (full source in the Github repo). Here are some examples of using it.

>>> from finitefield.finitefield import FiniteField
>>> F13 = FiniteField(p=13)
>>> a = F13(7)
>>> a+9
3 (mod 13)
>>> a*a
10 (mod 13)
>>> 1/a
2 (mod 13)


A programming aside: once you construct an instance of your finite field, all arithmetic operations involving instances of that type will automatically lift integers to the appropriate type. Now to make some polynomials:

>>> from finitefield.polynomial import polynomialsOver
>>> F = FiniteField(p=13)
>>> P = polynomialsOver(F)
>>> g = P([1,3,5])
>>> g
1 + 3 t^1 + 5 t^2
>>> g*g
1 + 6 t^1 + 6 t^2 + 4 t^3 + 12 t^4
>>> g(100)
4 (mod 13)


Now to fix an encoding/decoding scheme we’ll call $k$ the size of the unencoded message, $n$ the size of the encoded message, and $p$ the modulus, and we’ll fix these programmatically when the encoder and decoder are defined so we don’t have to keep carrying these data around.

def makeEncoderDecoder(n, k, p):
Fp = FiniteField(p)
Poly = polynomialsOver(Fp)

def encode(message):
...

def decode(encodedMessage):
...

return encode, decode


Encode is the easier of the two.

def encode(message):
thePoly = Poly(message)
return [(Fp(i), thePoly(Fp(i))) for i in range(n)]


Technically we could remove the leading Fp(i) from each tuple, since the decoder algorithm can assume we’re using the first $n$ integers in order. But we’ll leave it in and define the decode function more generically.

After we define how the decoder should work in theory we’ll run through a simple example step by step. Now on to the decoder.

## The decoding algorithm, Berlekamp-Welch

There are a lot of different decoding algorithms for various error correcting codes. The one we’ll implement is called the Berlekamp-Welch algorithm, but before we get to it we should mention a much simpler algorithm that will work when there are only a few errors.

To remind us of notation, call $k$ the length of the message, so that $k-1$ is the degree of the polynomial we used to encode it. And $n$ is the number of points we used in the encoding. Call the encoded message $M$ as it’s received (as a list of points, possibly with errors).

In the simple method what you do is just randomly pick $k$ points from $M$, do polynomial interpolation on the chosen points to get some polynomial $g$, and see if $g$ agrees with most of the points in $M$. If there really are few errors, then there’s a good chance the randomly chosen points won’t have any errors in them and you’ll win. If you get unlucky and pick some points with errors, then the $g$ you get won’t agree with most of $M$ and you can throw it out and try again. If you get really unlucky and a bad $g$ does agree with most of $M$, then you just run this procedure a few hundred times and take the $g$ you get most often. But again, this only works with a small number of errors and while it could be good enough for many applications, don’t bet your first-born child’s life on it working. Or even your favorite pencil, for that matter. We’re going to implement Berlekamp-Welch so you can win someone else’s favorite pencil. You’re welcome.

Exercise: Implement the simple decoding algorithm and test it on some data.

Suppose we are guaranteed that there are exactly $e < \frac{n-k+1}{2}$ errors in our received message $M = (a_1, b_1, \dots, a_n, b_n)$. Call the polynomial that represents the original message $P$. In other words, we have that $P(a_i) = b_i$ for all but $e$ of the points in $M$.

There are two key ingredients in the algorithm. The first is called the error locator polynomial. We’ll call this polynomial $E(x)$, and it’s just defined by being zero wherever the errors occurred. In symbols, $E(a_i) = 0$ whenever $P(a_i) \neq b_i$. If we knew where the errors occurred, we could write out $E(x)$ explicitly as a product of terms like $(x-a_i)$. And if we knew $E$ we’d also be done, because it would tell us where the errors were and we could do interpolation on all the non-error points in $M$.

So we’re going to have to study $E$ indirectly and use it to get $P$. One nice property of $E(x)$ is the following

$\displaystyle b_i E(a_i) = P(a_i)E(a_i),$

which is true for every pair $(a_i, b_i) \in M$. Indeed, by definition when $P(a_i) \neq b_i$ then $E(a_i) = 0$ so both sides are zero. Now we can use a technique called linearization. It goes like this. The product $P(x) E(x)$, i.e. the right-hand-side of the above equation, is a polynomial, say $Q(x)$, of larger degree ($e + k - 1$). We get the equation for all $i$:

$\displaystyle b_i E(a_i) = Q(a_i)$

Now $E$, $Q$, and $P$ are all unknown, but it turns out that we can actually find $E$ and $Q$ efficiently. Or rather, we can’t guarantee we’ll find $E$ and $Q$ exactly, instead we’ll find two polynomials that have the same quotient as $Q(x)/E(x) = P(x)$. Here’s how that works.

Say we wrote out $E(x)$ as a generic polynomial of degree $e$ and $Q(x)$ as a generic polynomial of degree $e+k-1$. So their coefficients are unspecified variables. Now we can plug in all the points $a_i, b_i$ to the equations $b_i E(a_i) = Q(a_i)$, and this will form a linear system of $2e + k-1$ unknowns ($e$ unknowns come from $E(x)$ and $e+k-1$ come from $Q(x)$).

Now we know that this system has good solution, because if we take the true error locator polynomial and $Q = E(x)P(x)$ with the true $P(x)$ we win. The worry is that we’ll solve this system and get two different polynomials $Q', E'$ whose quotient will be something crazy and unrelated to $P$. But as it turns out this will never happen, and any solution will give the quotient $P$. Here’s a proof you can skip if you hate proofs.

Proof. Say you have two pairs of solutions to the system, $(Q_1, E_1)$ and $(Q_2, E_2)$, and you want to show that $Q_1/E_1 = Q_2/E_2$. Well, they might not be divisible, but we can multiply the previous equation through to get $Q_1E_2 = Q_2E_1$. Now we show two polynomials are equal in the same way as always: subtract and show there are too many roots. Define $R(x) = Q_1E_2 - Q_2E_1$. The claim is that $R(x)$ has $n$ roots, one for every point $(a_i, b_i)$. Indeed,

$\displaystyle R(a_i) = (b_i E_1(a_i))E_2(a_i) - (b_iE_2(a_i)) E_1(a_i) = 0$

But the degree of $R(x)$ is $2e + k - 1$ which is less than $n$ by the assumption that $e < \frac{n-k+1}{2}$. So $R(x)$ has too many roots and must be the zero polynomial, and the two quotients are equal.

$\square$

So the core python routine is just two steps: solve the linear equation, and then divide two polynomials. However, it turns out that no python module has any decent support for solving linear systems of equations over finite fields.  Luckily, I wrote a linear solver way back when and so we’ll adapt it to our purposes. I’ll leave out the gory details of the solver itself, but you can see them in the source for this post. Here is the code that sets up the system

   def solveSystem(encodedMessage):
for e in range(maxE, 0, -1):
ENumVars = e+1
QNumVars = e+k
def row(i, a, b):
return ([b * a**j for j in range(ENumVars)] +
[-1 * a**j for j in range(QNumVars)] +
[0]) # the "extended" part of the linear system

system = ([row(i, a, b) for (i, (a,b)) in enumerate(encodedMessage)] +
[[0] * (ENumVars-1) + [1] + [0] * (QNumVars) + [1]])
# ensure coefficient of x^e in E(x) is 1

solution = someSolution(system, freeVariableValue=1)
E = Poly([solution[j] for j in range(e + 1)])
Q = Poly([solution[j] for j in range(e + 1, len(solution))])

P, remainder = Q.__divmod__(E)
if remainder == 0:
return Q, E

raise Exception("found no divisors!")

def decode(encodedMessage):
Q,E = solveSystem(encodedMessage)

P, remainder = Q.__divmod__(E)
if remainder != 0:
raise Exception("Q is not divisibly by E!")

return P.coefficients


## A simple example

Now let’s go through an extended example with small numbers. Let’s work modulo 7 and say that our message is

2, 3, 2 (mod 7)


In particular, $k=3$ is the length of the message. We’ll encode it as a polynomial in the way we described:

$\displaystyle m(x) = 2 + 3x + 2x^2 (\mod 7)$

If we pick $n = 5$, then we will encode the message as a sequence of five points on $m(x)$, namely $m(0)$ through $m(4)$.

[[0, 2], [1, 0], [2, 2], [3, 1], [4, 4]] (mod 7)


Now let’s add a single error. First remember that our theoretical guarantee says that we can correct any number of errors up to $\frac{n+k-1}{2} - 1$, which in this case is $[(5+3-1) / 2] - 1 = 2$, so we can definitely correct one error. We’ll add 1 to the third point, giving the received corrupted message as

[[0, 2], [1, 0], [2, 3], [3, 1], [4, 4]] (mod 7)


Now we set up the system of equations $b_i E(a_i) = Q(a_i)$ for all $(a_i, b_i)$ above. Rewriting the equations as $b_iE(a_i) - Q(a_i) = 0$, and adding as the last equation the constraint that $x^e = 1$. The columns represent the variables, with the last column being the right-hand-side of the equality as is the standard for Gaussian elimination.

# e0 e1 q0 q1 q2 q3
[
[2, 0, 6, 0, 0, 0, 0],
[0, 0, 6, 6, 6, 6, 0],
[3, 6, 6, 5, 3, 6, 0],
[1, 3, 6, 4, 5, 1, 0],
[4, 2, 6, 3, 5, 6, 0],
[0, 1, 0, 0, 0, 0, 1],
]


Then we do row-reduction to get

[
[1, 0, 0, 0, 0, 0, 5],
[0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 3],
[0, 0, 0, 1, 0, 0, 3],
[0, 0, 0, 0, 1, 0, 6],
[0, 0, 0, 0, 0, 1, 2]
]


And reading off the solution gives $E(x) = 5 + x$ and $Q(x) = 2 + 2x + 6x^2 + 2x^3$. Note in particular that the $E(x)$ given in this solution is not the error locator polynomial! Nevertheless, the quotient of the two polynomials is exactly $m(x) = 2 + 3x + 2x^2$ which gives back the original message.

There is one catch here: how does one determine the value of $e$ to use in setting up the system of linear equations? It turns out that an upper bound on $e$ will work just fine, so long as the upper bound you use agrees with the theoretical maximum number of errors allowed (see the Singleton bound from last time). The effect of doing this is that the linear system ends up with some number of free variables that you can set to arbitrary values, and these will correspond to additional shared roots of $E(x)$ and $Q(x)$ that cancel out upon dividing.

## A larger example

Now it’s time for a sad fact. I tried running Welch-Berlekamp on an encoded version of the following tiny image:

And it didn’t finish after running all night.

Berlekamp-Welch is a slow algorithm for decoding Reed-Solomon codes because it requires one to solve a large system of equations. There’s at least one equation for each pixel in a black and white image! To get around this one typically encodes blocks of pixels together into one message character (since $p$ is larger than $n > k$ there is lots of space), and apparently one can balance it to minimize the number of equations. And finally, a nontrivial inefficiency comes from our implementation of everything in Python without optimizations. If we rewrote everything in C++ or Go and fixed the prime modulus, we would likely see reasonable running times. There are also asymptotically much faster methods based on the fast Fourier transform, and in the future we’ll try implementing some of these. For the dedicated reader, these are all good follow-up projects.

For now we’ll just demonstrate that it works by running it on a larger sample of text, the introductory paragraphs of To Kill a Mockingbird:

def tkamTest():
message = '''When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow.  When it healed, and Jem's fears of never being able to play football were assuaged, he was seldom   self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. He   couldn't have cared less, so long as he could pass and punt.'''

k = len(message)
n = len(message) * 2
p = 2087
integerMessage = [ord(x) for x in message]

enc, dec, solveSystem = makeEncoderDecoder(n, k, p)
print("encoding...")
encoded = enc(integerMessage)

e = int(k/2)
print("corrupting...")
corrupted = corrupt(encoded[:], e, 0, p)

print("decoding...")
Q,E = solveSystem(corrupted)
P, remainder = (Q.__divmod__(E))

recovered = ''.join([chr(x) for x in P.coefficients])
print(recovered)


Running this with unix time produces the following:

encoding...
corrupting...
decoding...
When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow. When it healed, and Jem's fears of never being able to play football were assuaged, he was seldom self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. He couldn't have cared less, so long as he could pass and punt.

real	82m9.813s
user	81m18.891s
sys	0m27.404s


So it finishes in “only” an hour or so.

In any case, the decoding algorithm is an interesting one. In future posts we’ll explore more efficient algorithms and faster implementations.

Until then!

# Finding the majority element of a stream

Problem: Given a massive data stream of $n$ values in $\{ 1, 2, \dots, m \}$ and the guarantee that one value occurs more than $n/2$ times in the stream, determine exactly which value does so.

Solution: (in Python)

def majority(stream):
held = next(stream)
counter = 1

for item in stream:
if item == held:
counter += 1
elif counter == 0:
held = item
counter = 1
else:
counter -= 1

return held


Discussion: Let’s prove correctness. Say that $s$ is the unknown value that occurs more than $n/2$ times. The idea of the algorithm is that if you could pair up elements of your stream so that distinct values are paired up, and then you “kill” these pairs, then $s$ will always survive. The way this algorithm pairs up the values is by holding onto the most recent value that has no pair (implicitly, by keeping a count how many copies of that value you saw). Then when you come across a new element, you decrement the counter and implicitly account for one new pair.

Let’s analyze the complexity of the algorithm. Clearly the algorithm only uses a single pass through the data. Next, if the stream has size $n$, then this algorithm uses $O(\log(n) + \log(m))$ space. Indeed, if the stream entirely consists of a single value (say, a stream of all 1’s) then the counter will be $n$ at the end, which takes $\log(n)$ bits to store. On the other hand, if there are $m$ possible values then storing the largest requires $\log(m)$ bits.

Finally, the guarantee that one value occurs more than $n/2$ times is necessary. If it is not the case the algorithm could output anything (including the most infrequent element!). And moreover, if we don’t have this guarantee then every algorithm that solves the problem must use at least $\Omega(n)$ space in the worst case. In particular, say that $m=n$, and the first $n/2$ items are all distinct and the last $n/2$ items are all the same one, the majority value $s$. If you do not know $s$ in advance, then you must keep at least one bit of information to know which symbols occurred in the first half of the stream because any of them could be $s$. So the guarantee allows us to bypass that barrier.

This algorithm can be generalized to detect $k$ items with frequency above some threshold $n/(k+1)$ using space $O(k \log n)$. The idea is to keep $k$ counters instead of one, adding new elements when any counter is zero. When you see an element not being tracked by your $k$ counters (which are all positive), you decrement all the counters by 1. This is like a $k$-to-one matching rather than a pairing.

# Making Hybrid Images

The Mona Lisa

Leonardo da Vinci’s Mona Lisa is one of the most famous paintings of all time. And there has always been a discussion around her enigmatic smile. He used a trademark Renaissance technique called sfumato, which involves many thin layers of glaze mixed with subtle pigments. The striking result is that when you look directly at Mona Lisa’s smile, it seems to disappear. But when you look at the background your peripherals see a smiling face.

One could spend decades studying the works of these masters from various perspectives, but if we want to hone in on the disappearing nature of that smile, mathematics can provide valuable insights. Indeed, though he may not have known the relationship between his work and da Vinci’s, hundreds of years later Salvador Dali did the artist’s equivalent of mathematically isolating the problem with his painting, “Gala Contemplating the Mediterranean Sea.”

Gala Contemplating the Mediterranean Sea (Salvador Dali, 1976)

Here you see a woman in the foreground, but step back quite far from the picture and there is a (more or less) clear image of Abraham Lincoln. Here the question of gaze is the blaring focus of the work. Now of course Dali and da Vinci weren’t scribbling down equations and computing integrals; their artistic expression was much less well-defined. But we the artistically challenged have tools of our own: mathematics, science, and programming.

In 2006 Aude Oliva, Antonio Torralba, and Philippe. G. Schyns used those tools to merge the distance of Dali and the faded smiles of da Vinci into one cohesive idea. In their 2006 paper they presented the notion of a “hybrid image,” presented below.

The Mona Lisas of Science

If you look closely, you’ll see three women, each of which looks the teensiest bit strange, like they might be trying to suppress a smile, but none of them are smiling. Blur your eyes or step back a few meters, and they clearly look happy. The effect is quite dramatic. At the risk of being overly dramatic, these three women are literally modern day versions of Mona Lisa, the “Mona Lisas of Science,” if you will.

Another, perhaps more famous version of their technique, since it was more widely publicized, is their “Marilyn Einstein,” which up close is Albert Einstein and from far away is Marilyn Monroe.

Marilyn Einstein

This one gets to the heart of the question of what the eye sees at close range versus long range. And it turns out that you can address this question (and create brilliant works of art like the ones above) with some basic Fourier analysis.

## Intuitive Fourier analysis (and references)

The basic idea of Fourier analysis is the idea that smooth functions are hard to understand, and realization of how great it would be if we could decompose them into simpler pieces. Decomposing complex things into simpler parts is one of the main tools in all of mathematics, and Fourier analysis is one of the clearest examples of its application.

In particular, the things we care about are functions $f(x)$ with specific properties I won’t detail here like “smoothness” and “finiteness.” And the building blocks are the complex exponential functions

$\displaystyle e^{2 \pi i kx}$

where $k$ can be any integer. If you have done some linear algebra (and ignore this if you haven’t), then I can summarize the idea succinctly by saying the complex exponentials form an orthonormal basis for the vector space of square-integrable functions.

Back in colloquial language, what the Fourier theorem says is that any function of the kind we care about can be broken down into (perhaps infinitely many) pieces of this form called Fourier coefficients (I’m abusing the word “coefficient” here). The way it’s breaking down is also pleasingly simple: it’s a linear combination. Informally that means you’re just adding up all the complex exponentials with specific weights for each one. Mathematically, the conversion from the function to its Fourier coefficients is called the Fourier transform, and the set of all Fourier coefficients together is called the Fourier spectrum. So if you want to learn about your function $f$, or more importantly modify it in some way, you can inspect and modify its spectrum instead. The reason this is useful is that Fourier coefficients have very natural interpretations in sound and images, as we’ll see for the latter.

We wrote $f(x)$ and the complex exponential as a function of one real variable, but you can do the same thing for two variables (or a hundred!). And, if you’re willing to do some abusing and ignore the complexness of complex numbers, then you can visualize “complex exponentials in two variables” as images of stripes whose orientation and thickness correspond to two parameters (i.e., the $k$ in the offset equation becomes two coefficients). The video below shows how such complex exponentials can be used to build up an image of striking detail. The left frame shows which complex exponential is currently being added, and the right frame shows the layers all put together. I think the result is quite beautiful.

This just goes to show how powerful da Vinci’s idea of fine layering is: it’s as powerful as possible because it can create any image!

Now for digital images like the one above, everything is finite. So rather than have an infinitely precise function and a corresponding infinite set of Fourier coefficients, you get a finite list of sampled values (pixels) and a corresponding grid of Fourier coefficients. But the important and beautiful theorem is, and I want to emphasize how groundbreakingly important this is:

If you give me an image (or any function!) I can compute the decomposition very efficiently.

And the same theorem lets you go the other way: if you give me the decomposition, I can compute the original function’s samples quite easily. The algorithm to do this is called the Fast Fourier transform, and if any piece of mathematics or computer science has a legitimate claim to changing the world, it’s the Fast Fourier transform. It’s hard to pinpoint specific applications, because the transform is so ubiquitous across science and engineering, but we definitely would not have cell phones, satellites, internet, or electronics anywhere near as small as we do without the Fourier transform and the ability to compute it quickly.

Constructing hybrid images is one particularly nice example of manipulating the Fourier spectrum of two images, and then combining them back into a single image. That’s what we’ll do now.

As a side note, by the nature of brevity, the discussion above is a big disservice to the mathematics involved. I summarized and abused in ways that mathematicians would object to. If you want to see a much better treatment of the material, this blog has a long series of posts developing Fourier transforms and their discrete analogues from scratch. See our four primers, which lead into the main content posts where we implement the Fast Fourier transform in Python and use it to apply digital watermarks to an image. Note that in those posts, as in this one, all of the materials and code used are posted on this blog’s Github page.

## High and low frequencies

For images, interpreting ranges of Fourier coefficients is easy to do. You can imagine the coefficients lying on a grid in the plane like so:

Each dot in this grid corresponds to how “intense” the Fourier coefficient is. That is, it’s the magnitude of the (complex) coefficient of the corresponding complex exponential. Now the points that are closer to the origin correspond informally to the broad, smooth changes in the image. These are called “low frequency” coefficients. And points that are further away correspond to sharp changes and edges, and are likewise called “high frequency” components. So the if you wanted to “hybridize” two images, you’d pick ones with complementary intensities in these regions. That’s why Einstein (with all his wiry hair and wrinkles) and Monroe (with smooth features) are such good candidates. That’s also why, when we layered the Fourier components one by one in the video from earlier, we see the fuzzy shapes emerge before the fine details.

Moreover, we can “extract” the high frequency Fourier components by simply removing the low frequency ones. It’s a bit more complicated than that, since you want the transition from “something” to “nothing” to be smooth in sone sense. A proper discussion of this would go into sampling and the Nyquist frequency, but that’s beyond the scope of this post. Rather, we’ll just define a family of “filtering functions” without motivation and observe that they work well.

Definition: The Gaussian filter function with variance $\sigma$ and center $(a, b)$ is the function

$\displaystyle g(x,y) = e^{-\frac{(x - a)^2 + (y - b)^2}{2 \sigma^2}}$

It looks like this

image credit Wikipedia

In particular, at zero the function is 1 and it gradually drops to zero as you get farther away. The parameter $\sigma$ controls the rate at which it vanishes, and in the picture above the center is set to $(0,0)$.

Now what we’ll do is take our image, compute its spectrum, and multiply coordinatewise with a certain Gaussian function. If we’re trying to get rid of high-frequency components (called a “low-pass filter” because it lets the low frequencies through), we can just multiply the Fourier coefficients directly by the filter values $g(x,y)$, and if we’re doing a “high-pass filter” we multiply by $1 - g(x,y)$.

Before we get to the code, here’s an example of a low-pass filter. First, take this image of Marilyn Monroe

Now compute its Fourier transform

Apply the low-pass filter

And reverse the Fourier transform to get an image

In fact, this is a common operation in programs like photoshop for blurring an image (it’s called a Gaussian blur for obvious reasons). Here’s the python code to do this. You can download it along with all of the other resources used in making this post on this blog’s Github page.

import numpy
from numpy.fft import fft2, ifft2, fftshift, ifftshift
from scipy import misc
from scipy import ndimage
import math

def makeGaussianFilter(numRows, numCols, sigma, highPass=True):
centerI = int(numRows/2) + 1 if numRows % 2 == 1 else int(numRows/2)
centerJ = int(numCols/2) + 1 if numCols % 2 == 1 else int(numCols/2)

def gaussian(i,j):
coefficient = math.exp(-1.0 * ((i - centerI)**2 + (j - centerJ)**2) / (2 * sigma**2))
return 1 - coefficient if highPass else coefficient

return numpy.array([[gaussian(i,j) for j in range(numCols)] for i in range(numRows)])

def filterDFT(imageMatrix, filterMatrix):
shiftedDFT = fftshift(fft2(imageMatrix))
filteredDFT = shiftedDFT * filterMatrix
return ifft2(ifftshift(filteredDFT))

def lowPass(imageMatrix, sigma):
n,m = imageMatrix.shape
return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=False))

def highPass(imageMatrix, sigma):
n,m = imageMatrix.shape
return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=True))

if __name__ == "__main__":
marilyn = ndimage.imread("marilyn.png", flatten=True)
lowPassedMarilyn = lowPass(marilyn, 20)
misc.imsave("low-passed-marilyn.png", numpy.real(lowPassedMarilyn))


The first function samples the values from a Gaussian function with the specified parameters, discretizing the function and storing the values in a matrix. Then the filterDFT function applies the filter by doing coordinatewise multiplication (note these are all numpy arrays). We can do the same thing with a high-pass filter, producing the edgy image below

And if we compute the average of these two images, we basically get back to the original.

So the only difference between this and a hybrid image is that you take the low-passed part of one image and the high-passed part of another. Then the art is in balancing the parameters so as to make the averaged image look right. Indeed, with the following picture of Einstein and the above shot of Monroe, we can get a pretty good recreation of the Oliva-Torralba-Schyns piece. I think with more tinkering it could be even better (I did barely any centering/aligning/resizing to the original images).

Albert Einstein, Marilyn Monroe, and their hybridization.

And here’s the code for it

def hybridImage(highFreqImg, lowFreqImg, sigmaHigh, sigmaLow):
highPassed = highPass(highFreqImg, sigmaHigh)
lowPassed = lowPass(lowFreqImg, sigmaLow)

return highPassed + lowPassed


Interestingly enough, doing it in reverse doesn’t give quite as pleasing results, but it still technically works. So there’s something particularly important that the high-passed image does have a lot of high-frequency components, and vice versa for the low pass.

You can see some of the other hybrid images Oliva et al constructed over at their web gallery.

## Next Steps

How can we take this idea further? There are a few avenues I can think of. The most obvious one would be to see how this extends to video. Could one come up with generic parameters so that when two videos are hybridized (frame by frame, using this technique) it is only easy to see one at close distance? Or else, could we apply a three-dimensional transform to a video and modify that in some principled way? I think one would not likely find anything astounding, but who knows?

Second would be to look at the many other transforms we have at our disposal. How does manipulating the spectra of these transforms affect the original image, and can you make images that are hybridized in senses other than this one?

And finally, can we bring this idea down in dimension to work with one-dimensional signals? In particular, can we hybridize music? It could usher in a new generation of mashup songs that sound different depending on whether you wear earmuffs :)

Until next time!

# Learning to Love Complex Numbers

This post is intended for people with a little bit of programming experience and no prior mathematical background.

So let’s talk about numbers.

Numbers are curious things. On one hand, they represent one of the most natural things known to humans, which is quantity. It’s so natural to humans that even newborn babies are in tune with the difference between quantities of objects between 1 and 3, in that they notice when quantity changes much more vividly than other features like color or shape.

But our familiarity with quantity doesn’t change the fact that numbers themselves (as an idea) are a human invention. And they’re not like most human inventions, the kinds where you have to tinker with gears or circuits to get a machine that makes your cappuccino. No, these are mathematical inventions. These inventions exist only in our minds.

Numbers didn’t always exist. A long time ago, back when the Greeks philosophers were doing their philosophizing, negative numbers didn’t exist! In fact, it wasn’t until 1200 AD that the number zero was first considered in Europe. Zero, along with negative numbers and fractions and square roots and all the rest, were invented primarily to help people solve more problems than they could with the numbers they had available. That is, numbers were invented primarily as a way for people to describe their ideas in a useful way. People simply  wondered “is there a number whose square gives you 2?” And after a while they just decided there was and called it $\sqrt{2}$ because they didn’t have a better name for it.

But with these new solutions came a host of new problems. You see, although I said mathematical inventions only exist in our minds, once they’re invented they gain a life of their own. You start to notice patterns in your mathematical objects and you have to figure out why they do the things they do. And numbers are a perfectly good example of this: once I notice that I can multiply a number by itself, I can ask how often these “perfect squares” occur. That is, what’s the pattern in the numbers $1^2, 2^2, 3^2, 4^2, \dots$? If you think about it for a while, you’ll find that square numbers have a very special relationship with odd numbers.

Other times, however, the things you invent turn out to make no sense at all, and you can prove they never existed in the first place! It’s an odd state of affairs, but we’re going to approach the subject of complex numbers from this mindset. We’re going to come up with a simple idea, the idea that negative numbers can be perfect squares, and explore the world of patterns it opens up. Along the way we’ll do a little bit of programming to help explore, give some simple proofs to solidify our intuition, and by the end we’ll see how these ideas can cause wonderful patterns like this one:

## The number i

Let’s bring the story back around to squares. One fact we all remember about numbers is that squaring a number gives you something non-negative. $7^2 = 49, (-2)^2 = 4, 0^2 = 0$, and so on. But it certainly doesn’t have to be this way. What if we got sick of that stupid fact and decided to invent a new number whose square was negative? Which negative, you ask? Well it doesn’t really matter, because I can always stretch it larger or smaller so that it’s square is -1.

Let’s see how: if you say that your made-up number $x$ makes $x^2 = -7$, then I can just use $\frac{x}{\sqrt{7}}$ to get a number whose square is -1. If you’re going to invent a number that’s supposed to interact with our usual numbers, then you have to be allowed to add, subtract, and multiply $x$ with regular old real numbers, and the usual properties would have to still work. So it would have to be true that $(x / \sqrt{7})^2 = x^2 / \sqrt{7}^2 = -7/7 = -1$.

So because it makes no difference (this is what mathematicians mean by, “without loss of generality”) we can assume that the number we’re inventing will have a square of negative one. Just to line up with history, let’s call the new number $i$. So there it is: $i$ exists and $i^2 = -1$. And now that we are “asserting” that $i$ plays nicely with real numbers, we get these natural rules for adding and subtracting and multiplying and dividing. For example

• $1 + i$ is a new number, which we’ll just call $1+i$. And if we added two of these together, $(1+ i) + (1+i)$, we can combine the real parts and the $i$ parts to get $2 + 2i$. Same goes for subtraction. In general a complex number looks like $a + bi$, because as we’ll see in the other points you can simplify every simple arithmetic expression down to just one “real number” part and one “real number times $i$” part.
• We can multiply $3 \cdot i$, and we’ll just call it $3i$, and we require that multiplication distributes across addition (that the FOIL rule works). So that, for example, $(2 - i)(1 + 3i) = (2 + 6i - i - 3i^2) = (2 + 3) + (6i - i) = (5 + 5i)$.
• Dividing is a significantly more annoying. Say we want to figure out what $1 / (1+i)$ is (in fact, it’s not even obvious that this should look like a regular number! But it does). The $1 / a$ notation just means we’re looking for a number which, when we multiply by the denominator $a$, we get back to 1. So we’re looking to find out when $(a + bi)(1 + i) = 1 + 0i$ where $a$ and $b$ are variables we’re trying to solve for. If we multiply it out we get $(a-b) + (a + b)i = 1 + 0i$, and since the real part and the $i$ part have to match up, we know that $a - b = 1$ and $a + b = 0$. If we solve these two equations, we find that $a = 1/2, b = -1/2$ works great. If we want to figure out something like $(2 + 3i) / (1 - i)$, we just find out what $1 / (1- i)$ is first, and then multiply the result by $(2+3i)$.

So that was tedious and extremely boring, and we imagine you didn’t even read it (that’s okay, it really is boring!). All we’re doing is establishing ground rules for the game, so if you come across some arithmetic that doesn’t make sense, you can refer back to this list to see what’s going on. And once again, for the purpose of this post, we’re asserting that all these laws hold. Maybe some laws follow from others, but as long as we don’t come up with any nasty self-contradictions we’ll be fine.

And now we turn to the real questions: is $i$ the only square root of -1? Does $i$ itself have a square root? If it didn’t, we’d be back to where we started, with some numbers (the non-$i$ numbers) having square roots while others don’t. And so we’d feel the need to make all the $i$ numbers happy by making up more numbers to be their square roots, and then worrying what if these new numbers don’t have square roots and…gah!

I’ll just let you in on the secret to save us from this crisis. It turns out that $i$ does have a square root in terms of other $i$ numbers, but in order to find it we’ll need to understand $i$ from a different angle, and that angle turns out to be geometry.

Geometry? How is geometry going to help me understand numbers!?

It’s a valid question and part of why complex numbers are so fascinating. And I don’t mean geometry like triangles and circles and parallel lines (though there will be much talk of angles), I mean transformations in the sense that we’ll be “stretching,” “squishing,” and “rotating” numbers. Maybe another time I can tell you why for me “geometry” means stretching and rotating; it’s a long but very fun story.

The clever insight is that you can represent complex numbers as geometric objects in the first place. To do it, you just think of $a + bi$ as a pair of numbers $(a,b)$, (the pair of real part and $i$ part), and then plot that point on a plane. For us, the $x$-axis will be the “real” axis, and the $y$-axis will be the $i$-axis. So the number $(3 - 4i)$ is plotted 3 units in the positive $x$ direction and 4 units in the negative $y$ direction. Like this:

The “j” instead of “i” is not a typo, but a disappointing fact about the programming language we used to make this image. We’ll talk more about why later.

We draw it as an arrow for a good reason. Stretching, squishing, rotating, and reflecting will all be applied to the arrow, keeping its tail fixed at the center of the axes. Sometimes the arrow is called a “vector,” but we won’t use that word because here it’s synonymous with “complex number.”

So let’s get started squishing stuff.

## Stretching, Squishing, Rotating

Before we continue I should clear up some names. We call a number that has an $i$ in it a complex number, and we call the part without the $i$ the real part (like 2 in $2-i$) and the part with $i$ the complex part.

Python is going to be a great asset for us in exploring complex numbers, so let’s jump right into it. It turns out that Python natively supports complex numbers, and I wrote a program for drawing complex numbers. I used it to make the plot above. The program depends on a library I hate called matplotlib, and so the point of the program is to shield you from as much pain as possible and focus on complex numbers. You can use the program by downloading it from this blog’s Github page, along with everything else I made in writing this post. All you need to know how to do is call a function, and I’ve done a bit of window dressing removal to simplify things (I really hate matplotlib).

Here’s the function header:

# plotComplexNumbers : [complex] -> None
# display a plot of the given list of complex numbers
def plotComplexNumbers(numbers):
...


Before we show some examples of how to use it, we have to understand how to use complex numbers in Python. It’s pretty simple, except that Python was written by people who hate math, and so they decided the complex number would be represented by $j$ instead of $i$ (people who hate math are sometimes called “engineers,” and they use $j$ out of spite. Not really, though).

So in Python it’s just like any other computation. For example:

>>> (1 + 1j)*(4 - 2j) == (6+2j)
True
>>> 1 / (1+1j)
(0.5-0.5j)

And so calling the plotting function with a given list of complex numbers is as simple as importing the module and calling the function

from plotcomplex import plot
plot.plotComplexNumbers([(-1+1j), (1+2j), (-1.5 - 0.5j), (.6 - 1.8j)])


Here’s the result

So let’s use plots like this one to explore what “multiplication by $i$” does to a complex number. It might not seem exciting at first, but I promise there’s a neat punchline.

Even without plotting it’s pretty easy to tell what multiplying by $i$ does to some numbers. It takes 1 to $i$, moves $i$ to $i^2 = -1$, it takes -1 to $-i$, and $-i$ to $-i \cdot i = 1$.

What’s the pattern in these? well if we plot all these numbers, they’re all at right angles in counter-clockwise order. So this might suggest that multiplication by $i$ does some kind of rotation. Is that always the case? Well lets try it with some other more complicated numbers. Click the plots below to enlarge.

Well, it looks close but it’s hard to tell. Some of the axes are squished and stretched, so it might be that our images don’t accurately represent the numbers (the real world can be such a pain). Well when visual techniques fail, we can attempt to prove it.

Clearly multiplying by $i$ does some kind of rotation, maybe with other stuff too, and it shouldn’t be so hard to see that multiplying by $i$ does the same thing no matter which number you use (okay, the skeptical readers will say that’s totally hard to see, but we’ll prove it super rigorously in a minute). So if we take any number and multiply it by $i$ once, then twice, then three times, then four, and if we only get back to where we started at four multiplications, then each rotation had to be a quarter turn.

Indeed,

$\displaystyle (a + bi) i^4 = (ai - b) i^3 = (-a - bi) i^2 = (-ai + b) i = a + bi$

This still isn’t all that convincing, and we want to be 100% sure we’re right. What we really need is a way to arithmetically compute the angle between two complex numbers in their plotted forms. What we’ll do is find a way to measure the angle of one complex number with the $x$-axis, and then by subtraction we can get angles between arbitrary points. For example, in the figure below $\theta = \theta_1 - \theta_2$.

One way to do this is with trigonometry: the geometric drawing of $a + bi$ is the hypotenuse of a right triangle with the $x$-axis.

And so if $r$ is the length of the arrow, then by the definition of sine and cosine, $\cos(\theta) = a/r, \sin(\theta) = b/r$. If we have $r, \theta$, and $r > 0$, we can solve for a unique $a$ and $b$, so instead of representing a complex number in terms of the pair of numbers $(a,b)$, we can represent it with the pair of numbers $(r, \theta)$. And the conversion between the two is just

$a + bi = r \cos(\theta) + (r \sin(\theta)) i$

The $(r, \theta)$ representation is called the polar representation, while the $(a,b)$ representation is called the rectangular representation or the Cartesian representation. Converting between polar and Cartesian coordinates fills the pages of many awful pre-calculus textbooks (despite the fact that complex numbers don’t exist in classical calculus). Luckily for us Python has built-in functions to convert between the two representations for us.

>>> import cmath
>>> cmath.polar(1 + 1j)
(1.4142135623730951, 0.7853981633974483)
>>> z = cmath.polar(1 + 1j)
>>> cmath.rect(z[0], z[1])
(1.0000000000000002+1j)


It’s a little bit inaccurate on the rounding, but it’s fine for our purposes.

So how do we compute the angle between two complex numbers? Just convert each to the polar form, and subtract the second coordinates. So if we get back to our true goal, to figure out what multiplication by $i$ does, we can just do everything in polar form. Here’s a program that computes the angle between two complex numbers.

def angleBetween(z, w):
zPolar, wPolar = cmath.polar(z), cmath.polar(w)
return wPolar[1] - zPolar[1]

print(angleBetween(1 + 1j, (1 + 1j) * 1j))
print(angleBetween(2 - 3j, (2 - 3j) * 1j))
print(angleBetween(-0.5 + 7j, (-0.5 + 7j) * 1j))


Running it gives

1.5707963267948966
1.5707963267948966
-4.71238898038469


Note that the decimal form of $\pi/2$ is 1.57079…, and that the negative angle is equivalent to $\pi/2$ if you add a full turn of $2\pi$ to it. So programmatically we can see that for every input we try multiplying by $i$ rotates 90 degrees.

But we still haven’t proved it works. So let’s do that now. To say what the angle is between $r \cos (\theta) + ri \sin (\theta)$ and $i \cdot [r \cos (\theta) + ri \sin(\theta)] = -r \sin (\theta) + ri \cos(\theta)$, we need to transform the second number into the usual polar form (where the $i$ is on the sine part and not the cosine part). But we know, or I’m telling you now, this nice fact about sine and cosine:

$\displaystyle \sin(\theta + \pi/2) = cos(\theta)$
$\displaystyle \cos(\theta + \pi / 2) = -\sin(\theta)$

This fact is maybe awkward to write out algebraically, but it’s just saying that if you shift the whole sine curve a little bit you get the cosine curve, and if you keep shifting it you get the opposite of the sine curve (and if you kept shifting it even more you’d eventually get back to the sine curve; they’re called periodic for this reason).

So immediately we can rewrite the second number as $r \cos(\theta + \pi/2) + i r \sin (\theta + \pi/2)$. The angle is the same as the original angle plus a right angle of $\pi/2$. Neat!

Applying this same idea to $(a + bi) \cdot (c + di)$, it’s not much harder to prove that multiplying two complex numbers in general multiplies their lengths and adds their angles. So if a complex number $z$ has its magnitude $r$ smaller than 1, multiplying by $z$ squishes and rotates whatever is being multiplied. And if the magnitude is greater than 1, it stretches and rotates. So we have a super simple geometric understanding of how arithmetic with complex numbers works. And as we’re about to see, all this stretching and rotating results in some really weird (and beautifully mysterious!) mathematics and programs.

But before we do that we still have one question to address, the question that started this whole geometric train of thought: does $i$ have a square root? Indeed, I’m just looking for a number such that, when I square its length and double its angle, I get $i = \cos(\pi/2) + i \sin(\pi/2)$. Indeed, the angle we want is $\pi/4$, and the length we want is $r = 1$, which means $\sqrt{i} = \cos(\pi/4) + i \sin(\pi/4)$. Sweet! There is another root if you play with the signs, see if you can figure it out.

In fact it’s a very deeper and more beautiful theorem (“theorem” means “really important fact”) called the fundamental theorem of algebra. And essentially it says that the complex numbers are complete. That is, we can always find square roots, cube roots, or anything roots of numbers involving $i$. It actually says a lot more, but it’s easier to appreciate the rest of it after you do more math than we’re going to do in this post.

On to pretty patterns!

## The Fractal

So here’s a little experiment. Since every point in the plane is the end of some arrow representing a complex number, we can imagine transforming the entire complex plane by transforming each number by the same rule. The most interesting simple rule we can think of: squaring! So though it might strain your capacity for imagination, try to visualize the idea like this. Squaring a complex number is the same as squaring it’s length and doubling its angle. So imagine: any numbers whose arrows are longer than 1 will grow much bigger, arrows shorter than 1 will shrink, and arrows of length exactly one will stay the same length (arrows close to length 1 will grow/shrink much more slowly than those far away from 1). And complex numbers with small positive angles will increase their angle, but only a bit, while larger angles will grow faster.

Here’s an animation made by Douglas Arnold showing what happens to the set of complex numbers $a + bi$ with $0 \leq a, b \leq 1$ or $-1 < a,b < 0$. Again, imagine every point is the end of a different arrow for the corresponding complex number. The animation is for a single squaring, and the points move along the arc they would travel if one rotated/stretched them smoothly.

So that’s pretty, but this is by all accounts a well-behaved transformation. It’s “predictable,” because for example we can always tell which complex numbers will get bigger and bigger (in length) and which will get smaller.

What if, just for the sake of tinkering, we changed the transformation a little bit? That is, instead of sending $z = a+bi$ to $z^2$ (I’ll often write this $z \mapsto z^2$), what if we sent

$\displaystyle z \mapsto z^2 + 1$

Now it’s not so obvious: which numbers will grow and which will shrink? Notice that it’s odd because adding 1 only changes the real part of the number. So a number whose length is greater than 1 can become small under this transformation. For example, $i$ is sent to $0$, so something slightly larger would also be close to zero. Indeed, $5i/4 \mapsto -9/16$.

So here’s an interesting question: are there any complex numbers that will stay small even if I keep transforming like this forever? Specifically, if I call $f(z) = z^2$, and I call $f^2(z) = f(f(z))$, and likewise call $f^k(z)$ for $k$ repeated transformations of $z$, is there a number $z$ so that for every $k$, the value $f^k(z) < 2$? “Obvious” choices like $z=0$ don’t work, and neither do random guesses like $z=i$ or $z=1$. So should we guess the answer is no?

Before we jump to conclusions let’s write a program to see what happens for more than our random guesses. The program is simple: we’ll define the “square plus one” function, and then repeatedly apply that function to a number for some long number of times (say, 250 times). If the length of the number stays under 2 after so many tries, we’ll call it “small forever,” and otherwise we’ll call it “not small forever.”

def squarePlusOne(z):
return z*z + 1

def isSmallForever(z, f):
k = 0

while abs(z) < 2: z = f(z) k += 1 if k > 250:
return True

return False


This isSmallForever function is generic: you can give it any function $f$ and it will repeatedly call $f$ on $z$ until the result grows bigger than 2 in length. Note that the abs function is a built-in Python function for computing the length of a complex number.

Then I wrote a classify function, which you can give a window and a small increment, and it will produce a grid of zeros and ones marking the results of isSmallForever. The details of the function are not that important. I also wrote a function that turns the grid into a picture. So here’s an example of how we’d use it:

from plotcomplex.plot import gridToImage

def classifySquarePlusOne(z):
return isSmallForever(z, squarePlusOne)

grid = classify(classifySquarePlusOne) # the other arguments are defaulted to [-2,2], [-2,2], 0.1
gridToImage(grid)


And here’s the result. Points colored black grow beyond 2, and white points stay small for the whole test.

Looks like they’ll always grow big.

So it looks like repeated squaring plus one will always make complex numbers grow big. That’s not too exciting, but we can always make it more exciting. What happens if we replace the 1 in $z^2 + 1$ with a different complex number? For example, if we do $z^2 - 1$ then will things always grow big?

You can randomly guess and see that 0 will never grow big, because $0^2 - 1 = -1$ and $(-1)^2 - 1 = 0$. It will just oscillate forever. So with -1 some numbers will grow and some will not! Let’s use the same routine above to see which:

def classifySquareMinusOne(z):
return isSmallForever(z, squareMinusOne)

grid = classify(classifySquareMinusOne)
gridToImage(grid)


And the result:

Now that’s a more interesting picture! Let’s ramp up the resolution

grid = classify(classifySquareMinusOne, step=0.001)
gridToImage(grid)


Gorgeous. If you try this at home you’ll notice, however, that this took a hell of a long time to run. Speeding up our programs is very possible, but it’s a long story for another time. For now we can just be patient.

Indeed, this image has a ton of interesting details! It looks almost circular in the middle, but if we zoom in we can see that it’s more like a rippling wave

It’s pretty incredible, and a huge question is jumping out at me: what the heck is causing this pattern to occur? What secret does -1 know that +1 doesn’t that makes the resulting pattern so intricate?

But an even bigger question is this. We just discovered that some values of $c$ make $z \mapsto z^2 + c$ result in interesting patterns, and some do not! So the question is which ones make interesting patterns? Even if we just, say, fix the starting point to zero: what is the pattern in the complex numbers that would tell me when this transformation makes zero blow up, and when it keeps zero small?

Sounds like a job for another program. This time we’ll use a nice little Python feature called a closure, which we define a function that saves the information that exists when it’s created for later. It will let us write a function that takes in $c$ and produces a function that transforms according to $z \mapsto z^2+c$.

def squarePlusC(c):
def f(z):
return z*z + c

return f


And we can use the very same classification/graphing function from before to do this.

def classifySquarePlusC(c):
return isSmallForever(0, squarePlusC(c))

grid = classify(classifySquarePlusC, xRange=(-2, 1), yRange=(-1, 1), step=0.005)
gridToImage(grid)


And the result:

Stunning. This wonderful pattern, which is still largely not understood today, is known as the Mandelbrot set. That is, the white points are the points in the Mandlebrot set, and the black points are not in it. The detail on the border of this thing is infinitely intricate. For example, we can change the window in our little program to zoom in on a particular region.

And if you keep zooming in you keep getting more and more detail. This was true of the specific case of $z^2 - 1$, but somehow the patterns in the Mandelbrot set are much more varied and interesting. And if you keep going down eventually you’ll see patterns that look like the original Mandelbrot set. We can already kind of see that happening above. The name for this idea is a fractal, and the $z^2 - 1$ image has it too. Fractals are a fascinating and mysterious subject studied in a field called discrete dynamical systems. Many people dedicate their entire lives to studying these things, and it’s for good reason. There’s a lot to learn and even more that’s unknown!

So this is the end of our journey for now. I’ve posted all of the code we used in the making of this post so you can continue to play, but here are some interesting ideas.

• The Mandelbrot set (and most fractals) are usually colored. The way they’re colored is as follows. Rather than just say true or false when zero blows up beyond 2 in length, you return the number of iterations $k$ that happened. Then you pick a color based on how big $k$ is. There’s a link below that lets you play with this. In fact, adding colors shows that there is even more intricate detail happening outside the Mandelbrot set that’s too faint to see in our pictures above. Such as this.
• Some very simple questions about fractals are very hard to answer. For example, is the Mandelbrot set connected? That is, is it possible to “walk” from every point in the Mandelbrot set to every other point without leaving the set? Despite the scattering of points in the zoomed in picture above that suggest the answer is no, the answer is actually yes! This is a really difficult thing to prove, however.
• The patterns in many fractals are often used to generate realistic looking landscapes and generate pseudo randomness. So fractals are not just mathematical curiosities.
• You should definitely be experimenting with this stuff! What happens if you change the length threshold from 2 to some bigger number? What about a smaller number? What if you do powers different than $2$? There’s so much to explore!
• The big picture thing to take away from this is that it’s not the numbers themselves that are particularly interesting, it’s the transformations of the numbers that generate these patterns! The interesting questions are what kinds of things are the same under these transformations, and what things are different. This is a very general idea in mathematics, and the more math you do the more you’ll find yourself wondering about useful and bizarre transformations.

For the chance to keep playing with the Mandelbrot set, check out this Mandelbrot grapher that works in your browser. It lets you drag rectangles to zoom further in on regions of interest. It’s really fun.

Until next time!