Making Hybrid Images

monalisa

The Mona Lisa

Leonardo da Vinci’s Mona Lisa is one of the most famous paintings of all time. And there has always been a discussion around her enigmatic smile. He used a trademark Renaissance technique called sfumato, which involves many thin layers of glaze mixed with subtle pigments. The striking result is that when you look directly at Mona Lisa’s smile, it seems to disappear. But when you look at the background your peripherals see a smiling face.

One could spend decades studying the works of these masters from various perspectives, but if we want to hone in on the disappearing nature of that smile, mathematics can provide valuable insights. Indeed, though he may not have known the relationship between his work and da Vinci’s, hundreds of years later Salvador Dali did the artist’s equivalent of mathematically isolating the problem with his painting, “Gala Contemplating the Mediterranean Sea.”

gala-dali

Gala Contemplating the Mediterranean Sea (Salvador Dali, 1976)

Here you see a woman in the foreground, but step back quite far from the picture and there is a (more or less) clear image of Abraham Lincoln. Here the question of gaze is the blaring focus of the work. Now of course Dali and da Vinci weren’t scribbling down equations and computing integrals; their artistic expression was much less well-defined. But we the artistically challenged have tools of our own: mathematics, science, and programming.

In 2006 Aude Oliva, Antonio Torralba, and Philippe. G. Schyns used those tools to merge the distance of Dali and the faded smiles of da Vinci into one cohesive idea. In their 2006 paper they presented the notion of a “hybrid image,” presented below.

monalisas

The Mona Lisas of Science

If you look closely, you’ll see three women, each of which looks the teensiest bit strange, like they might be trying to suppress a smile, but none of them are smiling. Blur your eyes or step back a few meters, and they clearly look happy. The effect is quite dramatic. At the risk of being overly dramatic, these three women are literally modern day versions of Mona Lisa, the “Mona Lisas of Science,” if you will.

Another, perhaps more famous version of their technique, since it was more widely publicized, is their “Marilyn Einstein,” which up close is Albert Einstein and from far away is Marilyn Monroe.

marilyn-einstein

Marilyn Einstein

This one gets to the heart of the question of what the eye sees at close range versus long range. And it turns out that you can address this question (and create brilliant works of art like the ones above) with some basic Fourier analysis.

Intuitive Fourier analysis (and references)

The basic idea of Fourier analysis is the idea that smooth functions are hard to understand, and realization of how great it would be if we could decompose them into simpler pieces. Decomposing complex things into simpler parts is one of the main tools in all of mathematics, and Fourier analysis is one of the clearest examples of its application.

In particular, the things we care about are functions $ f(x)$ with specific properties I won’t detail here like “smoothness” and “finiteness.” And the building blocks are the complex exponential functions

$ \displaystyle e^{2 \pi i kx}$

where $ k$ can be any integer. If you have done some linear algebra (and ignore this if you haven’t), then I can summarize the idea succinctly by saying the complex exponentials form an orthonormal basis for the vector space of square-integrable functions.

Back in colloquial language, what the Fourier theorem says is that any function of the kind we care about can be broken down into (perhaps infinitely many) pieces of this form called Fourier coefficients (I’m abusing the word “coefficient” here). The way it’s breaking down is also pleasingly simple: it’s a linear combination. Informally that means you’re just adding up all the complex exponentials with specific weights for each one. Mathematically, the conversion from the function to its Fourier coefficients is called the Fourier transform, and the set of all Fourier coefficients together is called the Fourier spectrum. So if you want to learn about your function $ f$, or more importantly modify it in some way, you can inspect and modify its spectrum instead. The reason this is useful is that Fourier coefficients have very natural interpretations in sound and images, as we’ll see for the latter.

We wrote $ f(x)$ and the complex exponential as a function of one real variable, but you can do the same thing for two variables (or a hundred!). And, if you’re willing to do some abusing and ignore the complexness of complex numbers, then you can visualize “complex exponentials in two variables” as images of stripes whose orientation and thickness correspond to two parameters (i.e., the $ k$ in the offset equation becomes two coefficients). The video below shows how such complex exponentials can be used to build up an image of striking detail. The left frame shows which complex exponential is currently being added, and the right frame shows the layers all put together. I think the result is quite beautiful.

This just goes to show how powerful da Vinci’s idea of fine layering is: it’s as powerful as possible because it can create any image! 

Now for digital images like the one above, everything is finite. So rather than have an infinitely precise function and a corresponding infinite set of Fourier coefficients, you get a finite list of sampled values (pixels) and a corresponding grid of Fourier coefficients. But the important and beautiful theorem is, and I want to emphasize how groundbreakingly important this is:

If you give me an image (or any function!) I can compute the decomposition very efficiently.

And the same theorem lets you go the other way: if you give me the decomposition, I can compute the original function’s samples quite easily. The algorithm to do this is called the Fast Fourier transform, and if any piece of mathematics or computer science has a legitimate claim to changing the world, it’s the Fast Fourier transform. It’s hard to pinpoint specific applications, because the transform is so ubiquitous across science and engineering, but we definitely would not have cell phones, satellites, internet, or electronics anywhere near as small as we do without the Fourier transform and the ability to compute it quickly.

Constructing hybrid images is one particularly nice example of manipulating the Fourier spectrum of two images, and then combining them back into a single image. That’s what we’ll do now.

As a side note, by the nature of brevity, the discussion above is a big disservice to the mathematics involved. I summarized and abused in ways that mathematicians would object to. If you want to see a much better treatment of the material, this blog has a long series of posts developing Fourier transforms and their discrete analogues from scratch. See our four primers, which lead into the main content posts where we implement the Fast Fourier transform in Python and use it to apply digital watermarks to an image. Note that in those posts, as in this one, all of the materials and code used are posted on this blog’s Github page.

High and low frequencies

For images, interpreting ranges of Fourier coefficients is easy to do. You can imagine the coefficients lying on a grid in the plane like so:

sherlock-spectrum

Each dot in this grid corresponds to how “intense” the Fourier coefficient is. That is, it’s the magnitude of the (complex) coefficient of the corresponding complex exponential. Now the points that are closer to the origin correspond informally to the broad, smooth changes in the image. These are called “low frequency” coefficients. And points that are further away correspond to sharp changes and edges, and are likewise called “high frequency” components. So the if you wanted to “hybridize” two images, you’d pick ones with complementary intensities in these regions. That’s why Einstein (with all his wiry hair and wrinkles) and Monroe (with smooth features) are such good candidates. That’s also why, when we layered the Fourier components one by one in the video from earlier, we see the fuzzy shapes emerge before the fine details.

Moreover, we can “extract” the high frequency Fourier components by simply removing the low frequency ones. It’s a bit more complicated than that, since you want the transition from “something” to “nothing” to be smooth in sone sense. A proper discussion of this would go into sampling and the Nyquist frequency, but that’s beyond the scope of this post. Rather, we’ll just define a family of “filtering functions” without motivation and observe that they work well.

Definition: The Gaussian filter function with variance $ \sigma$ and center $ (a, b)$ is the function

$ \displaystyle g(x,y) = e^{-\frac{(x – a)^2 + (y – b)^2}{2 \sigma^2}}$

It looks like this

image credit Wikipedia

image credit Wikipedia

In particular, at zero the function is 1 and it gradually drops to zero as you get farther away. The parameter $ \sigma$ controls the rate at which it vanishes, and in the picture above the center is set to $ (0,0)$.

Now what we’ll do is take our image, compute its spectrum, and multiply coordinatewise with a certain Gaussian function. If we’re trying to get rid of high-frequency components (called a “low-pass filter” because it lets the low frequencies through), we can just multiply the Fourier coefficients directly by the filter values $ g(x,y)$, and if we’re doing a “high-pass filter” we multiply by $ 1 – g(x,y)$.

Before we get to the code, here’s an example of a low-pass filter. First, take this image of Marilyn Monroe

marilyn

Now compute its Fourier transform

dft

Apply the low-pass filter

filtered-dft

And reverse the Fourier transform to get an image

low-passed-marilyn

In fact, this is a common operation in programs like photoshop for blurring an image (it’s called a Gaussian blur for obvious reasons). Here’s the python code to do this. You can download it along with all of the other resources used in making this post on this blog’s Github page.

import numpy
from numpy.fft import fft2, ifft2, fftshift, ifftshift
from scipy import misc
from scipy import ndimage
import math

def makeGaussianFilter(numRows, numCols, sigma, highPass=True):
   centerI = int(numRows/2) + 1 if numRows % 2 == 1 else int(numRows/2)
   centerJ = int(numCols/2) + 1 if numCols % 2 == 1 else int(numCols/2)

   def gaussian(i,j):
      coefficient = math.exp(-1.0 * ((i - centerI)**2 + (j - centerJ)**2) / (2 * sigma**2))
      return 1 - coefficient if highPass else coefficient

   return numpy.array([[gaussian(i,j) for j in range(numCols)] for i in range(numRows)])

def filterDFT(imageMatrix, filterMatrix):
   shiftedDFT = fftshift(fft2(imageMatrix))
   filteredDFT = shiftedDFT * filterMatrix
   return ifft2(ifftshift(filteredDFT))

def lowPass(imageMatrix, sigma):
   n,m = imageMatrix.shape
   return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=False))

def highPass(imageMatrix, sigma):
   n,m = imageMatrix.shape
   return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=True))

if __name__ == "__main__":
   marilyn = ndimage.imread("marilyn.png", flatten=True)
   lowPassedMarilyn = lowPass(marilyn, 20)
   misc.imsave("low-passed-marilyn.png", numpy.real(lowPassedMarilyn))

The first function samples the values from a Gaussian function with the specified parameters, discretizing the function and storing the values in a matrix. Then the filterDFT function applies the filter by doing coordinatewise multiplication (note these are all numpy arrays). We can do the same thing with a high-pass filter, producing the edgy image below

high-passed-marilyn

And if we compute the average of these two images, we basically get back to the original.

sum-of-marilyns

So the only difference between this and a hybrid image is that you take the low-passed part of one image and the high-passed part of another. Then the art is in balancing the parameters so as to make the averaged image look right. Indeed, with the following picture of Einstein and the above shot of Monroe, we can get a pretty good recreation of the Oliva-Torralba-Schyns piece. I think with more tinkering it could be even better (I did barely any centering/aligning/resizing to the original images).

Albert Einstein, Marilyn Monroe, and their hybridization.

Albert Einstein, Marilyn Monroe, and their hybridization.

And here’s the code for it

def hybridImage(highFreqImg, lowFreqImg, sigmaHigh, sigmaLow):
   highPassed = highPass(highFreqImg, sigmaHigh)
   lowPassed = lowPass(lowFreqImg, sigmaLow)

   return highPassed + lowPassed

Interestingly enough, doing it in reverse doesn’t give quite as pleasing results, but it still technically works. So there’s something particularly important that the high-passed image does have a lot of high-frequency components, and vice versa for the low pass.

backwards

You can see some of the other hybrid images Oliva et al constructed over at their web gallery.

Next Steps

How can we take this idea further? There are a few avenues I can think of. The most obvious one would be to see how this extends to video. Could one come up with generic parameters so that when two videos are hybridized (frame by frame, using this technique) it is only easy to see one at close distance? Or else, could we apply a three-dimensional transform to a video and modify that in some principled way? I think one would not likely find anything astounding, but who knows?

Second would be to look at the many other transforms we have at our disposal. How does manipulating the spectra of these transforms affect the original image, and can you make images that are hybridized in senses other than this one?

And finally, can we bring this idea down in dimension to work with one-dimensional signals? In particular, can we hybridize music? It could usher in a new generation of mashup songs that sound different depending on whether you wear earmuffs 🙂

Until next time!

Bezier Curves and Picasso

Pablo Picasso

Pablo Picasso in front of The Kitchen, photo by Herbert List.

Simplicity and the Artist

Some of my favorite of Pablo Picasso’s works are his line drawings. He did a number of them about animals: an owl, a camel, a butterfly, etc. This piece called “Dog” is on my wall:

Dachshund-Picasso-Sketch

(Jump to interactive demo where we recreate “Dog” using the math in this post)

These paintings are extremely simple but somehow strike the viewer as deeply profound. They give the impression of being quite simple to design and draw. A single stroke of the hand and a scribbled signature, but what a masterpiece! It simultaneously feels like a hasty afterthought and a carefully tuned overture to a symphony of elegance. In fact, we know that Picasso’s process was deep. For example, in 1945-1946, Picasso made a series of eleven drawings (lithographs, actually) showing the progression of his rendition of a bull. The first few are more or less lifelike, but as the series progresses we see the bull boiled down to its essence, the final painting requiring a mere ten lines. Along the way we see drawings of a bull that resemble some of Picasso’s other works (number 9 reminding me of the sculpture at Daley Center Plaza in Chicago). Read more about the series of lithographs here.

the bull

Picasso’s, “The Bull.” Photo taken by Jeremy Kun at the Art Institute of Chicago in 2013. Click to enlarge.

Now I don’t pretend to be a qualified artist (I couldn’t draw a bull to save my life), but I can recognize the mathematical aspects of his paintings, and I can write a damn fine program. There is one obvious way to consider Picasso-style line drawings as a mathematical object, and it is essentially the Bezier curve. Let’s study the theory behind Bezier curves, and then write a program to draw them. The mathematics involved requires no background knowledge beyond basic algebra with polynomials, and we’ll do our best to keep the discussion low-tech. Then we’ll explore a very simple algorithm for drawing Bezier curves, implement it in Javascript, and recreate one of Picasso’s line drawings as a sequence of Bezier curves.

The Bezier Curve and Parameterizations

When asked to conjure a “curve” most people (perhaps plagued by their elementary mathematics education) will either convulse in fear or draw part of the graph of a polynomial. While these are fine and dandy curves, they only represent a small fraction of the world of curves. We are particularly interested in curves which are not part of the graphs of any functions.

Three French curves.

For instance, a French curve is a physical template used in (manual) sketching to aid the hand in drawing smooth curves. Tracing the edges of any part of these curves will usually give you something that is not the graph of a function. It’s obvious that we need to generalize our idea of what a curve is a bit. The problem is that many fields of mathematics define a curve to mean different things.  The curves we’ll be looking at, called Bezier curves, are a special case of  single-parameter polynomial plane curves. This sounds like a mouthful, but what it means is that the entire curve can be evaluated with two polynomials: one for the $ x$ values and one for the $ y$ values. Both polynomials share the same variable, which we’ll call $ t$, and $ t$ is evaluated at real numbers.

An example should make this clear. Let’s pick two simple polynomials in $ t$, say $ x(t) = t^2$ and $ y(t) = t^3$. If we want to find points on this curve, we can just choose values of $ t$ and plug them into both equations. For instance, plugging in $ t = 2$ gives the point $ (4, 8)$ on our curve. Plotting all such values gives a curve that is definitely not the graph of a function:

example-curve

But it’s clear that we can write any single-variable function $ f(x)$ in this parametric form: just choose $ x(t) = t$ and $ y(t) = f(t)$. So these are really more general objects than regular old functions (although we’ll only be working with polynomials in this post).

Quickly recapping, a single-parameter polynomial plane curve is defined as a pair of polynomials $ x(t), y(t)$ in the same variable $ t$. Sometimes, if we want to express the whole gadget in one piece, we can take the coefficients of common powers of $ t$ and write them as vectors in the $ x$ and $ y$ parts. Using the $ x(t) = t^2, y(t) = t^3$ example above, we can rewrite it as

$ \mathbf{f}(t) = (0,1) t^3 + (1,0) t^2$

Here the coefficients are points (which are the same as vectors) in the plane, and we represent the function $ f$ in boldface to emphasize that the output is a point. The linear-algebraist might recognize that pairs of polynomials form a vector space, and further combine them as $ (0, t^3) + (t^2, 0) = (t^2, t^3)$. But for us, thinking of points as coefficients of a single polynomial is actually better.

We will also restrict our attention to single-parameter polynomial plane curves for which the variable $ t$ is allowed to range from zero to one. This might seem like an awkward restriction, but in fact every finite single-parameter polynomial plane curve can be written this way (we won’t bother too much with the details of how this is done). For the purpose of brevity, we will henceforth call a “single-parameter polynomial plane curve where $ t$ ranges from zero to one” simply a “curve.”

Now there are some very nice things we can do with curves. For instance, given any two points in the plane $ P = (p_1, p_2), Q = (q_1, q_2)$ we can describe the straight line between them as a curve: $ \mathbf{L}(t) = (1-t)P + tQ$. Indeed, at $ t=0$ the value $ \mathbf{L}(t)$ is exactly $ P$, at $ t=1$ it’s exactly $ Q$, and the equation is a linear polynomial in $ t$. Moreover (without getting too much into the calculus details), the line $ \mathbf{L}$ travels at “unit speed” from $ P$ to $ Q$. In other words, we can think of $ \mathbf{L}$ as describing the motion of a particle from $ P$ to $ Q$ over time, and at time $ 1/4$ the particle is a quarter of the way there, at time $ 1/2$ it’s halfway, etc. (An example of a straight line which doesn’t have unit speed is, e.g. $ (1-t^2) P + t^2 Q$.)

More generally, let’s add a third point $ R$. We can describe a path which goes from $ P$ to $ R$, and is “guided” by $ Q$ in the middle. This idea of a “guiding” point is a bit abstract, but computationally no more difficult. Instead of travelling from one point to another at constant speed, we want to travel from one line to another at constant speed. That is, call the two curves describing lines from $ P \to Q$ and $ Q \to R$ $ \mathbf{L}_1, \mathbf{L_2}$, respectively. Then the curve “guided” by $ Q$ can be written as a curve

$ \displaystyle \mathbf{F}(t) = (1-t)\mathbf{L}_1(t) + t \mathbf{L}_2(t)$

Multiplying this all out gives the formula

$ \displaystyle \mathbf{F}(t) = (1-t)^2 P + 2t(1-t)Q + t^2 R$

We can interpret this again in terms of a particle moving. At the beginning of our curve the value of $ t$ is small, and so we’re sticking quite close to the line $ \mathbf{L}_1$ As time goes on the point $ \mathbf{F}(t)$ moves along the line between the points $ \mathbf{L}_1(t)$ and $ \mathbf{L}_2(t)$, which are themselves moving. This traces out a curve which looks like this

Screen Shot 2013-05-07 at 9.38.42 PM

This screenshot was taken from a wonderful demo by data visualization consultant Jason Davies. It expresses the mathematical idea quite superbly, and one can drag the three points around to see how it changes the resulting curve. One should play with it for at least five minutes.

The entire idea of a Bezier curve is a generalization of this principle: given a list $ P_0, \dots, P_n$ of points in the plane, we want to describe a curve which travels from the first point to the last, and is “guided” in between by the remaining points. A Bezier curve is a realization of such a curve (a single-parameter polynomial plane curve) which is the inductive continuation of what we described above: we travel at unit speed from a Bezier curve defined by the first $ n-1$ points in the list to the curve defined by the last $ n-1$ points. The base case is the straight-line segment (or the single point, if you wish). Formally,

Definition: Given a list of points in the plane $ P_0, \dots, P_n$ we define the degree $ n-1$ Bezier curve recursively as

$ \begin{aligned} \mathbf{B}_{P_0}(t) &= P_0 \\ \mathbf{B}_{P_0 P_1 \dots P_n}(t) &= (1-t)\mathbf{B}_{P_0 P_1 \dots P_{n-1}} + t \mathbf{B}_{P_1P_2 \dots P_n}(t) \end{aligned}$

We call $ P_0, \dots, P_n$ the control points of $ \mathbf{B}$.

While the concept of travelling at unit speed between two lower-order Bezier curves is the real heart of the matter (and allows us true computational insight), one can multiply all of this out (using the formula for binomial coefficients) and get an explicit formula. It is:

$ \displaystyle \mathbf{B}_{P_0 \dots P_n} = \sum_{k=0}^n \binom{n}{k}(1-t)^{n-k}t^k P_k$

And for example, a cubic Bezier curve with control points $ P_0, P_1, P_2, P_3$ would have equation

$ \displaystyle (1-t)^3 P_0 + 3(1-t)^2t P_1 + 3(1-t)t^2 P_2 + t^3 P_3$

Higher dimensional Bezier curves can be quite complicated to picture geometrically. For instance, the following is a fifth-degree Bezier curve (with six control points).

BezierCurve

A degree five Bezier curve, credit Wikipedia.

The additional line segments drawn show the recursive nature of the curve. The simplest are the green points, which travel from control point to control point. Then the blue points travel on the line segments between green points, the pink travel along the line segments between blue, the orange between pink, and finally the red point travels along the line segment between the orange points.

Without the recursive structure of the problem (just seeing the curve) it would be a wonder how one could actually compute with these things. But as we’ll see, the algorithm for drawing a Bezier curve is very natural.

Bezier Curves as Data, and de Casteljau’s Algorithm

Let’s derive and implement the algorithm for painting a Bezier curve to a screen using only the ability to draw straight lines. For simplicity, we’ll restrict our attention to degree-three (cubic) Bezier curves. Indeed, every Bezier curve can be written as a combination of cubic curves via the recursive definition, and in practice cubic curves balance computational efficiency and expressiveness. All of the code we present in this post will be in Javascript, and is available on this blog’s Github page.

So then a cubic Bezier curve is represented in a program by a list of four points. For example,

var curve = [[1,2], [5,5], [4,0], [9,3]];

Most graphics libraries (including the HTML5 canvas standard) provide a drawing primitive that can output Bezier curves given a list of four points. But suppose we aren’t given such a function. Suppose that we only have the ability to draw straight lines. How would one go about drawing an approximation to a Bezier curve? If such an algorithm exists (it does, and we’re about to see it) then we could make the approximation so fine that it is visually indistinguishable from a true Bezier curve.

The key property of Bezier curves that allows us to come up with such an algorithm is the following:

Any cubic Bezier curve $ \mathbf{B}$ can be split into two, end to end,
which together trace out the same curve as $ \mathbf{B}$.

Let see exactly how this is done. Let $ \mathbf{B}(t)$ be a cubic Bezier curve with control points $ P_0, P_1, P_2, P_3$, and let’s say we want to split it exactly in half. We notice that the formula for the curve when we plug in $ 1/2$, which is

$ \displaystyle \mathbf{B}(1/2) = \frac{1}{2^3}(P_0 + 3P_1 + 3P_2 + P_3)$

Moreover, our recursive definition gave us a way to evaluate the point in terms of smaller-degree curves. But when these are evaluated at 1/2 their formulae are similarly easy to write down. The picture looks like this:

subdivision

The green points are the degree one curves, the pink points are the degree two curves, and the blue point is the cubic curve. We notice that, since each of the curves are evaluated at $ t=1/2$, each of these points can be described as the midpoints of points we already know. So $ m_0 = (P_0 + P_1) / 2, q_0 = (m_0 + m_1)/2$, etc.

In fact, the splitting of the two curves we want is precisely given by these points. That is, the “left” half of the curve is given by the curve $ \mathbf{L}(t)$ with control points $ P_0, m_0, q_0, \mathbf{B}(1/2)$, while the “right” half $ \mathbf{R}(t)$ has control points $ \mathbf{B}(1/2), q_1, m_2, P_3$.

How can we be completely sure these are the same Bezier curves? Well, they’re just polynomials. We can compare them for equality by doing a bunch of messy algebra. But note, since $ \mathbf{L}(t)$ only travels halfway along $ \mathbf{B}(t)$, to check they are the same is to equate $ \mathbf{L}(t)$ with $ \mathbf{B}(t/2)$, since as $ t$ ranges from zero to one, $ t/2$ ranges from zero to one half. Likewise, we can compare $ \mathbf{B}((t+1)/2)$ with $ \mathbf{R}(t)$.

The algebra is very messy, but doable. As a test of this blog’s newest tools, here’s a screen cast of me doing the algebra involved in proving the two curves are identical.


Now that that’s settled, we have a nice algorithm for splitting a cubic Bezier (or any Bezier) into two pieces. In Javascript,

function subdivide(curve) {
   var firstMidpoints = midpoints(curve);
   var secondMidpoints = midpoints(firstMidpoints);
   var thirdMidpoints = midpoints(secondMidpoints);

   return [[curve[0], firstMidpoints[0], secondMidpoints[0], thirdMidpoints[0]],
          [thirdMidpoints[0], secondMidpoints[1], firstMidpoints[2], curve[3]]];
}

Here “curve” is a list of four points, as described at the beginning of this section, and the output is a list of two curves with the correct control points. The “midpoints” function used is quite simple, and we include it here for compelteness:

function midpoints(pointList) {
   var midpoint = function(p, q) {
      return [(p[0] + q[0]) / 2.0, (p[1] + q[1]) / 2.0];
   };

   var midpointList = new Array(pointList.length - 1);
   for (var i = 0; i < midpointList.length; i++) {
      midpointList[i] = midpoint(pointList[i], pointList[i+1]);
   }

   return midpointList;
}

It just accepts as input a list of points and computes their sequential midpoints. So a list of $ n$ points is turned into a list of $ n-1$ points. As we saw, we need to call this function $ d-1$ times to compute the segmentation of a degree $ d$ Bezier curve.

As explained earlier, we can keep subdividing our curve over and over until each of the tiny pieces are basically lines. That is, our function to draw a Bezier curve from the beginning will be as follows:

function drawCurve(curve, context) {
   if (isFlat(curve)) {
      drawSegments(curve, context);
   } else {
      var pieces = subdivide(curve);
      drawCurve(pieces[0], context);
      drawCurve(pieces[1], context);
   }
}

In words, as long as the curve isn’t “flat,” we want to subdivide and draw each piece recursively. If it is flat, then we can simply draw the three line segments of the curve and be reasonably sure that it will be a good approximation. The context variable sitting there represents the canvas to be painted to; it must be passed through to the “drawSegments” function, which simply paints a straight line to the canvas.

Of course this raises the obvious question: how can we tell if a Bezier curve is flat? There are many ways to do so. One could compute the angles of deviation (from a straight line) at each interior control point and add them up. Or one could compute the volume of the enclosed quadrilateral. However, computing angles and volumes is usually not very nice: angles take a long time to compute and volumes have stability issues, and the algorithms which are stable are not very simple. We want a measurement which requires only basic arithmetic and perhaps a few logical conditions to check.

It turns out there is such a measurement. It’s originally attributed to Roger Willcocks, but it’s quite simple to derive by hand.

Essentially, we want to measure the “flatness” of a cubic Bezier curve by computing the distance of the actual curve at time $ t$ from where the curve would be at time $ t$ if the curve were a straight line.

Formally, given $ \mathbf{B}(t)$ with control points $ P_0, P_1, P_2, P_3$ as usual, we can define the straight-line Bezier cubic as the colossal sum

$ \displaystyle \mathbf{S}(t) = (1-t)^3P_0 + 3(1-t)^2t \left ( \frac{2}{3}P_0 + \frac{1}{3}P_3 \right ) + 3(1-t)t^2 \left ( \frac{1}{3}P_0 + \frac{2}{3}P_3 \right ) + t^3 P_3 $

There’s nothing magical going on here. We’re simply giving the Bezier curve with control points $ P_0, \frac{2}{3}P_0 + \frac{1}{3}P_3, \frac{1}{3}P_0 + \frac{2}{3}P_3, P_3$. One should think about this as points which are a 0, 1/3, 2/3, and 1 fraction of the way from $ P_0$ to $ P_3$ on a straight line.

Then we define the function $ d(t) = \left \| \mathbf{B}(t) – \mathbf{S}(t) \right \|$ to be the distance between the two curves at the same time $ t$. The flatness value of $ \mathbf{B}$ is the maximum of $ d$ over all values of $ t$. If this flatness value is below a certain tolerance level, then we call the curve flat.

With a bit of algebra we can simplify this expression. First, the value of $ t$ for which the distance is maximized is the same as when its square is maximized, so we can omit the square root computation at the end and take that into account when choosing a flatness tolerance.

Now lets actually write out the difference as a single polynomial. First, we can cancel the 3’s in $ \mathbf{S}(t)$ and write the polynomial as

$ \displaystyle \mathbf{S}(t) = (1-t)^3 P_0 + (1-t)^2t (2P_0 + P_3) + (1-t)t^2 (P_0 + 2P_3) + t^3 P_3$

and so $ \mathbf{B}(t) – \mathbf{S}(t)$ is (by collecting coefficients of the like terms $ (1-t)^it^j$)

$ \displaystyle (1-t)^2t (3 P_1 – 2P_0 – P_3) + (1-t)t^2 (3P_2 – P_0 – 2P_3) $

Factoring out the $ (1-t)t$ from both terms and setting $ a = 3P_1 – 2P_0 – P_3$, $ b = 3P_2 – P_0 – 2P_3$, we get

$ \displaystyle d^2(t) = \left \| (1-t)t ((1-t)a + tb) \right \|^2 = (1-t)^2t^2 \left \| (1-t)a + tb \right \|^2$

Since the maximum of a product is at most the product of the maxima, we can bound the above quantity by the product of the two maxes. The reason we want to do this is because we can easily compute the two maxes separately. It wouldn’t be hard to compute the maximum without splitting things up, but this way ends up with fewer computational steps for our final algorithm, and the visual result is equally good.

Using some elementary single-variable calculus, the maximum value of $ (1-t)^2t^2$ for $ 0 \leq t \leq 1$ turns out to be $ 1/16$. And the norm of a vector is just the sum of squares of its components. If $ a = (a_x, a_y)$ and $ b = (b_x, b_y)$, then the norm above is exactly

$ \displaystyle ((1-t)a_x + tb_x)^2 + ((1-t)a_y + tb_y)^2$

And notice: for any real numbers $ z, w$ the quantity $ (1-t)z + tw$ is exactly the straight line from $ z$ to $ w$ we know so well. The maximum over all $ t$ between zero and one is obviously the maximum of the endpoints $ z, w$. So the max of our distance function $ d^2(t)$ is bounded by

$ \displaystyle \frac{1}{16} (\textup{max}(a_x^2, b_x^2) + \textup{max}(a_y^2, b_y^2))$

And so our condition for being flat is that this bound is smaller than some allowable tolerance. We may safely factor the 1/16 into this tolerance bound, and so this is enough to write a function.

function isFlat(curve) {
   var tol = 10; // anything below 50 is roughly good-looking

   var ax = 3.0*curve[1][0] - 2.0*curve[0][0] - curve[3][0]; ax *= ax;
   var ay = 3.0*curve[1][1] - 2.0*curve[0][1] - curve[3][1]; ay *= ay;
   var bx = 3.0*curve[2][0] - curve[0][0] - 2.0*curve[3][0]; bx *= bx;
   var by = 3.0*curve[2][1] - curve[0][1] - 2.0*curve[3][1]; by *= by;

   return (Math.max(ax, bx) + Math.max(ay, by) <= tol);
}

And there we have it. We write a simple HTML page to access a canvas element and a few extra helper functions to draw the line segments when the curve is flat enough, and present the final result in this interactive demonstration (you can perturb the control points).

The picture you see on that page (given below) is my rendition of Picasso’s “Dog” drawing as a sequence of nine Bezier curves. I think the resemblance is uncanny 🙂

Picasso's "Dog," redesigned as a sequence of eight bezier curves.

Picasso’s “Dog,” redesigned as a sequence of nine bezier curves.

While we didn’t invent the drawing itself (and hence shouldn’t attach our signature to it), we did come up with the representation as a sequence of Bezier curves. It only seems fitting to present that as the work of art. Here we’ve distilled the representation down to a single file: the first line is the dimension of the canvas, and each subsequent line represents a cubic Bezier curve. Comments are included for readability.

bezier-dog-for-web

“Dog” Jeremy Kun, 2013. Click to enlarge.

Because standardizing things seems important, we define a new filetype “.bezier”, which has the format given above:

int int
(int) curve 
(int) curve
...

Where the first two ints specify the size of the canvas, the first (optional) int on each line specifies the width of the stroke, and a “curve” has the form

[int,int] [int,int] ... [int,int]

If an int is omitted at the beginning of a line, this specifies a width of three pixels.

In a general .bezier file we allow a curve to have arbitrarily many control points, though the code we gave above does not draw them that generally. As an exercise, write a program which accepts as input a .bezier file and produces as output an image of the drawing. This will require an extension of the algorithm above for drawing arbitrary Bezier curves, which loops its computation of the midpoints and keeps track of which end up in the resulting subdivision. Alternatively, one could write a program which accepts as input a .bezier file with only cubic Bezier curves, and produces as output an SVG file of the drawing (SVG only supports cubic Bezier curves). So a .bezier file is a simplification (fewer features) and an extension (Bezier curves of arbitrary degree) of an SVG file.

We didn’t go as deep into the theory of Bezier curves as we could have. If the reader is itching for more (and a more calculus-based approach), see this lengthy primer. It contains practically everything one could want to know about Bezier curves, with nice interactive demos written in Processing.

Low-Complexity Art

There are some philosophical implications of what we’ve done today with Picasso’s “Dog.” Previously on this blog we’ve investigated the idea of low-complexity art, and it’s quite relevant here. The thesis is that “beautiful” art has a small description length, and more formally the “complexity” of some object (represented by text) is the length of the shortest program that outputs that object given no inputs. More on that in our primer on Kolmogorov complexity. The fact that we can describe Picasso’s line drawings with a small number of Bezier curves (and a relatively short program to output the bezier curves) is supposed to be a deep statement about the beauty of the art itself. Obviously this is very subjective, but not without its proponents.

There has been a bit of recent interest in computers generating art. For instance, this recent programming competition (in Dutch) gave the task of generating art similar to the work of Piet Mondrian. The idea is that the more elegant the algorithm, the higher it would be scored. The winner used MD5 hashes to generate Mondrian pieces, and there were many many other impressive examples (the link above has a gallery of submissions).

In our earlier post on low-complexity art, we explored the possibility of representing all images within a coordinate system involving circles with shaded interiors. But it’s obvious that such a coordinate system wouldn’t be able to represent “Dog” with very low complexity. It seems that Bezier curves are a much more natural system of coordinates. Some of the advantages include that length of lines and slight perturbations don’t affect the resulting complexity. A cubic Bezier curve can be described by any set of four points, and more “intricate” (higher complexity) descriptions of curves require a larger number of points. Bezier curves can be scaled up arbitrarily, and this doesn’t significantly change the complexity of the curve (although scaling many orders of magnitude will introduce a logarithmic factor complexity increase, this is quite small). Curves with larger stroke are slightly more complex than those with smaller stroke, and representing many small sharp bends require more curves than long, smooth arcs.

On the downside, it’s not so easy to represent a circle as a Bezier curve. In fact, it is impossible to do so exactly. Despite the simplicity of this object (it’s even defined as a single polynomial, albeit in two variables), the best one can do is approximate it. The same goes for ellipses. There are actually ways to overcome this (the concept of rational Bezier curves which are quotients of polynomials), but they add to the inherent complexity of the drawing algorithm and the approximations using regular Bezier curves are good enough.

And so we define the complexity of a drawing to be the number of bits in its .bezier file representation. Comments are ignored in this calculation.

The real prize, and what we’ll explore next time, is to find a way to generate art automatically. That is to do one of two things:

  1. Given some sort of “seed,” write a program that produces a pseudo-random line drawing.
  2. Given an image, produce a .bezier image which accurately depicts the image as a line drawing.

We will attempt to explore these possibilities in the follow-up to this post. Depending on how things go, this may involve some local search algorithms, genetic algorithms, or other methods.

Until then!

Addendum: want to buy a framed print of the source code for “Dog”? Head over to our page on Society6.

Random (Psychedelic) Art

And a Pinch of Python

Next semester I am a lab TA for an introductory programming course, and it’s taught in Python. My Python experience has a number of gaps in it, so we’ll have the opportunity for a few more Python primers, and small exercises to go along with it. This time, we’ll be investigating the basics of objects and classes, and have some fun with image construction using the Python Imaging Library. Disappointingly, the folks who maintain the PIL are slow to update it for any relatively recent version of Python (it’s been a few years since 3.x, honestly!), so this post requires one use Python 2.x (we’re using 2.7). As usual, the full source code for this post is available on this blog’s Github page, and we encourage the reader to follow along and create his own randomized pieces of art! Finally, we include a gallery of generated pictures at the end of this post. Enjoy!

How to Construct the Images

An image is a two-dimensional grid of pixels, and each pixel is a tiny dot of color displayed on the screen. In a computer, one represents each pixel as a triple of numbers $ (r,g,b)$, where $ r$ represents the red content, $ g$ the green content, and $ b$ the blue content. Each of these is a nonnegative integer between 0 and 255. Note that this gives us a total of $ 256^3 = 2^{24}$ distinct colors, which is nearly 17 million. Some estimates of how much color the eye can see range as high as 10 million (depending on the definition of color) but usually stick around 2.4 million, so it’s generally agreed that we don’t need more.

The general idea behind our random psychedelic art is that we will generate three randomized functions $ (f,g,h)$ each with domain and codomain $ [-1,1] \times [-1,1]$, and at each pixel $ (x,y)$ we will determine the color at that pixel by the triple $ (f(x,y), g(x,y), h(x,y))$. This will require some translation between pixel coordinates, but we’ll get to that soon enough. As an example, if our colors are defined by the functions $ (\sin(\pi x), \cos(\pi xy), \sin(\pi y))$, then the resulting image is:

We use the extra factor of $ \pi$ because without it the oscillation is just too slow, and the resulting picture is decidedly boring. Of course, the goal is to randomly generate such functions, so we should pick a few functions on $ [-1,1]$ and nest them appropriately. The first which come to mind are $ \sin(\pi \cdot -), \cos(\pi \cdot -),$ and simple multiplication. With these, we can create such convoluted functions like

$ \sin(\pi x \cos(\pi xy \sin(\pi (\cos (\pi xy)))))$

We could randomly generate these functions two ways, but both require randomness, so let’s familiarize ourselves with the capabilities of Python’s random library.

Random Numbers

Pseudorandom number generators are a fascinating topic in number theory, and one of these days we plan to cover it on this blog. Until then, we will simply note the basics. First, contemporary computers can not generate random numbers. Everything on a computer is deterministic, meaning that if one completely determines a situation in a computer, the following action will always be the same. With the complexity of modern operating systems (and the aggravating nuances of individual systems), some might facetiously disagree.

For an entire computer the “determined situation” can be as drastic as choosing every single bit in memory and the hard drive. In a pseudorandom number generator the “determined situation” is a single number called a seed. This initializes the random number generator, which then proceeds to compute a sequence of bits via some complicated arithmetic. The point is that one may choose the seed, and choosing the same seed twice will result in the same sequence of “randomly” generated numbers. The default seed (which is what one uses when one is not testing for correctness) is usually some sort of time-stamp which is guaranteed to never repeat. Flaws in random number generator design (hubris, off-by-one errors, and even using time-stamps!) has allowed humans to take advantage of people who try to rely on random number generators. The interested reader will find a detailed account of how a group of software engineers wrote a program to cheat at online poker, simply by reverse-engineering the random number generator used to shuffle the deck.

In any event, Python makes generating random numbers quite easy:

import random

random.seed()
print(random.random())
print(random.choice(["clubs", "hearts", "diamonds", "spades"]))

We import the random library, we seed it with the default seed, we print out a random number in $ (0,1)$, and then we randomly pick one element from a list. For a full list of the functions in Python’s random library, see the documentation. As it turns out, we will only need the choice() function.

Representing Mathematical Expressions

One neat way to represent a mathematical function is via…a function! In other words, just like Racket and Mathematica and a whole host of other languages, Python functions are first-class objects, meaning they can be passed around like variables. (Indeed, they are objects in another sense, but we will get to that later). Further, Python has support for anonymous functions, or lambda expressions, which work as follows:

>>> print((lambda x: x + 1)(4))
5

So one might conceivably randomly construct a mathematical expression by nesting lambdas:

import math

def makeExpr():
   if random.random() < 0.5:
      return lambda x: math.sin(math.pi * makeExpr()(x))
   else:
      return lambda x: x

Note that we need to import the math library, which has support for all of the necessary mathematical functions and constants. One could easily extend this to support two variables, cosines, etc., but there is one flaw with the approach: once we’ve constructed the function, we have no idea what it is. Here’s what happens:

>>> x = lambda y: y + 1
>>> str(x)
'<function <lambda> at 0xb782b144>'

There’s no way for Python to know the textual contents of a lambda expression at runtime!  In order to remedy this, we turn to classes.

The inquisitive reader may have noticed by now that lots of things in Python have “associated things,” which roughly correspond to what you can type after suffixing an expression with a dot. Lists have methods like “[1,2,3,4].append(5)”, dictionaries have associated lists of keys and values, and even numbers have some secretive methods:

>>> 45.7.is_integer()
False

In many languages like C, this would be rubbish. Many languages distinguish between primitive types and objects, and numbers usually fall into the former category. However, in Python everything is an object. This means the dot operator may be used after any type, and as we see above this includes literals.

A class, then, is just a more transparent way of creating an object with certain associated pieces of data (the fancy word is encapsulation). For instance, if I wanted to have a type that represents a dog, I might write the following Python program:

class Dog:
   age = 0
   name = ""

   def bark(self):
      print("Ruff ruff! (I'm %s)" % self.name)

Then to use the new Dog class, I could create it and set its attributes appropriately:

fido = Dog()
fido.age = 4
fido.name = "Fido"
fido.weight = 100
fido.bark()

The details of the class construction requires a bit of explanation. First, we note that the indented block of code is arbitrary, and one need not “initialize” the member variables. Indeed, they simply pop into existence once they are referenced, as in the creation of the weight attribute. To make it more clear, Python provides a special function called “__init__()” (with two underscores on each side of “init”; heaven knows why they decided it should be so ugly), which is called upon the creation of a new object, in this case the expression “Dog()”. For instance, one could by default name their dogs “Fido” as follows:

class Dog:
   def __init__(self):
      self.name = "Fido"

d = Dog()
d.name             # contains "Fido"

This brings up another point: all methods of a class that wish to access the attributes of the class require an additional argument. The first argument passed to any method is always the object which represents the owning instance of the object. In Java, this is usually hidden from view, but available by the keyword “this”. In Python, one must explicitly represent it, and it is standard to name the variable “self”.

If we wanted to give the user a choice when instantiating their dog, we could include an extra argument for the name like this:

class Dog:
   def __init__(self, name = 'Fido'):
      self.name = name

d = Dog()
d.name                   # contains "Fido" 
e = Dog("Manfred")
e.name                   # contains "Manfred"

Here we made it so the “name” argument is not required, and if it is excluded we default to “Fido.”

To get back to representing mathematical functions, we might represent the identity function on $ x$ by the following class:

class X:
   def eval(self, x, y):
      return x

expr = X()
expr.eval(3,4)           # returns 3

That’s simple enough. But we still have the problem of not being able to print anything sensibly. Trying gives the following output:

>>> str(X)
'__main__.X'

In other words, all it does is print the name of the class, which is not enough if we want to have complicated nested expressions. It turns out that the “str” function is quite special. When one calls “str()” of something, Python first checks to see if the object being called has a method called “__str__()”, and if so, calls that. The awkward “__main__.X” is a default behavior. So if we soup up our class by adding a definition for “__str__()”, we can define the behavior of string conversion. For the X class this is simple enough:

class X:
   def eval(self, x, y):
      return x

   def __str__(self):
      return "x"

For nested functions we could recursively convert the argument, as in the following definition for a SinPi class:

class SinPi:
   def __str__(self):
      return "sin(pi*" + str(self.arg) + ")"

   def eval(self, x, y):
      return math.sin(math.pi * self.arg.eval(x,y))

Of course, this requires we set the “arg” attribute before calling these functions, and since we will only use these classes for random generation, we could include that sort of logic in the “__init__()” function.

To randomly construct expressions, we create the function “buildExpr”, which randomly picks to terminate or continue nesting things:

def buildExpr(prob = 0.99):
   if random.random() < prob:
      return random.choice([SinPi, CosPi, Times])(prob)
   else:
      return random.choice([X, Y])()

Here we have classes for cosine, sine, and multiplication, and the two variables. The reason for the interesting syntax (picking the class name from a list and then instantiating it, noting that these classes are objects even before instantiation and may be passed around as well!), is so that we can do the following trick, and avoid unnecessary recursion:

class SinPi:
   def __init__(self, prob):
      self.arg = buildExpr(prob * prob)

   ...

In words, each time we nest further, we exponentially decrease the probability that we will continue nesting in the future, and all the nesting logic is contained in the initialization of the object. We’re building an expression tree, and then when we evaluate an expression we have to walk down the tree and recursively evaluate the branches appropriately. Implementing the remaining classes is a quick exercise, and we remind the reader that the entire source code is available from this blog’s Github page. Printing out such expressions results in some nice long trees, but also some short ones:

>>> str(buildExpr())
'cos(pi*y)*sin(pi*y)'
>>> str(buildExpr())
'cos(pi*cos(pi*y*y*x)*cos(pi*sin(pi*x))*cos(pi*sin(pi*sin(pi*x)))*sin(pi*x))'
>>> str(buildExpr())
'cos(pi*cos(pi*y))*sin(pi*sin(pi*x*x))*cos(pi*y*cos(pi*sin(pi*sin(pi*x))))*sin(pi*cos(pi*sin(pi*x*x*cos(pi*y)))*cos(pi*y))'
>>> str(buildExpr())
'cos(pi*cos(pi*sin(pi*cos(pi*y)))*cos(pi*cos(pi*x)*y)*sin(pi*sin(pi*x)))'
>>> str(buildExpr())
'sin(pi*cos(pi*sin(pi*cos(pi*cos(pi*y)*x))*sin(pi*y)))'
>>> str(buildExpr())
'cos(pi*sin(pi*cos(pi*x)))*y*cos(pi*cos(pi*y)*y)*cos(pi*x)*sin(pi*sin(pi*y*y*x)*y*cos(pi*x))*sin(pi*sin(pi*x*y))'

This should work well for our goals. The rest is constructing the images.

Images in Python, and the Python Imaging Library

The Python imaging library is part of the standard Python installation, and so we can access the part we need by adding the following line to our header:

from PIL import Image

Now we can construct a new canvas, and start setting some pixels.

canvas = Image.new("L", (300,300))
canvas.putpixel((150,150), 255)
canvas.save("test.png", "PNG")

This gives us a nice black square with a single white pixel in the center. The “L” argument to Image.new() says we’re working in grayscale, so that each pixel is a single 0-255 integer representing intensity. We can do this for three images, and merge them into a single color image using the following:

finalImage = Image.merge("RGB",
   (redCanvas, greenCanvas, blueCanvas))

Where we construct “redCanvas”, “greenCanvas”, and “blueCanvas” in the same way above, but with the appropriate intensities. The rest of the details in the Python code are left for the reader to explore, but we dare say it is just bookkeeping and converting between image coordinate representations. At the end of this post, we provide a gallery of the randomly generated images, and a text file containing the corresponding expression trees is packaged with the source code on this blog’s Github page.

Extending the Program With New Functions!

There is decidedly little mathematics in this project, but there are some things we can discuss. First, we note that there are many many many functions on the interval $ [-1,1]$ that we could include in our random trees. A few examples are: the average of two numbers in that range, the absolute value, certain exponentials, and reciprocals of interesting sequences of numbers. We leave it as an exercise to the reader to add new functions to our existing code, and to further describe which functions achieve coherent effects.

Indeed, the designs are all rather psychedelic, and the layers of color are completely unrelated. It would be an interesting venture to write a program which, given an image of something (pretend it’s a simple image containing some shapes), constructs expression trees that are consistent with the curves and lines in the image. This follows suit with our goal of constructing low-complexity pictures from a while back, and indeed, these pictures have rather low Kolmogorov complexity. This method is another framework in which to describe their complexity, in that smaller expression trees correspond to simpler pictures. We leave this for future work. Until then, enjoy these pictures!

Gallery

Low Complexity Art

The Art of Omission

Whether in painting, fiction, film, landscape architecture, or paper folding, art is often said to be the art of omission. Simplicity breeds elegance, and engages the reader at a deep, aesthetic level.

A prime example is the famous six-word story written by Ernest Hemingway:

For sale: baby shoes, never worn.

He called it his best work, and rightfully so. To say so much with this simple picture is a monumental feat that authors have been trying to recreate since Hemingway’s day. Unsurprisingly, some mathematicians (for whom the art of proof had better not omit anything!) want to apply their principles to describe elegance.

Computation and Complexity

This study of artistic elegance will be from a computational perspective, and it will be based loosely on the paper of the same name. While we include the main content of the paper in a condensed form, we will deviate in two important ways: we alter an axiom with justification, and we provide a working implementation for the reader’s use. We do not require extensive working knowledge of theoretical computation, but the informed reader should be aware that everything here is theoretically performed on a Turing machine, but the details are unimportant.

So let us begin with the computational characterization of simplicity. Unfortunately, due to our own lack of knowledge of the subject, we will overlook the underlying details and take them for granted. [At some point in the future, we will provide a primer on Kolmogorov complexity. We just ordered a wonderful book on it, and can’t wait to dig into it!]

Here we recognize that all digital images are strings of bits, and so when we speak of the complexity of a string, in addition to meaning strings in general, we specifically mean the complexity of an image.

Definition: The Kolmogorov complexity of a string is the length of the shortest program which generates it.

In order to specify “length” appropriately, we must fix some universal description language, so that all programs have the same frame of reference. Any Turing-complete programming language will do, so let us choose Python for the following examples. More specifically, there exists a universal Turing machine $ U$, for which any program on any machine may be translated (compiled) into an equivalent program for $ U$ by a program of fixed size. Hence, the measure of Kolmogorov complexity, when a fixed machine is specified (in this case Python), is objective over the class of all outputs.

Here is a simple example illustrating Kolmogorov complexity: consider the string of one hundred zeros. This string is obviously not very “complex,” in the sense that one could write a very short program to generate it. In Python:

print "0" * 100

One can imagine that a compiler which optimizes for brevity would output rather short assembly code as well, with a single print instruction and a conditional branch, and some constants. On the other hand, we want to call a string like

“00111010010000101101001110101000111101”

complex, because it follows no apparent pattern. Indeed, in Python the shortest program to output this string is just to print the string itself:

print "00111010010000101101001110101000111101"

And so we see that this random string of ones and zeros has a higher Kolmogorov complexity than the string of all zeros. In other words, the boring string of all zeros is “simple,” while the other is “complicated.”

Kolmogorov himself proved that there is no algorithm to compute Kolmogorov complexity (the number itself) for any input. In other words, the problem of determining exact Kolmogorov complexity is undecidable (by reduction from the halting problem; see the Turing machines primer). So we will not try in vain to actually get a number for the Kolmogorov complexity of arbitrary programs, although it is easy to count the lengths of these provably short examples, and instead we speak of complexity in terms of bounds and relativity.

Kolmogorov Meets Picasso

To apply this to art, we want to ask, “for a given picture, what is the length of the shortest program that outputs it?” This will tell us whether a picture is simple or complex. Unfortunately for us, most pictures are neither generated by programs, nor do they have obvious programmatic representations. More feasibly, we can ask, “can we come up with pictures which have low Kolmogorov complexity and are also beautiful?” This is truly a tough task.

To do so, we must first invent an encoding for pictures, and write a program to interpret the encoding. That’s the easy part. Then, the true test, we must paint a beautiful picture.

We don’t pretend to be capable of such artistry. However, there are some who have created an encoding based on circles and drawn very nice pictures with it. Here we will present those pictures as motivation, and then develop a very similar encoding method, providing the code and examples for the reader to play with.

Jürgen Schmidhuber, a long time proponent of low-complexity art, spent a very long time (on the order of thousands of sketches) creating drawings using his circle encoding method, and here are some of his results:

    

Marvelous. Our creations will be much uglier. But we admit, one must start somewhere, and it might as well be where we feel most comfortable: mathematics and programming.

Magnificence Meets Method

There are many possible encodings for drawings. We will choose one which is fairly easy to implement, and based on intersecting circles. The strokes in a drawing are arcs of these circles. We call the circles used to generate drawings legal circles, while the arcs are legal arcs. Here is an axiomatic specification of how to generate legal circles:

  1. Arbitrarily define the a circle $ C$ with radius 1 as legal. All other circles are generated with respect to this circle. Define a second legal circle whose center is on $ C$, and also has radius 1.
  2. Wherever two legal circles of equal radius intersect, a third circle of equal radius is centered at the point of intersection.
  3. Every legal circle of radius $ r$ has at its center another legal circle of radius $ r/2$.

A legal arc is then simply any arc of a legal circle, and a legal drawing is any list of legal arcs, where each arc has a width corresponding to some fixed set of values. Now we generate all circles which intersect the interior of the base circle $ C$, and sort them first by radius, then by $ x$ coordinate, then $ y$ coordinate. Now given a specified order on the circles, we may number them from 1 to $ n$, and specify a particular circle by its index in the list. In this way, we have defined a coordinate space of arcs, with points of the form (center, thickness, arc-start, arc-end), where the arc-start and art-end coordinates are measured in radians.

We describe the programmatic construction of these circles later. For now, here is the generated picture of all circles which intersect the unit circle up to radius $ 2^{-5}$:

The legal circles

In addition, we provide an animation showing the different layers:

And another animation displaying the list circles sorted by index in increasing order. For an animated GIF, this file has a large size (5MB), and so we link to it separately.

As we construct smaller and smaller circles, the interior of the base circle is covered up by a larger proportion of legally usable area. By using obscenely small circles, we may theoretically construct any drawing. On the other hand, what we care about is how much information is needed to do so.

Because of our nice well ordering on circles, those circles with very small radii will have huge indices! Indeed, there are about four circles of radius $ 2^{-i-1}$ for each circle of radius $ 2^{-i}$ in any fixed area. Then, we can measure the complexity of a drawing by how many characters its list of legal arcs requires. Clearly, a rendition of Starry Night would have a large number of high-indexed circles, and hence have high Kolmogorov complexity. (On second thought, I wonder how hard it would be to get a rough sketch of a Starry-Night-esque picture in this circle encoding…it might not be all that complex).

Note that Schmidhuber defines things slightly differently. In particular, he requires that the endpoints of a legal arc must be the intersection points of two other legal arcs, making the arc-start and arc-end coordinates integers instead of radian measures. We respectfully disagree with this axiom, and we explain why here:

Which of the two arcs is more “complex”?

Of the two arcs in the picture to the left, which would you say is more complex, the larger or the smaller? We observe that two arcs of the same circle, regardless of how long or short they are, should not be significantly different in complexity.

Schmidhuber, on the other hand, implicitly claims that arcs which begin or terminate at non-standard locations (locations which only correspond to the intersections of sufficiently small circles) should be deemed more complex. But this can be a difference as small as $ \pi/100$, and it drastically alters the complexity. We consider this specification unrealistic, at least to the extent to which human beings consider complexity in art. So we stick to radians.

Indeed, our model does alter the complexity for some radian measures, simply because finely specifying fractions requires more bits than integral values. But the change in complexity is hardly as drastic.

In addition, Schmidhuber allows for region shading between legal arcs. Since we did not find an easy way to implement this in Mathematica, we skipped it as extraneous.

Such Stuff as Programs are Made of

We implemented this circle encoding in Mathematica. The reader is encouraged to download and experiment with the full notebook, available from this blog’s Github page. We will explain the important bits here.

First, we have a function to compute all the circles whose centers lie on a given circle:

borderCircleCenters[{x_, y_}, r_] :=
  Table[{x + r Cos[i 2 Pi/6], y + r Sin[i 2 Pi/6]}, {i, 0, 5}];

We arbitrarily picked the first legal circle to be the unit circle, defined with center (0,0), while the second has center (1,0). This made generating all legal circles a relatively simple search task. In addition, we recognize that any arbitrary second chosen circle is simply a rotation of this chosen configuration, so one may rotate their final drawing to accommodate for a different initialization step.

Second, we have the brute-force search of all circles. We loop through all circles in a list, generating the six border circles appropriately, and then filtering out the ones we need, repeating until we have all the circles which intersect the interior of the unit circle. Note our inefficiencies: we search out as far as radius 2 to find small circles which do not necessarily intersect the unit circle, and we calculate the border circles of each circle many times. On the other hand, finding all circles as small as radius $ 2^{-5}$ takes about a minute on an Intel Atom processor, which is not so slow to need excessive tuning for a prototype’s sake.

getAllCenters[r_] := Module[{centers, borderCenters, searchR,
                             ord, rt},
   ord[{a_, b_}, {c_, d_}] := If[a < c, True, b < d];
   centers = {{0, 0}};

   rt = Power[r, 1/2];
   While[Norm[centers[[-1]]] <= Min[2, 1 + rt],
    borderCenters = Map[borderCircleCenters[#, r] &, centers];
    centers = centers \[Union] Flatten[borderCenters, 1]];

   Sort[Select[centers, Norm[#] < 1 + r &], ord]
   ];

Finally, we have a function to extract from the resulting list of all centers the center and radius of a given index, and a function to convert a coordinate to its graphical representation:

(* extracts a pair {center, radius} given the
   index of the circle *)
indexToCenterRadius[layeredCenters_, index_] :=
  Module[{row, length, counter},
   row = 1;
   length = Length[layeredCenters[[row]]];
   counter = index;

   While[counter > length,
    counter -= length;
    row++;
    length = Length[layeredCenters[[row]]];
    ];

   {layeredCenters[[row, counter]], 1/2^(row - 1)}
   ];

drawArc[{index_, thickness_, arcStart_, arcEnd_}] :=
  Module[{center, radius},
   {center, radius} = indexToCenterRadius[allCenters, index];
   Graphics[{Thickness[thickness],
     Circle[center, radius, {arcStart, arcEnd}]},
     ImagePadding -> 5, PlotRange -> {{-1, 1}, {-1, 1}},
     ImageSize -> {400, 400}]
   ];

And a front-end style function, which takes a list of coordinates and draws the resulting picture:

paint[coordinates_] := Show[Map[drawArc, coordinates]];

Any omitted details (at least one global variable name) are clarified in the notebook.

Now, with our paintbrush in hand, we unveil our very first low-complexity piece of art. Behold! Surprised Mr. Moustache Witnessing a Collapsing Soufflé:

Surprised Mr. Moustache, © Jeremy Kun, 2011

It’s coordinates are:

{{7, 0.005, 0, 2 Pi}, {197, 0.002, 0, 2 Pi},
{299, 0.002, 0, 2 Pi}, {783, 0.002, 0, 2 Pi},
{2140, 0.001, 0, 2 Pi}, {3592, 0.001, 0, 2 Pi},
{22, 0.004, 8 Pi/6, 10 Pi/6}, {29, 0.004, 4 Pi/3, 5 Pi/3},
 {21, 0.004, Pi/3, 2 Pi/3}, {28, 0.004, Pi/3, 2 Pi/3}}

Okay, so it’s lame, and took all of ten minutes to create (guess-and-check on the indices is quick, thanks to Mathematica’s interpreter). But it has low Kolmogorov complexity! And that’s got to count for something, right?

Even if you disagree with our obviously inspired artistic genius, the Mathematica framework for creating such drawings is free and available for anyone to play with. So please, should you have any artistic talent at all (and access to Mathematica), we would love to see your low-complexity art! If we somehow come across three days of being locked in a room with access to nothing but a computer and a picture of Starry Night, we might attempt to recreate a sketch of it for this blog. But until then, we will explore other avenues.

Happy sketching!

Addendum: Note that the outstanding problem here is how to algorithmically take a given picture (or specification of what one wants to draw), and translate it into this system of coordinates. As of now, no such algorithm is known, and hence we call the process of making a drawing art. We may attempt to find such a method in the future, but it is likely hard, and if we produced an algorithm even a quarter as good as we might hope, we would likely publish a paper first, and blog about it second.