Elliptic Curves as Elementary Equations

Finding solutions to systems of polynomial equations is one of the oldest and deepest problems in all of mathematics. This is broadly the domain of algebraic geometry, and mathematicians wield some of the most sophisticated and abstract tools available to attack these problems.

The elliptic curve straddles the elementary and advanced mathematical worlds in an interesting way. On one hand, it’s easy to describe in elementary terms: it’s the set of solutions to a cubic function of two variables. But despite how simple they seem deep theorems govern their behavior, and many natural questions about elliptic curves are still wide open. Since elliptic curves provide us with some of the strongest and most widely used encryption protocols, understanding elliptic curves more deeply would give insight into the security (or potential insecurity) of these protocols.

Our first goal in this series is to treat elliptic curves as mathematical objects, and derive the elliptic curve group as the primary object of study. We’ll see what “group” means next time, and afterward we’ll survey some of the vast landscape of unanswered questions. But this post will be entirely elementary, and will gently lead into the natural definition of the group structure on an elliptic curve.

Elliptic Curves as Equations

The simplest way to describe an elliptic curve is as the set of all solutions to a specific kind of polynomial equation in two real variables, $ x,y$. Specifically, the equation has the form:

$ \displaystyle y^2 = x^3 + ax + b$

Where $ a,b$ are real numbers such that

$ \displaystyle -16(4a^3 + 27b^2) \neq 0$

One would naturally ask, “Who the hell came up with that?” A thorough answer requires a convoluted trip through 19th and 20th-century mathematical history, but it turns out that this is a clever form of a very natural family of equations. We’ll elaborate on this in another post, but for now we can give an elementary motivation.

Say you have a pyramid of spheres whose layers are squares, like the one below


We might wonder when it’s the case that we can rearrange these spheres into a single square. Clearly you can do it for a pyramid of height 1 because a single ball is also a 1×1 square (and one of height zero if you allow a 0x0 square). But are there any others?

This question turns out to be a question about an elliptic curve. First, recall that the number of spheres in such a pyramid is given by

$ \displaystyle 1 + 4 + 9 + 16 + \dots + n^2 = \frac{n(n+1)(2n+1)}{6}$

And so we’re asking if there are any positive integers $ y$ such that

$ \displaystyle y^2 = \frac{x(x+1)(2x+1)}{6}$

Here is a graph of this equation in the plane. As you admire it, though, remember that we’re chiefly interested in integer solutions.


The equation doesn’t quite have the special form we mentioned above, but the reader can rest assured (and we’ll prove it later) that one can transform our equation into that form without changing the set of solutions. In the meantime let’s focus on the question: are there any integer-valued points on this curve besides $ (0,0)$ and $ (1,1)$? The method we use to answer this question comes from ancient Greece, and is due to Diophantus. The idea is that we can use the two points we already have to construct a third point. This method is important because it forms the basis for our entire study of elliptic curves.

Take the line passing through $ (0,0)$ and  $ (1,1)$, given by the equation $ y = x$, and compute the intersection of this line and the original elliptic curve. The “intersection” simply means to solve both equations simultaneously. In this case it’s

$ \begin{aligned} y^2 &= \frac{x(x+1)(2x+1)}{6} \\ y &= x \end{aligned}$

It’s clear what to do: just substitute the latter in for the former. That is, solve

$ \displaystyle x^2 = \frac{x(x+1)(2x+1)}{6}$

Rearranging this into a single polynomial and multiplying through by 3 gives

$ \displaystyle x^3 – \frac{3x^2}{2} + \frac{x}{2} = 0$

Factoring cubics happens to be easy, but let’s instead use a different trick that will come up again later. Let’s use a fact that is taught in elementary algebra and precalculus courses and promptly forgotten, that the sum of the roots of any polynomial is $ \frac{-a_{n-1}}{a_n}$, where $ a_{n}$ is the leading coefficient and $ a_{n-1}$ is the next coefficient. Here $ a_n = 1$, so the sum of the roots is $ 3/2$. This is useful because we already know two roots, namely the solutions 0 and 1 we used to define the system of equations in the first place. So the third root satisfies

$ \displaystyle r + 0 + 1 = \frac{3}{2}$

And it’s $ r = 1/2$, giving the point $ (1/2, 1/2)$ since the line was $ y=x$. Because of the symmetry of the curve, we also get the point $ (1/2, -1/2)$.

Here’s a zoomed-in picture of what we just did to our elliptic curve. We used the two pink points (which gave us the dashed line) to find the purple point.


The bad news is that these two new points don’t have integer coordinates. So it doesn’t answer our question. The good news is that now we have more points! So we can try this trick again to see if it will give us still more points, and hope to find some that are integer valued. (It sounds like a hopeless goal, but just hold out a bit longer). If we try this trick again using $ (1/2, -1/2)$ and $ (1,1)$, we get the equation

$ \displaystyle (3x – 2)^2 = \frac{x(x+1)(2x+1)}{6}$

And redoing all the algebraic steps we did before gives us the solution $ x=24, y=70$. In other words, we just proved that

$ \displaystyle 1^2 + 2^2 + \dots + 24^2 = 70^2$

Great! Here’s another picture showing what we just did.


In reality we don’t care about this little puzzle. Its solution might be a fun distraction (and even more distracting: try to prove there aren’t any other integer solutions), but it’s not the real treasure. The mathematical gem is the method of finding the solution. We can ask the natural question: if you have two points on an elliptic curve, and you take the line between those two points, will you always get a third point on the curve?

Certainly the answer is no. See this example of two points whose line is vertical.


But with some mathematical elbow grease, we can actually force it to work! That is, we can define things just right so that the line between any two points on an elliptic curve will always give you another point on the curve. This sounds like mysterious black magic, but it lights the way down a long mathematical corridor of new ideas, and is required to make sense of using elliptic curves for cryptography.

Shapes of Elliptic Curves

Before we continue, let’s take a little detour to get a good feel for the shapes of elliptic curves. We have defined elliptic curves by a special kind of equation (we’ll give it a name in a future post). During most of our study we won’t be able to make any geometric sense of these equations. But for now, we can pretend that we’re working over real numbers and graph these equations in the plane.

Elliptic curves in the form $ y^2 = x^3 + ax + b$ have a small handful of different shapes that we can see as $ a,b$ vary:


The problem is when we cross the point at which the rounded part pinches off in the first animation, and the circular component appears in the second. At those precise moments, the curve becomes “non-smooth” (or singular), and for reasons we’ll see later this is bad. The condition from the beginning of the article (that $ -16(4a^3 + 27b^2) \neq 0$) ensures that these two cases are excluded from consideration, and it’s one crucial part of our “elbow grease” to ensure that lines behave nicely.

The “canonical” shape of the elliptic curve is given by the specific example $ y^2 = x^3 – x + 1$. It’s the example that should pop up whenever you imagine an elliptic curve, and it’s the example we’ll use for all of our pictures.


So in the next post we’ll roll up our sleeves and see exactly how “drawing lines” can be turned into an algebraic structure on an elliptic curve.

Until then!

k-Means Clustering and Birth Rates

A common problem in machine learning is to take some kind of data and break it up into “clumps” that best reflect how the data is structured. A set of points which are all collectively close to each other should be in the same clump.

A simple picture will clarify any vagueness in this:


Here the data consists of points in the plane. There is an obvious clumping of the data into three pieces, and we want a way to automatically determine which points are in which clumps. The formal name for this problem is the clustering problem. That is, these clumps of points are called clusters, and there are various algorithms which find a “best” way to split the data into appropriate clusters.

The important applications of this are inherently similarity-based: if our data comes from, say, the shopping habits of users of some website, we may want to target a group of shoppers who buy similar products at similar times, and provide them with a coupon for a specific product which is valid during their usual shopping times. Determining exactly who is in that group of shoppers (and more generally, how many groups there are, and what the features the groups correspond to) if the main application of clustering.

This is something one can do quite easily as a human on small visualizable datasets, but the usual the digital representation (a list of numeric points with some number of dimensions) doesn’t yield any obvious insights. Moreover, as the data becomes more complicated (be it by dimension increase, data collection errors, or sheer volume) the “human method” can easily fail or become inconsistent. And so we turn to mathematics to formalize the question.

In this post we will derive one possible version of the clustering problem known as the k-means clustering or centroid clustering problem, see that it is a difficult problem to solve exactly, and implement a heuristic algorithm in place of an exact solution.

And as usual, all of the code used in this post is freely available on this blog’s Github page.

Partitions and Squared Deviations

The process of clustering is really a process of choosing a good partition of the data. Let’s call our data set $ S$, and formalize it as a list of points in space. To be completely transparent and mathematical, we let $ S$ be a finite subset of a metric space $ (X,d)$, where $ d$ is our distance metric.

Definition: We call a partition of a set $ S$ a choice of subsets $ A_1, \dots, A_n$ of $ S$ so that every element of $ S$ is in exactly one of the $ A_i$.

A couple of important things to note about partitions is that the union of all the $ A_i$ is $ S$, and that any two $ A_i, A_j$ intersect trivially. These are immediate consequences of the definition, and together provide an equivalent, alternative definition for a partition. As a simple example, the even and odd integers form a partition of the whole set of integers.

There are many different kinds of clustering problems, but every clustering problem seeks to partition a data set in some way depending on the precise formalization of the goal of the problem. We should note that while this section does give one of many possible versions of this problem, it culminates in the fact that this formalization is too hard to solve exactly. An impatient reader can safely skip to the following section where we discuss the primary heuristic algorithm used in place of an exact solution.

In order to properly define the clustering problem, we need to specify the desired features of a cluster, or a desired feature of the set of all clusters combined. Intuitively, we think of a cluster as a bunch of points which are all close to each other. We can measure this explicitly as follows. Let $ A$ be a fixed subset of the partition we’re interested in. Then we might want to optimize the sum of all of the distances of pairs of points within $ A$ to be a measure of it’s “clusterity.” In symbols, this would be

$ \displaystyle \sum_{x \neq y \in A} d(x, y)$

If this quantity is small, then it says that all of the points in the cluster $ A$ are close to each other, and $ A$ is a good cluster. Of course, we want all clusters to be “good” simultaneously, so we’d want to minimize the sum of these sums over all subsets in the partition.

Note that if there are $ n$ points in $ A$, then the above sum involves $ \choose{n}{2} \sim n^2$ distance calculations, and so this could get quite inefficient with large data sets. One of the many alternatives is to pick a “center” for each of the clusters, and try to minimize the sum of the distances of each point in a cluster from its center. Using the same notation as above, this would be

$ \displaystyle \sum_{x \in A} d(x, c)$

where $ c$ denotes the center of the cluster $ A$. This only involves $ n$ distance calculations, and is perhaps a better measure of “clusterity.” Specifically, if we use the first option and one point in the cluster is far away from the others, we essentially record that single piece of information $ n – 1$ times, whereas in the second we only record it once.

The method we will use to determine the center can be very general. We could use one of a variety of measures of center, like the arithmetic mean, or we could try to force one of the points in $ A$ to be considered the “center.” Fortunately, the arithmetic mean has the property that it minimizes the above sum for all possible choices of $ c$. So we’ll stick with that for now.

And so the clustering problem is formalized.

Definition: Let $ (X,d)$ be a metric space with metric $ d$, and let $ S \subset (X,d)$ be a finite subset. The centroid clustering problem is the problem of finding for any positive integer $ k$ a partition $ \left \{ A_1 ,\dots A_k \right \}$ of $ S$ so that the following quantity is minimized:

$ \displaystyle \sum_{i=1}^k\sum_{x \in A_i} d(x, c(A_i))$

where $ c(A_i)$ denotes the center of a cluster, defined as the arithmetic mean of the points in $ A_i$:

$ \displaystyle c(A) = \frac{1}{|A|} \sum_{x \in A} x$

Before we continue, we have a confession to make: the centroid clustering problem is prohibitively difficult. In particular, it falls into a class of problems known as NP-hard problems. For the working programmer, NP-hard means that there is unlikely to be an exact solution to the problem which is better than trying all possible partitions.

We’ll touch more on this after we see some code, but the salient fact is that a heuristic algorithm is our best bet. That is, all of this preparation with partitions and squared deviations really won’t come into the algorithm design at all. Formalizing this particular problem in terms of sets and a function we want to optimize only allows us to rigorously prove it is difficult to solve exactly. And so, of course, we will develop a naive and intuitive heuristic algorithm to substitute for an exact solution, observing its quality in practice.

Lloyd’s Algorithm

The most common heuristic for the centroid clustering problem is Lloyd’s algorithm, more commonly known as the k-means clustering algorithm. It was named after its inventor Stuart Lloyd, a University of Chicago graduate and member of the Manhattan project who designed the algorithm in 1957 during his time at Bell Labs.

Heuristics tend to be on the simpler side, and Lloyd’s algorithm is no exception. We start by fixing a number of clusters $ k$ and choosing an arbitrary initial partition $ A = \left \{ A_1, \dots, A_k \right \}$. The algorithm then proceeds as follows:

   compute the arithmetic mean c[i] of each A[i]
   construct a new partition B:
      each subset B[i] is given a center c[i] computed from A
      x is assigned to the subset B[i] whose c[i] is closest
   stop if B is equal to the old partition A, else set A = B

Intuitively, we imagine the centers of the partitions being pulled toward the center of mass of the points in its currently assigned cluster, and then the points deciding selectively who to pull towards them. (Indeed, precisely because of this the algorithm may not always give sensible results, but more on that later.)

One who is in tune with their inner pseudocode will readily understand the above algorithm. But perhaps the simplest way to think about this algorithm is functionally. That is, we are constructing this partition-updating function $ f$ which accepts as input a partition of the data and produces as output a new partition as follows: first compute the mean of centers of the subsets in the old partition, and then create the new partition by gathering all the points closest to each center. These are the fourth and fifth lines of the pseudocode above.

Indeed, the rest of the pseudocode is merely pomp and scaffolding! What we are really after is a fixed point of the partition-updating function $ f$. In other words, we want a partition $ P$ such that $ f(P) = P$. We go about finding one in this algorithm by applying $ f$ to our initial partition $ A$, and then recursively applying $ f$ to its own output until we no longer see a change.

Perhaps we should break away from traditional pseudocode illegibility and rewrite the algorithm as follows:

define updatePartition(A):
   let c[i] = center(A[i])
   return a new partition B:
      each B[i] is given the points which are closest to c[i]

compute a fixed point by recursively applying 
updatePartition to any initial partition.

Of course, the difference between these pseudocode snippets is just the difference between functional and imperative programming. Neither is superior, but the perspective of both is valuable in its own right.

And so we might as well implement Lloyd’s algorithm in two such languages! The first, weighing in at a whopping four lines, is our Mathematica implementation:

closest[x_, means_] :=
  means[[First[Ordering[Map[EuclideanDistance[x, #] &, means]]]]];

partition[points_, means_] := GatherBy[points, closest[#, means]&];
updatePartition[points_, old_] := partition[points, Map[Mean, old]];

kMeans[points_, initialMeans_] := FixedPoint[updatePartition[points, #]&, partition[points, initialMeans]];

While it’s a little bit messy (as nesting 5 function calls and currying by hand will inevitably be), the ideas are simple. The “closest” function computes the closest mean to a given point $ x$. The “partition” function uses Mathematica’s built-in GatherBy function to partition the points by the closest mean; GatherBy[L, f] partitions its input list L by putting together all points which have the same value under f. The “updatePartition” function creates the new partition based on the centers of the old partition. And finally, the “kMeans” function uses Mathematica’s built-in FixedPoint function to iteratively apply updatePartition to the initial partition until there are no more changes in the output.

Indeed, this is as close as it gets to the “functional” pseudocode we had above. And applying it to some synthetic data (three randomly-sampled Gaussian clusters that are relatively far apart) gives a good clustering in a mere two iterations:


Indeed, we rarely see a large number of iterations, and we leave it as an exercise to the reader to test Lloyd’s algorithm on random noise to see just how bad it can get (remember, all of the code used in this post is available on this blog’s Github page). One will likely see convergence on the order of tens of iterations. On the other hand, there are pathologically complicated sets of points (even in the plane) for which Lloyd’s algorithm takes exponentially long to converge to a fixed point. And even then, the solution is never guaranteed to be optimal. Indeed, having the possibility for terrible run time and a lack of convergence is one of the common features of heuristic algorithms; it is the trade-off we must make to overcome the infeasibility of NP-hard problems.

Our second implementation was in Python, and compared to the Mathematica implementation it looks like the lovechild of MUMPS and C++. Sparing the reader too many unnecessary details, here is the main function which loops the partition updating, a la the imperative pseudocode:

def kMeans(points, k, initialMeans, d=euclideanDistance):
   oldPartition = []
   newPartition = partition(points, k, initialMeans, d)

   while oldPartition != newPartition:
      oldPartition = newPartition
      newMeans = [mean(S) for S in oldPartition]
      newPartition = partition(points, k, newMeans, d)

   return newPartition

We added in the boilerplate functions for euclideanDistance, partition, and mean appropriately, and the reader is welcome to browse the source code for those.

Birth and Death Rates Clustering

To test our algorithm, let’s apply it to a small data set of real-world data. This data will consist of one data point for each country consisting of two features: birth rate and death rate, measured in annual number of births/deaths per 1,000 people in the population. Since the population is constantly changing, it is measured at some time in the middle of the year to act as a reasonable estimate to the median of all population values throughout the year.

The raw data comes directly from the CIA’s World Factbook data estimate for 2012. Formally, we’re collecting the “crude birth rate” and “crude death rate” of each country with known values for both (some minor self-governing principalities had unknown rates). The “crude rate” simply means that the data does not account for anything except pure numbers; there is no compensation for the age distribution and fertility rates. Of course, there are many many issues affecting the birth rate and death rate, but we don’t have the background nor the stamina to investigate their implications here. Indeed, part of the point of studying learning methods is that we want to extract useful information from the data without too much human intervention (in the form of ambient knowledge).

Here is a plot of the data with some interesting values labeled (click to enlarge):


Specifically, we note that there is a distinct grouping of the data into two clusters (with a slanted line apparently separating the clusters). As a casual aside, it seems that the majority of the countries in the cluster on the right are countries with active conflict.

Applying Lloyd’s algorithm with $ k=2$ to this data results in the following (not quite so good) partition:


Note how some of the points which we would expect to be in the “left” cluster are labeled as being in the right. This is unfortunate, but we’ve seen this issue before in our post on k-nearest-neighbors: the different axes are on different scales. That is, death rates just tend to vary more wildly than birth rates, and the two variables have different expected values.

Compensating for this is quite simple: we just need to standardize the data. That is, we need to replace each data point with its deviation from the mean (with respect to each coordinate) using the usual formula:

$ \displaystyle z = \frac{x – \mu}{\sigma}$

where for a random variable $ X$, its (sample) expected value is $ \mu$ and its (sample) standard deviation is $ \sigma$. Doing this in Mathematica is quite easy:

Transpose[Map[Standardize, Transpose[L]]]

where L is a list containing our data. Re-running Lloyd’s algorithm on the standardized data gives a much better picture:


Now the boundary separating one cluster from the other is in line with what our intuition dictates it should be.

Heuristics… The Air Tastes Bitter

We should note at this point that we really haven’t solved the centroid clustering problem yet. There is one glaring omission: the choice of $ k$. This question is central to the problem of finding a good partition; a bad choice can yield bunk insights at best. Below we’ve calculated Lloyd’s algorithm for varying values of $ k$ again on the birth-rate data set.

Lloyd's algorithm processed on the birth-rate/death-rate data set with varying values of k between 2 and 7.

Lloyd’s algorithm processed on the birth-rate/death-rate data set with varying values of k between 2 and 7 (click to enlarge).

The problem of finding $ k$ has been addressed by many a researcher, and unfortunately the only methods to find a good value for $ k$ are heuristic in nature as well. In fact, many believe that to determine the correct value of $ k$ is a learning problem in of itself! We will try not to go into too much detail about parameter selection here, but needless to say it is an enormous topic.

And as we’ve already said, even if the correct choice of $ k$ is known, there is no guarantee that Lloyd’s algorithm (or any algorithm attempting to solve the centroid clustering problem) will converge to a global optimum solution. In the same fashion as our posts on cryptoanalysis and deck-stacking in Texas Hold ‘Em, the process of finding a minimum can converge to a local minimum.

Here is an example with four clusters, where each frame is a step, and the algorithm progresses from left to right (click to enlarge):

One way to alleviate the issues of local minima is the same here as in our other posts: simply start the algorithm over again from a different randomly chosen starting point. That is, as in our implementations above, our “initial means” are chosen uniformly at random from among the data set points. Alternatively, one may randomly partition the data (without respect to any center; each data point is assigned to one of the $ k$ clusters with probability $ 1/k$). We encourage the reader to try both starting conditions as an exercise, and implement the repeated algorithm to return that output which minimizes the objective function (as detailed in the “Partitions and Squared Deviations” section).

And even if the algorithm will converge to a global minimum, it might not be the case that it does so efficiently. As we already mentioned, solving the problem of centroid clustering (even for a fixed $ k$) is NP-hard. And so (assuming $ \textup{P} \neq \textup{NP}$) any algorithm which converges to a global minimum will take exponentially long on some pathological inputs. The interested reader will see this exponentially slow convergence even in the case of k=2 for points in the plane (that is as simple as it gets).

These kinds of reasons make Lloyd’s algorithm and the centroid clustering problem a bit of a poster child of machine learning. In theory it’s difficult to solve exactly, but it has an efficient and widely employed heuristic used in practice which is often good enough. Moreover, since the exact solution is more or less hopeless, much of the focus has shifted to finding randomized algorithms which on average give solutions that are within some constant-factor approximation of the true minimum.

A Word on Expectation Maximization

This algorithm shares quite a bit of features with a very famous algorithm called the Expectation-Maximization algorithm. We plan to investigate this after we spend some more time on probability theory on this blog, but the (very rough) idea is that the algorithm operates in two steps. First, a measure of “center” is chosen for each of a number of statistical models based on given data. Then a maximization step occurs which chooses the optimal parameters for those statistical models, in the sense that the probability that the data was generated by statistical models with those parameters is maximized. These statistical models are then used as the “old” statistical models whose centers are computed in the next step.

Continuing the analogy with clustering, one feature of expectation-maximization that makes it nice is it allows the sizes of the “clusters” to have varying sizes, whereas Lloyd’s algorithm tends to make its clusters have equal size (as we saw with varying values of $ k$ in our birth-rates example above).

And so the ideas involved in this post are readily generalizable, and the applications extend to a variety of fields like image reconstruction, natural language processing, and computer vision. The reader who is interested in the full mathematical details can see this tutorial.

Until next time!

The Cellular Automaton Method for Cave Generation

Dear reader, this post has an interactive simulation! We encourage you to play with it as you read the article below.

In our series of posts on cellular automata, we explored Conway’s classic Game of Life and discovered some interesting patterns therein. And then in our primers on computing theory, we built up a theoretical foundation for similar kinds of machines, including a discussion of Turing machines and the various computational complexity classes surrounding them. But cellular automata served us pretty exclusively as a toy. It was a basic model of computation, which we were interested in only for its theoretical universality. One wouldn’t expect too many immediately practical (and efficient) applications of something which needs a ridiculous scale to perform basic logic. In fact, it’s amazing that there are as many as there are.

In this post we’ll look at one particular application of cellular automata to procedural level generation in games.

An example of a non-randomly generated cave level from Bethesda’s The Elder Scrolls series.

The Need for More Caves

Level design in video games is a time-consuming and difficult task. It’s extremely difficult for humans to hand-craft areas that both look natural and are simultaneously fun to play in. This is particularly true of the multitude of contemporary role-playing games modeled after Dungeons and Dragons, in which players move through a series of areas defeating enemies, collecting items, and developing their character. With a high demand for such games and so many levels in each game, it would save an unfathomable amount of money to have computers generate the levels on the fly. Perhaps more importantly, a game with randomly generated levels inherently has a much higher replay value.

The idea of randomized content generation (often called procedural generation) is not particularly new. It has been around at least since the 1980’s. Back then, computers simply didn’t have enough space to store large, complex levels in memory. To circumvent this problem, video game designers simply generated the world as the player moved through it. This opened up an infinitude of possible worlds for the user to play in, and the seminal example of this is a game called Rogue, which has since inspired series such as Diablo, Dwarf Fortress, and many many others. The techniques used to design these levels have since been refined and expanded into a toolbox of techniques which have become ubiquitous in computer graphics and game development.

We’ll explore more of these techniques in the future, but for now we’ll see how a cellular automaton can be used to procedurally generate two-dimensional cave-like maps.

A Quick Review of Cellular Automata

While the interested reader can read more about cellular automata on this blog, we will give a quick refresher here.

For our purposes here, a 2-dimensional cellular automaton is a grid of cells $ G$, where each cell $ c \in G$ is in one of a fixed number of states, and has a pre-determined and fixed set of neighbors. Then $ G$ is updated by applying a fixed rule to each cell simultaneously, and the process is repeated until something interesting happens or boredom strikes the observer. The most common kind of cellular automaton, called a ‘Life-like automaton,’ has only two states, ‘dead’ and ‘alive’ (for us, 0 and 1), and the rule applied to each cell is given as conditions to be ‘born’ or ‘survive’ based on the number of adjacent live cells. This is often denoted “Bx/Sy” where x and y are lists of single digit numbers. Furthermore, the choice of neighborhood is the eight nearest cells (i.e., including the diagonally-adjacent ones). For instance, B3/S23 is the cellular automaton rule where a cell is born if it has three living neighbors, and it survives if it has either two or three living neighbors, and dies otherwise. Technically, these are called ‘Life-like automata,’ because they are modest generalizations of Conway’s original Game of Life. We give an example of a B3/S23 cellular automaton initialized by a finite grid of randomly populated cells below. Note that each of the black (live) cells in the resulting stationary objects satisfy the S23 part of the rule, but none of the neighboring white (dead) cells satisfy the B3 condition.

A cellular automaton should really be defined for an arbitrary graph (or more generally, an arbitrary state space). There is really nothing special about a grid other than that it’s easy to visualize. Indeed, some cellular automata are designed for hexagonal grids, others are embedded on a torus, and still others are one- or three-dimensional. Of course, nothing stops automata from existing in arbitrary dimension, or from operating with arbitrary (albeit deterministic) rules, but to avoid pedantry we won’t delve into a general definition here. It would take us into a discussion of discrete dynamical systems (of which there are many, often with interesting pictures).

It All Boils Down to a Simple Rule

Now the particular cellular automaton we will use for cave generation is simply B678/S345678, applied to a random initial grid with a fixed live border. We interpret the live cells as walls, and the dead cells as open space. This rule should intuitively work: walls will stay walls even if more cells are born nearby, but isolated or near-isolated cells will often be removed. In other words, this cellular automaton should ‘smooth out’ a grid arrangement to some extent. Here is an example animation quickly sketched up in Mathematica to witness the automaton in action:

An example cave generated via the automaton rule B678/S345678. The black cells are alive, and the white cells are dead.

As usual, the code to generate this animation (which is only a slight alteration to the code used in our post on cellular automata) is available on this blog’s Github page.

This map is already pretty great! It has a number of large open caverns, and they are connected by relatively small passageways. With a bit of imagination, it looks absolutely cavelike!

We should immediately note that there is no guarantee that the resulting regions of whitespace will be connected. We got lucky with this animation, in that there are only two disconnected components, and one is quite small. But in fact one can be left with multiple large caves which have no connecting paths.

Furthermore, we should note the automaton’s rapid convergence to a stable state. Unlike Conway’s Game of Life, in practice this automaton almost always converges within 15 steps, and this author has yet to see any oscillatory patterns. Indeed, they are unlikely to exist because the survival rate is so high, and our initial grid has an even proportion of live and dead cells. There is no overpopulation that causes cells to die off, so once a cell is born it will always survive. The only cells that do not survive are those that begin isolated. In a sense, B678/S345678 is designed to prune the sparse areas of the grid, and fill in the dense areas by patching up holes.

We should also note that the initial proportion of cells which are alive has a strong effect on the density of the resulting picture.  For the animation we displayed above, we initially chose that 45% of the cells would be live. If we increase that a mere 5%, we get a picture like the following.

A cave generated with the initial proportion of live cells equal to 0.5

As expected, there are many more disconnected caverns. Some game designers prefer a denser grid combined with heuristic methods to connect the caverns. Since our goal is just to explore the mathematical ideas, we will leave this as a parameter in our final program.

Javascript Implementation, and Greater Resolution

One important thing to note is that B678/S345678 doesn’t scale well to fine grid sizes. For instance, if we increase the grid size to 200×200, we get something resembling an awkward camouflage pattern.

A 200×200 grid cave generation. Click the image to enlarge it.

What we really want is a way to achieve the major features of the low-resolution image on a larger grid. Since cellular automata are inherently local manipulations, we should not expect any modification of B678/S345678 to do this for us. Instead, we will use B678/345678 to create a low-resolution image, increase its resolution manually, and smooth it out with — you guessed it — another cellular automaton! We’ll design this automaton specifically for the purpose of smoothing out corners.

To increase the resolution, we may simply divide the cells into four pieces. The picture doesn’t change, but the total number of cells increases fourfold. There are a few ways to do this programmatically, but the way we chose simply uses the smallest resolution possible, and simulates higher resolution by doing block computations. The interested programmer can view our Javascript program available on this blog’s Github page to see this directly (or view the page source of this post’s interactive simulator).

To design a “smoothing automaton,” we should investigate more closely what we need to improve on in the above examples. In particular, once we increase the resolution, we will have a lot of undesirable convex and concave corners. Since a “corner” is simply a block satisfying certain local properties, we can single those out to be removed by an automaton. It’s easy to see that convex corners have exactly 3 live neighbors, so we should not allow those cells to survive. Similarly, the white cell just outside a concave corner has 5 live neighbors, so we should allow that cell to be born. On the other hand, we still want the major properties of our old B678/S345678 to still apply, so we can simply add 5 to the B part and remove 3 from the S part. Lastly, for empirical reasons, we also decide to kill off cells with 4 live neighbors.

And so our final “smoothing automaton” is simply B5678/S5678.

We present this application as an interactive javascript program. Some basic instructions:

  • The “Apply B678/S345678” button does what you’d expect: it applies B678/S345678 to the currently displayed grid. It iterates the automaton 20 times in an animation.
  • The “Apply B5678/S5678” button applies the smoothing automaton, but it does so only once, allowing the user to control the degree of smoothing at the specific resolution level.
  • The “Increase Resolution” button splits each cell into four, and may be applied until the cell size is down to a single pixel.
  • The “Reset” button resets the entire application, creating a new random grid.

We used this program to generate a few interesting looking pictures by varying the order in which we pressed the various buttons (it sounds silly, but it’s an exploration!). First, a nice cave:

An example of a higher resolution cave created with our program. In order to achieve similar results, First apply B678/S345678, and then alternate increasing the resolution and applying B5678/S5678 1-3 times.

We note that this is not perfect. There are some obvious and awkward geometric artifacts lingering in this map, mostly in the form of awkwardly straight diagonal lines and awkwardly flawless circles. Perhaps one might imagine the circles are the bases of stalactites or stalagmites. But on the whole, in terms of keeping the major features of the original automaton present while smoothing out corners, this author thinks B5678/S5678 has done a phenomenal job. Further to the cellular automaton’s defense, when the local properties are applied uniformly across the entire grid, such regularities are bound to occur. That’s just another statement of the non-chaotic nature of B5678/S5678 (in stark contrast to Conway’s Game of Life).

There are various modifications one could perform (or choose not to, depending on the type of game) to make the result more accessible for the player. For instance, one could remove all regions which fit inside a sufficiently small circle, or add connections between the disconnected components at some level of resolution. This would require some sort of connected-component labeling, which is a nontrivial task; current research goes into optimizing connected-component algorithms for large-scale grids. We plan to cover such topics on this blog in the future.

Another example of a cool picture we created with this application might be considered a more “retro” style of cave.

Apply S678/B345678 once, and increase the resolution as much as possible before applying B5678/S5678 as many times as desired.

We encourage the reader to play around with the program to see what other sorts of creations one can make. As of the time of this writing, changing the initial proportion of live cells (50%) or changing the automaton rules cannot be done in the browser; it requires one to modify the source code. We may implement the ability to control these in the browser given popular demand, but (of course) it would be a wonderful exercise for the intermediate Javascript programmer.

Caves in Three Dimensions

It’s clear that this same method can be extended to a three-dimensional model for generating caverns in a game like Minecraft. While we haven’t personally experimented with three-dimensional cellular automata here on this blog, it’s far from a new idea. Once we reach graphics programming on this blog (think: distant future) we plan to revisit the topic and see what we can do.

Until then!

Numerical Integration

Rectangles, Trapezoids, and Simpson’s

I just wrapped up a semester of calculus TA duties, and I thought it would be fun to revisit the problem of integration from a numerical standpoint. In other words, the goal of this article is to figure out how fast we can approximate the definite integral of a function $ f:\mathbb{R} \to \mathbb{R}$. Intuitively, a definite integral is a segment of the area between a curve $ f$ and the $ x$-axis, where we allow area to be negative when $ f(x) < 0$.

If any of my former students are reading this, you should note that we will touch on quite a number of important ideas from future calculus courses, but until you see them in class, the proofs may seem quite mystical.

As usual, the source code for this post is available on this blog’s Github page.

The Baby Integral

Let’s quickly recall the definition of an integral $ \int_a^b f(x) dx$:

Definition: Let $ P = \left \{ [a_i, b_i] \right \}_{i = 1}^n$ be a partition of an interval $ [a,b]$ into $ n$ sub-intervals, let $ \Delta x_i = b_i – a_i$ be the length of each interval, and let $ \widehat{x_i}$ be a chosen point inside $ [a_i,b_i]$ (which we call a tag). Then a Riemann sum of $ f$ from $ a$ to $ b$ is a sum

$ \displaystyle R(f, P) = \sum_{i = 1}^n f(\widehat{x_i}) \Delta x_i$.

Geometrically, a Riemann sum approximates the area under the curve $ f$ by using sufficiently small rectangles whose heights are determined by the tagged points. The terms in the sum above correspond to the areas of the approximating rectangles.

We note that the intervals in question need not have the same lengths, and the points $ \widehat{x_i}$ may be chosen in any haphazard way one wishes. Of course, as we come up with approximations, we will pick the partition and tags very deliberately.

Definition: The integral $ \displaystyle \int_a^b f(x) dx$ is the limit of Riemann sums as the maximum length of any sub-interval in the partition goes to zero. In other words, for a fixed partition $ P$ let $ \delta_P = \max_i(\Delta x_i)$. Then

$ \displaystyle \int_a^b f(x) dx = \lim_{\delta_P \to 0}R(f, P)$

Another way to put this definition is that if you have any sequence of partitions $ P_k$ so that $ \delta_{P_k} \to 0$ as $ k \to \infty$, then the integral is just the limit of Riemann sums for this particular sequence of partitions.

Our first and most naive attempt at computing a definite integral is to interpret the definition quite literally. The official name for it is a left Riemann sum. We constrain our partitions $ P$ so that each sub-interval has the same length, namely $ \Delta x = (b-a)/n$. We choose our tags to be the leftmost points in each interval, so that if we name each interval $ [a_i, b_i]$, we have $ \widehat{x_i} = a_i$. Then we simply use a large enough value of $ n$, and we have a good approximation of the integral.

For this post we used Mathematica (gotta love those animations!), but the code to implement this is quite short in any language:

LeftRiemannSum[f_, n_, a_, b_] :=
 Module[{width = (b-a)/n},
  N[width * Sum[f[a + i*width], {i, 0, n-1}]]

Note that we may factor the constant “width” out of the sum, since here it does not depend on the interval. The only other detail is that Mathematica leaves all expressions as exact numbers, unless they are wrapped within a call to N[ ], which stands for “numerical” output. In most general languages numerical approximations are the default.

The computational complexity should be relatively obvious, as we require “one” computation per interval, and hence $ n$ computations for $ n$ intervals. (Really, it depends on the cost of evaluating $ f(x)$, but for the sake of complexity we can assume computing each term is constant.) And so this algorithm is $ O(n)$.

However, we should note that our concern is not necessarily computational complexity, but how fast the sum converges. In other words, we want to know how large $ n$ needs to be before we get a certain number of decimal places of accuracy.

For all of our tests and visualizations, we will use the following arbitrarily chosen, but sufficiently complicated function on $ [0, \pi]$:

$ \displaystyle f(x) = 5 \cos(x) \sin^10(x) + \frac{1}{5} \cos^9(x) \exp{\sqrt{x}}$

Our test function.

For n = 15, we have the following left Riemann sum:

A left Riemann Sum with n = 15

Here’s an animation of the sum as $ n \to 100$:

An animation of a left Riemann sum where n goes from 2 to 100

Unfortunately, it seems that on the regions where $ f$ is increasing, that portion of the Riemann sum is an underestimate, and where $ f$ is decreasing, the Riemann sum is an overestimate. Eventually this error will get small enough, but we note that even for $ n = 10,000$, the sum requires almost 7 seconds to compute, and only achieves 3 decimal places of accuracy! From a numerical standpoint, left Riemann sums converge slower than paint dries and are effectively useless. We can certainly do better.

More Geometric Intuition

Continuing with the geometric ideas, we could conceivably pick a better tag in each sub-interval. Instead of picking the left (or right) endpoint, why not pick the midpoint of each sub-interval? Then the rectangles will be neither overestimates nor underestimates, and hence the sums will be inherently more accurate. The change from a left Riemann sum to a midpoint Riemann sum is trivial enough to be an exercise for the reader (remember, the source code for this post is available on this blog’s Github page). We leave it as such, and turn to more interesting methods.

Instead of finding the area of a rectangle under the curve, let’s use a trapezoid whose endpoints are both on the curve. (Recall the area of a trapezoid, if necessary) We call this a trapezoid sum, and a first attempt at the code is not much different from the left Riemann sum:

TrapezoidSum[f_, n_, a_, b_] :=
 Module[{width = (b-a)/n},
  N[width * Sum[1/2 (f[a + i*width] + f[a + (i+1)*width]),
     {i, 0, n-1}]]

Here is a picture for $ n = 15$:

A trapezoid sum for n = 15

And an animation as $ n \to 100$:

An animation of the trapezoid sum as n goes to 100

The animation hints that this method converges much faster than left Riemann sums, and indeed we note that for $ n = 100$, the sum requires a mere .16 seconds, yet achieves the three decimal places of accuracy for which the left Riemann sum required $ n = 10,000$. This method appears to be a drastic improvement, and indeed plotting the accuracy of left Riemann sums against trapezoid sums gives a nice indication:

Errors of the left Riemann sums (blue, positive) and the trapezoid sums (red, negative) for increasing values of n.

Now that is quite an improvement!

Going back to the code, the computational complexity is again $ O(n)$, but we note that at a micro-efficiency level, we are being a bit wasteful. We call the function $ f$ twice for each trapezoid, even when adjacent trapezoids share edges and hence base heights. If a call to $ f$ is relatively expensive (and it just so happens that calls to Sin, Cos, Exp are somewhat expensive), then this becomes a significant issue. We leave it as an exercise to the adventurous reader to optimize the above code, so that no superfluous calls to $ f$ are made (hint: surround the function $ f$ with a cache, so that you can reuse old computations).

Before we move on to our final method for this post, we will take a no-so-short aside to give a proof of how accurate the trapezoid rule is. In fact, we will give an upper bound on the error of trapezoid sums based solely on $ n$ and easy properties of $ f$ to compute. In it’s raw form, we have the following theorem:

Theorem: Supposing $ f: [a,b] \to \mathbb{R}$ is thrice differentiable, let $ h = (b-a)/n$, let $ a_i$ be $ a + (i-1)h$, and let $ T_n(f)$ be a trapezoidal approximation of $ f$ with $ n$ trapezoids, as above. Then

$ \displaystyle T_n(f) – \int_{a}^b f(x) dx = \frac{b-a}{n} h^2 f”(c)$ for some $ c \in [a,b]$

Proof. Let $ \varphi_i : [0,h] \to \mathbb{R}$ be defined by

$ \displaystyle \varphi_i(t) = \frac{t}{2}(f(a_i) + f(a_i + t)) – \int_{a_i}^{a_i + t} f(x) dx$

We claim that $ \sum \limits_{i=1}^n \varphi_i(h) = T_n(f) – \int_a^b f(x) dx$, and one can see this by simply expanding the sum according to the definition of $ \varphi_i$. Now we turn to the question of bounding $ \varphi_i$ on $ [0,h]$.

We note $ \varphi_i(0) = 0$, and by the fundamental theorem of calculus:

$ \displaystyle \varphi_i'(t) = \frac{1}{2}(f(a_i) – f(a_i + t)) + \frac{t}{2}f'(a_i + t)$

Furthermore, $ \varphi_i'(0) = 0$ as is evident by the above equation, and differentiating again gives us

$ \displaystyle \varphi_i”(t) = \frac{t}{2}f”(a_i + t)$

With again $ \varphi_i”(0) = 0$.

As $ f”$ is continuous, the extreme-value theorem says there exist bounds for $ f$ on $ [a,b]$. We call the lower one $ m$ and the upper $ M$, so that

$ \displaystyle \frac{1}{2}mt \leq \varphi_i”(t) \leq \frac{1}{2}Mt$

Taking definite integrals twice on $ [0,t]$, we get

$ \displaystyle \frac{1}{12}mt^3 \leq \varphi_i(t) \leq \frac{1}{12}Mt^3$

Then the sum of all the $ \varphi_i(h)$ may be bounded by

$ \displaystyle \frac{n}{12}mh^3 \leq \sum \limits_{i=1}^n \varphi_i(h) \leq \frac{n}{12}Mh^3$

The definition of $ h$ and some simplification gives

$ \displaystyle \frac{b-a}{12}mh^2 \leq \sum \limits_{i=1}^n \varphi_i(h) \leq \frac{(b-a)}{12}Mh^2$

And from here we note that by continuity, $ f”(x)$ obtains every value between its bounds $ m, M$, so that for some $ c \in [a,b], f”(c)$ obtains the value needed to make the middle term equal to $ \frac{b-a}{12}h^2 f”(c)$, as desired.

As a corollary, we can bound the magnitude of the error by using the larger of $ |m|, |M|$, to obtain a fixed value $ B$ such that

$ \displaystyle \left | T_n(f) – \int_a^b f(x) dx \right | \leq \frac{(b-a)}{12}h^2 B$

$ \square$

So what we’ve found is that the error of our trapezoidal approximation (in absolute value) is proportional to the function $ 1/n^2$. There is a similar theorem about the bounds of a left Riemann sum, and we leave it as an exercise to the reader to find and prove it (hint: use a similar argument, or look at the Taylor expansion of $ f$ at the left endpoint).

Interpolating Polynomials and Simpson’s Rule

One way to interpret the left Riemann sum is that we estimate the integral by integrating a step function which is close to the actual function. For the trapezoidal rule we estimate the function by piecewise lines, and integrate that. Naturally, our next best approximation would be estimating the function by piecewise quadratic polynomials. To pick the right ones, we should investigate the idea of an interpolating polynomial.

Following the same pattern, given one point there is a unique constant function (degree zero polynomial) passing through that point. Given two points there is a unique line (degree one) which contains both. We might expect that given $ n+1$ points in the plane (in general position), there is a unique degree $ n$ polynomial passing through them.

For three points, $ (x_1, y_1), (x_2, y_2), (x_3, y_3)$ we may concoct a working curve as follows (remember $ x$ is the variable here)

$ \displaystyle \frac{(x-x_1)(x-x_2)}{(x_3-x_1)(x_3-x_2)}y_3 + \frac{(x-x_1)(x-x_3)}{(x_2-x_1)(x_2-x_3)}y_2 + \frac{(x-x_2)(x-x_3)}{(x_1-x_2)(x_1-x_3)}y_1$

Notice that each of the terms are 2nd degree, and plugging in any one of the three given $ x$ values annihilates two of the terms, and gives us precisely the right $ y$ value in the remaining term. We may extend this in the obvious way to establish the interpolating polynomial for a given set of points. A proof of uniqueness is quite short, as if $ p, q$ are two such interpolating polynomials, then $ p-q$ has $ n+1$ roots, but is at most degree $ n$. It follows from the fundamental theorem of algebra that $ p-q$ must be the zero polynomial.

If we wish to approximate $ f$ with a number of these quadratics, we can simply integrate the polynomials which interpolate $ (a_i, f(a_i)), (a_{i+1}, f(a_{i+1})), (a_{i+2}, f(a_{i+2}))$, and do this for $ i = 1, 3, \dots, n-2$. This is called Simpson’s Rule, and it gives the next level of accuracy for numerical integration.

With a bit of algebra, we may write the integrals of the interpolating polynomials in terms of the points themselves. Without loss of generality, assume the three points are centered at 0, i.e. the points are $ (- \delta, y_1), (0, y_2), (\delta, y_3)$. This is fine because shifting the function left or right does not change the integral. Then the interpolating polynomial is (as above),

$ \displaystyle p(x) = \frac{(x+\delta)(x)}{2 \delta^2}y_3 + \frac{(x+\delta)(x-\delta)}{-\delta^2}y_2 + \frac{(x)(x-\delta)}{2 \delta^2}y_1$.

Integrating this over $ [-\delta, \delta]$ and simplifying gives the quantity

$ \displaystyle \int_{-\delta}^{\delta} p(x) dx = \frac{\delta}{3}(y_1 + 4y_2 + y_3)$.

And so we may apply this to each pair of sub-intervals (or work with the midpoints of our $ n$ sub-intervals), to get our quadratic approximation of the entire interval. Note, though that as we range over all sub-intervals, the two endpoints will be counted twice, so our entire approximation is

$ f(a) + 2f(a+\delta) + 4f(a+ 2\delta) + 2f(a + 3\delta) + \dots + 2f(a + (2n-1) \delta) + f(b)$

Translating this into code is as straightforward as one could hope:

SimpsonsSum[f_, n_, a_, b_] :=
 Module[{width = (b-a)/n, coefficient},
  coefficient[i_?EvenQ] = 2;
  coefficient[i_?OddQ] = 4;
  N[width/3 * (f[a] + f[b] +
     Sum[coefficient[i] * f[a + i*width], {i, 1, n-1}])]

As usual, here is a picture for $ n = 8$:

Simpson’s Rule for n=8 (4 interpolated quadratics)

And an animation showing the convergence:

An animation showing the convergence of Simpson’s Rule

There is a similar bound for Simpson’s Rule as there was for the trapezoid sums. Here it is:

Theorem: Supposing $ f$ is five times differentiable, and let $ B$ be the maximum of the values $ |f^{(4)}(c)|$ for $ c$ in $ [a,b]$. Then the magnitude of the difference between the true integral $ \int_a^b f(x) dx$ and the Simpson’s Rule approximation with $ n$ interpolated polynomials is at most

$ \displaystyle \frac{(b-a)^5}{180 n^4}B$

The proof is similar in nature to the proof for trapezoid sums, but requires an annoying amount of detail. It’s quite difficult to find a complete proof of the error estimate for reference. This is probably because it is a special case of a family of Newton-Cotes formulas. We leave the proof as a test of gumption, and provide a reference to a slightly more advanced treatment by Louis A. Talman.

As an easier cop-out, we show a plot of the convergence of Simpson’s Rule versus the trapezoid sums:

The error convergence of Simpson’s Rule (red, above), versus the trapezoid sums (blue, below) for increasing values of n.

Judging by the graph, the improvement from trapezoid sums to Simpson’s rule is about as drastic as the improvement from Riemann sums to trapezoid sums.

Final Thoughts, and Future Plans

There are a host of other integral approximations which fall in the same category as what we’ve done here. Each is increasingly accurate, but requires a bit more computation in each step, and the constants involved in the error bound are based on larger and larger derivatives. Unfortunately, in practice it may hard to bound large derivatives of an arbitrary function, so confidence in the error bounds for simpler methods might be worth the loss of efficiency for some cases.

Furthermore, we always assumed that the length of each sub-interval was uniform across the entire partition. It stands to reason that some functions are wildly misbehaved in small intervals, but well-behaved elsewhere. Consider, for example, $ \sin(1/x)$ on $ (0, \pi)$. It logically follows that we would not need small intervals for the sub-intervals close to $ \pi$, but we would need increasingly small intervals close to 0. We will investigate such methods next time.

We will also investigate the trials and tribulations of multidimensional integrals. If we require 100 evenly spaced points to get a good approximation of a one-dimensional integral, then we would require $ 100^2$ evenly spaced points for a two-dimensional integral, and once we start working on interesting problems in 36,000-dimensional space, integrals will require $ 100^{36,000}$ evenly spaced points, which is far greater than the number of atoms in the universe (i.e., far exceeds the number of bits available in all computers, and hence cannot be computed). We will investigate alternative methods for evaluating higher-dimensional integrals, at least one of which will be based on random sampling.

Before we close, we note that even today the question of how to approximate integrals is considered important research. Within the last twenty years there have been papers generalizing these rules to arbitrary spaces, and significant (“clever”) applications to the biological sciences. Here are two examples: Trapezoidal rules in Sobolev spaces, and  Trapezoidal rule and Simpson’s rule for “rough” continuous functions. Of course, as we alluded to, when dimension goes up integrals become exponentially harder to compute. As our world is increasingly filled with high-dimensional data, rapid methods for approximating  integrals in arbitrary dimension is worth quite a bit of money and fame.

Until next time!