Properties of Morphisms

This post is mainly mathematical. We left it out of our introduction to categories for brevity, but we should lay these definitions down and some examples before continuing on to universal properties and doing more computation. The reader should feel free to skip this post and return to it later when the words “isomorphism,” “monomorphism,” and “epimorphism” come up again. Perhaps the most important part of this post is the description of an isomorphism.

Isomorphisms, Monomorphisms, and Epimorphisms

Perhaps the most important paradigm shift in category theory is the focus on morphisms as the main object of study. In particular, category theory stipulates that the only knowledge one can gain about an object is in how it relates to other objects. Indeed, this is true in nearly all fields of mathematics: in groups we consider all isomorphic groups to be the same. In topology, homeomorphic spaces are not distinguished. The list goes on. The only way to do determine if two objects are “the same” is by finding a morphism with special properties. Barry Mazur gives a more colorful explanation by considering the meaning of the number 5 in his essay, “When is one thing equal to some other thing?” The point is that categories, more than existing to be a “foundation” for all mathematics as a formal system (though people are working to make such a formal system), exist primarily to “capture the essence” of mathematical discourse, as Mazur puts it. A category defines objects and morphisms, but literally all of the structure of a category lies in its morphisms. And so we study them.

The strongest kind of morphism we can consider is an isomorphism. An isomorphism is the way we say two objects in a category are “the same.” We don’t usually care whether two objects are equal, but rather whether some construction is unique up to isomorphism (more on that when we talk of universal properties). The choices made in defining morphisms in a particular category allow us to strengthen or weaken this idea of “sameness.”

Definition: A morphism $f : A \to B$ in a category $\mathbf{C}$ is an isomorphism if there exists a morphism $g: B \to A$ so that both ways to compose $f$ and $g$ give the identity morphisms on the respective objects. That is,

$gf = 1_A$ and $fg = 1_B$.

The most basic (usually obvious, but sometimes very deep) question in approaching a new category is to ask what the isomorphisms are. Let us do this now.

In $\mathbf{Set}$ the morphisms are set-functions, and it is not hard to see that any two sets of equal cardinality have a bijection between them. As all bijections have two-sided inverses, two objects in $\mathbf{Set}$ are isomorphic if and only if they have the same cardinality. For example, all sets of size 10 are isomorphic. This is quite a weak notion of “sameness.” In contrast, there is a wealth of examples of groups of equal cardinality which are not isomorphic (the smallest example has cardinality 4). On the other end of the spectrum, a poset category $\mathbf{Pos}_X$ has no isomorphisms except for the identity morphisms. The poset categories still have useful structure, but (as with objects within a category) the interesting structure is in how a poset category relates to other categories. This will become clearer later when we look at functors, but we just want to dissuade the reader from ruling out poset categories as uninteresting due to a lack of interesting morphisms.

Consider the category $\mathbf{C}$ of ML types with ML functions as morphisms. An isomorphism in this category would be a function which has a two-sided inverse. Can the reader think of such a function?

Let us now move on to other, weaker properties of morphisms.

Definition: A morphism $f: A \to B$ is a monomorphism if for every object $C$ and every pair of morphisms $g,h: C \to A$ the condition $fg = fh$ implies $g = h$.

The reader should parse this notation carefully, but truly think of it in terms of the following commutative diagram:

Whenever this diagram commutes and $f$ is a monomorphism, then we conclude (by definition) that $g=h$. Remember that a diagram commuting just means that all ways to compose morphisms (and arrive at morphisms with matching sources and targets) result in an identical morphism. In this diagram, commuting is the equivalent of claiming that $fg = fh$, since there are only two nontrivial ways to compose.

The idea is that monomorphisms allow one to “cancel” $f$ from the left side of a composition (although, confusingly, this means the cancelled part is on the right hand side of the diagram).

The corresponding property for cancelling on the right is defined identically.

Definition: A morphism $f: A \to B$ is an epimorphism if for every object $C$ and every pair of morphism $g,h: B \to C$ the condition $gf = hf$ implies $g = h$.

Again, the relevant diagram.

Whenever $f$ is an epimorphism and this diagram commutes, we can conclude $g=h$.

Now one of the simplest things one can do when considering a category is to identify the monomorphisms and epimorphisms. Let’s do this for a few important examples.

Monos and Epis in Various Categories

In the category $\mathbf{Set}$, monomorphisms and epimorphisms correspond to injective and surjective functions, respectively. Lets see why this is the case for monomorphisms. Recall that an injective function $f$ has the property that $f(x) = f(y)$ implies $x=y$. With this property we can show $f$ is a monomorphism because if $f(g(x)) = f(h(x))$ then the injective property gives us immediately that $g(x) = h(x)$. Conversely, if $f$ is a monomorphism and $f(x) = f(y)$, we will construct a set $C$ and two convenient functions $g, h: C \to A$ to help us show that $x=y$. In particular, pick $C$ to be the one point set $\left \{ c \right \}$, and define $g(c) = x, h(c) = y$. Then as functions $fg = fh$. Indeed, there is only one value in the domain, so saying this amounts to saying $f(x) = fg(c) = fh(c) = f(y)$, which we know is true by assumption. By the monomorphism property $g = h$, so $x = g(c) = h(c) = y$.

Now consider epimorphisms. It is clear that a surjective map is an epimorphism, but the converse is a bit trickier. We prove by contraposition. Instead of now picking the “one-point set,” for our $C$, we must choose something which is one element bigger than $B$. In particular, define $g, h : B \to B'$, where $B'$ is $B$ with one additional point $x$ added (which we declare to not already be in $B$). Then if $f$ is not surjective, and there is some $b_0 \in B$ which is missed by $f$, we define $g(b_0) = x$ and $g(b) = b$ otherwise. We can also define $h$ to be the identity on $B$, so that $gf = hf$, but $g \neq h$. So epimorphisms are exactly the surjective set-maps.

There is one additional fact that makes the category of sets well-behaved: a morphism in $\mathbf{Set}$ is an isomorphism if and only if it is both a monomorphism and an epimorphism. Indeed, isomorphisms are set-functions with two-sided inverses (bijections) and we know from classical set theory that bijections are exactly the simultaneous injections and surjections. A warning to the reader: not all categories are like this! We will see in a moment an example of a nontrivial category in which isomorphisms are not the same thing as simultaneous monomorphisms and epimorphisms.

The category $\mathbf{Grp}$ is very similar to $\mathbf{Set}$ in regards to monomorphisms and epimorphisms. The former are simply injective group homomorphisms, while the latter are surjective group homomorphisms. And again, a morphisms is an isomorphism if and only if it is both a monomorphism and an epimorphism. We invite the reader to peruse the details of the argument above and adapt it to the case of groups. In both cases, the hard decision is in choosing $C$ when necessary. For monomorphisms, the “one-point group” does not work because we are constrained to send the identity to the identity in any group homomorphism. The fortuitous reader will avert their eyes and think about which group would work, and otherwise we suggest trying $C = \mathbb{Z}$. After completing the proof, the reader will see that the trick is to find a $C$ for which only one “choice” can be made. For epimorphisms, the required $C$ is a bit more complex, but we invite the reader to attempt a proof to see the difficulties involved.

Why do these categories have the same properties but they are acquired in such different ways? It turns out that although these proofs seem different in execution, they are the same in nature, and they follow from properties of the category as a whole. In particular, the “one-point object” (a singleton set for $\mathbf{Set} and$latex \mathbb{Z}$for $\mathbf{Grp}$) is more categorically defined as the “free object on one generator.” We will discuss this more when we get to universal properties, but a “free object on $n$ generators” is roughly speaking an object $A$ for which any morphism with source $A$ must make exactly $n$ “choices” in its definition. With sets that means $n$ choices for the images of elements, for groups that means $n$ consistent choices for images of group elements. On the epimorphism side, the construction is a sort of “disjoint union object” which is correctly termed a “coproduct.” But momentarily putting aside all of this new and confusing terminology, let us see some more examples of morphisms in various categories. Our recent primer on rings was well-timed, because the category $\mathbf{Ring}$ of rings (with identity) is an example of a not-so-well-behaved category. As with sets and groups, we do have that monomorphisms are equivalent to injective ring homomorphisms, but the argument is trickier than it was for groups. It is not obvious which “convenient” object $C$ to choose here, since maps $\mathbb{Z} \to R$ yield no choices: 1 maps to 1, 0 maps to 0, and the properties of a ring homomorphism determine everything else (in fact, the abelian group structure and the fact that units are preserved is enough). This makes $\mathbb{Z}$ into what’s called an “initial object” in $\mathbf{Ring}$; more on that when we study universal properties. In fact, we invite the reader to return to this post after we talk about the universal property of polynomial rings. It turns out that $\mathbb{Z}[x]$ is a suitable choice for $C$, and the “choice” made is where to send the indeterminate $x$. On the other hand, things go awry when trying to apply analogous arguments to epimorphisms. While it is true that every surjective ring homomorphism is an epimorphism (it is already an epimorphism in $\mathbf{Set}$, and the argument there applies), there are ring epimorphisms which are not isomorphisms! Consider the inclusion map of rings $i : \mathbb{Z} \to \mathbb{Q}$. The map $i$ is not surjective, but it is an epimorphism. Suppose $g, h : \mathbb{Q} \to R$ are two parallel ring morphisms, and they agree on $\mathbb{Z}$ (they will always do so, since there is only one ring homomorphism $\mathbb{Z} \to R$). Then $g,h$ must also agree on $\mathbb{Q}$, because if $p,q \in \mathbb{Z}$ with $q \neq 0$, then $\displaystyle g(p/q) = g(p)g(q^{-1}) = g(p)g(q)^{-1} = h(p)h(q)^{-1} = h(p/q)$ And so, the category of rings is a very naturally occurring instance of a category which has morphisms that are both epimorphisms and monomorphisms, but not isomorphisms. There are instances in which monomorphisms and epimorphisms are trivial. Take, for instance any poset category. There is at most one morphism between any two objects, and so the conditions for an epimorphism and monomorphism vacuously hold. This is an extreme example of a time when simultaneous monomorphisms and epimorphisms are not the same thing as isomorphisms! The only isomorphisms in a poset category are the identity morphisms. Morals about Good and Bad Categories The inspection of epimorphisms and monomorphisms is an important step in the analysis of a category. It gives one insight into how “well-behaved” the category is, and picks out the objects which are special either for their useful properties or confounding trickery. This reminds us of a quote of Alexander Grothendieck, one of the immortal gods of mathematics who popularized the use of categories in mainstream mathematics. A good category containing some bad objects is preferable to a bad category containing only good objects. I suppose the thesis here is that the having only “good” objects yields less interesting and useful structure, and that one should be willing to put up with the bad objects in order to have access to that structure. Next time we’ll jump into a discussion of universal properties, and start constructing some programs to prove that various objects satisfy them. Until then! “Dog” Print Available for Sale My girlfriend and I decided we want a print of my recent artistic creation from the post Bezier Curves and Picasso. I set up an account at Society6, which does professional art printing (framed or unframed). I can’t imagine anyone beside me would be interested in including this in their art collection, but just in case it’s available for sale here. I plan to put it below my print of Picasso’s “Dog.” Donning my pretentious art-snob hat, the juxtaposition of two abstract representations of the same idea adds a profound depth to the work as a whole. Warning: I have not yet ordered this print, so I cannot attest to the quality (perhaps it’s pixelated and I need to produce a higher quality version). I will be travelling to conferences and doing last-minute preparations for a preliminary exam over the next three weeks, so I won’t be ordering the print until June. If people are interested, leave a comment and I will send a notification via Twitter/Google+/Facebook with my thoughts once I get it. Disclosure: I receive a$5.00 commission on any purchase. If you want a more direct way to support this blog, consider donating via Paypal.

Here’s a screenshot from Society6′s automatically generated preview of one framing option. Looks pretty cool

Bezier Curves and Picasso

Pablo Picasso in front of The Kitchen, photo by Herbert List.

Simplicity and the Artist

Some of my favorite of Pablo Picasso’s works are his line drawings. He did a number of them about animals: an owl, a camel, a butterfly, etc. This piece called “Dog” is on my wall:

These paintings are extremely simple but somehow strike the viewer as deeply profound. They give the impression of being quite simple to design and draw. A single stroke of the hand and a scribbled signature, but what a masterpiece! It simultaneously feels like a hasty afterthought and a carefully tuned overture to a symphony of elegance. In fact, we know that Picasso’s process was deep. For example, in 1945-1946, Picasso made a series of eleven drawings (lithographs, actually) showing the progression of his rendition of a bull. The first few are more or less lifelike, but as the series progresses we see the bull boiled down to its essence, the final painting requiring a mere ten lines. Along the way we see drawings of a bull that resemble some of Picasso’s other works (number 9 reminding me of the sculpture at Daley Center Plaza in Chicago). Read more about the series of lithographs here.

Picasso’s, “The Bull.” Photo taken by Jeremy Kun at the Art Institute of Chicago in 2013. Click to enlarge.

Now I don’t pretend to be a qualified artist (I couldn’t draw a bull to save my life), but I can recognize the mathematical aspects of his paintings, and I can write a damn fine program. There is one obvious way to consider Picasso-style line drawings as a mathematical object, and it is essentially the Bezier curve. Let’s study the theory behind Bezier curves, and then write a program to draw them. The mathematics involved requires no background knowledge beyond basic algebra with polynomials, and we’ll do our best to keep the discussion low-tech. Then we’ll explore a very simple algorithm for drawing Bezier curves, implement it in Javascript, and recreate one of Picasso’s line drawings as a sequence of Bezier curves.

The Bezier Curve and Parameterizations

When asked to conjure a “curve” most people (perhaps plagued by their elementary mathematics education) will either convulse in fear or draw part of the graph of a polynomial. While these are fine and dandy curves, they only represent a small fraction of the world of curves. We are particularly interested in curves which are not part of the graphs of any functions.

Three French curves.

For instance, a French curve is a physical template used in (manual) sketching to aid the hand in drawing smooth curves. Tracing the edges of any part of these curves will usually give you something that is not the graph of a function. It’s obvious that we need to generalize our idea of what a curve is a bit. The problem is that many fields of mathematics define a curve to mean different things.  The curves we’ll be looking at, called Bezier curves, are a special case of  single-parameter polynomial plane curves. This sounds like a mouthful, but what it means is that the entire curve can be evaluated with two polynomials: one for the $x$ values and one for the $y$ values. Both polynomials share the same variable, which we’ll call $t$, and $t$ is evaluated at real numbers.

An example should make this clear. Let’s pick two simple polynomials in $t$, say $x(t) = t^2$ and $y(t) = t^3$. If we want to find points on this curve, we can just choose values of $t$ and plug them into both equations. For instance, plugging in $t = 2$ gives the point $(4, 8)$ on our curve. Plotting all such values gives a curve that is definitely not the graph of a function:

But it’s clear that we can write any single-variable function $f(x)$ in this parametric form: just choose $x(t) = t$ and $y(t) = f(t)$. So these are really more general objects than regular old functions (although we’ll only be working with polynomials in this post).

Quickly recapping, a single-parameter polynomial plane curve is defined as a pair of polynomials $x(t), y(t)$ in the same variable $t$. Sometimes, if we want to express the whole gadget in one piece, we can take the coefficients of common powers of $t$ and write them as vectors in the $x$ and $y$ parts. Using the $x(t) = t^2, y(t) = t^3$ example above, we can rewrite it as

$\mathbf{f}(t) = (0,1) t^3 + (1,0) t^2$

Here the coefficients are points (which are the same as vectors) in the plane, and we represent the function $f$ in boldface to emphasize that the output is a point. The linear-algebraist might recognize that pairs of polynomials form a vector space, and further combine them as $(0, t^3) + (t^2, 0) = (t^2, t^3)$. But for us, thinking of points as coefficients of a single polynomial is actually better.

We will also restrict our attention to single-parameter polynomial plane curves for which the variable $t$ is allowed to range from zero to one. This might seem like an awkward restriction, but in fact every finite single-parameter polynomial plane curve can be written this way (we won’t bother too much with the details of how this is done). For the purpose of brevity, we will henceforth call a “single-parameter polynomial plane curve where $t$ ranges from zero to one” simply a “curve.”

Now there are some very nice things we can do with curves. For instance, given any two points in the plane $P = (p_1, p_2), Q = (q_1, q_2)$ we can describe the straight line between them as a curve: $\mathbf{L}(t) = (1-t)P + tQ$. Indeed, at $t=0$ the value $\mathbf{L}(t)$ is exactly $P$, at $t=1$ it’s exactly $Q$, and the equation is a linear polynomial in $t$. Moreover (without getting too much into the calculus details), the line $\mathbf{L}$ travels at “unit speed” from $P$ to $Q$. In other words, we can think of $\mathbf{L}$ as describing the motion of a particle from $P$ to $Q$ over time, and at time $1/4$ the particle is a quarter of the way there, at time $1/2$ it’s halfway, etc. (An example of a straight line which doesn’t have unit speed is, e.g. $(1-t^2) P + t^2 Q$.)

More generally, let’s add a third point $R$. We can describe a path which goes from $P$ to $R$, and is “guided” by $Q$ in the middle. This idea of a “guiding” point is a bit abstract, but computationally no more difficult. Instead of travelling from one point to another at constant speed, we want to travel from one line to another at constant speed. That is, call the two curves describing lines from $P \to Q$ and $Q \to R$ $\mathbf{L}_1, \mathbf{L_2}$, respectively. Then the curve “guided” by $Q$ can be written as a curve

$\displaystyle \mathbf{F}(t) = (1-t)\mathbf{L}_1(t) + t \mathbf{L}_2(t)$

Multiplying this all out gives the formula

$\displaystyle \mathbf{F}(t) = (1-t)^2 P + 2t(1-t)Q + t^2 R$

We can interpret this again in terms of a particle moving. At the beginning of our curve the value of $t$ is small, and so we’re sticking quite close to the line $\mathbf{L}_1$ As time goes on the point $\mathbf{F}(t)$ moves along the line between the points $\mathbf{L}_1(t)$ and $\mathbf{L}_2(t)$, which are themselves moving. This traces out a curve which looks like this

This screenshot was taken from a wonderful demo by data visualization consultant Jason Davies. It expresses the mathematica idea quite superbly, and one can drag the three points around to see how it changes the resulting curve. One should play with it for at least five minutes.

The entire idea of a Bezier curve is a generalization of this principle: given a list $P_0, \dots, P_n$ of points in the plane, we want to describe a curve which travels from the first point to the last, and is “guided” in between by the remaining points. A Bezier curve is a realization of such a curve (a single-parameter polynomial plane curve) which is the inductive continuation of what we described above: we travel at unit speed from a Bezier curve defined by the first $n-1$ points in the list to the curve defined by the last $n-1$ points. The base case is the straight-line segment (or the single point, if you wish). Formally,

Definition: Given a list of points in the plane $P_0, \dots, P_n$ we define the degree $n-1$ Bezier curve recursively as

\begin{aligned} \mathbf{B}_{P_0}(t) &= P_0 \\ \mathbf{B}_{P_0 P_1 \dots P_n}(t) &= (1-t)\mathbf{B}_{P_0 P_1 \dots P_{n-1}} + t \mathbf{B}_{P_1P_2 \dots P_n}(t) \end{aligned}

We call $P_0, \dots, P_n$ the control points of $\mathbf{B}$.

While the concept of travelling at unit speed between two lower-order Bezier curves is the real heart of the matter (and allows us true computational insight), one can multiply all of this out (using the formula for binomial coefficients) and get an explicit formula. It is:

$\displaystyle \mathbf{B}_{P_0 \dots P_n} = \sum_{k=0}^n \binom{n}{k}(1-t)^{n-k}t^k P_i$

And for example, a cubic Bezier curve with control points $P_0, P_1, P_2, P_3$ would have equation

$\displaystyle (1-t)^3 P_0 + 3(1-t)^2t P_1 + 3(1-t)t^2 P_2 + t^3 P_3$

Higher dimensional Bezier curves can be quite complicated to picture geometrically. For instance, the following is a fifth-degree Bezier curve (with six control points).

A degree five Bezier curve, credit Wikipedia.

The additional line segments drawn show the recursive nature of the curve. The simplest are the green points, which travel from control point to control point. Then the blue points travel on the line segments between green points, the pink travel along the line segments between blue, the orange between pink, and finally the red point travels along the line segment between the orange points.

Without the recursive structure of the problem (just seeing the curve) it would be a wonder how one could actually compute with these things. But as we’ll see, the algorithm for drawing a Bezier curve is very natural.

Bezier Curves as Data, and de Casteljau’s Algorithm

We will now derive and implement the algorithm for painting a Bezier curve to a screen using only the ability to draw straight lines. For simplicity, we’ll restrict our attention to degree-three (cubic) Bezier curves. Indeed, every Bezier curve can be written as a combination of cubic curves via the recursive definition, and in practice cubic curves balance computational efficiency and expressiveness. All of the code we present in this post will be in Javascript, and is available on this blog’s Google code page.

So then a cubic Bezier curve is represented in a program by a list of four points. For example,

var curve = [[1,2], [5,5], [4,0], [9,3]];

Most graphics libraries (including the HTML5 canvas standard) provide a drawing primitive that can output Bezier curves given a list of four points. But suppose we aren’t given such a function. Suppose that we only have the ability to draw straight lines. How would one go about drawing an approximation to a Bezier curve? If such an algorithm exists (it does, and we’re about to see it) then we could make the approximation so fine that it is visually indistinguishable from a true Bezier curve.

The key property of Bezier curves that allows us to come up with such an algorithm is the following:

Any cubic Bezier curve $\mathbf{B}$ can be split into two, end to end,
which together trace out the same curve as $\mathbf{B}$.

Let see exactly how this is done. Let $\mathbf{B}(t)$ be a cubic Bezier curve with control points $P_0, P_1, P_2, P_3$, and let’s say we want to split it exactly in half. We notice that the formula for the curve when we plug in $1/2$, which is

$\displaystyle \mathbf{B}(1/2) = \frac{1}{2^3}(P_0 + 3P_1 + 3P_2 + P_3)$

Moreover, our recursive definition gave us a way to evaluate the point in terms of smaller-degree curves. But when these are evaluated at 1/2 their formulae are similarly easy to write down. The picture looks like this:

The green points are the degree one curves, the pink points are the degree two curves, and the blue point is the cubic curve. We notice that, since each of the curves are evaluated at $t=1/2$, each of these points can be described as the midpoints of points we already know. So $m_0 = (P_0 + P_1) / 2, q_0 = (m_0 + m_1)/2$, etc.

In fact, the splitting of the two curves we want is precisely given by these points. That is, the “left” half of the curve is given by the curve $\mathbf{L}(t)$ with control points $P_0, m_0, q_0, \mathbf{B}(1/2)$, while the “right” half $\mathbf{R}(t)$ has control points $\mathbf{B}(1/2), q_1, m_2, P_3$.

How can we be completely sure these are the same Bezier curves? Well, they’re just polynomials. We can compare them for equality by doing a bunch of messy algebra. But note, since $\mathbf{L}(t)$ only travels halfway along $\mathbf{B}(t)$, to check they are the same is to equate $\mathbf{L}(t)$ with $\mathbf{B}(t/2)$, since as $t$ ranges from zero to one, $t/2$ ranges from zero to one half. Likewise, we can compare $\mathbf{B}((t+1)/2)$ with $\mathbf{R}(t)$.

The algebra is very messy, but doable. As a test of this blog’s newest tools, we present this screen cast of this author performing the algebra involved in proving the two curves are identical.

Now that that’s settled, we have a nice algorithm for splitting a cubic Bezier (or any Bezier) into two pieces. In Javascript,

function subdivide(curve) {
var firstMidpoints = midpoints(curve);
var secondMidpoints = midpoints(firstMidpoints);
var thirdMidpoints = midpoints(secondMidpoints);

return [[curve[0], firstMidpoints[0], secondMidpoints[0], thirdMidpoints[0]],
[thirdMidpoints[0], secondMidpoints[1], firstMidpoints[2], curve[3]]];
}

Here “curve” is a list of four points, as described at the beginning of this section, and the output is a list of two curves with the correct control points. The “midpoints” function used is quite simple, and we include it here for compelteness:

function midpoints(pointList) {
var midpoint = function(p, q) {
return [(p[0] + q[0]) / 2.0, (p[1] + q[1]) / 2.0];
};

var midpointList = new Array(pointList.length - 1);
for (var i = 0; i < midpointList.length; i++) {
midpointList[i] = midpoint(pointList[i], pointList[i+1]);
}

return midpointList;
}


It just accepts as input a list of points and computes their sequential midpoints. So a list of $n$ points is turned into a list of $n-1$ points. As we saw, we need to call this function $d-1$ times to compute the segmentation of a degree $d$ Bezier curve.

As we explained earlier, we can keep subdividing our curve over and over until each of the tiny pieces are basically lines. That is, our function to draw a Bezier curve from the beginning will be as follows:

function drawCurve(curve, context) {
if (isFlat(curve)) {
drawSegments(curve, context);
} else {
var pieces = subdivide(curve);
drawCurve(pieces[0], context);
drawCurve(pieces[1], context);
}
}


In words, as long as the curve isn’t “flat,” we want to subdivide and draw each piece recursively. If it is flat, then we can simply draw the three line segments of the curve and be reasonably sure that it will be a good approximation. The context variable sitting there represents the canvas to be painted to; it must be passed through to the “drawSegments” function, which simply paints a straight line to the canvas.

Of course this raises the obvious question: how can we tell if a Bezier curve is flat? There are many ways to do so. One could compute the angles of deviation (from a straight line) at each interior control point and add them up. Or one could compute the volume of the enclosed quadrilateral. However, computing angles and volumes is usually not very nice: angles take a long time to compute and volumes have stability issues, and the algorithms which are stable are not very simple. We want a measurement which requires only basic arithmetic and perhaps a few logical conditions to check.

It turns out there is such a measurement. It is is originally attributed to Roger Willcocks, but it is quite simple to derive by hand.

Essentially, we want to measure the “flatness” of a cubic Bezier curve by computing the distance of the actual curve at time $t$ from where the curve would be at time $t$ if the curve were a straight line.

Formally, given $\mathbf{B}(t)$ with control points $P_0, P_1, P_2, P_3$ as usual, we can define the straight-line Bezier cubic as the colossal sum

$\displaystyle \mathbf{S}(t) = (1-t)^3P_0 + 3(1-t)^2t \left ( \frac{2}{3}P_0 + \frac{1}{3}P_3 \right ) + 3(1-t)t^2 \left ( \frac{1}{3}P_0 + \frac{2}{3}P_3 \right ) + t^3 P_3$

There is nothing magical going on here. We’re simply giving the Bezier curve with control points $P_0, \frac{2}{3}P_0 + \frac{1}{3}P_3, \frac{1}{3}P_0 + \frac{2}{3}P_3, P_3$. One should think about this as points which are a 0, 1/3, 2/3, and 1 fraction of the way from $P_0$ to $P_3$ on a straight line.

Then we define the function $d(t) = \left \| \mathbf{B}(t) - \mathbf{S}(t) \right \|$ to be the distance between the two curves at the same time $t$. The flatness value of $\mathbf{B}$ is the maximum of $d$ over all values of $t$. If this flatness value is below a certain tolerance level, then we call the curve flat.

With a bit of algebra we can simplify this expression. First, the value of $t$ for which the distance is maximized is the same as when its square is maximized, so we can omit the square root computation at the end and take that into account when choosing a flatness tolerance.

Now lets actually write out the difference as a single polynomial. First, we can cancel the 3′s in $\mathbf{S}(t)$ and write the polynomial as

$\displaystyle \mathbf{S}(t) = (1-t)^3 P_0 + (1-t)^2t (2P_0 + P_3) + (1-t)t^2 (P_0 + 2P_3) + t^3 P_3$

and so $\mathbf{B}(t) - \mathbf{S}(t)$ is (by collecting coefficients of the like terms $(1-t)^it^j$)

$\displaystyle (1-t)^2t (3 P_1 - 2P_0 - P_3) + (1-t)t^2 (3P_2 - P_0 - 2P_3)$

Factoring out the $(1-t)t$ from both terms and setting $a = 3P_1 - 2P_0 - P_3$, $b = 3P_2 - P_0 - 2P_3$, we get

$\displaystyle d^2(t) = \left \| (1-t)t ((1-t)a + tb) \right \|^2 = (1-t)^2t^2 \left \| (1-t)a + tb \right \|^2$

Since the maximum of a product is at most the product of the maxima, we can bound the above quantity by the product of the two maxes. The reason we want to do this is because we can easily compute the two maxes separately. It wouldn’t be hard to compute the maximum without splitting things up, but this way ends up with fewer computational steps for our final algorithm, and the visual result is equally good.

Using some elementary single-variable calculus, the maximum value of $(1-t)^2t^2$ for $0 \leq t \leq 1$ turns out to be $1/16$. And the norm of a vector is just the sum of squares of its components. If $a = (a_x, a_y)$ and $b = (b_x, b_y)$, then the norm above is exactly

$\displaystyle ((1-t)a_x + tb_x)^2 + ((1-t)a_y + tb_y)^2$

And notice: for any real numbers $z, w$ the quantity $(1-t)z + tw$ is exactly the straight line from $z$ to $w$ we know so well. The maximum over all $t$ between zero and one is obviously the maximum of the endpoints $z, w$. So the max of our distance function $d^2(t)$ is bounded by

$\displaystyle \frac{1}{16} (\textup{max}(a_x^2, b_x^2) + \textup{max}(a_y^2, b_y^2))$

And so our condition for being flat is that this bound is smaller than some allowable tolerance. We may safely factor the 1/16 into this tolerance bound, and so this is enough to write a function.

function isFlat(curve) {
var tol = 10; // anything below 50 is roughly good-looking

var ax = 3.0*curve[1][0] - 2.0*curve[0][0] - curve[3][0]; ax *= ax;
var ay = 3.0*curve[1][1] - 2.0*curve[0][1] - curve[3][1]; ay *= ay;
var bx = 3.0*curve[2][0] - curve[0][0] - 2.0*curve[3][0]; bx *= bx;
var by = 3.0*curve[2][1] - curve[0][1] - 2.0*curve[3][1]; by *= by;

return (Math.max(ax, bx) + Math.max(ay, by) <= tol);
}


And there we have it. We write a simple HTML page to access a canvas element and a few extra helper functions to draw the line segments when the curve is flat enough, and present the final result in this interactive demonstration (you can perturb the control points).

The picture you see on that page (given below) is this author’s rendition of Picasso’s “Dog” drawing as a sequence of nine Bezier curves. This author thinks the resemblance is uncanny

Picasso’s “Dog,” redesigned as a sequence of nine bezier curves.

While we didn’t invent the drawing itself (and hence shouldn’t attach our signature to it), we did come up with the representation as a sequence of Bezier curves. It only seems fitting to present that as the work of art. Here we’ve distilled the representation down to a single file: the first line is the dimension of the canvas, and each subsequent line represents a cubic Bezier curve. Comments are included for readability.

“Dog” Jeremy Kun, 2013. Click to enlarge.

Because standardizing things seems important, we define a new filetype “.bezier”, which has the format given above:

int int
(int) curve
(int) curve
...

Where the first two ints specify the size of the canvas, the first (optional) int on each line specifies the width of the stroke, and a “curve” has the form

[int,int] [int,int] ... [int,int]

If an int is omitted at the beginning of a line, this specifies a width of three pixels.

In a general .bezier file we allow a curve to have arbitrarily many control points, though the code we gave above does not draw them that generally. As an exercise, write a program which accepts as input a .bezier file and produces as output an image of the drawing. This will require an extension of the algorithm above for drawing arbitrary Bezier curves, which loops its computation of the midpoints and keeps track of which end up in the resulting subdivision. Alternatively, one could write a program which accepts as input a .bezier file with only cubic Bezier curves, and produces as output an SVG file of the drawing (SVG only supports cubic Bezier curves). So a .bezier file is a simplification (fewer features) and an extension (Bezier curves of arbitrary degree) of an SVG file.

We didn’t go as deep into the theory of Bezie curves as we could have. If the reader is itching for more (and a more calculus-based approach), see this lengthy primer. It contains practically everything one could want to know about Bezier curves, with nice interactive demos written in Processing.

Low-Complexity Art

There are some philosophical implications of what we’ve done today with Picasso’s “Dog.” Previously on this blog we’ve investigated the idea of low-complexity art, and it’s quite relevant here. The thesis is that “beautiful” art has a small description length, and more formally the “complexity” of some object (represented by text) is the length of the shortest program that outputs that object given no inputs. More on that in our primer on Kolmogorov complexity. The fact that we can describe Picasso’s line drawings with a small number of Bezier curves (and a relatively short program to output the bezier curves) is supposed to be a deep statement about the beauty of the art itself. Obviously this is very subjective, but not without its proponents.

There has been a bit of recent interest in computers generating art. For instance, this recent programming competition (in Dutch) gave the task of generating art similar to the work of Piet Mondrian. The idea is that the more elegant the algorithm, the higher it would be scored. The winner used MD5 hashes to generate Mondrian pieces, and there were many many other impressive examples (the link above has a gallery of submissions).

In our earlier post on low-complexity art, we explored the possibility of representing all images within a coordinate system involving circles with shaded interiors. But it’s obvious that such a coordinate system wouldn’t be able to represent “Dog” with very low complexity. It seems that Bezier curves are a much more natural system of coordinates. Some of the advantages include that length of lines and slight perturbations don’t affect the resulting complexity. A cubic Bezier curve can be described by any set of four points, and more “intricate” (higher complexity) descriptions of curves require a larger number of points. Bezier curves can be scaled up arbitrarily, and this doesn’t significantly change the complexity of the curve (although scaling many orders of magnitude will introduce a logarithmic factor complexity increase, this is quite small). Curves with larger stroke are slightly more complex than those with smaller stroke, and representing many small sharp bends require more curves than long, smooth arcs.

On the downside, it’s not so easy to represent a circle as a Bezier curve. In fact, it is impossible to do so exactly. Despite the simplicity of this object (it’s even defined as by a single polynomial, albeit in two variables), the best one can do is approximate it. The same goes for ellipses. There are actually ways to overcome this (the concept of rational Bezier curves which are quotients of polynomials), but they add to the inherent complexity of the drawing algorithm and the approximations using regular Bezier curves are good enough.

And so we define the complexity of a drawing to be the number of bits in its .bezier file representation. Comments are ignored in this calculation.

The real prize, and what we’ll explore next time, is to find a way to generate art automatically. That is to do one of two things:

1. Given some sort of “seed,” write a program that produces a pseudo-random line drawing.
2. Given an image, produce a .bezier image which accurately depicts the image as a line drawing.

We will attempt to explore these possibilities in the follow-up to this post. Depending on how things go, this may involve some local search algorithms, genetic algorithms, or other methods.

Until then!

Facebook Page, Google+ Community, and Whispers of Guest Posts

I recently decided to add a few new ways for readers to keep up with what’s going on at Math ∩ Programming. These come in the form of a Google+ community and a Facebook group. I also have a seasoned Twitter feed (@MathProgramming) that I check regularly and post links to.

I plan to regularly post updates to these pages whenever I write a new post. Everyone is welcome to join/subscribe/follow these things, and hopefully they can generate some interesting discussion above and beyond what goes into my posts.

Aside from that, I’m having some discussions with my colleagues about the potential for guest posts on various topics. These include topics in elliptic curves, financial mathematics, and game theory. In at least one of these topics there are already working programs floating around. Keep an eye out for it (I’m super excited to see how they turn out), and with luck I can recruit more of my colleagues to join the fun.

Until next time!

Categories as Types

In this post we’ll get a quick look at two ways to define a category as a type in ML. The first way will be completely trivial: we’ll just write it as a tuple of functions. The second will involve the terribly-named “functor” expression in ML, which allows one to give a bit more structure on data types.

The reader unfamiliar with the ML programming language should consult our earlier primer.

What Do We Want?

The first question to ask is “what do we want out of a category?” Recall from last time that a category is a class of objects, where each pair of objects is endowed with a set of morphisms. The morphisms were subject to a handful of conditions, which we paraphrase below:

• Composition has to make sense when it’s defined.
• Composition is associative.
• Every object has an identity morphism.

What is the computational heart of such a beast? Obviously, we can’t explicitly represent the class of objects itself. Computers are finite (or at least countable) beings, after all. And in general we can’t even represent the set of all morphisms explicitly. Think of the category $\mathbf{Set}$: if $A, B$ are infinite, then there are infinitely many morphisms (set-functions) $A \to B$.

A subtler and more important point is that everything in our representation of a category needs to be a type in ML (for indeed, how can a program run without being able to understand the types involved). This nudges us in an interesting direction: if an object is a type, and a morphism is a type, then we might reasonably expect a composition function:

compose: 'arrow * 'arrow -> 'arrow

We might allow this function to raise an exception if the two morphisms are uncomposable. But then, of course, we need a way to determine if morphisms are composable, and this requires us to know what the source and target functions are.

source: 'arrow -> 'object
target: 'arrow -> 'object

We can’t readily enforce composition to be associative at this stage (or any), because morphisms don’t have any other manipulable properties. We can’t reasonably expect to be able to compare morphisms for equality, as we aren’t able to do this in $\mathbf{Set}$.

But we can enforce the final axiom: that every object has an identity morphism. This would come in the form of a function which accepts as input an object and produces a morphism:

identity: 'object -> 'arrow

But again, we can’t necessarily enforce that identity behaves as its supposed to.

Even so, this should be enough to define a category. That is, in ML, we have the datatype

datatype ('object, 'arrow)Category =
category of ('arrow -> 'object) *
('arrow -> 'object) *
('object -> 'arrow) *
('arrow * 'arrow -> 'arrow)

Where we understand the first two functions to be source and target, and the third and fourth to be identity and compose, respectively.

Categories as Values

In order to see this type in action, we defined (and included in the source code archive for this post) a type for homogeneous sets. Since ML doesn’t support homogeneous lists natively, we’ll have to settle for a particular subcategory of $\mathbf{Set}$. We’ll call it $\mathbf{Set}_a$, the category whose objects are finite sets whose elements have type $a$, and whose morphisms are again set-functions. Hence, we can define an object in this category as the ML type

'a Set

and an arrow as a datatype

datatype 'a SetMap = setMap of ('a Set) * ('a -> 'a) * ('a Set)

where the first object is the source and the second is the target (mirroring the notation $A \to B$).

Now we must define the functions required for the data constructor of a Category. Source and target are trivial.

fun setSource(setMap(a, f, b)) = a
fun setTarget(setMap(a, f, b)) = b

And identity is equally natural.

fun setIdentity(aSet) = setMap(aSet, (fn x => x), aSet)

Finally, compose requires us to check for set equality of the relevant source and target, and compose the bundled set-maps. A few notational comments: the “o” operator does function composition in ML, and we defined “setEq” and the “uncomposable” exception elsewhere.

fun setCompose(setMap(b', g, c), setMap(a, f, b)) =
if setEq(b, b') then
setMap(a, (g o f), c)
else
raise uncomposable

Note that the second map in the composition order is the first argument to the function, as is standard in mathematical notation.

And now this category of finite sets is just the instantiation

val FiniteSets = category(setSource, setTarget, setIdentity, setCompose)

Oddly enough, this definition does not depend on the choice of a type for our sets! None of the functions we defined care about what the elements of sets are, so long as they are comparable for equality (in ML, such a type is denoted with two leading apostrophes: ”a).

This example is not very interesting, but we can build on it in two interesting ways. The first is to define the poset category of sets. Recall that the poset category of a set $X$ has as objects subsets of $X$ and there is a unique morphism from $A \to B$ if and only if $A \subset B$. There are two ways to model this in ML. The first is that we can assume the person constructing the morphisms is giving correct data (that is, only those objects which satisfy the subset relation). This would amount to a morphism just representing “nothing” except the pair $(A,B)$. On the other hand, we may also require a function which verifies that the relation holds. That is, we could describe a morphism in this (or any) poset category as a datatype

datatype 'a PosetArrow = ltArrow of 'a * ('a -> 'a -> bool) * 'a

We still do need to assume the user is not creating multiple arrows with different functions (or we must tacitly call them all equal regardless of the function). We can do a check that the function actually returns true by providing a function for creating arrows in this particular poset category of a set with subsets.

exception invalidArrow
fun setPosetArrow(x)(y) = if subset(x)(y) then ltArrow(x, subset, y)
else raise invalidArrow

Then the functions defining a poset category are very similar to those of a set. The only real difference is in the identity and composition functions, and it is minor at that.

fun posetSource(ltArrow(x, f, y)) = x
fun posetTarget(ltArrow(x, f, y)) = y
fun posetIdentity(x) = ltArrow(x, (fn z => fn w => true), x)
fun posetCompose(ltArrow(b', g, c), ltArrow(a, f, b)) =
if b = b' then
ltArrow(a, (fn x => fn y => f(x)(b) andalso g(b)(y)), c)
else
raise uncomposable

We know that a set is always a subset of itself, so we can provide the constant true function in posetIdentity. In posetCompose, we assume things are transitive and provide the logical conjunction of the two verification functions for the pieces. Then the category as a value is

val Posets = category(posetSource, posetTarget, posetIdentity, posetCompose)

One will notice again that the homogeneous type of our sets is irrelevant. The poset category is parametric. To be completely clear, the type of the Poset category defined above is

('a Set, ('a Set)PosetArrow)Category

and so we can define a shortcut for this type.

type 'a PosetCategory = ('a Set, ('a Set)PosetArrow)Category

The only way we could make this more general is to pass the constructor for creating a particular kind of posetArrow (in our case, it just means fixing the choice of verification function, subset, for the more generic type constructor). We leave this as an exercise to the reader.

Our last example will return again to our abstract “diagram category.” Recall that if we fix an object $A$ in a category $\mathbf{C}$, the category $\mathbf{C}_A$ has as objects the morphisms with fixed source $A$,

and the arrows are commutative diagrams

Lets fix $A$ to be a set, and define a function to instantiate objects in this category:

fun makeArrow(fixedSet)(f)(target) = setMap(fixedSet, f, target)

That is, if I have a fixed set $A$, I can call the function like so

val makeObject = makeArrow(fixedSet)

And use this function to create all of my objects. A morphism is then just a pair of set-maps with a connecting function. Note how similar this snippet of code looks structurally to the other morphism datatypes we’ve defined. This should reinforce the point that morphisms really aren’t any different in this abstract category!

datatype 'a TriangleDiagram =
triangle of ('a SetMap) * ('a -> 'a) * ('a SetMap)

And the corresponding functions and category instantiation are

fun triangleSource(triangle(x, f, y)) = x
fun triangleTarget(triangle(x, f, y)) = y
fun triangleIdentity(x) = triangle(x, (fn z => z), x)
fun triangleCompose(triangle(b', g, c), triangle(a, f, b)) =
if setEq(setTarget(b'))(setTarget(b)) then
triangle(a, (g o f), c)
else
raise uncomposable

val SetDiagrams =
category(triangleSource, triangleTarget, triangleIdentity, triangleCompose)

Notice how we cannot compare the objects in this category for equality! Indeed, two functions cannot be compared in ML, and the best we can do is compare the targets of these functions to ensure that the connecting morphisms can be composed. A malicious programmer might then be able to compose uncomposable morphisms, a devious and startling proposition.

Notice further how we can’t avoid using set-specific functions with our current definition of a category. What we could do instead is require a category to have a function which checks for composability. Then compose could be written in a more homogeneous (but not quite completely parametric fashion). The reader is welcome to try this on their own, but we’ll soon have a more organized way to do this later, when we package up a category into an ML structure.

Categories as Signatures

Let’s look at a more advanced feature of ML and see how we can apply it to representing categories in code. The idea is familiar to most Java and C++ programmers, and that is of an interface: the programmer specifies an abstract type which has specific functions attached to it, but does not implement those functions. Then some (perhaps different) programmer implements the interface by defining those functions for a concrete type.

For the reader who isn’t so interested in these features of ML, feel free to skip this section. We won’t be introducing any new examples of categories, just rephrasing what we’ve done above in a different way. It is also highly debatable whether this new way is actually any better (it’s certainly longer).

The corresponding concept in ML is called a signature (interface) and a structure (concrete instance). The only difference is that a structure is removed one step further from ML’s type system. This is in a sense necessary: a structure should be able to implement the requirements for many different signatures, and a signature should be able to have many different structures implement it. So there is no strict hierarchy of types to lean on (it’s just a many-to-many relationship).

On the other hand, not having any way to think of a structure as a type at all would be completely useless. The whole point is to be able to write function which manipulate a structure (concrete instance) by abstractly accessing the signature’s (interface’s) functions regardless of the particular implementation details. And so ML adds in another layer of indirection: the functor (don’t confuse this with categorical functors, which we haven’t talked abou yet). A functor in ML is a procedure which accepts as input a structure and produces as output another structure.

Let’s see this on a simple example. We can define a structure for “things that have magnitudes,” which as a signature would be

signature MAG_OBJ =
sig
type object
val mag : object -> int
end

This introduced a few new language forms: the “signature” keyword is like “fun” or “val,” it just declares the purpose of the statement as a whole. The “sig … end” expression is what actually creates the signature, and the contents are a list of type definitions, datatype definitions, and value/function type definitions. The reader should think of this as a named “type” for structures, analogous to a C struct.

To implement a signature in ML is to “ascribe” it. That is, we can create a structure that defines the functions laid out in a signature (and perhaps more). Here is an example of a structure for a mag object of integers:

structure IntMag : MAG_OBJ =
struct
type object = int
fun mag(x) = if x >= 0 then x else ~x
end

The colon is a type ascription, and at the compiler-level it goes through the types and functions defined in this structure and attempts to match them with the required ones from the MAG_OBJ signature. If it fails, it raises a compiler error. A more detailed description of this process can be found here. We can then use the structure as follows:

local open IntMag in
val y = ~7
val z = mag(y)
end

The open keyword binds all of the functions and types to the current environment (and so it’s naturally a bad idea to open structures in the global environment). The “local” construction is made to work specifically with open.

In any case, the important part about signatures and structures is that one can write functions which accept as input structures with a given signature and produce structures with other signatures. The name for this is a “functor,” but we will usually call it an “ML functor” to clarify that it is not a categorical functor (if you don’t know what a categorical functor is, don’t worry yet).

For example, here is a signature for a “distance object”

signature DIST_OBJ =
sig
type object
val dist: object * object -> int
end

And a functor which accepts as input a MAG_OBJ and creates a DIST_OBJ in a contrived and silly way.

functor MakeDistObj(structure MAG: MAG_OBJ) =
struct
type object = MAG.object
val dist = fn (x, y) =>
let
val v = MAG.mag(x) - MAG.mag(y)
in
if v > 0 then v else ~v
end
end

Here the argument to the functor is a structure called MAG, and we need to specify which type is to be ascribed to it; in this case, it’s a MAG_OBJ. Only things within the specified signature can be used within the struct expression that follows (if one needs a functor to accept as input something which has multiple type ascriptions, one can create a new signature that “inherits” from both). Then the right hand side of the functor is just a new structure definition, where the innards are allowed to refer to the properties of the given MAG_OBJ.

We could in turn define a functor which accepts a DIST_OBJ as input and produces as output another structure, and compose the two functors. This is started to sound a bit category-theoretical, but we want to remind the reader that a “functor” in category theory is quite different from a “functor” in ML (in particular, we have yet to mention the former, and the latter is just a mechanism to parameterize and modularize code).

In any case, we can do the same thing with a category. A general category will be represented by a signature, and a particular instance of a category is a structure with the implemented pieces.

signature CATEGORY =
sig
type object
type arrow
exception uncomposable

val source: arrow -> object
val target: arrow -> object
val identity: object -> arrow
val composable: arrow * arrow -> bool
val compose: arrow * arrow -> arrow
end

As it turns out, ML has an implementation of “sets” already, and it uses signatures. So instead of using our rinky-dink implementation of sets from earlier in this post, we can implement an instance of the ORD_KEY signature, and then call one of the many set-creating functors available to us. We’ll use an implementation based on red-black trees for now.

First the key:

structure OrderedInt : ORD_KEY =
struct
type ord_key = int
val compare = Int.compare
end

then the functor:

functor MakeSetCategory(structure ELT: ORD_KEY) =
struct
structure Set:ORD_SET = RedBlackSetFn(ELT)
type elt = ELT.ord_key
type object = Set.set
datatype arrow = function of object * (elt -> elt) * object

exception uncomposable

fun source(function(a, _, _)) = a
fun target(function(_, _, b)) = b
fun identity(a) = function(a, fn x => x, a)
fun composable(a,b) = Set.equal(a,b)
fun compose(function(b2, g, c), function(a, f, b1)) =
if composable(b1, b2) then
function(a, (g o f), c)
else
raise uncomposable
fun apply(function(a, f, b), x) = f(x)
end

The first few lines are the most confusing; the rest is exactly what we’ve seen from our first go-round of defining the category of sets. In particular, we call the RedBlackSetFn functor given the ELT structure, and it produces an implementation of sets which we name Set (and superfluously ascribe the type ORD_SET for clarity).

Then we define the “elt” type which is used to describe an arrow, the object type as the main type of an ORD_SET (see in this documentation that this is the only new type available in that signature), and the arrow type as we did earlier.

We can then define the category of sets with a given type as follows

structure IntSets = MakeSetCategory(structure ELT = OrderedInt)

The main drawback of this approach is that there’s so much upkeep! Every time we want to make a new kind of category, we need to define a new functor, and every time we want to make a new kind of category of sets, we need to implement a new kind of ORD_KEY. All of this signature and structure business can be quite confusing; it often seems like the compiler doesn’t want to cooperate when it should.

Nothing Too Shocking So Far

To be honest, we haven’t really done anything very interesting so far. We’ve seen a single definition (of a category), looked at a number of examples to gain intuition, and given a flavor of how these things will turn into code.

Before we finish, let’s review the pros and cons of a computational representation of category-theoretical concepts.

Pros:

• We can prove results by explicit construction (more on this next time).
• Different-looking categories are structurally similar in code.
• We can faithfully represent the idea of using (objects and morphisms of) categories as parameters to construct other categories.
• Writing things in code gives a fuller understanding of how they work.

Cons:

• All computations are finite, requiring us to think much harder than a mathematician about the definition of an object.
• The type system is too weak. we can’t enforce the axioms of a category directly or even ensure the functions act sensibly at all. As such, the programmer becomes responsible for any failure that occur from bad definitions.
• The type system is too strong. when we work concretely we are forced to work within homogeneous categories (e.g. of ‘a Set).
• We cannot ensure the ability to check equality on objects. This showed up in our example of diagram categories.
• The functions used in defining morphisms, e.g. in the category of sets, are somewhat removed from real set-functions. For example, nothing about category theory requires functions to be computable. Moreover, nothing about our implementation requires the functions to have any outputs at all (they may loop infinitely!). Moreover, it is not possible to ensure that any given function terminates on a given input set (this is the Halting problem).

This list looks quite imbalanced, but one might argue that the cons are relatively minor compared to the pros. In particular (and this is what this author hopes to be the case), being able to explicitly construct proofs to theorems in category theory will give one a much deeper understanding both of category theory and of programming. This is the ultimate prize.

Next time we will begin our investigation of universal properties (where the real fun starts), and we’ll use the programmatic constructions we laid out in this post to give constructive proofs that various objects are universal.

Until then!

Rings — A Primer

Previously on this blog, we’ve covered two major kinds of algebraic objects: the vector space and the group. There are at least two more fundamental algebraic objects every mathematician should something know about. The first, and the focus of this primer, is the ring. The second, which we’ve mentioned briefly in passing on this blog, is the field. There are a few others important to the pure mathematician, such as the $R$-module (here $R$ is a ring). These do have some nice computational properties, but in order to even begin to talk about them we need to know about rings.

A Very Special Kind of Group

Recall that an abelian group $(G, +)$ is a set $G$ paired with a commutative binary operation $+$, where $G$ has a special identity element called 0 which acts as an identity for $+$. The archetypal example of an abelian group is, of course, the integers $\mathbb{Z}$ under addition, with zero playing the role of the identity element.

The easiest way to think of a ring is as an abelian group with more structure. This structure comes in the form of a multiplication operation which is “compatible” with the addition coming from the group structure.

Definition:ring $(R, +, \cdot)$ is a set $R$ which forms an abelian group under $+$ (with additive identity 0), and has an additional associative binary operation $\cdot$ with an element 1 serving as a (two-sided) multiplicative identity. Furthermore, $\cdot$ distributes over $+$ in the sense that for all $x,y,z \in R$

$x(y+z) = xy + xz$ and $(y+z)x = yx + zx$

The most important thing to note is that multiplication is not commutative both in general rings and for most rings in practice. If multiplication is commutative, then the ring is called commutative. Some easy examples of commutative rings include rings of numbers like $\mathbb{Z}, \mathbb{Z}/n\mathbb{Z}, \mathbb{Q}, \mathbb{R}$, which are just the abelian groups we know and love with multiplication added on.

If the reader takes anything away from this post, it should be the following:

Rings generalize arithmetic with integers.

Of course, this would imply that all rings are commutative, but this is not the case. More meaty and tempestuous examples of rings are very visibly noncommutative. One of the most important examples are rings of matrices. In particular, denote by $M_n(\mathbb{R})$ the set of all $n \times n$ matrices with real valued entries. This forms a ring under addition and multiplication of matrices, and has as a multiplicative identity the $n \times n$ identity matrix $I_n$.

Commutative rings are much more well-understood than noncommutative rings, and the study of the former is called commutative algebra. This is the main prerequisite for fields like algebraic geometry, which (in the simplest examples) associate commutative rings to geometric objects.

For us, all rings will have an identity, but many ring theorists will point out that one can just as easily define a ring to not have a multiplicative identity. We will call these non-unital rings, and will rarely, if ever, see them on this blog.

Another very important example of a concrete ring is the polynomial ring in $n$ variables with coefficients in $\mathbb{Q}$ or $\mathbb{R}$. This ring is denoted with square brackets denoting the variables, e.g. $\mathbb{R}[x_1, x_2, \dots , x_n]$. We rest assured that the reader is familiar with addition and multiplication of polynomials, and that this indeed forms a ring.

Kindergarten Math

Let’s start with some easy properties of rings. We will denote our generic ring by $R$.

First, the multiplicative identity of a ring is unique. The proof is exactly the same as it was for groups, but note that identities must be two-sided for this to work. If $1, 1'$ are two identities, then $1 = 1 \cdot 1' = 1'$.

Next, we prove that $0a = a0 = 0$ for all $a \in R$. Indeed, by the fact that multiplication distributes across addition, $0a = (0 + 0)a = 0a + 0a$, and additively canceling $0a$ from both sides gives $0a = 0$. An identical proof works for $a0$.

In fact, pretty much any “obvious property” from elementary arithmetic is satisfied for rings. For instance, $-(-a) = a$ and $(-a)b = a(-b) = -(ab)$ and $(-1)^2 = 1$ are all trivial to prove. Here is a list of these and more properties which we invite the reader to prove.

Zero Divisors, Integral Domains, and Units

One thing that is very much not automatically given in the general theory of rings is multiplicative cancellation. That is, if I have $ac = bc$ then it is not guaranteed to be the case that $a = b$. It is quite easy to come up with examples in modular arithmetic on integers; if $R = \mathbb{Z}/8\mathbb{Z}$ then $2*6 = 6*6 = 4 \mod 8$, but $2 \neq 6$.

The reason for this phenomenon is that many rings have elements that lack multiplicative inverses. In $\mathbb{Z}/8\mathbb{Z}$, for instance, $2$ has no multiplicative inverse (and neither does 6). Indeed, one is often interested in determining which elements are invertible in a ring and which elements are not. In a seemingly unrelated issue, one is interested in determining whether one can multiply any given element $x \in R$ by some $y$ to get zero. It turns out that these two conditions are disjoint, and closely related to our further inspection of special classes of rings.

Definition: An element $x$ of a ring $R$ is said to be a left zero-divisor if there is some $y \neq 0$ such that $xy = 0$. Similarly, $x$ is a right zero-divisor if there is a $z$ for which $zx = 0$. If $x$ is a left and right zero-divisor (e.g. if $R$ is commutative), it is just called a zero-divisor.

Definition: Let $x,y \in R$. The element $y$ is said to be a left inverse to $x$ if $yx = 1$, and a right inverse if $xy = 1$. If there is some $z \neq 0$ for which $xz = zx = 1$, then $x$ is said to be a two-sided inverse and $z$ is called the inverse of $x$, and $x$ is called a unit.

As a quick warmup, we prove that if $x$ has a left  and a right inverse then it has a two-sided inverse. Indeed, if $zx = 1 = xy$, then $z = z(xy) = (zx)y = y$, so in fact the left and right inverses are the same.

The salient fact here is that having a (left- or right-) inverse allows one to do (left- or right-) cancellation, since obviously when $ac = bc$ and $c$ has a right inverse, we can multiply $acc^{-1} = bcc^{-1}$ to get $a=b$. We will usually work with two-sided inverses and zero-divisors (since we will usually work in a commutative ring). But in non-commutative rings, like rings of matrices, one-sided phenomena do run rampant, and one must distinguish between them.

The right way to relate these two concepts is as follows. If $c$ has a right inverse, then define the right-multiplication function $(- \cdot c) : R \to R$ which takes $x$ and spits out $xc$. In fact, this function is an injection. Indeed, we already proved that (because $c$ has a right inverse) if $xc = yc$ then $x = y$. In particular, there is a unique preimage of $0$ under this map. Since $0c = 0$ is always true, then it must be the case that the only way to left-multiply $c$ times something to get zero is $0c$. That is, $c$ is not a right zero-divisor if right-multiplication by $c$ is injective. On the other hand, if the map is not injective, then there are some $x \neq y$ such that $xc = yc$, implying $(x-y)c = 0$, and this proves that $c$ is a right zero-divisor. We can do exactly the same argument with left-multiplication.

But there is one minor complication: what if right-multiplication is injective, but $c$ has no inverses? It’s not hard to come up with an example: 2 as an element of the ring of integers $\mathbb{Z}$ is a perfectly good one. It’s neither a zero-divisor nor a unit.

This basic study of zero-divisors gives us some natural definitions:

Definition: A division ring is a ring in which every element has a two-sided inverse.

If we allow that $R$ is commutative, we get something even better (more familiar: $\mathbb{Q}, \mathbb{R}, \mathbb{C}$ are the standard examples of fields).

Definition: field is a nonzero commutative division ring.

The “nonzero” part here is just to avoid the case when the ring is the trivial ring (sometimes called the zero ring) with one element. i.e., the set $\left \{ 0 \right \}$ is a ring in which zero satisfies both the additive and multiplicative identities. The zero ring is excluded from being a field for silly reasons: elegant theorems will hold for all fields except the zero ring, and it would be messy to require every theorem to add the condition that the field in question is nonzero.

We will have much more to say about fields later on this blog, but for now let’s just state one very non-obvious and interesting result in non-commutative algebra, known as Wedderburn’s Little Theorem.

Theorem: Every finite divison ring is a field.

That is, simply having finitely many elements in a division ring is enough to prove that multiplication is commutative. Pretty neat stuff. We will actually see a simpler version of this theorem in a moment.

Now as we saw units and zero-divisors are disjoint, but not quite opposites of each other. Since we have defined a division ring as a ring where all (non-zero) elements are units, it is natural to define a ring in which the only zero-divisor is zero. This is considered a natural generalization of our favorite ring $\mathbb{Z}$, hence the name “integral.”

Definition: An integral domain is a commutative ring in which zero is the only zero-divisor.

Note the requirement that the ring is commutative. Often we will simply call it a domain, although most authors allow domains to be noncommutative.

Already we can prove a very nice theorem:

Theorem: Every finite integral domain is a field.

Proof. Integral domains are commutative by definition, and so it suffices to show that every non-zero element has an inverse. Let $R$ be our integral domain in question, and $x \in R$ the element whose inverse we seek. By our discussion of above, right multiplication by $x$ is an injective map $R \to R$, and since $R$ is finite this map must be a bijection. Hence $x$ must have some $y \neq 0$ so that $yx = 1$. And so $y$ is the inverse of $x$.

$\square$

We could continue traveling down this road of studying special kinds of rings and their related properties, but we won’t often use these ideas on this blog. We do think the reader should be familiar with the names of these special classes of rings, and we will state the main theorems relating them.

Definition: A nonzero element $p \in R$ is called prime if whenever $p$ divides a product $ab$ it either divides $a$ or $b$ (or both). A unique factorization domain (abbreviated UFD) is an integral domain in which every element can be written uniquely as a product of primes.

Definition:Euclidean domain is a ring in which the division algorithm can be performed. That is, there is a norm function $| \cdot | : R \ \left \{ 0 \right \} \to \mathbb{N}$, for which every pair $a,b \neq 0$ can be written as $a = bq + r$ with r satisfying $|r| < |b|$.

Paolo Aluffi has a wonderful diagram showing the relations among the various special classes of integral domains. This image comes from his book, Algebra: Chapter 0, which is a must-have for the enterprising mathematics student interested in algebra.

In terms of what we have already seen, this diagram says that every field is a Euclidean domain, and in turn every Euclidean domain is a unique factorization domain. These are standard, but non-trivial theorems. We will not prove them here.

The two big areas in this diagram we haven’t yet mentioned on this blog are PIDs and Noetherian domains. The reason for that is because they both require a theory of ideals in rings (perhaps most briefly described as a generalization of the even numbers). We will begin next time with a discussion of ideals, and their important properties in studying rings, but before we finish we want to focus on one main example that will show up later on this blog.

Polynomial Rings

Let us formally define the polynomial ring.

Definition: Let $R$ be a commutative ring. Define the ring $R[x]$, to be the set of all polynomials in $x$ with coefficients in $R$, where addition and multiplication are the usual addition and multiplication of polynomials. We will often call $R[x]$ the polynomial ring in one variable over $R$.

We will often replace $x$ by some other letter representing an “indeterminate” variable, such as $t$, or $y$, or multiple indexed variables as in the following definition.

Definition: Let $R$ be a commutative ring. The ring $R[x_1, x_2, \dots, x_n]$ is the set of all polynomials in the $n$ variables with the usual addition and multiplication of polynomials.

What can we say about the polynomial ring in one variable $R[x]$? It’s additive and multiplicative identities are clear: the constant 0 and 1 polynomials, respectively. Other than that, we can’t quite get much more. There are some very bizarre features of polynomial rings with bizarre coefficient rings, such as multiplication decreasing degree.

However, when we impose additional conditions on $R$, the situation becomes much nicer.

Theorem: If $R$ is a unique factorization domain, then so is $R[x]$.

Proof. As we have yet to discuss ideals, we refer the reader to this proof, and recommend the reader return to it after our next primer.

$\square$

On the other hand, we will most often be working with polynomial rings over a field. And here the situation is even better:

Theorem: If $k$ is a field, then $k[x]$ is a Euclidean domain.

Proof. The norm function here is precisely the degree of the polynomial (the highest power of a monomial in the polynomial). Then given $f,g$, the usual algorithm for polynomial division gives a quotient and a remainder $q, r$ so that $f = qg + r$. In following the steps of the algorithm, one will note that all multiplication and division operations are performed in the field $k$, and the remainder always has a smaller degree than the quotient. Indeed, one can explicitly describe the algorithm and prove its correctness, and we will do so in full generality in the future of this blog when we discuss computational algebraic geometry.

$\square$

For multiple variables, things are a bit murkier. For instance, it is not even the case that $k[x,y]$ is a euclidean domain. One of the strongest things we can say originates from this simple observation:

Lemma: $R[x,y]$ is isomorphic to $R[x][y]$.

We haven’t quite yet talked about isomorphisms of rings (we will next time), but the idea is clear: every polynomial in two variables $x,y$ can be thought of as a polynomial in $y$ where the coefficients are polynomials in $x$ (gathered together by factoring out common factors of $y^k$). Similarly, $R[x_1, \dots, x_n]$ is the same thing as $R[x_1, \dots, x_{n-1}][x_n]$ by induction. This allows us to prove that any polynomial ring is a unique factorization domain:

Theorem: If $R$ is a UFD, so is $R[x_1, \dots, x_n]$.

Proof. $R[x]$ is a UFD as described above. By the lemma, $R[x_1, \dots, x_n] = R[x_1, \dots, x_{n-1}][x_n]$ so by induction $R[x_1, \dots, x_{n-1}]$ is a UFD implies $R[x_1, \dots, x_n]$ is as well.

$\square$

We’ll be very interested in exactly how to compute useful factorizations of polynomials into primes when we start our series on computational algebraic geometry. Some of the applications include robot motion planning, and automated theorem proving.

Next time we’ll visit the concept of an ideal, see quotient rings, and work toward proving Hilbert’s Nullstellensatz, a fundamental result in algebraic geometry.

Until then!

Persistent Homology Talk at UIC: Slides

Today I gave a twenty-minute talk at UI Chicago as part of the first annual Chicago Area Student SIAM Conference. My talk was titled “Recent Developments in Persistent Homology,” and it foreshadows the theoretical foundations and computational implementations we’ll be laying out on this blog in the coming months. Here’s the abstract:

Persistent homology is a recently developed technique for analyzing the topology of data sets. We will give a rough overview of the technique and sample successful applications to areas such as natural image analysis & texture classification, breast and liver cancer classification, molecular dynamical systems, and others.

The talk was received very well — mostly, I believe, because I waved my hands on the theoretical aspects and spent most of my time talking about the applications.

In any case, although the slides I used for my talk were largely unannotated (I spoke much more than is contained in text on the slides), I did list a number of references to papers that have shown successful applications. As such, some of the audience members asked me to post the slides on the web.

I personally hate it when people post slides because they’re often taken out of context (or just slide after slide of dense text), so this is just a warning to the reader. If you didn’t attend my talk, the chances are you won’t get much out of these slides. If you did, I hope the slides will be useful for the references and to jog your memory about my talk.