Last time we worked through some basic examples of universal properties, specifically singling out quotients, products, and coproducts. There are many many more universal properties that we will mention as we encounter them, but there is one crucial topic in category theory that we have only hinted at: functoriality.

As we’ve repeatedly stressed, the meat of category theory is in the morphisms. One natural question one might ask is, what notion of morphism is there between categories themselves? Indeed, the most straightforward way to see category theoretic concepts in classical mathematics is in a clever choice of functor. For example (and this example isn’t necessary for the rest of the article) one can “associate” to each topological space a group, called the homology group, in such a way that continuous functions on topological spaces translate to group homomorphisms. Moreover, this translation is functorial in the following sense: the group homomorphism associated to a composition is the composition of the associated group homomorphisms. If we denote the association by a subscripted asterisk, then we get the following common formula.

$ \displaystyle (fg)_* = f_* g_*$

This is the crucial property that maintains the structure of morphisms. Again, this should reinforce the idea that the crucial ingredient of every definition in category theory is its effect on morphisms.

Functors: a Definition

In complete generality, a functor is a mapping between two categories which preserves the structure of morphisms. Formally,

Definition: Let $ \mathbf{C}, \mathbf{D}$ be categories. A functor $ \mathscr{F}$ consists of two parts:

  • For each object $ C \in \mathbf{C}$ an associated object $ \mathscr{F}(C) \in \mathbf{D}$.
  • For each morphism $ f \in \textup{Hom}_{\mathbf{C}}(A,B)$ a corresponding morphism $ \mathscr{F}(f) \in \textup{Hom}_{\mathbf{D}}(\mathscr{F}(A), \mathscr{F}(B))$. Specifically, for each $ A,B$ we have a set-function $ \textup{Hom}_{\mathbf{C}}(A,B) \to \textup{Hom}_{\mathbf{C}}(\mathscr{F}(A), \mathscr{F}(B))$.

There are two properties that a functor needs to satisfy to “preserve structure.” The first is that the identity morphisms are preserved for every object; that is, $ \mathscr{F}(1_A) = 1_{\mathscr{F}(A)}$ for every object $ A$. Second, composition must be preserved. That is, if $ f \in \textup{Hom}_{\mathbf{C}}(A,B)$ and $ g \in \textup{Hom}_{\mathbf{C}}(B,C)$, we have

$ \displaystyle \mathscr{F}(gf) = \mathscr{F}(g) \mathscr{F}(f)$

We often denote a functor as we would a function $ \mathscr{F}: \mathbf{C} \to \mathbf{D}$, and use the function application notation as if everything were with sets.

Let’s look at a few simple examples.

Let $ \mathbf{FiniteSet}$ be the poset category of finite sets with subsets as morphisms, and let $ \mathbf{Int}$ be the category whose objects are integers where there is a unique morphism from $ x \to y$ if $ x \leq y$. Then the size function is a functor $ \mathbf{Set} \to \mathbf{Int}$. Continuing with $ \mathbf{Int}$, remember that $ \mathbf{Int}$ forms a group under addition (also known as $ \mathbb{Z}$). And so by its very definition any group homomorphism $ \mathbb{Z} \to \mathbb{Z}$ is a functor from $ \mathbf{Int} \to \mathbf{Int}$. A functor from a category to itself is often called an endofunctor.

There are many examples of functors from the category $ \mathbf{Top}$ of topological spaces to the category $ \mathbf{Grp}$ of groups. These include some examples we’ve seen on this blog, such as the fundamental group and homology groups.

One trivial example of a functor is called the forgetful functor. Let $ \mathbf{C}$ be a category whose objects are sets and whose morphisms are set-maps with additional structure, for example the category of groups. Define a functor $ \mathbf{C} \to \mathbf{Set}$ which acts as the identity on both sets and functions. This functor simply “forgets” the structure in $ \mathbf{C}$. In the realm of programs and types (this author is thinking Java), one can imagine this as a ‘type-cast’ from String to Object.  In the same vein, one could define an “identity” endofunctor $ \mathbf{C} \to \mathbf{C}$ which does absolutely nothing.

One interesting way to think of a functor is as a “realization” of one category inside another. In particular, because the composition structure of $ \mathbf{C}$ is preserved by a functor $ \mathbf{C} \to \mathbf{D}$, it must be the case that all commutative diagrams are “sent” to commutative diagrams. In addition, isomorphisms are sent to isomorphisms because if $ f, g$ are inverses of each other, $ 1_{\mathscr{F}(A)} = \mathscr{F}(1_A) = \mathscr{F}(gf) = \mathscr{F}(g)\mathscr{F}(f)$ and likewise for the reverse composition. And so if we have a functor $ \mathscr{F}$ from a poset category (say, the real numbers with the usual inequality) to some category $ \mathbf{C}$, then we can realize the structure of the poset sitting inside of $ \mathbf{C}$ (perhaps involving only some of the objects of $ \mathbf{C}$). This view comes in handy in a few places we’ll see later in our series on computational topology.

The Hom Functor

There is a very important and nontrivial example called the “hom functor” which is motivated by the category of vector spaces. We’ll stick to the concrete example of vector spaces, but the generalization to arbitrary categories is straightforward. If the reader knows absolutely nothing about vector spaces, replace “vector space” with “object” and “linear map” with “morphism.” It won’t quite be correct, but it will get the idea across.

To each vector space $ V$ one can define a dual vector space of functions $ V \to \mathbb{C}$ (or whatever the field of scalars for $ V$ is). Following the lead of hom sets, the dual vector space is denoted $ \textup{Hom}(V,\mathbb{C})$. Here the morphisms in the set are those from the category of vector spaces (that is, linear maps $ V \to \mathbb{C}$). Indeed, this is a vector space: one can add two functions pointwise ($ (f+g)(v) = f(v) + g(v)$) and scale them ($ (\lambda f)(v) = \lambda f(v)$), and the properties for a vector spaces are trivial to check.

Now the mapping $ \textup{Hom}(-, \mathbb{C})$ which takes $ V$ and produces $ \textup{Hom}(V, \mathbb{C})$ is a functor called the hom functor. But let’s inspect this one more closely. The source category is obviously the category of vector spaces, but what is the target category? The objects are clear: the hom sets $ \textup{Hom}(V,\mathbb{C})$ where $ V$ is a vector space. The morphisms of the category are particularly awkward. Officially, they are written as

$ \displaystyle \textup{Hom}(\textup{Hom}(V, \mathbb{C}), \textup{Hom}(W, \mathbb{C}))$

so a morphism in this category takes as input a linear map $ V \to \mathbb{C}$ and produces as output one $ W \to \mathbb{C}$. But what are the morphisms in words we can understand? And how can we compose them? Before reading on, think about what a morphism of morphisms should look like.

Okay, ready?

The morphisms in this category can be thought of as linear maps $ W \to V$. More specifically, given a morphism $ \varphi: V \to \mathbb{C}$ and a linear map $ g: W \to V$, we can construct a linear map $ W \to \mathbb{C}$ by composing $ \varphi g$.

And so if we apply the $ \textup{Hom}$ functor to a morphism $ f: W \to V$, we get a morphism in $ \textup{Hom}(\textup{Hom}(V, \mathbb{C}), \textup{Hom}(W, \mathbb{C}))$. Let’s denote the application of the hom functor using an asterisk so that $ f \mapsto f_*$.

But wait a minute! The mapping here is going in the wrong direction: we took a map in one category going from the $ V$ side to the $ W$ side, and after applying the functor we got a map going from the $ W$ side ($ \textup{Hom}(W, \mathbb{C})$) to the $ V$ side ($ \textup{Hom}(V, \mathbb{C})$). It seems there is no reasonable way to take a map $ V \to \mathbb{C}$ and get a map in $ W \to \mathbb{C}$ using just $ f$, but the other way is obvious. The hom functor “goes backward” in a sense. In other words, the composition property for our “functor” makes the composite $ (g f)_*$ to the map taking $ \varphi$ to $ \varphi g f$. On the other hand, there is no way to compose $ g_* f_*$, as they operate on the wrong domains! It must be the other way around:

$ \displaystyle (gf)_* = f_* g_*$

We advise the reader to write down the commutative diagram and trace out the compositions to make sure everything works out. But this is a problem, because it makes the hom functor fail the most important requirement. In order to fix this reversal “problem,” we make the following definition:

Definition: A functor $ \mathscr{F} : \mathbf{C} \to \mathbf{D}$ is called covariant if it preserves the order of morphism composition, so that $ \mathscr{F}(gf) = \mathscr{F}(g) \mathscr{F}(f)$. If it reverses the order, we call it contravariant.

And so the hom functor on vector spaces is a contravariant functor, while all of the other functors we’ve defined in this post are covariant.

There is another way to describe a contravariant functor as a covariant functor which is often used. It involves the idea of an “opposite” category. For any category $ \mathbf{C}$ we can define the opposite category $ \mathbf{C}^{\textup{op}}$ to be a category with the same objects as $ \mathbf{C}$, but with all morphisms reversed. That is, we define

$ \displaystyle \textup{Hom}_{\mathbf{C}^{\textup{op}}}(A,B) = \textup{Hom}_{\mathbf{C}}(B,A)$

We leave it to the reader to verify that this is indeed a category. It is also not hard to see that $ (\mathbf{C}^{\textup{op}})^{\textup{op}} = \mathbf{C}$. Opposite categories give us a nice recharacterization of a contrvariant functor. Indeed, because composition in opposite categories is reversed, a contravariant functor $ \mathbf{C} \to \mathbf{D}$ is just a covariant functor on the opposite category $ \mathbf{C}^{\textup{op}} \to \mathbf{D}$. Or equivalently, one $ \mathbf{C} \to \mathbf{D}^{\textup{op}}$. More than anything, opposite categories are syntactical sugar. Composition is only reversed artificially to make domains and codomains line up, but the actual composition is the same as in the original category.

Functors as Types

Before we move on to some code, let’s take a step back and look at the big picture (we’ve certainly plowed through enough details up to this point). The main thesis is that functoriality is a valuable property for an operation to have, but it’s not entirely clear why. Even the brightest of readers can only assume such properties are useful for mathematical analysis. It seems that the question we started this series out with, “what does category theory allow us to do that we couldn’t do before?” still has the answer, “nothing.” More relevantly, the question of what functoriality allows us to do is unclear. Indeed, once again the answer is “nothing.” Rather, functoriality in a computation allows one to analyze the behavior of a program. It gives the programmer a common abstraction in which to frame operations, and ease in proving the correctness of one’s algorithms.

In this light, the best we can do in implementing functors in programs is to give a type definition and examples. And in this author’s opinion this series is quickly becoming boring (all of the computational examples are relatively lame), so we will skip the examples in favor of the next post which will analyze more meaty programming constructs from a categorical viewpoint.

So recall the ML type definition of a category, a tuple of operations for source, target, identity, and composition:

datatype ('object, 'arrow)Category =
    category of ('arrow -> 'object) *
                ('arrow -> 'object) *
                ('object -> 'arrow) *
                ('arrow * 'arrow -> 'arrow)

And so a functor consists of the two categories involved (as types), and the mapping on objects, and the mapping on morphisms.

datatype ('cObject, 'cArrow, 'dObject, 'dArrow)Functor =
   aFunctor of ('cObject, 'cArrow)Category *
               ('cObject -> 'dObject) *
               ('cArrow -> 'dArrow) *
               ('dObject -> 'dArrow)Category

We encourage the reader who is uncomfortable with these type definitions to experiment with them by implementing some of our simpler examples (say, the size functor from sets to integers). Insofar as the basic definitions go, functors are not all that interesting. They become much more interesting when additional structure is imposed on them, and in the distant future we will see a glimpse of this in the form of adjointness. We hope to get around to analyzing statements like “syntax and semantics are adjoint functors.” For the next post in this series, we will take the three beloved functions of functional programming (map, foldl(r), and filter), and see what their categorical properties are.

Until then!

Universal Properties

Previously in this series we’ve seen the definition of a category and a bunch of examples, basic properties of morphisms, and a first look at how to represent categories as types in ML. In this post we’ll expand these ideas and introduce the notion of a universal property. We’ll see examples from mathematics and write some programs which simultaneously prove certain objects have universal properties and construct the morphisms involved.

A Grand Simple Thing

One might go so far as to call universal properties the most important concept in category theory. This should initially strike the reader as odd, because at first glance universal properties are so succinctly described that they don’t seem to be very interesting. In fact, there are only two universal properties and they are that of being initial and final.

Definition: An object $ A$ in a category $ \mathbf{C}$ is called initial if for every object $ B$ there is a unique morphism $ A \to B$. An object $ Z$ is called final if for every object $ B$ there is a unique morphism $ B \to Z$. If an object satisfies either of these properties, it is called universal. If an object satisfies both, it is called a zero object.

In both cases, the existence of a unique morphism is the same as saying the relevant Hom set is a singleton (i.e., for initial objects $ A$, the Hom set $ \textup{Hom}_{\mathbf{C}}(A,B)$ consists of a single element). There is one and only one morphism between the two objects. In the particular case of $ \textup{Hom}(A,A)$ when $ A$ is initial (or final), the definition of a category says there must be at least one morphism, the identity, and the universal property says there is no other.

There’s only one way such a simple definition could find fruitful applications, and that is by cleverly picking categories. Before we get to constructing interesting categories with useful universal objects, let’s recognize some universal objects in categories we already know.

In $ \mathbf{Set}$ the single element set is final, but not initial; there is only one set-function to a single-element set. It is important to note that the single-element set is far from unique. There are infinitely many (uncountably many!) singleton sets, but as we have already seen all one-element sets are isomorphic in $ \mathbf{Set}$ (they all have the same cardinality). On the other hand, the empty set is initial, since the “empty function” is the only set-mapping from the empty set to any set. Here the initial object truly is unique, and not just up to isomorphism.

It turns out universal objects are always unique up to isomorphism, when they exist. Here is the official statement.

Proposition: If $ A, A’$ are both initial in $ \mathbf{C}$, then $ A \cong A’$ are isomorphic. If $ Z, Z’$ are both final, then $ Z \cong Z’$.

Proof. Recall that a mophism $ f: A \to B$ is an isomorphism if it has a two sided inverse, a $ g:B \to A$ so that $ gf = 1_A$ and $ fg=1_B$ are the identities. Now if $ A,A’$ are two initial objects there are unique morphisms $ f : A \to A’$ and $ g: A’ \to A$. Moreover, these compose to be morphisms $ gf: A \to A$. But since $ A$ is initial, the only morphism $ A \to A$ is the identity. The situation for $ fg : A’ \to A’$ is analogous, and so these morphisms are actually inverses of each other, and $ A, A’$ are isomorphic. The proof for final objects is identical. $ \square$

Let’s continue with examples. In the category of groups, the trivial group $ \left \{ 1 \right \}$ is both initial and final, because group homomorphisms must preserve the identity element. Hence the trivial group is a zero object. Again, “the” trivial group is not unique, but unique up to isomorphism.

In the category of types with computable (halting) functions as morphisms, the null type is final. To be honest, this depends on how we determine whether two computable functions are “equal.” In this case, we only care about the set of inputs and outputs, and for the null type all computable functions have the same output: null.

Partial order categories are examples of categories which need not have universal objects. If the partial order is constructed from subsets of a set $ X$, then the initial object is the empty set (by virtue of being a subset of every set), and $ X$ as a subset of itself is obviously final. But there are other partial orders, such as inequality of integers, which have no “smallest” or “largest” objects. Partial order categories which have particularly nice properties (such as initial and final objects, but not quite exactly) are closely related to the concept of a domain in denotational semantics, and the language of universal properties is relevant to that discussion as well.

The place where universal properties really shine is in defining new constructions. For instance, the direct product of sets is defined by the fact that it satisfies a universal property. Such constructions abound in category theory, and they work via the ‘diagram categories’ we defined in our introductory post. Let’s investigate them now.


Let’s recall the classical definition from set theory of a quotient. We described special versions of quotients in the categories of groups and topological spaces, and we’ll see them all unified via the universal property of a quotient in a moment.

Definition: An equivalence relation on a set $ X$ is a subset of the set product $ \sim \subset X \times X$ which is reflexive, symmetric, and transitive. The quotient set $ X / \sim$ is the set of equivalence classes on $ X$. The canonical projection $ \pi : X \to X/\sim$ is the map sending $ x$ to its equivalence class under $ \sim$.

The quotient set $ X / \sim$ can also be described in terms of a special property: it is the “largest” set which agrees with the equivalence relation $ \sim$. On one hand, it is the case that whenever $ a \sim b$ in $ X$ then $ \pi(a) = \pi(b)$. Moreover, for any set $ Y$ and any map $ g: X \to Y$ which equates equivalent things ($ g(a) = g(b)$ for all $ a \sim b$), then there is a unique map $ f : X/\sim \to Y$ such that $ f \pi = g$. This word salad is best translated into a diagram.


Here we use a dashed line to assert the existence of a morphism (once we’ve proven such a morphism exists, we use a solid line instead), and the symbol $ \exists !$ signifies existence ($ \exists$) and uniqueness (!).

We can prove this explicitly in the category $ \mathbf{Set}$. Indeed, if $ g$ is any map such that $ g(a) = g(b)$ for all equivalent $ a,b \in X$, then we can define $ f$ as follows: for any $ a \in X$ whose equivalence class is denoted by $ [a]$ in $ X / \sim$, and define $ f([a]) = g(a)$. This map is well defined because if $ a \sim b$, then $ f([a]) = g(a) = g(b) = f([b])$. It is unique because if $ f \pi = g = h \pi$ for some other $ h: X / \sim \to Y$, then $ h([x]) = g(x) = f([x])$; this is the only possible definition.

Now the “official” way to state this universal property is as follows:

The quotient set $ X / \sim$ is universal with respect to the property of mapping $ X$ to a set so that equivalent elements have the same image.

But as we said earlier, there are only two kinds of universal properties: initial and final. Now this $ X / \sim$ looks suspiciously like an initial object ($ f$ is going from $ X / \sim$, after all), but what exactly is the category we’re considering? The trick to dissecting this sentence is to notice that this is not a statement about just $ X / \sim$, but of the morphism $ \pi$.

That is, we’re considering a diagram category. In more detail: fix an object $ X$ in $ \mathbf{Set}$ and an equivalence relation $ \sim$ on $ X$. We define a category $ \mathbf{Set}_{X,\sim}$ as follows. The objects in the category are morphisms $ f:X \to Y$ such that $ a \sim b$ in $ X$ implies $ f(a) = f(b)$ in $ Y$. The morphisms in the category are commutative diagrams


Here $ f_1, f_2$ need to be such that they send equivalent things to equal things (or they wouldn’t be objects in the category!), and by the commutativity of the diagram $ f_2 = \varphi f_1$. Indeed, the statement about quotients is that $ \pi : X \to X / \sim$ is an initial object in this category. In fact, we have already proved it! But note the abuse of language in our offset statement above: it’s not really $ X / \sim$ that is the universal object, but $ \pi$. Moreover, the statement itself doesn’t tell us what category to inspect, nor whether we care about initial or final objects in that category. Unfortunately this abuse of language is widespread in the mathematical world, and for arguably good reason. Once one gets acquainted with these kinds of constructions, reading between the limes becomes much easier and it would be a waste of time to spell it out. After all, once we understand $ X / \sim$ there is no “obvious” choice for a map $ X \to X / \sim$ except for the projection $ \pi$. This is how $ \pi$ got its name, the canonical projection.

Two last bits of terminology: if $ \mathbf{C}$ is any category whose objects are sets (and hence, where equivalence relations make sense), we say that $ \mathbf{C}$ has quotients if for every object $ X$ there is a morphism $ \pi$ satisfying the universal property of a quotient. Another way to state the universal property is to say that all maps respecting the equivalence structure factor through the quotient, in the sense that we get a diagram like the one above.

What is the benefit of viewing $ X / \sim$ by its universal property? For one, the set $ X / \sim$ is unique up to isomorphism. That is, if any other pair $ (Z, g)$ satisfies the same property, we automatically get an isomorphism $ X / \sim \to Z$. For instance, if $ \sim$ is defined via a function $ f : X \to Y$ (that is, $ a \sim b$ if $ f(a) = f(b)$), then the pair $ (\textup{im}(f), f)$ satisfies the universal property of a quotient. This means that we can “decompose” any function into three pieces:

$ \displaystyle X \to X / \sim \to \textup{im}(f) \to Y$

The first map is the canonical projection, the second is the isomorphism given by the universal property of the quotient, and the last is the inclusion map into $ Y$. In a sense, all three of these maps are “canonical.” This isn’t so magical for set-maps, but the same statement (and essentially the same proof) holds for groups and topological spaces, and are revered as theorems. For groups, this is called The First Isomorphism Theorem, but it’s essentially the claim that the category of groups has quotients.

This is getting a bit abstract, so let’s see how the idea manifests itself as a program. In fact, it’s embarrassingly simple. Using our “simpler” ML definition of a category from last time, the constructive proof that quotient sets satisfy the universal property is simply a concrete version of the definition of $ f$ we gave above. In code,

fun inducedMapFromQuotient(setMap(x, pi, q), setMap(x, g, y)) =
   setMap(q, (fn a => g(representative(a))), y)

That is, once we have $ \pi$ and $ X / \sim$ defined for our given equivalence relation, this function accepts as input any morphism $ g$ and produces the uniquely defined $ f$ in the diagram above. Here the “representative” function just returns an arbitrary element of the given set, which we added to the abstract datatype for sets. If the set $ X$ is empty, then all functions involved will raise an “empty” exception upon being called, which is perfectly fine. We leave the functions which explicitly construct the quotient set given $ X, \sim$ as an exercise to the reader.

Products and Coproducts

Just as the concept of a quotient set or quotient group can be generalized to a universal property, so can the notion of a product. Again we take our intuition from $ \mathbf{Set}$. There the product of two sets $ X,Y$ is the set of ordered pairs

$ \displaystyle X \times Y = \left \{ (x,y) : x \in X, y \in Y \right \}$

But as with quotients, there’s much more going on and the key is in the morphisms. Specifically, there are two obvious choices for morphisms $ X \times Y \to X$ and $ X \times Y \to Y$. These are the two projections onto the components, namely $ \pi_1(x,y) = x$ and $ \pi_2(x,y) = y$. These projections are also called “canonical projections,” and they satisfy their own universal property.

The product of sets is universal with respect to the property of having two morphisms to its factors.

Indeed, this idea is so general that it can be formulated in any category, not just categories whose objects are sets. Let $ X,Y$ be two fixed objects in a category $ \mathbf{C}$. Should it exist, the product $ X \times Y$ is defined to be a final object in the following diagram category. This category has as objects pairs of morphisms


and as morphisms it has commutative diagrams


In words, to say products are final is to say that for any object in this category, there is a unique map $ \varphi$ that factors through the product, so that $ \pi_1 \varphi = f$ and $ \pi_2 \varphi = g$. In a diagram, it is to claim the following commutes:


If the product $ X \times Y$ exists for any pair of objects, we declare that the category $ \mathbf{C}$ has products.

Indeed, many familiar product constructions exist in pure mathematics: sets, groups, topological spaces, vector spaces, and rings all have products. In fact, so does the category of ML types. Given two types ‘a and ‘b, we can form the (aptly named) product type ‘a * ‘b. The canonical projections exist because ML supports parametric polymorphism. They are

fun leftProjection(x,y) = x
fun rightProjection(x,y) = y

And to construct the unique morphism to the product,

fun inducedMapToProduct(f,g) = fn a => (f(a), g(a))

We leave the uniqueness proof to the reader as a brief exercise.

There is a similar notion called a coproduct, denoted $ X \coprod Y$, in which everything is reversed: the arrows in the diagram category go to $ X \coprod Y$ and the object is initial in the diagram category. Explicitly, for a fixed $ X, Y$ the objects in the diagram category are again pairs of morphisms, but this time with arrows going to the central object


The morphisms are again commutative diagrams, but with the connecting morphism going away from the central object


And a coproduct is defined to be an initial object in this category. That is, a pair of morphisms $ i_1, i_2$ such that there is a unique connecting morphism in the following diagram.


Coproducts are far less intuitive than products in their concrete realizations, but the universal property is no more complicated.  For the category of sets, the coproduct is a disjoint union (in which shared elements of two sets $ X, Y$ are forcibly considered different), and the canonical morphisms are “inclusion” maps (hence the switch from $ \pi$ to $ i$ in the diagram above). Specifically, if we define the coproduct

$ \displaystyle X \coprod Y = (X \times \left \{ 1 \right \}) \cup (Y \times \left \{ 2 \right \})$

as the set of “tagged” elements (the right entry in a tuple signifies which piece of the coproduct the left entry came from), then the inclusion maps $ i_1(x) = (x,1)$ and $ i_2(y) = (y,2)$ are the canonical morphisms.

There are similar notions of disjoint unions for topological spaces and graphs, which are their categories’ respective coproducts. However, in most categories the coproducts are dramatically different from “unions.” In group theory, it is a somewhat complicated object known as the free product. We mentioned free products in our hasty discussion of the fundamental group, but understanding why and where free groups naturally occur is quite technical. Similarly, coproducts of vector spaces (or $ R$-modules) are more like a product, with the stipulation that at most finitely many of the entries of a tuple are nonzero (e.g., formal linear combinations of things from the pieces). Even without understanding these examples, the reader should begin to believe that relatively simple universal properties can yield very useful objects with potentially difficult constructions in specific categories. The ubiquity of the concepts across drastically different fields of mathematics is part of why universal properties are called “universal.”

Luckily, the category of ML types has a nice coproduct which feels like a union, but it is not supported as a “native” language feature like products types are. Specifically, given two types ‘a, ‘b we can define the “coproduct type”

datatype ('a, 'b)Coproduct = left of 'a | right of 'b

Let’s prove this is actually a coproduct: fix two types ‘a and ‘b, and let $ i_1, i_2$ be the functions

fun leftInclusion(x) = left(x)
fun rightInclusion(y) = right(y)

Then given any other pair of functions $ f,g$ which accept as input types ‘a and ‘b, respectively, there is a unique function $ \varphi$ operating on the coproduct type. We construct it as follows.

fun inducedCoproductMap(f,g) =
      theMap(left(a)) = f(a)
      theMap(right(b)) = g(b)

The uniqueness of this construction is self-evident. This author finds it fascinating that these definitions are so deep and profound (indeed, category theory is heralded as the queen of abstraction), but their realizations are trivially obvious to the working programmer. Perhaps this is a statement about how well-behaved the category of ML types is.

Continuing On

So far we have seen three relatively simple examples of universal properties, and explored how they are realized in some categories. We should note before closing two things. The first is that not every category has objects with these universal properties. Unfortunately poset categories don’t serve as a good counterexample for this (they have both products and coproducts; what are they?), but it may be the case that in some categories only some pairs of objects have products or coproducts, while others do not.

Second, there are many more universal properties that we haven’t covered here. For instance, the notion of a limit is a universal property, as is the notion of a “free” object. There are kernels, pull-backs, equalizers, and many other ad-hoc universal properties without names. And for every universal property there is a corresponding “dual” property that results from reversing the arrows in every diagram, as we did with coproducts. We will visit the relevant ones as they come up in our explorations.

In the next few posts we’ll encounter functors and the concept of functoriality, and start asking some poignant questions about familiar programmatic constructions.

Until then!

Categories as Types

In this post we’ll get a quick look at two ways to define a category as a type in ML. The first way will be completely trivial: we’ll just write it as a tuple of functions. The second will involve the terribly-named “functor” expression in ML, which allows one to give a bit more structure on data types.

The reader unfamiliar with the ML programming language should consult our earlier primer.

What Do We Want?

The first question to ask is “what do we want out of a category?” Recall from last time that a category is a class of objects, where each pair of objects is endowed with a set of morphisms. The morphisms were subject to a handful of conditions, which we paraphrase below:

  • Composition has to make sense when it’s defined.
  • Composition is associative.
  • Every object has an identity morphism.

What is the computational heart of such a beast? Obviously, we can’t explicitly represent the class of objects itself. Computers are finite (or at least countable) beings, after all. And in general we can’t even represent the set of all morphisms explicitly. Think of the category $ \mathbf{Set}$: if $ A, B$ are infinite, then there are infinitely many morphisms (set-functions) $ A \to B$.

A subtler and more important point is that everything in our representation of a category needs to be a type in ML (for indeed, how can a program run without being able to understand the types involved). This nudges us in an interesting direction: if an object is a type, and a morphism is a type, then we might reasonably expect a composition function:

compose: 'arrow * 'arrow -> 'arrow

We might allow this function to raise an exception if the two morphisms are uncomposable. But then, of course, we need a way to determine if morphisms are composable, and this requires us to know what the source and target functions are.

source: 'arrow -> 'object
target: 'arrow -> 'object

We can’t readily enforce composition to be associative at this stage (or any), because morphisms don’t have any other manipulable properties. We can’t reasonably expect to be able to compare morphisms for equality, as we aren’t able to do this in $ \mathbf{Set}$.

But we can enforce the final axiom: that every object has an identity morphism. This would come in the form of a function which accepts as input an object and produces a morphism:

identity: 'object -> 'arrow

But again, we can’t necessarily enforce that identity behaves as its supposed to.

Even so, this should be enough to define a category. That is, in ML, we have the datatype

datatype ('object, 'arrow)Category =
 category of ('arrow -> 'object) *
             ('arrow -> 'object) *
             ('object -> 'arrow) *
             ('arrow * 'arrow -> 'arrow)

Where we understand the first two functions to be source and target, and the third and fourth to be identity and compose, respectively.

Categories as Values

In order to see this type in action, we defined (and included in the source code archive for this post) a type for homogeneous sets. Since ML doesn’t support homogeneous lists natively, we’ll have to settle for a particular subcategory of $ \mathbf{Set}$. We’ll call it $ \mathbf{Set}_a$, the category whose objects are finite sets whose elements have type $ a$, and whose morphisms are again set-functions. Hence, we can define an object in this category as the ML type

'a Set

and an arrow as a datatype

datatype 'a SetMap = setMap of ('a Set) * ('a -> 'a) * ('a Set)

where the first object is the source and the second is the target (mirroring the notation $ A \to B$).

Now we must define the functions required for the data constructor of a Category. Source and target are trivial.

fun setSource(setMap(a, f, b)) = a
fun setTarget(setMap(a, f, b)) = b

And identity is equally natural.

fun setIdentity(aSet) = setMap(aSet, (fn x => x), aSet)

Finally, compose requires us to check for set equality of the relevant source and target, and compose the bundled set-maps. A few notational comments: the “o” operator does function composition in ML, and we defined “setEq” and the “uncomposable” exception elsewhere.

fun setCompose(setMap(b', g, c), setMap(a, f, b)) =
      if setEq(b, b') then
         setMap(a, (g o f), c)
         raise uncomposable

Note that the second map in the composition order is the first argument to the function, as is standard in mathematical notation.

And now this category of finite sets is just the instantiation

val FiniteSets = category(setSource, setTarget, setIdentity, setCompose)

Oddly enough, this definition does not depend on the choice of a type for our sets! None of the functions we defined care about what the elements of sets are, so long as they are comparable for equality (in ML, such a type is denoted with two leading apostrophes: ”a).

This example is not very interesting, but we can build on it in two interesting ways. The first is to define the poset category of sets. Recall that the poset category of a set $ X$ has as objects subsets of $ X$ and there is a unique morphism from $ A \to B$ if and only if $ A \subset B$. There are two ways to model this in ML. The first is that we can assume the person constructing the morphisms is giving correct data (that is, only those objects which satisfy the subset relation). This would amount to a morphism just representing “nothing” except the pair $ (A,B)$. On the other hand, we may also require a function which verifies that the relation holds. That is, we could describe a morphism in this (or any) poset category as a datatype

datatype 'a PosetArrow = ltArrow of 'a * ('a -> 'a -> bool) * 'a

We still do need to assume the user is not creating multiple arrows with different functions (or we must tacitly call them all equal regardless of the function). We can do a check that the function actually returns true by providing a function for creating arrows in this particular poset category of a set with subsets.

exception invalidArrow
fun setPosetArrow(x)(y) = if subset(x)(y) then ltArrow(x, subset, y)
                          else raise invalidArrow

Then the functions defining a poset category are very similar to those of a set. The only real difference is in the identity and composition functions, and it is minor at that.

fun posetSource(ltArrow(x, f, y)) = x
fun posetTarget(ltArrow(x, f, y)) = y
fun posetIdentity(x) = ltArrow(x, (fn z => fn w => true), x)
fun posetCompose(ltArrow(b', g, c), ltArrow(a, f, b)) =
      if b = b' then
         ltArrow(a, (fn x => fn y => f(x)(b) andalso g(b)(y)), c)
         raise uncomposable

We know that a set is always a subset of itself, so we can provide the constant true function in posetIdentity. In posetCompose, we assume things are transitive and provide the logical conjunction of the two verification functions for the pieces. Then the category as a value is

val Posets = category(posetSource, posetTarget, posetIdentity, posetCompose)

One will notice again that the homogeneous type of our sets is irrelevant. The poset category is parametric. To be completely clear, the type of the Poset category defined above is

('a Set, ('a Set)PosetArrow)Category

and so we can define a shortcut for this type.

type 'a PosetCategory = ('a Set, ('a Set)PosetArrow)Category

The only way we could make this more general is to pass the constructor for creating a particular kind of posetArrow (in our case, it just means fixing the choice of verification function, subset, for the more generic type constructor). We leave this as an exercise to the reader.

Our last example will return again to our abstract “diagram category.” Recall that if we fix an object $ A$ in a category $ \mathbf{C}$, the category $ \mathbf{C}_A$ has as objects the morphisms with fixed source $ A$,


and the arrows are commutative diagrams

diagram-morphismLets fix $ A$ to be a set, and define a function to instantiate objects in this category:

fun makeArrow(fixedSet)(f)(target) = setMap(fixedSet, f, target)

That is, if I have a fixed set $ A$, I can call the function like so

val makeObject = makeArrow(fixedSet)

And use this function to create all of my objects. A morphism is then just a pair of set-maps with a connecting function. Note how similar this snippet of code looks structurally to the other morphism datatypes we’ve defined. This should reinforce the point that morphisms really aren’t any different in this abstract category!

datatype 'a TriangleDiagram = 
   triangle of ('a SetMap) * ('a -> 'a) * ('a SetMap)

And the corresponding functions and category instantiation are

fun triangleSource(triangle(x, f, y)) = x
fun triangleTarget(triangle(x, f, y)) = y
fun triangleIdentity(x) = triangle(x, (fn z => z), x)
fun triangleCompose(triangle(b', g, c), triangle(a, f, b)) =
      if setEq(setTarget(b'))(setTarget(b)) then
         triangle(a, (g o f), c)
         raise uncomposable

val SetDiagrams =
   category(triangleSource, triangleTarget, triangleIdentity, triangleCompose)

Notice how we cannot compare the objects in this category for equality! Indeed, two functions cannot be compared in ML, and the best we can do is compare the targets of these functions to ensure that the connecting morphisms can be composed. A malicious programmer might then be able to compose uncomposable morphisms, a devious and startling proposition.

Notice further how we can’t avoid using set-specific functions with our current definition of a category. What we could do instead is require a category to have a function which checks for composability. Then compose could be written in a more homogeneous (but not quite completely parametric fashion). The reader is welcome to try this on their own, but we’ll soon have a more organized way to do this later, when we package up a category into an ML structure.

Categories as Signatures

Let’s look at a more advanced feature of ML and see how we can apply it to representing categories in code. The idea is familiar to most Java and C++ programmers, and that is of an interface: the programmer specifies an abstract type which has specific functions attached to it, but does not implement those functions. Then some (perhaps different) programmer implements the interface by defining those functions for a concrete type.

For the reader who isn’t so interested in these features of ML, feel free to skip this section. We won’t be introducing any new examples of categories, just rephrasing what we’ve done above in a different way. It is also highly debatable whether this new way is actually any better (it’s certainly longer).

The corresponding concept in ML is called a signature (interface) and a structure (concrete instance). The only difference is that a structure is removed one step further from ML’s type system. This is in a sense necessary: a structure should be able to implement the requirements for many different signatures, and a signature should be able to have many different structures implement it. So there is no strict hierarchy of types to lean on (it’s just a many-to-many relationship).

On the other hand, not having any way to think of a structure as a type at all would be completely useless. The whole point is to be able to write function which manipulate a structure (concrete instance) by abstractly accessing the signature’s (interface’s) functions regardless of the particular implementation details. And so ML adds in another layer of indirection: the functor (don’t confuse this with categorical functors, which we haven’t talked abou yet). A functor in ML is a procedure which accepts as input a structure and produces as output another structure.

Let’s see this on a simple example. We can define a structure for “things that have magnitudes,” which as a signature would be

signature MAG_OBJ =
   type object
   val mag : object -> int

This introduced a few new language forms: the “signature” keyword is like “fun” or “val,” it just declares the purpose of the statement as a whole. The “sig … end” expression is what actually creates the signature, and the contents are a list of type definitions, datatype definitions, and value/function type definitions. The reader should think of this as a named “type” for structures, analogous to a C struct.

To implement a signature in ML is to “ascribe” it. That is, we can create a structure that defines the functions laid out in a signature (and perhaps more). Here is an example of a structure for a mag object of integers:

structure IntMag : MAG_OBJ = 
   type object = int
   fun mag(x) = if x >= 0 then x else ~x

The colon is a type ascription, and at the compiler-level it goes through the types and functions defined in this structure and attempts to match them with the required ones from the MAG_OBJ signature. If it fails, it raises a compiler error. A more detailed description of this process can be found here. We can then use the structure as follows:

local open IntMag in
   val y = ~7
   val z = mag(y)

The open keyword binds all of the functions and types to the current environment (and so it’s naturally a bad idea to open structures in the global environment). The “local” construction is made to work specifically with open.

In any case, the important part about signatures and structures is that one can write functions which accept as input structures with a given signature and produce structures with other signatures. The name for this is a “functor,” but we will usually call it an “ML functor” to clarify that it is not a categorical functor (if you don’t know what a categorical functor is, don’t worry yet).

For example, here is a signature for a “distance object”

signature DIST_OBJ =
   type object
   val dist: object * object -> int

And a functor which accepts as input a MAG_OBJ and creates a DIST_OBJ in a contrived and silly way.

functor MakeDistObj(structure MAG: MAG_OBJ) =
   type object = MAG.object
   val dist = fn (x, y) =>
                  val v = MAG.mag(x) - MAG.mag(y)
                  if v > 0 then v else ~v

Here the argument to the functor is a structure called MAG, and we need to specify which type is to be ascribed to it; in this case, it’s a MAG_OBJ. Only things within the specified signature can be used within the struct expression that follows (if one needs a functor to accept as input something which has multiple type ascriptions, one can create a new signature that “inherits” from both). Then the right hand side of the functor is just a new structure definition, where the innards are allowed to refer to the properties of the given MAG_OBJ.

We could in turn define a functor which accepts a DIST_OBJ as input and produces as output another structure, and compose the two functors. This is started to sound a bit category-theoretical, but we want to remind the reader that a “functor” in category theory is quite different from a “functor” in ML (in particular, we have yet to mention the former, and the latter is just a mechanism to parameterize and modularize code).

In any case, we can do the same thing with a category. A general category will be represented by a signature, and a particular instance of a category is a structure with the implemented pieces.

signature CATEGORY =
   type object
   type arrow
   exception uncomposable

   val source: arrow -> object
   val target: arrow -> object
   val identity: object -> arrow
   val composable: arrow * arrow -> bool
   val compose: arrow * arrow -> arrow

As it turns out, ML has an implementation of “sets” already, and it uses signatures. So instead of using our rinky-dink implementation of sets from earlier in this post, we can implement an instance of the ORD_KEY signature, and then call one of the many set-creating functors available to us. We’ll use an implementation based on red-black trees for now.

First the key:

structure OrderedInt : ORD_KEY =
   type ord_key = int
   val compare =

then the functor:

functor MakeSetCategory(structure ELT: ORD_KEY) =
   structure Set:ORD_SET = RedBlackSetFn(ELT)
   type elt = ELT.ord_key
   type object = Set.set
   datatype arrow = function of object * (elt -> elt) * object

   exception uncomposable

   fun source(function(a, _, _)) = a
   fun target(function(_, _, b)) = b
   fun identity(a) = function(a, fn x => x, a)
   fun composable(a,b) = Set.equal(a,b)
   fun compose(function(b2, g, c), function(a, f, b1)) =
      if composable(b1, b2) then
         function(a, (g o f), c)
         raise uncomposable
   fun apply(function(a, f, b), x) = f(x)

The first few lines are the most confusing; the rest is exactly what we’ve seen from our first go-round of defining the category of sets. In particular, we call the RedBlackSetFn functor given the ELT structure, and it produces an implementation of sets which we name Set (and superfluously ascribe the type ORD_SET for clarity).

Then we define the “elt” type which is used to describe an arrow, the object type as the main type of an ORD_SET (see in this documentation that this is the only new type available in that signature), and the arrow type as we did earlier.

We can then define the category of sets with a given type as follows

structure IntSets = MakeSetCategory(structure ELT = OrderedInt)

The main drawback of this approach is that there’s so much upkeep! Every time we want to make a new kind of category, we need to define a new functor, and every time we want to make a new kind of category of sets, we need to implement a new kind of ORD_KEY. All of this signature and structure business can be quite confusing; it often seems like the compiler doesn’t want to cooperate when it should.

Nothing Too Shocking So Far

To be honest, we haven’t really done anything very interesting so far. We’ve seen a single definition (of a category), looked at a number of examples to gain intuition, and given a flavor of how these things will turn into code.

Before we finish, let’s review the pros and cons of a computational representation of category-theoretical concepts.


  • We can prove results by explicit construction (more on this next time).
  • Different-looking categories are structurally similar in code.
  • We can faithfully represent the idea of using (objects and morphisms of) categories as parameters to construct other categories.
  • Writing things in code gives a fuller understanding of how they work.


  • All computations are finite, requiring us to think much harder than a mathematician about the definition of an object.
  • The type system is too weak. we can’t enforce the axioms of a category directly or even ensure the functions act sensibly at all. As such, the programmer becomes responsible for any failure that occur from bad definitions.
  • The type system is too strong. when we work concretely we are forced to work within homogeneous categories (e.g. of ‘a Set).
  • We cannot ensure the ability to check equality on objects. This showed up in our example of diagram categories.
  • The functions used in defining morphisms, e.g. in the category of sets, are somewhat removed from real set-functions. For example, nothing about category theory requires functions to be computable. Moreover, nothing about our implementation requires the functions to have any outputs at all (they may loop infinitely!). Moreover, it is not possible to ensure that any given function terminates on a given input set (this is the Halting problem).

This list looks quite imbalanced, but one might argue that the cons are relatively minor compared to the pros. In particular (and this is what this author hopes to be the case), being able to explicitly construct proofs to theorems in category theory will give one a much deeper understanding both of category theory and of programming. This is the ultimate prize.

Next time we will begin our investigation of universal properties (where the real fun starts), and we’ll use the programmatic constructions we laid out in this post to give constructive proofs that various objects are universal.

Until then!

A Sample of Standard ML, the TreeSort Algorithm, and Monoids

In this post we will assume the reader has a passing familiarity with some of the basic concepts of functional programming (the map, fold, and filter functions). We introduce these topics in our Racket primer, but the average reader will understand the majority of this primer without expertise in functional programming.

Follow-ups to this post can be found in the Computational Category Theory section of the Main Content page.

Preface: ML for Category Theory

A few of my readers have been asking for more posts about functional languages and algorithms written in functional languages. While I do have a personal appreciation for the Haskell programming language (and I plan to do a separate primer for it), I have wanted to explore category theory within the context of programming for quite a while now. From what I can tell, ML is a better choice than Haskell for this.

Part of the reason is that, while many Haskell enthusiasts claim it to be a direct implementation of category theory, Haskell actually tweaks category theoretic concepts in certain ways. I rest assured that the designers of Haskell (who are by assumption infinitely better at everything than I am) have very good reasons for doing this. But rather than sort through the details of the Haskell language specification to clarify the finer details, we would learn a lot more by implementing category theory by hand in a programming language that doesn’t have such concepts already.

And so we turn to ML.

ML, which stands for MetaLanguage, apparently has historical precedents for being a first in many things. One of these is an explicit recognition of parametric polymorphism, which is the idea that an operation can have the same functionality regardless of the types of the data involved; the types can, in effect, be considered variables. Another ground-breaking aspect of ML is an explicit type inference system. Similar to Haskell, an ML program will not run unless the compiler can directly prove that the program produces the correct types in every step of the computation.

Both of these features are benefits for the student of category theory. Most of our time in category theory will be spent working with very general assumptions on the capabilities of our data involved, and parametric polymorphism will be our main tool for describing what these assumptions are and for laying out function signatures.

As a side note, I’ve noticed through my ever-growing teaching experiences that one of the main things new programming students struggle with (specifically, after mastering the syntax and semantics of basic language constructs) is keeping their types straight. This is especially prominent in a language like Python (which is what I teach), where duck-typing is so convenient that it lulls the students into a false sense of security. Sooner as opposed to later they’ll add strings to numbers with the blind confidence that Python will simply get it. Around this time in their first semester of programming, I would estimate that type errors lie at the heart of 75% of the bugs my students face and fail to resolve before asking me for help. So one benefit of programming in ML for pedagogy is that it is literally impossible to make type errors. The second you try to run a program with bad types, the compiler points out what the expected type is and what the given (incorrect) type was. It takes a while to get used to type variables (and appeasing the type checker when you want to play fast and loose). But once you do you’ll find the only bugs that remain in your code are conceptual ones, which are of course much more rewarding and important bugs to fix.

So enough of this preamble. Let’s learn some ML!

Warming Up: Basic Arithmetic and Conditionals

We’ll be working with the Standard ML of New Jersey compiler, which you can download for free at their website. The file extension for ML files is .sml.

As one would expect, ML has variables and arithmetic which work in much the same way as other languages. Each variable declaration is prefixed by the word “val,” as below

val x = 7;
val y = 2;

This statements modify the global environment (the list of which variable names are associated to which values). Semicolons are required to terminate variable declarations at the global level. We can declare multiple variables in a single line using the “and” keyword

val x = 7 and y = 2;

As a precaution, “and” is only used in ML for syntactic conjunctions of variable/function declarations, and is only necessary when the two defined variable names are mutually defined in terms of each other (this can happen naturally for recursive function definitions). We will see in a moment that the logical and operation is denoted “andalso.”

We can also use pattern matching to bind these two variables in one line, much the same way as it might work in Python:

val (x,y) = (7,2);

We note that while ML does not require us to specify the type of a variable, the type is known and ever present under the surface. If we run the above code through the sml compiler (which after running the contents of a file opens a REPL to further evaluate commands), we see the following output

[opening vars.sml]
val x = 7 : int
val n = 2 : int

The environment is printed out to the user, and it displays that the two types are “int.”

Arithmetic is defined for integers, and the standard ones we will use are the expected +, -, *, and (not a slash, but) div. Here are some examples, and here is a list of all basic operations on ints. A few things to note: the unary negation operator is a tilde (~), and the semicolons are only used terminate statements in the REPL, which tells the compiler we’re ready for it to evaluate our code. Semicolons can also be used to place multiple statements on a single line of code. The “it” variable is a REPL construct which saves the most recent unbound expression evaluation.

- 3 + 6;
val it = 9 : int
- 6 div 3;      
val it = 2 : int
- 2 * 9;
val it = 18 : int
- 2 - 9;
val it = ~7 : int
- ~9;
val it = ~9 : int

ML also has floating point arithmetic (in ML this type is called “real”), but treats it in a very prudish manner. Specifically (and this is a taste of the type checker doing its job too well), ML does not coerce types for you. If you want to multiply a real number and an integer, you have to first convert the int to a real and then multiply. An error will occur if you do not:

- val x = 4.0;
val x = 4.0 : real
- val y = 7;
val y = 7 : int
- x * y;
stdIn:5.1-5.6 Error: operator and operand don't agree [tycon mismatch]
  operator domain: real * real
  operand:         real * int
  in expression:
    x * y
- x * Real.fromInt(y);
val it = 28.0 : real

Here is a list of all operations on reals. We don’t anticipate using reals much, but it’s good to know that ML fervently separates them.

It seems odd that we’re talking so much about statements, because often enough we will be either binding function names (and tests) to the global environment, or restricting ourselves to local variable declarations. The latter has a slightly more complicated syntax, simply surrounding your variable declarations and evaluated code in a “let … in … end” expression. This will be a much more common construction for us.

   val x = 7
   val y = 9
   (x + 2*y) div 3

The “in” expression is run with the combined variables from the ambient environment (the variables declared outside of the let) and those defined inside the let. The variables defined in the let leave scope after the “in” expression is evaluated, and the entire let expression as a whole evaluates to the result of evaluating the “in” expression. Clearly and example shows what is going on much more directly than words.

The last major basic expression form are the boolean expressions and operations. The type for booleans in ML is called “bool,” and the two possible values are “true,” and “false.” They have the usual unary and binary operations, but the names are a bit weird. Binary conjunction is called “andalso,” while binary disjunction is called “orelse.”

val a = true and b = false;
val c = (a andalso b) orelse ((not a) andalso (not b));

But aside from that, boolean expressions work largely as one would expect. There are the six standard numerical comparison functions, where testing for equality is given by a single equals sign (in most languages, comparison for equality is ==), and inequality is given by the diamond operator <>. The others are, as usual, <, <=, >, >=.

- 6 = 7;
val it = false : bool
- 6 = 6;
val it = true : bool
- 6 < 7;
val it = true : bool
- 7 <= 6;
val it = false : bool
- 6 <> 7;
val it = true : bool

ML also has the standard if statement, which has the following syntax, which is more or less the same as most languages:

- val x = if 6 < 7 then ~1 else 4;
val x = ~1 : int

ML gives the programmer more or less complete freedom with whitespace, so any of these expressions can be spread out across multiple lines if the writer desires.

val x = if 6 < 7

This can sometimes be helpful when defining things inside of let expressions inside of function definitions (inside of other function definitions, inside of …).

So far the basics are easy, if perhaps syntactically different from most other languages we’re familiar with. So let’s move on to the true heart of ML and all functional programming languages: functions.

Functions and cases, recursion

Now that we have basic types and conditions, we can start to define some simple functions. In the global environment, functions are defined the same way as values, using the word “fun” in the place of “val.” For instance, here is a function that adds 3 to a number.

fun add3(x) = x+3

The left hand side is the function signature, and the right hand side is the body expression. As in Racket, and distinct from most imperative languages, a function evaluates to whatever the body expression evaluates to. Calling functions has two possible syntaxes:

add3 5
(add3 5)

In other words, if the function application is unambiguous, parentheses aren’t required. Otherwise, one can specify precedence using parentheses in either Racket (Lisp) style or in standard mathematical style.

The most significant difference between ML and most other programming languages, is that ML’s functions have case-checking. That is, we can specify what action is to be taken based on the argument, and these actions are completely disjoint in the function definition (no if statements are needed).

For instance, we could define an add3 function which nefariously does the wrong thing when the user inputs 7.

fun add3(7) = 2
  | add3(x) = x+3

The vertical bar is read “or,” and the idea is that the possible cases for the function definition must be written in most-specific to least-specific order. For example, interchanging the orders of the add3 function cases gives the following error:

- fun add3(x) = x+3 
=   | add3(7) = 2;
stdIn:13.5-14.16 Error: match redundant
          x => ...
    -->   7 => ...

Functions can call themselves recursively, and this is the main way to implement loops in ML. For instance (and this is quite an inefficient example), I could define a function to check whether a number is even as follows.

fun even(0) = true 
  | even(n) = not(even(n-1))

Don’t cringe too visibly; we will see recursion used in less horrifying ways in a moment.

Functions with multiple arguments are similarly easy, but there are two semantic possibilities for how to define the arguments. The first, and simplest is what we would expect from a typical language: put commas in between the arguments.

fun add(x,y) = x+y

When one calls the add function, one is forced to supply both arguments immediately. This is usually how programs are written, but often times it can be convenient to only supply one argument, and defer the second argument until later.

If this sounds like black magic, you can thank mathematicians for it. The technique is called currying, and the idea stems from the lambda calculus, in which we can model all computation using just functions (with a single argument) as objects, and function application. Numbers, arithmetic, lists, all of these things are modeled in terms of functions and function calls; the amazing thing is that everything can be done with just these two tools. If readers are interested, we could do a post or two on the lambda calculus to see exactly how these techniques work; the fun part would be that we can actually write programs to prove the theorems.

Function currying is built-in to Standard ML, and to get it requires a minor change in syntax. Here is the add function rewritten in a curried style.

fun add(x)(y) = x+y

Now we can, for instance, define the add3 function in terms of add as follows:

val add3 = add(3)

And we can curry the second argument by defining a new function which defers the first argument appropriately.

fun add6(x) = add(x)(6)

Of course, in this example addition is commutative so which argument you pick is useless.

We should also note that we can define anonymous functions as values (for instance, in a let expression) using this syntax:

val f = (fn x => x+3)

The “fn x => x+3” is just like a “lambda x: x+3” expression in Python, or a “(lambda (x) (+ x 3))” in Racket. Note that one can also define functions using the “fun” syntax in a let expression, so this is truly only for use-once function arguments.

Tuples, Lists, and Types

As we’ve discovered already, ML figures out the types of our expressions for us. That is, if we define the function “add” as above (in the REPL),

- fun add(x,y) = x+y;
val add = fn : int * int -> int

then ML is smart enough to know that “add” is to accept a list of two ints (we’ll get to what the asterisk means in a moment) and returns an int.

The curried version is similarly intuited:

- fun add(x)(y) = x+y;
val add = fn : int -> int -> int

The parentheses are implied here: int -> (int -> int). So that this is a function which accepts as input an int, and produces as output another function which accepts an int and returns an int.

But, what if we’d like to use “add” to add real numbers? ML simply won’t allow us to; it will complain that we’re providing the wrong types. In order to make things work, we can tell ML that the arguments are reals, and it will deduce that “+” here means addition on reals and not addition on ints. This is one awkward thing about ML; while the compiler is usually able to determine the most general possible type for our functions, it has no general type for elements of a field, and instead defaults to int whenever it encounters arithmetic operations. In any case, this is how to force a function to have a type signature involving reals:

- fun add(x:real, y:real) = x+y;
val add = fn : real * real -> real

If we’re going to talk about types, we need to know all of ML’s syntax for its types. Of course there are the basics (int, real, bool). Then there are function types: int -> int is the type of a function which accepts one int and returns an int.

We’ll see two new types in this section, and the first is the tuple. In ML, tuples are heterogeneous, so we can mix types in them. For instance,

- val tup = (true, 7, 4.4);
val tup = (true,7,4.4) : bool * int * real

Here the asterisk denotes the tuple type, and one familiar with set theory can think of a tuple as an element of the product of sets, in this case

$ \displaystyle \left \{ \textup{true}, \textup{false} \right \} \times \mathbb{Z} \times \mathbb{R}$

Indeed, there is a distinct type for each possible kind of tuple. A tuple of three ints (int * int * int) is a distint type from a tuple of three booleans (bool * bool * bool). When we define a function that has multiple arguments using the comma syntax, we are really defining a function which accepts as input a single argument which is a tuple. This parallels exactly how functions on multiple arguments work in classical mathematics.

The second kind of compound type we’ll use quite often is the list. Lists are distinct from tuples in ML in that lists must be homogenous. So a list of integers (which has type “int list”) is different from a list of booleans.

The operations on lists are almost identical as in Haskell. To construct explicit lists use square brackets with comma-delimited elements. To construct them one piece at a time, use the :: list constructor operation. For those readers who haven’t had much experience with lists in functional programming: all lists are linked lists, and the :: operation is the operation of appending a single value to the beginning of a given list. Here are some examples.

- val L = [1,2,3];
val L = [1,2,3] : int list

- val L = 1::(2::(3::nil));
val L = [1,2,3] : int list

The “nil” expression is the empty list, as is the empty-square-brackets expression “[]”.

There is a third kind of compound type called the record, and it is analogous to a C struct, where each field is named. We will mention this in the future once we have a need for it.

The most important thing about lists and tuples in ML is that functions which operate on them don’t always have an obvious type signature. For instance, here is a function which takes in a tuple of two elements and returns the first element.

fun first(x,y) = x

What should the type of this function be? We want it to be able to work with any tuple, no matter the type. As it turns out, ML was one of the first programming language to allow this sort of construction to work. The formal name is parametric polymorphism. It means that a function can operate on many different kinds of types (because the actual computation is the same regardless of the types involved) and the full type signature can be deduced for each function call based on the parameters for that particular call.

The other kind of polymorphism is called ad-hoc polymorphism. Essentially this means that multiple (very different) operations can have the same name. For instance, addition of integers and addition of floating point numbers require two very different sets of instructions, but they both go under the name of +.

What ML does to make its type system understand parametric polymorphism is introduce so-called type variables. A type variable is any string prefixed by a single quote, e.g. ‘a, and they can represent any type. When ML encounters a function with an ambiguous type signature, it decides what the most general possible type is (which usually involves a lot of type variables), and then uses that type.

So to complete our example, the first function has the type

'a * 'b -> 'a

As a side note, the “first” function is in a sense the “canonical” operation that has this type signature. If nothing is known about the types, then no other action can happen besides the projection. There are some more interesting things to be said about such canonical operations (for instance, could we get away with not having to even define them at all).

The analogous version for lists is as follows. Note that in order to decompose a list into its first element and the tail list, we need to use pattern matching.

fun listFirst([x]) = x
  | listFirst(head::tail) = head

And this function has the type signature ‘a list -> ‘a. As a slightly more complicated example (where we need recursion), we can write a function to test for list membership.

fun member(x, nil) = false
  | member(x, (head::tail)) = if x = head then true
                              else member(x, tail)

If you run this program and see some interesting warning messages, see this StackOverflow question for a clarification.

Defining New Types

The simplest way to define a new type is to just enumerate all possibilities. For instance, here is an enumerated datatype with four possibilities.

datatype maths = algebra | analysis | logic | computation

Then we can define functions which operate on those types using pattern matching.

fun awesome(algebra) = true 
  | awesome(analysis) = false
  | awesome(logic) = false 
  | awesome(computation) = true

And this function has type maths -> bool (don’t take it too seriously :-)). We can also define data types whose constructors require arguments.

datatype language = functional of string*bool 
                  | imperative of string
                  | other

Here we define a language to be functional or imperative. The functional type consists of a name and a boolean representing whether it is purely functional, while the imperative type just consists of a name. We can then construct these types by treating the type constructors as if they were functions.

val haskell = functional("Haskell", true) and
    java = imperative("Java") and
    prolog = other;

Perhaps more useful than this is to define types using type variables. A running example we will use for the remainder of this post is a binary tree of homogeneous elements at each node. Defining such types is easy: all one needs to do is place the appropriate type (with parentheses to clarify when the description of a type starts or ends) after the “of” keyword.

datatype 'a Tree = empty 
                 | leaf of 'a 
                 | node of (('a Tree) * 'a * ('a Tree))

We can create instances of an integer tree as expected:

val t2 = node(node(leaf(2), 3, leaf(4)), 6, leaf(8))

The TreeSort Algorithm

We can define a host of useful operations on trees (of any type). For instance, below we compute the breadth of a tree (the total number of leaves), the depth of a tree (the maximal length of a path from the root to a leaf), and the ability to flatten a tree (traverse the tree in order and place all of the values into a list). These first two are a nice warm-up.

fun breadth(empty) = 0
  | breadth(leaf(_)) = 1
  | breadth(node(left, _, right)) = breadth(left) + breadth(right)

Here the underscore indicates a pattern match with a variable we won’t actually use. This is more space efficient for the compiler; it can avoid adding extra values to the current environment in a potentially deep recursion.

fun depth(empty) = 0
  | depth(leaf(_)) = 1
  | depth(node(left, _, right)) =
         val (lDepth, rDepth) = (1 + depth(left), 1 + depth(right))
         if lDepth > rDepth then lDepth else rDepth

This function should be self explanatory.

fun flatten(empty) = []
   | flatten(leaf(x)) = [x]
   | flatten(node(left, x, right)) = 
               flatten(left) @ (x :: flatten(right))

Here the @ symbol is list concatenation. This is not quite the most efficient way to do this (we are going to write a forthcoming post about tail-call optimization, and there we will see why), but it is certainly the clearest. In the final recursive call, we traverse the left subtree first, flattening it into a list in order. Then we flatten the right hand side, and put the current node’s element in between the two flattened parts.

Note that if our tree is ordered, then flatten will produce a strictly increasing list of the elements in the tree. For those readers unfamiliar with ordered binary trees, for all intents and purposes this is an “int Tree” and we can compare the values at different nodes. Then an ordered binary tree is a tree which satisfies the following property for each node: all of the values in the left child’s subtree are strictly smaller than the current node’s value, and all of the values in the right child’s subtree are greater than or equal to the current node’s value.

Indeed, we can use the flatten function as part of a simple algorithm to sort lists of numbers. First we insert the numbers in the unsorted list into a tree (in a way that preserves the ordered property at each step), and then we flatten the tree into a sorted list. This algorithm is called TreeSort, and the insert function is simple as well.

fun insert(x, empty) = leaf(x)
  | insert(y, leaf(x)) = if x <= y 
                         then node(empty, x, leaf(y)) 
                         else node(leaf(y), x, empty)
  | insert(y, node(left, x, right)) = 
                if x <= y 
                then node(left, x, insert(y, right))
                else node(insert(y, left), x, right)

If we’re at a nonempty node, then we just recursively insert into the appropriate subtree (taking care to create new interior nodes if we’re at a leaf). Note that we do not do any additional computation to ensure that the tree is balanced (that each node has approximately as many children in its left subtree as in its right). Doing so would digress from the point of this primer, but rest assured that the problem of keeping trees balanced has long been solved.

Now the process of calling insert on every element of a list is just a simple application of fold. ML does have the standard map, foldl, foldr, and filter functions, although apparently filter is not available in the standard library (one needs to reference the List module via List.filter).

In any case, foldl is written in the currying style and building the tree is a simple application of it. As we said, the full sorting algorithm is just the result of flattening the resulting tree with our in-order traversal.

fun sort(L) = flatten(foldl(insert)(empty)(L))

So there you have it! One of the simplest (efficient) sorting algorithms I can think of in about twelve lines of code.

Free Monoid Homomorphisms: A More Advanced Example

Just to get a quick taste of what our series on category theory will entail, let’s write a program with a slightly more complicated type signature. The idea hinges on the idea that lists form a what’s called a free monoid. In particular,

Definition: monoid is a set $ X$ equipped with an associative binary operation $ \cdot: X \times X \to X$ and an identity element $ 1 \in X$ for which $ x1 = 1x = x$ for all $ x \in X$.

Those readers who have been following our series on group theory will recognize a monoid as a group with less structure (there is no guarantee of inverses for the $ \cdot$ operation). The salient fact for this example is that the set of ML values of type $ \textup{‘a list}$ forms a monoid. The operation is list concatenation, and the identity element is the empty list. Call the empty list nil and the append operation @, as it is in ML.

More than that, $ \textup{‘a list}$ forms a free monoid, and the idea of freeness has multiple ways of realization. One sort of elementary way to understand freeness is that the only way to use the binary operation to get to the identity is to have the two summands both be the identity element. In terms of lists, the only way to concatenate two lists to get the empty list is to have both pieces be empty to begin with.

Another, more mature (more category-theoretical) way to think about freeness is to say is satisfies the following “universal” property. Call $ A$ the set of values of type ‘a, and $ [A]$ the set of values of type $ \textup{‘a list}$, and suppose that $ (B, \cdot_B, 1_B)$ is the datum of an arbitrary monoid. The universal property says that if we are given a function $ f: A \to B$, and we take the canonical map $ g: A \to [A]$ which maps an element $ a \in A$ to the single-entry list $ [a] \in [A]$, then there is a unique way to extend $ f$ to a monoid homomorphism $ f^* : [A] \to B$ on lists, so that $ f^*g = f$. We have mentioned monoid homomorphisms on this blog before in the context of string metrics, but briefly a monoid homomorphism respects the monoid structure in the sense that (for this example) $ f^*(a \textup{ @ } b) = f^*(a) \cdot_B f^*(b)$ no matter what $ a, b$ are.

This was quite a mouthful, but the data is often written in terms of a so-called “commutative diagram,” whose general definition we will defer to a future post. The diagram for this example looks like:


The dashed line says we are asserting the existence of $ f^*$, and the symbol $ \exists !$ says this function exists and is uniquely determined by $ f, g$. The diagram “commutes” in the sense that traveling from $ A$ to $ B$ along $ f$ gives you the same computational result as traveling by $ g$ and then $ f^*$. The reason for the word “universal” will become clear in future posts, but vaguely it’s because the set $ [A]$ is a unique “starting place” in a special category.

If this talk is too mysterious, we can just go ahead and prove that $ f^*$ exists by writing a program that computes the function transforming $ f \mapsto f^*$. We call the function “listMonoidLift” because it “lifts” the function $ f$ from just operating on $ A$ to the status of a monoid homomorphism. Very regal, indeed.

Part of the beauty of this function is that a number of different list operations (list length, list sum, member tests, map, etc.), when viewed under this lens, all become special cases of this theorem! By thinking about things in terms of monoids, we write less code, and more importantly we recognize that these functions all have the same structural signature. Perhaps one can think of it like parametric polymorphism on steroids.

fun listMonoidLift(f:('a->'b), (combine:(('b * 'b) -> 'b), id:'b)) =
      fun f'(nil) = id
        | f'(head::tail) = combine(f(head), f'(tail))

Here we specified the types of the input arguments to be completely clear what’s what. The first argument is our function $ f$ as in the above diagram, and the second two arguments together form the data of our monoid (the set $ B$ is implicitly the collection of types $ ‘b$ determined at the time of the function call). Now let’s see how the list summation and list length functions can be written in terms of the listMonoidLift function.

fun plus(x, y) = x + y
val sum = listMonoidLift((fn x => x), (plus, 0))
val length = listMonoidLift((fn x => 1), (plus, 0))

The plus function on integers with zero as the identity is the monoid $ B$ in both cases (and also happens to be $ A$ by coincidence), but in the summation case the function $ f$ is the identity and for length it is the constant $ 1$ function.

As a more interesting example, see how list membership is a lift.

fun member(x) = listMonoidLift((fn y => y = x),
                      ((fn (a,b) => a orelse b), false))

Here the member function is curried; it has type ‘a -> ‘a list -> bool (though it’s a bit convoluted since listMonoidLift is what’s returning the ‘a list -> bool part of the type signature). Here the $ B$ monoid is the monoid of boolean values, where the operation is logical “or” and the identity is false. It is a coincidence and a simple exercise to prove that $ B$ is a free monoid as well.

Now the mapping $ f(y)$ is the test to see if $ y$ is the same object as $ x$. The lift to the list monoid will compute the logical “or” of all evaluations of $ f$ on the values.

Indeed, (although this author hates bringing up too many buzzwords where they aren’t entirely welcome) the monoid lifting operation we’ve just described is closely related to the MapReduce framework (without all the distributed computing parts). Part of the benefit of MapReduce is that the programmer need only define the Map() and Reduce() functions (the heart of the computation) and MapReduce does the rest. What this example shows is that defining the Map() function can be even simpler: one only needs define the function $ f$, and Map() is computed as $ f^*$ automatically. The Reduce() part is simply the definition of the target monoid $ B$.

Just to drive this point home, we give the reader a special exercise: write map as a special case of listMonoidLift. The result (the map function) should have one of the two type signatures:

map : ('a -> 'b) * ('a list) -> 'b list
map : 'a -> 'b -> 'a list -> b' list

As a hint, the target monoid should also be a list monoid.

Part of why this author is hesitant to bring up contemporary software in discussing these ideas is because the ideas themselves are far from contemporary. Burstall and Landin’s, 1969 text Programs and Their Proofs (which this author would love to find a copy of) details this exact reduction and other versions of this idea in a more specific kind of structure called “free algebras.” So MapReduce (minus the computer programs and distributed systems) was a well-thought-out idea long before the internet or massively distributed systems were commonplace.

In any case, we’ll start the next post in this series right off with the mathematical concept of a category. We’ll start slow and detail many examples of categories that show up both in (elementary) mathematics and computing, and then move on to universal properties, functors, natural transformations, and more, implementing the ideas all along the way.

Until then!