# Rings — A Second Primer

Last time we defined and gave some examples of rings. Recapping, a ring is a special kind of group with an additional multiplication operation that “plays nicely” with addition. The important thing to remember is that a ring is intended to remind us arithmetic with integers (though not too much: multiplication in a ring need not be commutative). We proved some basic properties, like zero being unique and negation being well-behaved. We gave a quick definition of an integral domain as a place where the only way to multiply two things to get zero is when one of the multiplicands was already zero, and of a Euclidean domain where we can perform nice algorithms like the one for computing the greatest common divisor. Finally, we saw a very important example of the ring of polynomials.

In this post we’ll take a small step upward from looking at low-level features of rings, and start considering how general rings relate to each other. The reader familiar with this blog will notice many similarities between this post and our post on group theory. Indeed, the definitions here will be “motivated” by an attempt to replicate the same kinds of structures we found helpful in group theory (subgroups, homomorphisms, quotients, and kernels). And because rings are also abelian groups, we will play fast and loose with a number of the proofs here by relying on facts we proved in our two-post series on group theory. The ideas assume a decidedly distinct flavor here (mostly in ideals), and in future posts we will see how this affects the computational aspects in more detail.

## Homomorphisms and Sub-structures

The first question we should ask about rings is: what should mappings between rings look like? For groups they were set-functions which preserved the group operation ($f(xy) = f(x)f(y)$ for all $x,y$). The idea of preserving algebraic structure translates to rings, but as rings have two operations we must preserve both.

Definition: Let $(R, +_R, \cdot_R), (S, +_S, \cdot_S)$ be rings, and let $f: R \to S$ be a function on the underlying sets. We call $f$ a ring homomorphism if $f$ is a group homomorphism of the underlying abelian groups, and for all $x,y \in R$ we have $f(x \cdot_R y) = f(x) \cdot_S f(y)$. A bijective ring homomorphism is called an isomorphism.

Indeed, we have the usual properties of ring homomorphisms that we would expect. All ring homomorphisms preserve the additive and multiplicative identities, multiplicative inverses (when they exist), and things like zero-divisors. We leave these verifications to the reader.

Ring homomorphisms have the same kinds of constructions as group homomorphisms did. One of particular importance is the kernel $\textup{ker} f$, which is the preimage of zero under $f$, or equivalently the set $\left \{ x : f(x) = 0 \right \}$. This is the same definition as for group homomorphisms, linear maps, etc.

The second question we want to ask about rings is: what is the appropriate notion of a sub-structure for rings? One possibility is obvious.

Definition: A subring $S$ of a ring $R$ is a subset $S \subset R$ which is also a ring under the same operations as those of $R$ (and with the same identities).

Unfortunately subrings do not have the properties we want them to have, and this is mostly because of the requirement that our rings have a multiplicative identity. For instance, we want to say that the kernel of a ring homomorphism $f: R \to S$ is a subring of $R$. This will obviously not be the case, because the multiplicative identity never maps to zero under a ring homomorphism! We also want an appropriate notion of a quotient ring, but if we quotient out (“declare to be zero”) a subring, then the identity will become zero, which yields all sorts of contradictions in all but the most trivial of rings. One nice thing, however, is that the image of a ring homomorphism $R \to S$ is a subring of $S$. It is a trivial exercise to prove.

Rather than modify our definition of a kernel or a quotient to make it work with the existing definition of a subring, we pick a different choice of an appropriate sub-structure: the ideal.

## The Study of Ideals

From an elementary perspective, an ideal is easiest understood as a generalization of even numbers. Even integers have this nice property that multiplying any integer by an even integer gives you an even integer. This is a kind of closure property that mathematicians really love to think about, because they tend to lead to further interesting properties and constructions. Indeed, we can generalize it as follows:

Definition: Let $R$ be a commutative ring, and $I \subset R$. We call $I$ an ideal if two conditions are satisfied:

1. $I$ is a subgroup of $R$ under addition.
2. For all $r \in R$ and for all $x \in I$, we have $rx \in I$.

That is, an ideal is a subgroup closed under multiplication. The even integers as a subset of $\mathbb{Z}$ give a nice example, as does the set $\left \{ 0 \right \}$. In fact, every ideal contains zero by being a subgroup. A slightly more complicated example is the set of polynomials divisible by $(x-1)$ as a subset of the polynomial ring $\mathbb{R}[x]$.

We have to be a little careful here if our ring is not commutative. Technically this definition above is for a left-ideal, which is closed under left-multiplication. You can also define right-ideals closed under right-multiplication, and the official name for a plain old “ideal” is a two-sided ideal. The only time we will ever work with noncommutative rings (that we can envision) is with matrix rings, so excluding that case our ideals will forevermore be two-sided. Often times we will use the notation $rI$ to denote the set of all possible left multiplications $\left \{ rx : x \in I \right \}$, and so the ideal condition is just $rI = I$. This is a multiplicative kind of coset, although as we are about to see we will use both additive and multiplicative cosets in talking about rings.

Let’s look at some properties of ideals that show how neatly this definition encapsulates what we want. First,

Proposition: The kernel of a ring homomorphism is an ideal.

Proof. Let $f: R \to S$ be a ring homomorphism, and let $I = \textup{ker}(f)$. We already know that $I$ is a subgroup of $R$ under addition because kernels of group homomorphisms are subgroups. To show the ideal condition, simply take $r \in R, x \in I$ and note that $f(rx) = f(r)f(x) = f(r)0 = 0$, and so $rx \in \textup{ker}(f)$. $\square$

In fact the correspondence is one-to-one: every ideal is the kernel of a ring homomorphism and every ring homomorphism’s kernel is an ideal. This is not surprising as it was the case for groups, and the story starts here with quotients, too.

Definition: Let $R$ be a ring and $I$ an ideal of $R$. The quotient group $R/I$ forms a ring called the quotient ring, and is still denoted by $R / I$.

To show this definition makes any sense requires some thought: what are the operations of this new ring? Are they well-defined on coset representatives?

For the first, we already know the addition operation because it is the same operation when we view $R / I$ as a quotient group; that is, $(x + I) + (y + I) = (x+y) + I$. The additive identity is just $0 + I = I$. The multiplication operation is similar: $(x + I)(y + I) = xy + I$. And the multiplicative identity is clearly $1 + I$.

The fact that multiplication works as we said it does above gives more motivation for the definition of an ideal. To prove it, pick any representatives $x + z_1 \in x + I$ and $y + z_2 \in y + I$. Their product is

$(xy + xz_2 + yz_1 + z_1z_2) \in xy + xI + yI + I^2$

Where we denote by $I^2 = \left \{ xy : x \in I, y \in I \right \}$. The condition for an ideal to be an ideal is precisely that the weird parts, the $xI + yI + I^2$, become just $I$. And indeed, $I$ is an additive group, so sums of things in $I$ are in $I$, and moreover $I$ is closed under multiplication by arbitrary elements of $R$. More rigorously, $xy + xz_2 + yz_1 + z_1z_2$ is equivalent to $xy$ under the coset equivalence relation because their difference $xz_2 + yz_1 + z_1z_2$ is a member of $I$.

Now we can realize that every ideal is the kernel of some homomorphism. Indeed, if $I$ is an ideal of $R$, then there is a natural map $\pi : R \to R/I$ (called the canonical projection, see our post on universal properties for more) defined by sending an element to its coset: $\pi(x) = x+ I$. It should be obvious to the reader that the kernel of $\pi$ is exactly $I$ (because $x + i = 0 + I$ if and only if $x \in I$; this is a fact we borrow from groups). And so there is a correspondence between kernels and ideals. They are really the same concept manifested in two different ways.

Because we have quotient rings and ring homomorphisms, we actually get a number of “theorems” for free. These are the isomorphism theorems for rings, the most important one being the analogue of the First Isomorphism Theorem for groups. That is, if $f: R \to S$ is a ring homomorphism, $\textup{im}(f)$ is a subring of $S$, and moreover $R/ \textup{ker}(f) \cong \textup{im}(f)$. We invite the reader to prove it by hand (start by defining a map $\varphi(x + I) = f(x)$ and prove it’s a ring isomorphism). There are some other isomorphism theorems, but they’re not particularly deep; rather, they’re just commonly-applied corollaries of the first isomorphism theorem. In light of this blog’s discussions on category theory, the isomorphism theorems are a trivial consequence of the fact that rings have quotients, and that $\textup{im}(f)$ happens to be well-behaved.

## Noetherian Rings and Principal Ideal Domains

One cannot overestimate the importance of ideals as a fundamental concept in ring theory. They are literally the cornerstone of everything interesting in the subject, and especially the computational aspects of the field (more on that in future posts). So to study ideals, we make some basic definitions.

Definition: Let $A \subset R$ be any subset of the ring $R$. The ideal generated by $A$ is the smallest ideal of $R$ containing $A$, denoted $I(A)$. It is “smallest” in the sense that all ideals containing $A$ must also contain $I(A)$.

Indeed, we can realize $I(A)$ directly as the set of all possible finite linear combinations $r_1 a_1 + \dots r_k a_k$ where the $r_i \in R$ and $a_i \in A$. Such a linear combination is called an $R$-linear combination because the “coefficients” come from $R$. It is not hard to see that this is an ideal. It is obviously a subgroup since sums of linear combinations are also linear combinations, and negation distributes across the sum. It satisfies multiplicative closure because

$r(r_1a_1 + \dots + r_k a_k) = (rr_1)a_1 + \dots + (rr_k)a_k$

is another $R$-linear combination.

One convenient way to think of this ideal is as the intersection of all ideals $I$ which contain $A$. Since the intersection of any family of ideals is an ideal (check this!) this will always give us the smallest possible ideal containing $A$.

One important bit of notation is that if $I = I(A)$ is the ideal generated by a finite set $A$, then we write $I = \left \langle a_1, \dots, a_n \right \rangle$ where $A = \left \{ a_1, \dots, a_n \right \}$. We say that $I$ is finitely generated. If $A$ happens to be a singleton, we say that $I$ is a principal ideal. Following the notation of linear algebra, a minimal (by cardinality) generating set for an ideal is called a basis for the ideal.

Thinking about how ideals are generated is extremely important both in the pure theory of rings and in the computational algorithms that deal with them. It’s so important, in fact, that there are entire classes of rings defined by how their ideals behave. We give the two most important ones here.

Definition: A commutative ring is called Noetherian if all of its ideals are finitely generated. An integral domain is called a principal ideal domain if all of its ideals are principal.

The concept of a Noetherian ring is a particularly juicy one, and it was made famous by the founding mother of commutative ring theory, Emmy Noether. Without going into too much detail, just as an integral domain is the most faithful abstraction of the ring of integers, a Noetherian ring is the best way to think about polynomial rings (and quotients of polynomial rings). This is highlighted by a few basic facts and some deep theorems:

Fact: If $R$ is a Noetherian ring and $I$ an ideal, then $R/I$ is Noetherian.

This follows from the correspondence of ideals between $R$ and $R/I$. Indeed, for every ideal $J \subset R/I$ there is an ideal $\pi^{-1}(J)$ of $R$ which contains $I$. Moreover, this correspondence is a bijection. So if we want a generating set for $J$, we can lift it up to $R$ via $\pi$, get a finite generating set for $\pi^{-1}(J)$, and project the generators back down to $R/I$. Some of the generators will be killed off (sent to $I$ by $\pi$), but what remains will be a valid generating set for $J$, and still finite.

Theorem (Hilbert Basis Theorem): If $R$ is a Noetherian ring, then the polynomial ring in one variable $R[x]$ is Noetherian. Inductively, $R[x_1, \dots, x_n]$ is also Noetherian.

These theorems start to lay the foundation for algebraic geometry, which connects ideals generated by a family of polynomials to the geometric solution set of those polynomials. Since a vast array of industrial problems can be reduced to solving systems of polynomial equations (take robot motion planning, for example), it would be quite convenient if we could write programs to reason about those systems of equations. Indeed, such algorithms do exist, and we make heavy use of theorems like the Hilbert Basis Theorem to ensure their correctness.

In the next post in this series, we’ll start this journey through elementary algebraic geometry by defining a variety, and establishing its relationship to ideals of polynomial rings. We’ll then work towards theorems like Hilbert’s Nullstellensatz, the computational operations we wish to perform on ideals, and the algorithms that carry out those operations. As usual, we’ll eventually see some applications and write some code.

Until then!

# Universal Properties

Previously in this series we’ve seen the definition of a category and a bunch of examples, basic properties of morphisms, and a first look at how to represent categories as types in ML. In this post we’ll expand these ideas and introduce the notion of a universal property. We’ll see examples from mathematics and write some programs which simultaneously prove certain objects have universal properties and construct the morphisms involved.

## A Grand Simple Thing

One might go so far as to call universal properties the most important concept in category theory. This should initially strike the reader as odd, because at first glance universal properties are so succinctly described that they don’t seem to be very interesting. In fact, there are only two universal properties and they are that of being initial and final.

Definition: An object $A$ in a category $\mathbf{C}$ is called initial if for every object $B$ there is a unique morphism $A \to B$. An object $Z$ is called final if for every object $B$ there is a unique morphism $B \to Z$. If an object satisfies either of these properties, it is called universal. If an object satisfies both, it is called a zero object.

In both cases, the existence of a unique morphism is the same as saying the relevant Hom set is a singleton (i.e., for initial objects $A$, the Hom set $\textup{Hom}_{\mathbf{C}}(A,B)$ consists of a single element). There is one and only one morphism between the two objects. In the particular case of $\textup{Hom}(A,A)$ when $A$ is initial (or final), the definition of a category says there must be at least one morphism, the identity, and the universal property says there is no other.

There’s only one way such a simple definition could find fruitful applications, and that is by cleverly picking categories. Before we get to constructing interesting categories with useful universal objects, let’s recognize some universal objects in categories we already know.

In $\mathbf{Set}$ the single element set is final, but not initial; there is only one set-function to a single-element set. It is important to note that the single-element set is far from unique. There are infinitely many (uncountably many!) singleton sets, but as we have already seen all one-element sets are isomorphic in $\mathbf{Set}$ (they all have the same cardinality). On the other hand, the empty set is initial, since the “empty function” is the only set-mapping from the empty set to any set. Here the initial object truly is unique, and not just up to isomorphism.

It turns out universal objects are always unique up to isomorphism, when they exist. Here is the official statement.

Proposition: If $A, A'$ are both initial in $\mathbf{C}$, then $A \cong A'$ are isomorphism. If $Z, Z'$ are both final, then $Z \cong Z'$.

Proof. Recall that a mophism $f: A \to B$ is an isomorphism if it has a two sided inverse, a $g:B \to A$ so that $gf = 1_A$ and $fg=1_B$ are the identities. Now if $A,A'$ are two initial objects there are unique morphisms $f : A \to A'$ and $g: A' \to A$. Moreover, these compose to be morphisms $gf: A \to A$. But since $A$ is initial, the only morphism $A \to A$ is the identity. The situation for $fg : A' \to A'$ is analogous, and so these morphisms are actually inverses of each other, and $A, A'$ are isomorphic. The proof for final objects is identical. $\square$

Let’s continue with examples. In the category of groups, the trivial group $\left \{ 1 \right \}$ is both initial and final, because group homomorphisms must preserve the identity element. Hence the trivial group is a zero object. Again, “the” trivial group is not unique, but unique up to isomorphism.

In the category of types with computable (halting) functions as morphisms, the null type is final. To be honest, this depends on how we determine whether two computable functions are “equal.” In this case, we only care about the set of inputs and outputs, and for the null type all computable functions have the same output: null.

Partial order categories are examples of categories which need not have universal objects. If the partial order is constructed from subsets of a set $X$, then the initial object is the empty set (by virtue of being a subset of every set), and $X$ as a subset of itself is obviously final. But there are other partial orders, such as inequality of integers, which have no “smallest” or “largest” objects. Partial order categories which have particularly nice properties (such as initial and final objects, but not quite exactly) are closely related to the concept of a domain in denotational semantics, and the language of universal properties is relevant to that discussion as well.

The place where universal properties really shine is in defining new constructions. For instance, the direct product of sets is defined by the fact that it satisfies a universal property. Such constructions abound in category theory, and they work via the ‘diagram categories’ we defined in our introductory post. Let’s investigate them now.

## Quotients

Let’s recall the classical definition from set theory of a quotient. We described special versions of quotients in the categories of groups and topological spaces, and we’ll see them all unified via the universal property of a quotient in a moment.

Definition: An equivalence relation on a set $X$ is a subset of the set product $\sim \subset X \times X$ which is reflexive, symmetric, and transitive. The quotient set $X / \sim$ is the set of equivalence classes on $X$. The canonical projection $\pi : X \to X/\sim$ is the map sending $x$ to its equivalence class under $\sim$.

The quotient set $X / \sim$ can also be described in terms of a special property: it is the “largest” set which agrees with the equivalence relation $\sim$. On one hand, it is the case that whenever $a \sim b$ in $X$ then $\pi(a) = \pi(b)$. Moreover, for any set $Y$ and any map $g: X \to Y$ which equates equivalent things ($g(a) = g(b)$ for all $a \sim b$), then there is a unique map $f : X/\sim \to Y$ such that $f \pi = g$. This word salad is best translated into a diagram.

Here we use a dashed line to assert the existence of a morphism (once we’ve proven such a morphism exists, we use a solid line instead), and the symbol $\exists !$ signifies existence ($\exists$) and uniqueness (!).

We can prove this explicitly in the category $\mathbf{Set}$. Indeed, if $g$ is any map such that $g(a) = g(b)$ for all equivalent $a,b \in X$, then we can define $f$ as follows: for any $a \in X$ whose equivalence class is denoted by $[a]$ in $X / \sim$, and define $f([a]) = g(a)$. This map is well defined because if $a \sim b$, then $f([a]) = g(a) = g(b) = f([b])$. It is unique because if $f \pi = g = h \pi$ for some other $h: X / \sim \to Y$, then $h([x]) = g(x) = f([x])$; this is the only possible definition.

Now the “official” way to state this universal property is as follows:

The quotient set $X / \sim$ is universal with respect to the property of mapping $X$ to a set so that equivalent elements have the same image.

But as we said earlier, there are only two kinds of universal properties: initial and final. Now this $X / \sim$ looks suspiciously like an initial object ($f$ is going from $X / \sim$, after all), but what exactly is the category we’re considering? The trick to dissecting this sentence is to notice that this is not a statement about just $X / \sim$, but of the morphism $\pi$.

That is, we’re considering a diagram category. In more detail: fix an object $X$ in $\mathbf{Set}$ and an equivalence relation $\sim$ on $X$. We define a category $\mathbf{Set}_{X,\sim}$ as follows. The objects in the category are morphisms $f:X \to Y$ such that $a \sim b$ in $X$ implies $f(a) = f(b)$ in $Y$. The morphisms in the category are commutative diagrams

Here $f_1, f_2$ need to be such that they send equivalent things to equal things (or they wouldn’t be objects in the category!), and by the commutativity of the diagram $f_2 = \varphi f_1$. Indeed, the statement about quotients is that $\pi : X \to X / \sim$ is an initial object in this category. In fact, we have already proved it! But note the abuse of language in our offset statement above: it’s not really $X / \sim$ that is the universal object, but $\pi$. Moreover, the statement itself doesn’t tell us what category to inspect, nor whether we care about initial or final objects in that category. Unfortunately this abuse of language is widespread in the mathematical world, and for arguably good reason. Once one gets acquainted with these kinds of constructions, reading between the limes becomes much easier and it would be a waste of time to spell it out. After all, once we understand $X / \sim$ there is no “obvious” choice for a map $X \to X / \sim$ except for the projection $\pi$. This is how $\pi$ got its name, the canonical projection.

Two last bits of terminology: if $\mathbf{C}$ is any category whose objects are sets (and hence, where equivalence relations make sense), we say that $\mathbf{C}$ has quotients if for every object $X$ there is a morphism $\pi$ satisfying the universal property of a quotient. Another way to state the universal property is to say that all maps respecting the equivalence structure factor through the quotient, in the sense that we get a diagram like the one above.

What is the benefit of viewing $X / \sim$ by its universal property? For one, the set $X / \sim$ is unique up to isomorphism. That is, if any other pair $(Z, g)$ satisfies the same property, we automatically get an isomorphism $X / \sim \to Z$. For instance, if $\sim$ is defined via a function $f : X \to Y$ (that is, $a \sim b$ if $f(a) = f(b)$), then the pair $(\textup{im}(f), f)$ satisfies the universal property of a quotient. This means that we can “decompose” any function into three pieces:

$\displaystyle X \to X / \sim \to \textup{im}(f) \to Y$

The first map is the canonical projection, the second is the isomorphism given by the universal property of the quotient, and the last is the inclusion map into $Y$. In a sense, all three of these maps are “canonical.” This isn’t so magical for set-maps, but the same statement (and essentially the same proof) holds for groups and topological spaces, and are revered as theorems. For groups, this is called The First Isomorphism Theorem, but it’s essentially the claim that the category of groups has quotients.

This is getting a bit abstract, so let’s see how the idea manifests itself as a program. In fact, it’s embarrassingly simple. Using our “simpler” ML definition of a category from last time, the constructive proof that quotient sets satisfy the universal property is simply a concrete version of the definition of $f$ we gave above. In code,

fun inducedMapFromQuotient(setMap(x, pi, q), setMap(x, g, y)) =
setMap(q, (fn a => g(representative(a))), y)

That is, once we have $\pi$ and $X / \sim$ defined for our given equivalence relation, this function accepts as input any morphism $g$ and produces the uniquely defined $f$ in the diagram above. Here the “representative” function just returns an arbitrary element of the given set, which we added to the abstract datatype for sets. If the set $X$ is empty, then all functions involved will raise an “empty” exception upon being called, which is perfectly fine. We leave the functions which explicitly construct the quotient set given $X, \sim$ as an exercise to the reader.

## Products and Coproducts

Just as the concept of a quotient set or quotient group can be generalized to a universal property, so can the notion of a product. Again we take our intuition from $\mathbf{Set}$. There the product of two sets $X,Y$ is the set of ordered pairs

$\displaystyle X \times Y = \left \{ (x,y) : x \in X, y \in Y \right \}$

But as with quotients, there’s much more going on and the key is in the morphisms. Specifically, there are two obvious choices for morphisms $X \times Y \to X$ and $X \times Y \to Y$. These are the two projections onto the components, namely $\pi_1(x,y) = x$ and $\pi_2(x,y) = y$. These projections are also called “canonical projections,” and they satisfy their own universal property.

The product of sets is universal with respect to the property of having two morphisms to its factors.

Indeed, this idea is so general that it can be formulated in any category, not just categories whose objects are sets. Let $X,Y$ be two fixed objects in a category $\mathbf{C}$. Should it exist, the product $X \times Y$ is defined to be a final object in the following diagram category. This category has as objects pairs of morphisms

and as morphisms it has commutative diagrams

In words, to say products are final is to say that for any object in this category, there is a unique map $\varphi$ that factors through the product, so that $\pi_1 \varphi = f$ and $\pi_2 \varphi = g$. In a diagram, it is to claim the following commutes:

If the product $X \times Y$ exists for any pair of objects, we declare that the category $\mathbf{C}$ has products.

Indeed, many familiar product constructions exist in pure mathematics: sets, groups, topological spaces, vector spaces, and rings all have products. In fact, so does the category of ML types. Given two types ‘a and ‘b, we can form the (aptly named) product type ‘a * ‘b. The canonical projections exist because ML supports parametric polymorphism. They are

fun leftProjection(x,y) = x
fun rightProjection(x,y) = y

And to construct the unique morphism to the product,

fun inducedMapToProduct(f,g) = fn a => (f(a), g(a))

We leave the uniqueness proof to the reader as a brief exercise.

There is a similar notion called a coproduct, denoted $X \coprod Y$, in which everything is reversed: the arrows in the diagram category go to $X \coprod Y$ and the object is initial in the diagram category. Explicitly, for a fixed $X, Y$ the objects in the diagram category are again pairs of morphisms, but this time with arrows going to the central object

The morphisms are again commutative diagrams, but with the connecting morphism going away from the central object

And a coproduct is defined to be an initial object in this category. That is, a pair of morphisms $i_1, i_2$ such that there is a unique connecting morphism in the following diagram.

Coproducts are far less intuitive than products in their concrete realizations, but the universal property is no more complicated.  For the category of sets, the coproduct is a disjoint union (in which shared elements of two sets $X, Y$ are forcibly considered different), and the canonical morphisms are “inclusion” maps (hence the switch from $\pi$ to $i$ in the diagram above). Specifically, if we define the coproduct

$\displaystyle X \coprod Y = (X \times \left \{ 1 \right \}) \cup (Y \times \left \{ 2 \right \})$

as the set of “tagged” elements (the right entry in a tuple signifies which piece of the coproduct the left entry came from), then the inclusion maps $i_1(x) = (x,1)$ and $i_2(y) = (y,2)$ are the canonical morphisms.

There are similar notions of disjoint unions for topological spaces and graphs, which are their categories’ respective coproducts. However, in most categories the coproducts are dramatically different from “unions.” In group theory, it is a somewhat complicated object known as the free product. We mentioned free products in our hasty discussion of the fundamental group, but understanding why and where free groups naturally occur is quite technical. Similarly, coproducts of vector spaces (or $R$-modules) are more like a product, with the stipulation that at most finitely many of the entries of a tuple are nonzero (e.g., formal linear combinations of things from the pieces). Even without understanding these examples, the reader should begin to believe that relatively simple universal properties can yield very useful objects with potentially difficult constructions in specific categories. The ubiquity of the concepts across drastically different fields of mathematics is part of why universal properties are called “universal.”

Luckily, the category of ML types has a nice coproduct which feels like a union, but it is not supported as a “native” language feature like products types are. Specifically, given two types ‘a, ‘b we can define the “coproduct type”

datatype ('a, 'b)Coproduct = left of 'a | right of 'b

Let’s prove this is actually a coproduct: fix two types ‘a and ‘b, and let $i_1, i_2$ be the functions

fun leftInclusion(x) = left(x)
fun rightInclusion(y) = right(y)

Then given any other pair of functions $f,g$ which accept as input types ‘a and ‘b, respectively, there is a unique function $\varphi$ operating on the coproduct type. We construct it as follows.

fun inducedCoproductMap(f,g) =
let
theMap(left(a)) = f(a)
theMap(right(b)) = g(b)
in
theMap
end

The uniqueness of this construction is self-evident. This author finds it fascinating that these definitions are so deep and profound (indeed, category theory is heralded as the queen of abstraction), but their realizations are trivially obvious to the working programmer. Perhaps this is a statement about how well-behaved the category of ML types is.

## Continuing On

So far we have seen three relatively simple examples of universal properties, and explored how they are realized in some categories. We should note before closing two things. The first is that not every category has objects with these universal properties. Unfortunately poset categories don’t serve as a good counterexample for this (they have both products and coproducts; what are they?), but it may be the case that in some categories only some pairs of objects have products or coproducts, while others do not.

Second, there are many more universal properties that we haven’t covered here. For instance, the notion of a limit is a universal property, as is the notion of a “free” object. There are kernels, pull-backs, equalizers, and many other ad-hoc universal properties without names. And for every universal property there is a corresponding “dual” property that results from reversing the arrows in every diagram, as we did with coproducts. We will visit the relevant ones as they come up in our explorations.

In the next few posts we’ll encounter functors and the concept of functoriality, and start asking some poignant questions about familiar programmatic constructions.

Until then!

# Properties of Morphisms

This post is mainly mathematical. We left it out of our introduction to categories for brevity, but we should lay these definitions down and some examples before continuing on to universal properties and doing more computation. The reader should feel free to skip this post and return to it later when the words “isomorphism,” “monomorphism,” and “epimorphism” come up again. Perhaps the most important part of this post is the description of an isomorphism.

## Isomorphisms, Monomorphisms, and Epimorphisms

Perhaps the most important paradigm shift in category theory is the focus on morphisms as the main object of study. In particular, category theory stipulates that the only knowledge one can gain about an object is in how it relates to other objects. Indeed, this is true in nearly all fields of mathematics: in groups we consider all isomorphic groups to be the same. In topology, homeomorphic spaces are not distinguished. The list goes on. The only way to do determine if two objects are “the same” is by finding a morphism with special properties. Barry Mazur gives a more colorful explanation by considering the meaning of the number 5 in his essay, “When is one thing equal to some other thing?” The point is that categories, more than existing to be a “foundation” for all mathematics as a formal system (though people are working to make such a formal system), exist primarily to “capture the essence” of mathematical discourse, as Mazur puts it. A category defines objects and morphisms, but literally all of the structure of a category lies in its morphisms. And so we study them.

The strongest kind of morphism we can consider is an isomorphism. An isomorphism is the way we say two objects in a category are “the same.” We don’t usually care whether two objects are equal, but rather whether some construction is unique up to isomorphism (more on that when we talk of universal properties). The choices made in defining morphisms in a particular category allow us to strengthen or weaken this idea of “sameness.”

Definition: A morphism $f : A \to B$ in a category $\mathbf{C}$ is an isomorphism if there exists a morphism $g: B \to A$ so that both ways to compose $f$ and $g$ give the identity morphisms on the respective objects. That is,

$gf = 1_A$ and $fg = 1_B$.

The most basic (usually obvious, but sometimes very deep) question in approaching a new category is to ask what the isomorphisms are. Let us do this now.

In $\mathbf{Set}$ the morphisms are set-functions, and it is not hard to see that any two sets of equal cardinality have a bijection between them. As all bijections have two-sided inverses, two objects in $\mathbf{Set}$ are isomorphic if and only if they have the same cardinality. For example, all sets of size 10 are isomorphic. This is quite a weak notion of “sameness.” In contrast, there is a wealth of examples of groups of equal cardinality which are not isomorphic (the smallest example has cardinality 4). On the other end of the spectrum, a poset category $\mathbf{Pos}_X$ has no isomorphisms except for the identity morphisms. The poset categories still have useful structure, but (as with objects within a category) the interesting structure is in how a poset category relates to other categories. This will become clearer later when we look at functors, but we just want to dissuade the reader from ruling out poset categories as uninteresting due to a lack of interesting morphisms.

Consider the category $\mathbf{C}$ of ML types with ML functions as morphisms. An isomorphism in this category would be a function which has a two-sided inverse. Can the reader think of such a function?

Let us now move on to other, weaker properties of morphisms.

Definition: A morphism $f: A \to B$ is a monomorphism if for every object $C$ and every pair of morphisms $g,h: C \to A$ the condition $fg = fh$ implies $g = h$.

The reader should parse this notation carefully, but truly think of it in terms of the following commutative diagram:

Whenever this diagram commutes and $f$ is a monomorphism, then we conclude (by definition) that $g=h$. Remember that a diagram commuting just means that all ways to compose morphisms (and arrive at morphisms with matching sources and targets) result in an identical morphism. In this diagram, commuting is the equivalent of claiming that $fg = fh$, since there are only two nontrivial ways to compose.

The idea is that monomorphisms allow one to “cancel” $f$ from the left side of a composition (although, confusingly, this means the cancelled part is on the right hand side of the diagram).

The corresponding property for cancelling on the right is defined identically.

Definition: A morphism $f: A \to B$ is an epimorphism if for every object $C$ and every pair of morphism $g,h: B \to C$ the condition $gf = hf$ implies $g = h$.

Again, the relevant diagram.

Whenever $f$ is an epimorphism and this diagram commutes, we can conclude $g=h$.

Now one of the simplest things one can do when considering a category is to identify the monomorphisms and epimorphisms. Let’s do this for a few important examples.

## Monos and Epis in Various Categories

In the category $\mathbf{Set}$, monomorphisms and epimorphisms correspond to injective and surjective functions, respectively. Lets see why this is the case for monomorphisms. Recall that an injective function $f$ has the property that $f(x) = f(y)$ implies $x=y$. With this property we can show $f$ is a monomorphism because if $f(g(x)) = f(h(x))$ then the injective property gives us immediately that $g(x) = h(x)$. Conversely, if $f$ is a monomorphism and $f(x) = f(y)$, we will construct a set $C$ and two convenient functions $g, h: C \to A$ to help us show that $x=y$. In particular, pick $C$ to be the one point set $\left \{ c \right \}$, and define $g(c) = x, h(c) = y$. Then as functions $fg = fh$. Indeed, there is only one value in the domain, so saying this amounts to saying $f(x) = fg(c) = fh(c) = f(y)$, which we know is true by assumption. By the monomorphism property $g = h$, so $x = g(c) = h(c) = y$.

Now consider epimorphisms. It is clear that a surjective map is an epimorphism, but the converse is a bit trickier. We prove by contraposition. Instead of now picking the “one-point set,” for our $C$, we must choose something which is one element bigger than $B$. In particular, define $g, h : B \to B'$, where $B'$ is $B$ with one additional point $x$ added (which we declare to not already be in $B$). Then if $f$ is not surjective, and there is some $b_0 \in B$ which is missed by $f$, we define $g(b_0) = x$ and $g(b) = b$ otherwise. We can also define $h$ to be the identity on $B$, so that $gf = hf$, but $g \neq h$. So epimorphisms are exactly the surjective set-maps.

There is one additional fact that makes the category of sets well-behaved: a morphism in $\mathbf{Set}$ is an isomorphism if and only if it is both a monomorphism and an epimorphism. Indeed, isomorphisms are set-functions with two-sided inverses (bijections) and we know from classical set theory that bijections are exactly the simultaneous injections and surjections. A warning to the reader: not all categories are like this! We will see in a moment an example of a nontrivial category in which isomorphisms are not the same thing as simultaneous monomorphisms and epimorphisms.

The category $\mathbf{Grp}$ is very similar to $\mathbf{Set}$ in regards to monomorphisms and epimorphisms. The former are simply injective group homomorphisms, while the latter are surjective group homomorphisms. And again, a morphisms is an isomorphism if and only if it is both a monomorphism and an epimorphism. We invite the reader to peruse the details of the argument above and adapt it to the case of groups. In both cases, the hard decision is in choosing $C$ when necessary. For monomorphisms, the “one-point group” does not work because we are constrained to send the identity to the identity in any group homomorphism. The fortuitous reader will avert their eyes and think about which group would work, and otherwise we suggest trying $C = \mathbb{Z}$. After completing the proof, the reader will see that the trick is to find a $C$ for which only one “choice” can be made. For epimorphisms, the required $C$ is a bit more complex, but we invite the reader to attempt a proof to see the difficulties involved.

Why do these categories have the same properties but they are acquired in such different ways? It turns out that although these proofs seem different in execution, they are the same in nature, and they follow from properties of the category as a whole. In particular, the “one-point object” (a singleton set for $\mathbf{Set}$ and $\mathbb{Z}$ for $\mathbf{Grp}$) is more categorically defined as the “free object on one generator.” We will discuss this more when we get to universal properties, but a “free object on $n$ generators” is roughly speaking an object $A$ for which any morphism with source $A$ must make exactly $n$ “choices” in its definition. With sets that means $n$ choices for the images of elements, for groups that means $n$ consistent choices for images of group elements. On the epimorphism side, the construction is a sort of “disjoint union object” which is correctly termed a “coproduct.” But  momentarily putting aside all of this new and confusing terminology, let us see some more examples of morphisms in various categories.

Our recent primer on rings was well-timed, because the category $\mathbf{Ring}$ of rings (with identity) is an example of a not-so-well-behaved category. As with sets and groups, we do have that monomorphisms are equivalent to injective ring homomorphisms, but the argument is trickier than it was for groups. It is not obvious which “convenient” object $C$ to choose here, since maps $\mathbb{Z} \to R$ yield no choices: 1 maps to 1, 0 maps to 0, and the properties of a ring homomorphism determine everything else (in fact, the abelian group structure and the fact that units are preserved is enough). This makes $\mathbb{Z}$ into what’s called an “initial object” in $\mathbf{Ring}$; more on that when we study universal properties. In fact, we invite the reader to return to this post after we talk about the universal property of polynomial rings. It turns out that $\mathbb{Z}[x]$ is a suitable choice for $C$, and the “choice” made is where to send the indeterminate $x$.

On the other hand, things go awry when trying to apply analogous arguments to epimorphisms. While it is true that every surjective ring homomorphism is an epimorphism (it is already an epimorphism in $\mathbf{Set}$, and the argument there applies), there are ring epimorphisms which are not surjections! Consider the inclusion map of rings $i : \mathbb{Z} \to \mathbb{Q}$. The map $i$ is not surjective, but it is an epimorphism. Suppose $g, h : \mathbb{Q} \to R$ are two parallel ring morphisms, and they agree on $\mathbb{Z}$ (they will always do so, since there is only one ring homomorphism $\mathbb{Z} \to R$). Then $g,h$ must also agree on $\mathbb{Q}$, because if $p,q \in \mathbb{Z}$ with $q \neq 0$, then

$\displaystyle g(p/q) = g(p)g(q^{-1}) = g(p)g(q)^{-1} = h(p)h(q)^{-1} = h(p/q)$

Because the map above is also an injection, the category of rings is a very naturally occurring example of a category which has morphisms that are both epimorphisms and monomorphisms, but not isomorphisms.

There are instances in which monomorphisms and epimorphisms are trivial. Take, for instance any poset category. There is at most one morphism between any two objects, and so the conditions for an epimorphism and monomorphism vacuously hold. This is an extreme example of a time when simultaneous monomorphisms and epimorphisms are not the same thing as isomorphisms! The only isomorphisms in a poset category are the identity morphisms.

The inspection of epimorphisms and monomorphisms is an important step in the analysis of a category. It gives one insight into how “well-behaved” the category is, and picks out the objects which are special either for their useful properties or confounding trickery.

This reminds us of a quote of Alexander Grothendieck, one of the immortal gods of mathematics who popularized the use of categories in mainstream mathematics.

A good category containing some bad objects is preferable to a bad category containing only good objects.

I suppose the thesis here is that the having only “good” objects yields less interesting and useful structure, and that one should be willing to put up with the bad objects in order to have access to that structure.

Next time we’ll jump into a discussion of universal properties, and start constructing some programs to prove that various objects satisfy them.

Until then!

# Categories as Types

In this post we’ll get a quick look at two ways to define a category as a type in ML. The first way will be completely trivial: we’ll just write it as a tuple of functions. The second will involve the terribly-named “functor” expression in ML, which allows one to give a bit more structure on data types.

The reader unfamiliar with the ML programming language should consult our earlier primer.

## What Do We Want?

The first question to ask is “what do we want out of a category?” Recall from last time that a category is a class of objects, where each pair of objects is endowed with a set of morphisms. The morphisms were subject to a handful of conditions, which we paraphrase below:

• Composition has to make sense when it’s defined.
• Composition is associative.
• Every object has an identity morphism.

What is the computational heart of such a beast? Obviously, we can’t explicitly represent the class of objects itself. Computers are finite (or at least countable) beings, after all. And in general we can’t even represent the set of all morphisms explicitly. Think of the category $\mathbf{Set}$: if $A, B$ are infinite, then there are infinitely many morphisms (set-functions) $A \to B$.

A subtler and more important point is that everything in our representation of a category needs to be a type in ML (for indeed, how can a program run without being able to understand the types involved). This nudges us in an interesting direction: if an object is a type, and a morphism is a type, then we might reasonably expect a composition function:

compose: 'arrow * 'arrow -> 'arrow

We might allow this function to raise an exception if the two morphisms are uncomposable. But then, of course, we need a way to determine if morphisms are composable, and this requires us to know what the source and target functions are.

source: 'arrow -> 'object
target: 'arrow -> 'object

We can’t readily enforce composition to be associative at this stage (or any), because morphisms don’t have any other manipulable properties. We can’t reasonably expect to be able to compare morphisms for equality, as we aren’t able to do this in $\mathbf{Set}$.

But we can enforce the final axiom: that every object has an identity morphism. This would come in the form of a function which accepts as input an object and produces a morphism:

identity: 'object -> 'arrow

But again, we can’t necessarily enforce that identity behaves as its supposed to.

Even so, this should be enough to define a category. That is, in ML, we have the datatype

datatype ('object, 'arrow)Category =
category of ('arrow -> 'object) *
('arrow -> 'object) *
('object -> 'arrow) *
('arrow * 'arrow -> 'arrow)

Where we understand the first two functions to be source and target, and the third and fourth to be identity and compose, respectively.

## Categories as Values

In order to see this type in action, we defined (and included in the source code archive for this post) a type for homogeneous sets. Since ML doesn’t support homogeneous lists natively, we’ll have to settle for a particular subcategory of $\mathbf{Set}$. We’ll call it $\mathbf{Set}_a$, the category whose objects are finite sets whose elements have type $a$, and whose morphisms are again set-functions. Hence, we can define an object in this category as the ML type

'a Set

and an arrow as a datatype

datatype 'a SetMap = setMap of ('a Set) * ('a -> 'a) * ('a Set)

where the first object is the source and the second is the target (mirroring the notation $A \to B$).

Now we must define the functions required for the data constructor of a Category. Source and target are trivial.

fun setSource(setMap(a, f, b)) = a
fun setTarget(setMap(a, f, b)) = b

And identity is equally natural.

fun setIdentity(aSet) = setMap(aSet, (fn x => x), aSet)

Finally, compose requires us to check for set equality of the relevant source and target, and compose the bundled set-maps. A few notational comments: the “o” operator does function composition in ML, and we defined “setEq” and the “uncomposable” exception elsewhere.

fun setCompose(setMap(b', g, c), setMap(a, f, b)) =
if setEq(b, b') then
setMap(a, (g o f), c)
else
raise uncomposable

Note that the second map in the composition order is the first argument to the function, as is standard in mathematical notation.

And now this category of finite sets is just the instantiation

val FiniteSets = category(setSource, setTarget, setIdentity, setCompose)

Oddly enough, this definition does not depend on the choice of a type for our sets! None of the functions we defined care about what the elements of sets are, so long as they are comparable for equality (in ML, such a type is denoted with two leading apostrophes: ”a).

This example is not very interesting, but we can build on it in two interesting ways. The first is to define the poset category of sets. Recall that the poset category of a set $X$ has as objects subsets of $X$ and there is a unique morphism from $A \to B$ if and only if $A \subset B$. There are two ways to model this in ML. The first is that we can assume the person constructing the morphisms is giving correct data (that is, only those objects which satisfy the subset relation). This would amount to a morphism just representing “nothing” except the pair $(A,B)$. On the other hand, we may also require a function which verifies that the relation holds. That is, we could describe a morphism in this (or any) poset category as a datatype

datatype 'a PosetArrow = ltArrow of 'a * ('a -> 'a -> bool) * 'a

We still do need to assume the user is not creating multiple arrows with different functions (or we must tacitly call them all equal regardless of the function). We can do a check that the function actually returns true by providing a function for creating arrows in this particular poset category of a set with subsets.

exception invalidArrow
fun setPosetArrow(x)(y) = if subset(x)(y) then ltArrow(x, subset, y)
else raise invalidArrow

Then the functions defining a poset category are very similar to those of a set. The only real difference is in the identity and composition functions, and it is minor at that.

fun posetSource(ltArrow(x, f, y)) = x
fun posetTarget(ltArrow(x, f, y)) = y
fun posetIdentity(x) = ltArrow(x, (fn z => fn w => true), x)
fun posetCompose(ltArrow(b', g, c), ltArrow(a, f, b)) =
if b = b' then
ltArrow(a, (fn x => fn y => f(x)(b) andalso g(b)(y)), c)
else
raise uncomposable

We know that a set is always a subset of itself, so we can provide the constant true function in posetIdentity. In posetCompose, we assume things are transitive and provide the logical conjunction of the two verification functions for the pieces. Then the category as a value is

val Posets = category(posetSource, posetTarget, posetIdentity, posetCompose)

One will notice again that the homogeneous type of our sets is irrelevant. The poset category is parametric. To be completely clear, the type of the Poset category defined above is

('a Set, ('a Set)PosetArrow)Category

and so we can define a shortcut for this type.

type 'a PosetCategory = ('a Set, ('a Set)PosetArrow)Category

The only way we could make this more general is to pass the constructor for creating a particular kind of posetArrow (in our case, it just means fixing the choice of verification function, subset, for the more generic type constructor). We leave this as an exercise to the reader.

Our last example will return again to our abstract “diagram category.” Recall that if we fix an object $A$ in a category $\mathbf{C}$, the category $\mathbf{C}_A$ has as objects the morphisms with fixed source $A$,

and the arrows are commutative diagrams

Lets fix $A$ to be a set, and define a function to instantiate objects in this category:

fun makeArrow(fixedSet)(f)(target) = setMap(fixedSet, f, target)

That is, if I have a fixed set $A$, I can call the function like so

val makeObject = makeArrow(fixedSet)

And use this function to create all of my objects. A morphism is then just a pair of set-maps with a connecting function. Note how similar this snippet of code looks structurally to the other morphism datatypes we’ve defined. This should reinforce the point that morphisms really aren’t any different in this abstract category!

datatype 'a TriangleDiagram =
triangle of ('a SetMap) * ('a -> 'a) * ('a SetMap)

And the corresponding functions and category instantiation are

fun triangleSource(triangle(x, f, y)) = x
fun triangleTarget(triangle(x, f, y)) = y
fun triangleIdentity(x) = triangle(x, (fn z => z), x)
fun triangleCompose(triangle(b', g, c), triangle(a, f, b)) =
if setEq(setTarget(b'))(setTarget(b)) then
triangle(a, (g o f), c)
else
raise uncomposable

val SetDiagrams =
category(triangleSource, triangleTarget, triangleIdentity, triangleCompose)

Notice how we cannot compare the objects in this category for equality! Indeed, two functions cannot be compared in ML, and the best we can do is compare the targets of these functions to ensure that the connecting morphisms can be composed. A malicious programmer might then be able to compose uncomposable morphisms, a devious and startling proposition.

Notice further how we can’t avoid using set-specific functions with our current definition of a category. What we could do instead is require a category to have a function which checks for composability. Then compose could be written in a more homogeneous (but not quite completely parametric fashion). The reader is welcome to try this on their own, but we’ll soon have a more organized way to do this later, when we package up a category into an ML structure.

## Categories as Signatures

Let’s look at a more advanced feature of ML and see how we can apply it to representing categories in code. The idea is familiar to most Java and C++ programmers, and that is of an interface: the programmer specifies an abstract type which has specific functions attached to it, but does not implement those functions. Then some (perhaps different) programmer implements the interface by defining those functions for a concrete type.

For the reader who isn’t so interested in these features of ML, feel free to skip this section. We won’t be introducing any new examples of categories, just rephrasing what we’ve done above in a different way. It is also highly debatable whether this new way is actually any better (it’s certainly longer).

The corresponding concept in ML is called a signature (interface) and a structure (concrete instance). The only difference is that a structure is removed one step further from ML’s type system. This is in a sense necessary: a structure should be able to implement the requirements for many different signatures, and a signature should be able to have many different structures implement it. So there is no strict hierarchy of types to lean on (it’s just a many-to-many relationship).

On the other hand, not having any way to think of a structure as a type at all would be completely useless. The whole point is to be able to write function which manipulate a structure (concrete instance) by abstractly accessing the signature’s (interface’s) functions regardless of the particular implementation details. And so ML adds in another layer of indirection: the functor (don’t confuse this with categorical functors, which we haven’t talked abou yet). A functor in ML is a procedure which accepts as input a structure and produces as output another structure.

Let’s see this on a simple example. We can define a structure for “things that have magnitudes,” which as a signature would be

signature MAG_OBJ =
sig
type object
val mag : object -> int
end

This introduced a few new language forms: the “signature” keyword is like “fun” or “val,” it just declares the purpose of the statement as a whole. The “sig … end” expression is what actually creates the signature, and the contents are a list of type definitions, datatype definitions, and value/function type definitions. The reader should think of this as a named “type” for structures, analogous to a C struct.

To implement a signature in ML is to “ascribe” it. That is, we can create a structure that defines the functions laid out in a signature (and perhaps more). Here is an example of a structure for a mag object of integers:

structure IntMag : MAG_OBJ =
struct
type object = int
fun mag(x) = if x >= 0 then x else ~x
end

The colon is a type ascription, and at the compiler-level it goes through the types and functions defined in this structure and attempts to match them with the required ones from the MAG_OBJ signature. If it fails, it raises a compiler error. A more detailed description of this process can be found here. We can then use the structure as follows:

local open IntMag in
val y = ~7
val z = mag(y)
end

The open keyword binds all of the functions and types to the current environment (and so it’s naturally a bad idea to open structures in the global environment). The “local” construction is made to work specifically with open.

In any case, the important part about signatures and structures is that one can write functions which accept as input structures with a given signature and produce structures with other signatures. The name for this is a “functor,” but we will usually call it an “ML functor” to clarify that it is not a categorical functor (if you don’t know what a categorical functor is, don’t worry yet).

For example, here is a signature for a “distance object”

signature DIST_OBJ =
sig
type object
val dist: object * object -> int
end

And a functor which accepts as input a MAG_OBJ and creates a DIST_OBJ in a contrived and silly way.

functor MakeDistObj(structure MAG: MAG_OBJ) =
struct
type object = MAG.object
val dist = fn (x, y) =>
let
val v = MAG.mag(x) - MAG.mag(y)
in
if v > 0 then v else ~v
end
end

Here the argument to the functor is a structure called MAG, and we need to specify which type is to be ascribed to it; in this case, it’s a MAG_OBJ. Only things within the specified signature can be used within the struct expression that follows (if one needs a functor to accept as input something which has multiple type ascriptions, one can create a new signature that “inherits” from both). Then the right hand side of the functor is just a new structure definition, where the innards are allowed to refer to the properties of the given MAG_OBJ.

We could in turn define a functor which accepts a DIST_OBJ as input and produces as output another structure, and compose the two functors. This is started to sound a bit category-theoretical, but we want to remind the reader that a “functor” in category theory is quite different from a “functor” in ML (in particular, we have yet to mention the former, and the latter is just a mechanism to parameterize and modularize code).

In any case, we can do the same thing with a category. A general category will be represented by a signature, and a particular instance of a category is a structure with the implemented pieces.

signature CATEGORY =
sig
type object
type arrow
exception uncomposable

val source: arrow -> object
val target: arrow -> object
val identity: object -> arrow
val composable: arrow * arrow -> bool
val compose: arrow * arrow -> arrow
end

As it turns out, ML has an implementation of “sets” already, and it uses signatures. So instead of using our rinky-dink implementation of sets from earlier in this post, we can implement an instance of the ORD_KEY signature, and then call one of the many set-creating functors available to us. We’ll use an implementation based on red-black trees for now.

First the key:

structure OrderedInt : ORD_KEY =
struct
type ord_key = int
val compare = Int.compare
end

then the functor:

functor MakeSetCategory(structure ELT: ORD_KEY) =
struct
structure Set:ORD_SET = RedBlackSetFn(ELT)
type elt = ELT.ord_key
type object = Set.set
datatype arrow = function of object * (elt -> elt) * object

exception uncomposable

fun source(function(a, _, _)) = a
fun target(function(_, _, b)) = b
fun identity(a) = function(a, fn x => x, a)
fun composable(a,b) = Set.equal(a,b)
fun compose(function(b2, g, c), function(a, f, b1)) =
if composable(b1, b2) then
function(a, (g o f), c)
else
raise uncomposable
fun apply(function(a, f, b), x) = f(x)
end

The first few lines are the most confusing; the rest is exactly what we’ve seen from our first go-round of defining the category of sets. In particular, we call the RedBlackSetFn functor given the ELT structure, and it produces an implementation of sets which we name Set (and superfluously ascribe the type ORD_SET for clarity).

Then we define the “elt” type which is used to describe an arrow, the object type as the main type of an ORD_SET (see in this documentation that this is the only new type available in that signature), and the arrow type as we did earlier.

We can then define the category of sets with a given type as follows

structure IntSets = MakeSetCategory(structure ELT = OrderedInt)

The main drawback of this approach is that there’s so much upkeep! Every time we want to make a new kind of category, we need to define a new functor, and every time we want to make a new kind of category of sets, we need to implement a new kind of ORD_KEY. All of this signature and structure business can be quite confusing; it often seems like the compiler doesn’t want to cooperate when it should.

## Nothing Too Shocking So Far

To be honest, we haven’t really done anything very interesting so far. We’ve seen a single definition (of a category), looked at a number of examples to gain intuition, and given a flavor of how these things will turn into code.

Before we finish, let’s review the pros and cons of a computational representation of category-theoretical concepts.

Pros:

• We can prove results by explicit construction (more on this next time).
• Different-looking categories are structurally similar in code.
• We can faithfully represent the idea of using (objects and morphisms of) categories as parameters to construct other categories.
• Writing things in code gives a fuller understanding of how they work.

Cons:

• All computations are finite, requiring us to think much harder than a mathematician about the definition of an object.
• The type system is too weak. we can’t enforce the axioms of a category directly or even ensure the functions act sensibly at all. As such, the programmer becomes responsible for any failure that occur from bad definitions.
• The type system is too strong. when we work concretely we are forced to work within homogeneous categories (e.g. of ‘a Set).
• We cannot ensure the ability to check equality on objects. This showed up in our example of diagram categories.
• The functions used in defining morphisms, e.g. in the category of sets, are somewhat removed from real set-functions. For example, nothing about category theory requires functions to be computable. Moreover, nothing about our implementation requires the functions to have any outputs at all (they may loop infinitely!). Moreover, it is not possible to ensure that any given function terminates on a given input set (this is the Halting problem).

This list looks quite imbalanced, but one might argue that the cons are relatively minor compared to the pros. In particular (and this is what this author hopes to be the case), being able to explicitly construct proofs to theorems in category theory will give one a much deeper understanding both of category theory and of programming. This is the ultimate prize.

Next time we will begin our investigation of universal properties (where the real fun starts), and we’ll use the programmatic constructions we laid out in this post to give constructive proofs that various objects are universal.

Until then!