# Anti-Coordination Games and Stable Graph Colorings

## My First Paper

I’m pleased to announce that my first paper, titled “Anti-Coordination Games and Stable Colorings,” has been accepted for publication! The venue is the Symposium on Algorithmic Game Theory, which will take place in Aachen, Germany this October. A professor of mine once told me that everyone puts their first few publications on a pedestal, so I’ll do my best to keep things down to earth by focusing on the contents of the paper and not my swirling cocktail of pride. The point of this post is to explain the results of my work to a non-research-level audience; the research level folks will likely feel more comfortable reading the paper itself. So here we’ll spend significantly longer explaining the proofs and the concepts, and significantly less time on previous work.

I will assume familiarity with basic graph theory (we have a gentle introduction to that here) and NP-completeness proofs (again, see our primer). We’ll give a quick reminder about the latter when we get to it.

## Anti-Coordination Games on Graphs

The central question in the paper is how to find stable strategy profiles for anti-coordination games played on graphs. This section will flush out exactly what all of that means.

The easiest way to understand the game is in terms of fashion. Imagine there is a group of people. Every day they choose their outfits individually and interact with their friends. If any pair of friends happens to choose the same clothing, then they both suffer some embarrassment. We can alternatively say that whenever two friends anti-coordinate their outfits, they each get some kind of reward. If not being embarrassed is your kind of reward, then these really are equivalent. Not every pair of people are friends, so perhaps the most important aspect of this problem is how the particular friendship network considered affects their interactions. This kind of game is called an anti-coordination game, and the network of friends makes it a “game on a graph.” We’ll make this more rigorous shortly.

We can ask questions like, if everyone is acting independently and greedily will their choices converge over time to a single choice of outfit? If so how quickly? How much better could a centralized fashion-planner who knows the entire friendship network fare in choosing outfits? Is the problem of finding a best strategy for picking outfits computationally hard? What if some pairs of people want to coordinate their outfits and others don’t? What if caring about another’s fashion is only one-sided in some cases?

Already this problem is rooted in the theory of social networks, but the concept of an anti-coordination game played on a graph is quite broad, and the relevance of this model to the real world comes from the generality of a graph. For example, one may consider the trading networks of various countries; in this case not all countries are trading partners, and it is beneficial to produce different commodities than your trading partners so that you actually benefit from the interaction. Likewise, neighboring radio towers want to emit signals on differing wavelengths to minimize interference, and commuters want to pick different roadways to minimize traffic. These are all examples of this model which we’re about to formalize.

In place of our “network of friends,” the game entails a graph $G = (V,E)$ in which each player is represented by a vertex, and there is an edge between two vertices whenever the corresponding players are trying to anti-coordinate. We will use the terms player and vertex interchangeably. For now the graph is undirected, but later we will relax this assumption and work with directed graphs. In place of “outfits” we’ll have a generic set of strategies denoted by the numbers $1, \dots, k$, and each vertex will choose a strategy from this set. In one round of the game, each vertex $v$ chooses a strategy, and this defines a function $f : V \to \left \{ 1, \dots, k \right \}$ from the set of vertices to the set of strategies. Then the payoff of a vertex $v$ in a round, which we denote $\mu_f(v)$, is the number of neighbors of $v$ which have chosen a different strategy than $v$. That is, it is

$\displaystyle \mu_f(v) = \sum_{(v,w) \in E} \mathbf{1}_{\left \{ f(v) \neq f(w) \right \}}$

Where $\mathbf{1}_{A}$ denotes the indicator function for the event $A$, which assumes a value of 1 when the event occurs and 0 otherwise. Here is an example of an instance of the game. We have three strategies, denoted by colors, and the payoff for the vertex labeled $v$ is three.

If this game is played over many many rounds, we can ask if there is a so-called Nash equilibrium. That is, is there a choice of strategies for the players so that no single player will have an incentive to change in future rounds, assuming nobody else changes? We restrict even further to thinking about pure strategy Nash equilibria, which means there are no probabilistic choices made in choosing a strategy. Equivalently, a pure strategy equilibrium is just a choice of a strategy for each vertex which doesn’t change across rounds. In terms of the graph, we call a strategy function $f$ which is a Nash equilibrium a stable equilibrium (or, as will be made clear in the next paragraph, a stable coloring). It must satisfy the property that no vertex can increase its payoff by switching to a different strategy. So our question now becomes: how can we find a stable coloring which as good as possible for all players involved? Slightly more generally, we call a Nash equilibrium a strictly stable equilibrium (or a strictly stable coloring) if every vertex would strictly decrease its payoff by switching to another strategy. As opposed to a plain old stable coloring where one could have the same payoff by switching strategies, if any player tries to switch strategy then it will get a necessarily worse payoff. Though it’s not at all clear now, we will see that this distinction is the difference between computational tractability and infeasibility.

We can see a very clear connection between this game and graph coloring. Here an edge produces a payoff of 1 for each of its two vertices if and only if it’s properly colored. And so if the strategy choice function $f$ is also a proper coloring, this will produce the largest possible payoff for all vertices in the graph. But it may not be the case that (for a fixed set of strategies) the graph is properly colorable, and we already know that finding a proper coloring with more than two colors is a computationally hard problem. So this isn’t a viable avenue for solving our fashion game. In any case, the connection confuses us enough to interchangeably call the strategy choice function $f$coloring of $G$.

As an interesting side note, a slight variation of this game was actually tested on humans (with money as payoff!) to see how well they could do. Each human player was only shown the strategies of their neighbors, and received $5 for every round in which they collectively arrived at a proper coloring of the graph. See this article in Science for more details. Since our game allows for the presence of improperly colored edges, we could instead propose to find an assignment of colors to vertices which maximizes the sum of the payoffs of all players. In this vein, we define the social welfare of a graph and a coloring, denoted $W(G,f)$, to be the sum of the payoffs for all vertices $\sum_v \mu_f(v)$. This is a natural quantity one wants to analyze. Unfortunately, even in the case of two strategies, this quantity is computationally difficult (NP-hard) to maximize. It’s a version of the MAX-CUT problem, in which we try to separate the graph into two sets $X, Y$ such that the largest number of edges crosses from $X$ to $Y$. The correspondence between the two problems is seen by having $X$ represent those vertices which get strategy 1 and $Y$ represent strategy 2. So we can’t hope to find an efficient algorithm maximizing social welfare. The next natural question is: can we find stable or strictly stable colorings at all? Will they even necessarily exist? The answers to these questions form the main results of our paper. ## An Algorithm for Stable Colorings, and the Price of Anarchy It turns out that there is a very simple greedy algorithm for finding stable colorings of a graph. We state it in the form of a proposition. By stable $k$-coloring we mean a stable coloring of the graph with $k$ colors (strategies). Proposition: For every graph $G$ and every $k \geq 1$, $G$ admits a stable $k$-coloring, and such a coloring can be found in polynomial time. Proof. The proof operates by using the social welfare function as a so-called potential function. That is, a change in a player’s strategy which results in a higher payoff results in a higher value of the social welfare function. It is easy to see why: if a player $v$ changes to a color that helps him, then it will result in more properly colored edges (adjacent to $v$) than there were before. This means that more of $v$‘s neighbors receive an additional 1 unit of payoff than those that lost 1 as a result of $v$‘s switch. We call a vertex which has the potential to improve its payoff unhappy, and one which cannot improve its payoff happy. And so our algorithm to find a stable coloring simply finds some unhappy vertex, switches its color to the most uncommon color among its neighbors, and repeats the process until all vertices are happy. Indeed, this is a local maximum of the social welfare function, and the very definition of a stable coloring. $\square$ So that was nice, but we might ask: how much worse is the social welfare arrived at by this algorithm than the optimal social welfare? How much do we stand to lose by accepting the condemnation of NP-hardness and settling for the greedy solution we found? More precisely, if we call $Q$ the set of stable colorings and $C$ the set of all possible colorings, what is the value of $\displaystyle \frac{\max_{c' \in C} W(G, c')}{\min_{c \in Q} W(G, c)}$ This is a well-studied quantity in many games, called the price of anarchy. The name comes from the thought: what do we stand to gain by having a central authority, who can see the entire network topology and decide what is best for us, manage our strategies? The alternative (anarchy) is to have each player in the game act as selfishly and rationally as possible without complete information. It turns out that as the number of strategies grows large in our anti-coordination game, there is no price of anarchy. For our game this obviously depends on the choice of graph, but we know what it is and we formally state the result as a proposition: Proposition: For any graph, the price of anarchy for the $k$ strategy anti-coordination game is at most $k/(k-1)$ and this value is actually achieved by some instances of the game. Proof. The pigeonhole principle says that every vertex can always achieve at least a $(k-1)/k$ fraction of its maximum possible payoff. Specifically, if a vertex $v_i$ can’t achieve a proper coloring, then every color must be accounted for among $v_i$‘s neighbors. Choosing the minimally occurring color will give $v_i$ at least a payoff of $d_i(k-1)/k$ where $d_i$ is the number of neighbors of $v_i$. Since every stable coloring has to satisfy the condition that no vertex can do any better than the strategy it already has, even in the worst stable coloring every vertex already has chosen such a minority color. Since the maximum payoff is twice the number of edges $2 |E|$, and the sum of the degrees $\sum_i d_i = 2 |E|$, we have that the price of anarchy is at most $\displaystyle \frac{2|E|}{\frac{k-1}{k} \sum_i d_i} = \frac{k}{k-1}$ Indeed, we can’t do any better than this in general, because the following graph gives an example where the price of anarchy exactly meets this bound. An instance of the anti-coordination game with 5 strategies which meets the upper bound on price of anarchy. This example can easily be generalized to work with arbitrary $k$. We leave the details as an exercise to the reader. $\square$ ## Strictly Stable Colorings are Hard to Find Perhaps surprisingly, the relatively minor change of adding strictness is enough to make computability intractable. We’ll give an explicit proof of this, but first let’s recall what it means to be intractable. Recall that a problem is in NP if there is an efficient (read, polynomial-time) algorithm which can verify a solution to the problem is actually a solution. For example, the problem of proper graph $k$-coloring is in NP, because if someone gives you a purported coloring all you have to do is verify that each of the $O(n^2)$ edges are properly colored. Similarly, the problem of strictly stable coloring is in NP; all one need do is verify that no choice of a different color for any vertex improves its payoff, and it is trivial to come up with an algorithm which checks this. Next, call a problem $A$ NP-hard if a solution to $A$ allows you to solve any problem in NP. More formally, $A$ being NP-hard means that there is a polynomial-time reduction from any problem in NP $B$ to $A$ in the following (rough) sense: there is a polynomial-time computable function (i.e. deterministic program) $f$ which takes inputs for $B$ and transforms them into inputs for $A$ such that: $w$ is a solvable instance of $B$ is if and only if $f(w)$ is solvable for $A$. This is not a completely formal definition (see this primer on NP-completeness for a more serious treatment), but it’s good enough for this post. In order to prove a problem $C$ is NP-hard, all you need to do is come up with a polynomial-time reduction from a known NP-hard problem $A$ to your new problem $C$. The composition of the reduction used for $A$ can be composed with the reduction for $C$ to get a new reduction proving $C$ is NP-hard. Finally, we call a problem NP-complete if it is both in NP and NP-hard. One natural question to ask is: if we don’t already know of any NP-hard problems, how can we prove anything is NP-hard? The answer is: it’s very hard, but it was done once and we don’t need to do it again (but if you really want to, see these notes). As a result, we have generated a huge list of problems that are NP-complete, and unless P = NP none of these algorithms have polynomial-time algorithms to solve them. We need two examples of NP-hard problems for this paper: graph coloring, and boolean satisfiability. Since we assume the reader is familiar with the former, we recall the latter. Given a set of variables $x_i$, we can form a boolean formula over those variables of the form $\varphi = C_1 \wedge C_2 \wedge \dots \wedge C_m$ where each clause $C_i$ is a disjunction of three literals (negated or unnegated variables). For example, $C_i = (x_2 \vee \bar{x_5} \vee \bar{x_9})$ might be one clause. Here interpret a formula as the $x_i$ having the value true or false, the horizontal bars denoting negation, the wedges $\wedge$ meaning “and” and the vees $\vee$ meaning “or.” We call this particular form conjunctive normal form. A formula $\varphi$ is called satisfiable if there is a choice of true and false assignments to the variables which makes the entire logical formula true. The problem of determining whether there is any satisfying assignment of such a formula, called 3-SAT, is NP-hard. Going back to strictly stable equilibria and anti-coordination games, we will prove that the problem of determining whether a graph has a strictly stable coloring with $k$ colors is NP-hard. As a consequence, finding such an equilibrium is NP-hard. Since the problem is also in NP, it is in fact NP-complete. Theorem: For all $k \geq 2$, the problem of determining whether a graph $G$ has a strictly stable coloring with $k$ colors is NP-complete. Proof. The hardest case is $k =2$, but $k \geq 3$ is a nice warmup to understand how a reduction works, so we start there. The $k \geq 3$ part works by reducing from graph coloring. That is, our reduction will take an input to the graph $k$-coloring problem (a graph $G$ whose $k$-colorability is in question) and we produce a graph $G'$ such that $G$ is $k$-colorable if and only if $G'$ has a strictly stable coloring with $k$ colors. Since graph coloring is hard for $k \geq 3$, this will prove our problem is NP-hard. More specifically, we will construct $G'$ in such a way that the strictly stable colorings also happen to be proper colorings! So finding a strictly stable coloring of $G'$ will immediately give us a proper coloring of $G$. The construction of $G'$ is quite straightforward. We start with $G' = G$, and then for each edge $e = (u,v)$ we add a new subgraph which we call $H_e$ that looks like: By $K_{k-2}$ we mean the complete graph on $k-2$ vertices (all possible edges are present), and the vertices $u,v$ are adjacent to all vertices of the $H_e = K_{k-2}$ part. That is, the graph $H_e \cup \left \{ u,v \right \}$ is the complete graph on $k$ vertices. Now if the original graph $G$ was $k$-colorable, then we can use the same colors for the corresponding vertices in $G'$, and extend to a proper coloring (and hence a strictly stable equilibrium) of all of $G'$. Indeed, for any $H_e$ we can use one different color for each vertex of the $K_{k-2}$ part if we don’t use either of the colors used for $u,v$, then we’ll have a proper coloring. On the other hand, if $G'$ has a strictly stable equilibrium, then no edge $e$ which originally came from $G$ can be improperly colored. If some edge $e = (u,v)$ were improperly colored, then there would be some vertex in the corresponding $H_e$ which is not strictly stable. To see this, notice that in the $k$ vertices among $H_e \cup \left \{ u,v \right \}$ there can be at most $k-1$ colors used, and so any vertex will always be able to switch to that color without hurting his payoff. That is, the coloring might be stable, but it won’t be strictly so. So strictly stable colorings are the same as proper colorings, and we already see that the subgraph $G \subset G'$ is $k$-colorable, completing the reduction. Well that was a bit of a cheap trick, but it shows the difficulty of working with strictly stable equilibria: preventing vertices from changing with no penalty is hard! What’s worse is that it’s still hard even if there are only two colors. The reduction here is a lot more complicated, so we’ll give a sketch of how it works. The reduction is from 3-SAT. So given a boolean formula $\varphi = C_1 \wedge \dots \wedge C_m$ we produce a graph $G$ so that $\varphi$ has a satisfying assignment if and only if $G$ has a strictly stable coloring with two colors. The principle part of the reduction is the following gadget which represents the logic inherent in a clause. We pulled the figure directly from our paper, since the caption gives a good explanation of how it works. To reiterate, the two “appendages” labeled by $x$ correspond to the literal $x$, and the choice of colors for these vertices correspond to truth assignments in $\varphi$. In particular, if the two vertices have the same color, then the literal is assigned true. Of course, we have to ensure that any $x$‘s showing up in other clause gadgets agree, and any $\bar{x}$‘s will have opposite truth values. That’s what the following two gadgets do: The gadget on the left enforces x’s to have the same truth assignment across gadgets (otherwise the center vertex won’t be in strict equilibrium). The gadget on the right enforces two literals to be opposites. And if we stitch the clauses together in a particular way (using the two gadgets above) then we will guarantee consistency across all of the literals. All that’s left to check is that the clauses do what they’re supposed to. That is, we need it to be the case that if all of the literals in a clause gadget are “false,” then we can’t complete the coloring to be strictly stable, and otherwise we can. Indeed, the following diagram gives all possible cases of this up to symmetry: The last figure deserves an explanation: if the three literals are all false, then we can pick any color we wish for $v_1$, and its two remaining neighbors must both have the same color (or else $v_1$ is not in strict equilibrium). Call this color $a$, and using the same reasoning call the neighbors of $v_2$ and $v_3$ $b$ and $c$, respectively. Now by the pigeonhole principle, either $a=b, b=c,$ or $b=c$. Suppose without loss of generality that $a=b$, then the edge labeled $(a,b)$ will have the $a$ part not in strict equilibrium (it will have two neighbors of its same color and only one of the other color). This shows that no strict equilibrium can exist. The reduction then works by taking a satisfying assignment for the variables, coloring the literals in $G$ appropriately, and extending to a strictly stable equilibrium of all of $G$. Conversely, if $G$ has a strictly stable coloring, then the literals must be consistent and each clause must be fully colorable, which the above diagram shows is the same as the clauses being satisfiable. So all of $\varphi$ is satisfiable and we’re done (excluding a few additional details we describe in the paper). $\square$ ## Directed Graphs and Cooperation That was the main result of our paper, but we go on to describe some interesting generalizations. Since this post is getting quite long, we’ll just give a quick overview of the interesting parts. The rest of the paper is dedicated to directed graphs, where we define the payoff of a directed edge $(u,v)$ to go to the $u$ player if $u$ and $v$ anti-coordinate, but $v$ gets nothing. Here the computational feasibility is even worse than it was in the undirected case, but the structure is noticeably more interesting. For the former, not only is in NP-hard to compute strictly stable colorings, it’s even NP-hard to do so in the non-strict case! One large part of the reason for this is that stable colorings might not even exist: a directed 3-cycle has no stable equilibrium. We use this fact as a tool in our reductions to prove the following theorem. Theorem: For all $k \geq 2$, determining whether a directed graph has a stable$latex k\$-coloring is NP-complete.

See section 5 of our paper for a full proof.

To address the interesting structure that arises in the directed case, we observe that we can use a directed graph to simulate the desire of one vertex to actually cooperate with another. To see this for two colors, instead of adding an edge $(u,v)$ we add a proxy edge $u'$ and directed edges $(u,u'), (u',v)$. To be in equilibrium, the proxy has no choice but to anti-coordinate with $v$, and this will give $u$ more incentive to cooperate with $v$ by anti-cooperating with its proxy. This can be extended to $k$ colors by using an appropriately (acyclically) directed copy of $K_{k-1}$.

## Thoughts, and Future Work

While the results in this paper are nice, and I’m particularly proud that I came up with a novel NP-hardness reduction, it is unfortunate that the only results in this paper were hardness results. Because of the ubiquity of NP-hard problems, it’s far more impressive to have an algorithm which actually does something (approximate a good solution, do well under some relaxed assumption, do well in expectation with some randomness) than to prove something is NP-hard. To get an idea of the tone set by researchers, NP-hardness results are often called “negative” results (in the sense that they give a “no” answer to the question of whether there is an efficient algorithm) and finding an algorithm that does something is called a positive result. That being said the technique of using two separate vertices to represent a single literal in a reduction proof is a nice trick, and I have since used it in another research paper, so I’m happy with my work.

On the positive side, though, there is some interesting work to be done. We could look at varying types of payoff structures, where instead of a binary payoff it is a function of the colors involved (say, $|i - j|$. Another interesting direction is to consider distributed algorithms (where each player operates independently and in parallel) and see what kinds of approximations of the optimal payoff can be achieved in that setting. Yet another direction favored by a combinatorialist is to generalize the game to hypergraphs, which makes me wonder what type of payoff structure is appropriate (payoff of 1 for a rainbow edge? a non-monochromatic edge?). There is also some more work that can be done in inspecting the relationship between cooperation and anti-cooperation in the directed version. Though I don’t have any immediate open questions about it, it’s a very interesting phenomenon.

In any event, I’m currently scheduled to give three talks about the results in this paper (one at the conference venue in Germany, and two at my department’s seminars). Here’s to starting off my research career!

# Conferences, Summer Work, and an Advisor

I’ve been spending a little less time on my blog recently then I’d like to, but for good reason: I’ve been attending two weeks of research conferences, I’m getting ready for a summer internship in cybersecurity, and I’ve finally chosen an advisor.

## Visions, STOC, and CCC

The Simons Institute at UC Berkeley

I’ve been taking a break from the Midwest for the last two weeks to attend some of this year’s seminal computer science theory conferences. The first (which is already over) was the Simon’s Institute Symposium on the Visions of the Theory of Computing. It wasn’t a typical computer science theory conference, because the Simon’s Institute exists to bring techniques from computer science to other fields. So there were many talks in life sciences, ranging from neurobiology (modelling the brain as a computer) to nanochemistry (using DNA to engineer computers) to physics (quantum computing). These kinds of sciences are interesting to me, but pretty far beyond my knowledge base. Other talks were on social networks (and the web), designing markets to be efficient, and computational privacy.

All of these talks shared a common theme: how can we apply the computational “lens” to everything? Interestingly, there were some very deep philosophical questions that computational insight can give answers to. For instance, Christos Papadimitriou gave a talk on evolution and sex, and raised an interesting question: what is the place of sex in evolution? Briefly, the problem with sex is as follows. Say you have a perfect human: perfect health, perfect genes, perfect intelligence, perfect everything. Because of sex, all of this human’s offspring are guaranteed to be imperfect. If the “goal” of evolution is to promote the fitness of a species, it seems that sexual reproduction works against us. Why then, is sexual reproduction the norm and asexual reproduction the exception?

Indeed, the same patterns hold in computer science. Genetic algorithms are the analogue of sexual reproduction, while “simulated annealing” (gradient descent with random mutation) is the analogue of asexual reproduction. And there we can actually experiment, and (without insulting anyone too heavily) it’s largely the case that the asexual methods outperform sexual methods.

But with computational (mathematical) models, we can actually analyze them to see what’s going on. And in fact, genetic algorithms are not optimizing a fitness function. Instead, they are simultaneously optimizing fitness and entropy in the gene pool. One might argue this allows for maximal adaptability, and there are some connections to learning-theory methods that mirror evolution (see Multiplicative Weights Update Algorithm).

This is pretty cool to think about, but more deeply there are philosophical claims about how we can apply the computational lens to other fields. The idea is that computer scientists can create and analyze generative models which reflect the empirical phenomena we observe. Indeed, this mentality is in our blood. It stems from (in the words of my advisor) a “deeply religious” view that everything is computation. And even beyond “big data” and other popular trends in the news, researchers in other fields are increasingly appreciative of the algorithmic mindset computer scientists have to offer.

The second two conferences, the Symposium on the Theory of Computing (STOC) and the Conference on Computational Complexity (CCC) are much more technical. STOC is a whirlwind of a conference, with around 15 twenty-minute talks per day (30 in total with two running in parallel at a time). And I admit that I understand very little. As I write this I’m lost during a talk on an online version of the Steiner tree problem. But even so, it gives a great glimpse into the frontier of research, and tells me what things I should be looking at in my own work.

One new notion that seems omnipresent at STOC this year is the concept of differential privacy. The idea stems from recent controversies over privacy in “anonymized” database information that was released to the public being de-anonymized in clever ways. So the question becomes: is it possible for a database to release information from its databases without revealing any information about the contents of the databases? “Information” could mean simple things like means and medians, or more complicated things like principal components. It’s a question about the mechanism for releasing data, and it’s inherently a computational problem: we want it to be (probabilistically) hard to accurately represent the data from the information provided. At a very high level, the idea is to add random noise to the data as you compute, and modify your methods to be resilient to such noise. There is a flurry of work being done on this topic, with a wealth of open problems waiting to be tackled. A related idea is that of homomorphic encryption, in which the user sends encrypted data to the compute server, the compute server computes things without decrypting the data, and then the user decrypts the result of the computation locally. The server, then, can have computational powers far beyond that of the user, but the user need not relinquish sensitive information to access that power.

A schematic example of differential privacy at work. Image credit: Microsoft Research.

Another popular topic (which I know even less about) is the idea of communication complexity and information complexity. Roughly, this field is concerned with two communicating parties with private information trying to compute a function that depends on both of their information. The communication complexity question is how can they do this while spending the least amount of resources on communication. The information complexity concern is to keep as much information private as possible, both between the two parties involved and any nefarious outsiders listening in on the conversation.

It’s amazing to me how much privacy concerns have permeated into theoretical computer science. The usual picture of a mathematician is of a wiry-haired chalk-dust-covered professor with no concern for the world outside his or her own mind. But in reality, we are actively working on the problems faced by average users, to protect them against the dangers of small groups of individuals making bad decisions with their sensitive data. Anyone worried about the upcoming dystopia posed by the masses of available data is apparently not informed in the magical powers of modern cryptography. It seems like it will only be a few years before these kinds of systems become feasible for large-scale applications. Maybe I’ll be lucky enough to have a hand in that

Starting Wednesday is the CCC, and this will likely be more concerned with more “pure” kinds of theoretical computer science research. I’m pretty excited for it, too (although I know by the end I will be exhausted; I’m already feeling the strain today).

## MIT Lincoln Lab

This summer (actually, in a few days), I’ll be at MIT’s Lincoln Lab to work on problems in cybersecurity. I still don’t know quite what I’ll be working on (and I probably won’t be at liberty to describe it when I do find out), but vaguely the problem is in finding the right data representations for massive amounts of data so as to reason about it.

I know that the idea of the government analyzing the world’s internet activity is a scary thought, but I’m going to do my best to go into this with an open mind. At the very least, I’ll be familiar enough with the kind of work they do there to decide if it violates any of my principles. I had minor worries about this last year when I went to work at Lawrence Livermore, but by the time I got there it was quite obvious that my work would keep me far far away from weapons development and closer to problems with renewable energy.

And at least it will be fun to live on the east coast for a little while. In anticipation I’ve joined the Boston Python Users group, so it will be fun to go to some of their project nights and talk math and programming with like-minded people in the area.

Finally, I’m quite pleased to announce that I have secured an advisor for my PhD program! His name is Lev Reyzin, and he’s a learning theorist. (For the last year or so I’ve been meaning to start talking about learning theory on this blog). That means that I’m officially in the field of theoretical computer science.

I was a little bit late in choosing an advisor, mostly because I meandered around algebraic geometry and algebraic topology and logic before deciding that theoretical computer science was the right field for me.

It’s exciting to finally have an advisor because it implies a few things:

• I’ve passed all of my prelim exams.
• I can focus on one area and start to deeply understand important problems in theoretical computer science.
• My graduate life will shift from mainly course-taking to mainly research. Research is much more fun.

Lev is also a good fit for me both because I’m interested in learning theory, and because he has nontrivial experience in industry-focused research. He’s done work at Yahoo! Research and Google Research, in particular on improving their algorithms for placing news stories and ads to the front page. The formal name for this problem is the multi-armed bandit problem, and I’ll definitely be writing about it on this blog. While I’d probably be totally happy doing pure learning theory, I think it’s fun and worthwhile to look at wider applications. So I have a lot of optimism that the next few years will be fun and productive.

Until next time!

# A Sample of Standard ML, the TreeSort Algorithm, and Monoids

In this post we will assume the reader has a passing familiarity with some of the basic concepts of functional programming (the map, fold, and filter functions). We introduce these topics in our Racket primer, but the average reader will understand the majority of this primer without expertise in functional programming.

## Preface: ML for Category Theory

A few of my readers have been asking for more posts about functional languages and algorithms written in functional languages. While I do have a personal appreciation for the Haskell programming language (and I plan to do a separate primer for it), I have wanted to explore category theory within the context of programming for quite a while now. From what I can tell, ML is a better choice than Haskell for this.

Part of the reason is that, while many Haskell enthusiasts claim it to be a direct implementation of category theory, Haskell actually tweaks category theoretic concepts in certain ways. I rest assured that the designers of Haskell (who are by assumption infinitely better at everything than I am) have very good reasons for doing this. But rather than sort through the details of the Haskell language specification to clarify the finer details, we would learn a lot more by implementing category theory by hand in a programming language that doesn’t have such concepts already.

And so we turn to ML.

ML, which stands for MetaLanguage, apparently has historical precedents for being a first in many things. One of these is an explicit recognition of parametric polymorphism, which is the idea that an operation can have the same functionality regardless of the types of the data involved; the types can, in effect, be considered variables. Another ground-breaking aspect of ML is an explicit type inference system. Similar to Haskell, an ML program will not run unless the compiler can directly prove that the program produces the correct types in every step of the computation.

Both of these features are benefits for the student of category theory. Most of our time in category theory will be spent working with very general assumptions on the capabilities of our data involved, and parametric polymorphism will be our main tool for describing what these assumptions are and for laying out function signatures.

As a side note, I’ve noticed through my ever-growing teaching experiences that one of the main things new programming students struggle with (specifically, after mastering the syntax and semantics of basic language constructs) is keeping their types straight. This is especially prominent in a language like Python (which is what I teach), where duck-typing is so convenient that it lulls the students into a false sense of security. Sooner as opposed to later they’ll add strings to numbers with the blind confidence that Python will simply get it. Around this time in their first semester of programming, I would estimate that type errors lie at the heart of 75% of the bugs my students face and fail to resolve before asking me for help. So one benefit of programming in ML for pedagogy is that it is literally impossible to make type errors. The second you try to run a program with bad types, the compiler points out what the expected type is and what the given (incorrect) type was. It takes a while to get used to type variables (and appeasing the type checker when you want to play fast and loose). But once you do you’ll find the only bugs that remain in your code are conceptual ones, which are of course much more rewarding and important bugs to fix.

So enough of this preamble. Let’s learn some ML!

## Warming Up: Basic Arithmetic and Conditionals

As one would expect, ML has variables and arithmetic which work in much the same way as other languages. Each variable declaration is prefixed by the word “val,” as below

val x = 7;
val y = 2;

This statements modify the global environment (the list of which variable names are associated to which values). Semicolons are required to terminate variable declarations at the global level. We can declare multiple variables in a single line using the “and” keyword

val x = 7 and y = 2;

As a precaution, “and” is only used in ML for syntactic conjunctions of variable/function declarations, and is only necessary when the two defined variable names are mutually defined in terms of each other (this can happen naturally for recursive function definitions). We will see in a moment that the logical and operation is denoted “andalso.”

We can also use pattern matching to bind these two variables in one line, much the same way as it might work in Python:

val (x,y) = (7,2);

We note that while ML does not require us to specify the type of a variable, the type is known and ever present under the surface. If we run the above code through the sml compiler (which after running the contents of a file opens a REPL to further evaluate commands), we see the following output

[opening vars.sml]
val x = 7 : int
val n = 2 : int

The environment is printed out to the user, and it displays that the two types are “int.”

Arithmetic is defined for integers, and the standard ones we will use are the expected +, -, *, and (not a slash, but) div. Here are some examples, and here is a list of all basic operations on ints. A few things to note: the unary negation operator is a tilde (~), and the semicolons are only used terminate statements in the REPL, which tells the compiler we’re ready for it to evaluate our code. Semicolons can also be used to place multiple statements on a single line of code. The “it” variable is a REPL construct which saves the most recent unbound expression evaluation.

- 3 + 6;
val it = 9 : int
- 6 div 3;
val it = 2 : int
- 2 * 9;
val it = 18 : int
- 2 - 9;
val it = ~7 : int
- ~9;
val it = ~9 : int

ML also has floating point arithmetic (in ML this type is called “real”), but treats it in a very prudish manner. Specifically (and this is a taste of the type checker doing its job too well), ML does not coerce types for you. If you want to multiply a real number and an integer, you have to first convert the int to a real and then multiply. An error will occur if you do not:

- val x = 4.0;
val x = 4.0 : real
- val y = 7;
val y = 7 : int
- x * y;
stdIn:5.1-5.6 Error: operator and operand don't agree [tycon mismatch]
operator domain: real * real
operand:         real * int
in expression:
x * y
- x * Real.fromInt(y);
val it = 28.0 : real

Here is a list of all operations on reals. We don’t anticipate using reals much, but it’s good to know that ML fervently separates them.

It seems odd that we’re talking so much about statements, because often enough we will be either binding function names (and tests) to the global environment, or restricting ourselves to local variable declarations. The latter has a slightly more complicated syntax, simply surrounding your variable declarations and evaluated code in a “let … in … end” expression. This will be a much more common construction for us.

let
val x = 7
val y = 9
in
(x + 2*y) div 3
end

The “in” expression is run with the combined variables from the ambient environment (the variables declared outside of the let) and those defined inside the let. The variables defined in the let leave scope after the “in” expression is evaluated, and the entire let expression as a whole evaluates to the result of evaluating the “in” expression. Clearly and example shows what is going on much more directly than words.

The last major basic expression form are the boolean expressions and operations. The type for booleans in ML is called “bool,” and the two possible values are “true,” and “false.” They have the usual unary and binary operations, but the names are a bit weird. Binary conjunction is called “andalso,” while binary disjunction is called “orelse.”

val a = true and b = false;
val c = (a andalso b) orelse ((not a) andalso (not b));

But aside from that, boolean expressions work largely as one would expect. There are the six standard numerical comparison functions, where testing for equality is given by a single equals sign (in most languages, comparison for equality is ==), and inequality is given by the diamond operator <>. The others are, as usual, <, <=, >, >=.

- 6 = 7;
val it = false : bool
- 6 = 6;
val it = true : bool
- 6 < 7;
val it = true : bool
- 7 <= 6;
val it = false : bool
- 6 <> 7;
val it = true : bool

ML also has the standard if statement, which has the following syntax, which is more or less the same as most languages:

- val x = if 6 < 7 then ~1 else 4;
val x = ~1 : int

ML gives the programmer more or less complete freedom with whitespace, so any of these expressions can be spread out across multiple lines if the writer desires.

val x = if 6 < 7
then
~1
else
4

This can sometimes be helpful when defining things inside of let expressions inside of function definitions (inside of other function definitions, inside of …).

So far the basics are easy, if perhaps syntactically different from most other languages we’re familiar with. So let’s move on to the true heart of ML and all functional programming languages: functions.

## Functions and cases, recursion

Now that we have basic types and conditions, we can start to define some simple functions. In the global environment, functions are defined the same way as values, using the word “fun” in the place of “val.” For instance, here is a function that adds 3 to a number.

fun add3(x) = x+3

The left hand side is the function signature, and the right hand side is the body expression. As in Racket, and distinct from most imperative languages, a function evaluates to whatever the body expression evaluates to. Calling functions has two possible syntaxes:

add3(5)
(add3 5)

In other words, if the function application is unambiguous, parentheses aren’t required. Otherwise, one can specify precedence using parentheses in either Racket (Lisp) style or in standard mathematical style.

The most significant difference between ML and most other programming languages, is that ML’s functions have case-checking. That is, we can specify what action is to be taken based on the argument, and these actions are completely disjoint in the function definition (no if statements are needed).

For instance, we could define an add3 function which nefariously does the wrong thing when the user inputs 7.

fun add3(7) = 2
| add3(x) = x+3

The vertical bar is read “or,” and the idea is that the possible cases for the function definition must be written in most-specific to least-specific order. For example, interchanging the orders of the add3 function cases gives the following error:

- fun add3(x) = x+3
stdIn:13.5-14.16 Error: match redundant
x => ...
-->   7 => ...

Functions can call themselves recursively, and this is the main way to implement loops in ML. For instance (and this is quite an inefficient example), I could define a function to check whether a number is even as follows.

fun even(0) = true
| even(n) = not(even(n-1))

Don’t cringe too visibly; we will see recursion used in less horrifying ways in a moment.

Functions with multiple arguments are similarly easy, but there are two semantic possibilities for how to define the arguments. The first, and simplest is what we would expect from a typical language: put commas in between the arguments.

fun add(x,y) = x+y

When one calls the add function, one is forced to supply both arguments immediately. This is usually how programs are written, but often times it can be convenient to only supply one argument, and defer the second argument until later.

If this sounds like black magic, you can thank mathematicians for it. The technique is called currying, and the idea stems from the lambda calculus, in which we can model all computation using just functions (with a single argument) as objects, and function application. Numbers, arithmetic, lists, all of these things are modeled in terms of functions and function calls; the amazing thing is that everything can be done with just these two tools. If readers are interested, we could do a post or two on the lambda calculus to see exactly how these techniques work; the fun part would be that we can actually write programs to prove the theorems.

Function currying is built-in to Standard ML, and to get it requires a minor change in syntax. Here is the add function rewritten in a curried style.

fun add(x)(y) = x+y

Now we can, for instance, define the add3 function in terms of add as follows:

val add3 = add(3)

And we can curry the second argument by defining a new function which defers the first argument appropriately.

fun add6(x) = add(x)(6)

Of course, in this example addition is commutative so which argument you pick is useless.

We should also note that we can define anonymous functions as values (for instance, in a let expression) using this syntax:

val f = (fn x => x+3)

The “fn x => x+3″ is just like a “lambda x: x+3″ expression in Python, or a “(lambda (x) (+ x 3))” in Racket. Note that one can also define functions using the “fun” syntax in a let expression, so this is truly only for use-once function arguments.

## Tuples, Lists, and Types

As we’ve discovered already, ML figures out the types of our expressions for us. That is, if we define the function “add” as above (in the REPL),

- fun add(x,y) = x+y;
val add = fn : int * int -> int

then ML is smart enough to know that “add” is to accept a list of two ints (we’ll get to what the asterisk means in a moment) and returns an int.

The curried version is similarly intuited:

- fun add(x)(y) = x+y;
val add = fn : int -> int -> int

The parentheses are implied here: int -> (int -> int). So that this is a function which accepts as input an int, and produces as output another function which accepts an int and returns an int.

But, what if we’d like to use “add” to add real numbers? ML simply won’t allow us to; it will complain that we’re providing the wrong types. In order to make things work, we can tell ML that the arguments are reals, and it will deduce that “+” here means addition on reals and not addition on ints. This is one awkward thing about ML; while the compiler is usually able to determine the most general possible type for our functions, it has no general type for elements of a field, and instead defaults to int whenever it encounters arithmetic operations. In any case, this is how to force a function to have a type signature involving reals:

- fun add(x:real, y:real) = x+y;
val add = fn : real * real -> real

If we’re going to talk about types, we need to know all of ML’s syntax for its types. Of course there are the basics (int, real, bool). Then there are function types: int -> int is the type of a function which accepts one int and returns an int.

We’ll see two new types in this section, and the first is the tuple. In ML, tuples are heterogeneous, so we can mix types in them. For instance,

- val tup = (true, 7, 4.4);
val tup = (true,7,4.4) : bool * int * real

Here the asterisk denotes the tuple type, and one familiar with set theory can think of a tuple as an element of the product of sets, in this case

$\displaystyle \left \{ \textup{true}, \textup{false} \right \} \times \mathbb{Z} \times \mathbb{R}$

Indeed, there is a distinct type for each possible kind of tuple. A tuple of three ints (int * int * int) is a distint type from a tuple of three booleans (bool * bool * bool). When we define a function that has multiple arguments using the comma syntax, we are really defining a function which accepts as input a single argument which is a tuple. This parallels exactly how functions on multiple arguments work in classical mathematics.

The second kind of compound type we’ll use quite often is the list. Lists are distinct from tuples in ML in that lists must be homogenous. So a list of integers (which has type “int list”) is different from a list of booleans.

The operations on lists are almost identical as in Haskell. To construct explicit lists use square brackets with comma-delimited elements. To construct them one piece at a time, use the :: list constructor operation. For those readers who haven’t had much experience with lists in functional programming: all lists are linked lists, and the :: operation is the operation of appending a single value to the beginning of a given list. Here are some examples.

- val L = [1,2,3];
val L = [1,2,3] : int list

- val L = 1::(2::(3::nil));
val L = [1,2,3] : int list

The “nil” expression is the empty list, as is the empty-square-brackets expression “[]“.

There is a third kind of compound type called the record, and it is analogous to a C struct, where each field is named. We will mention this in the future once we have a need for it.

The most important thing about lists and tuples in ML is that functions which operate on them don’t always have an obvious type signature. For instance, here is a function which takes in a tuple of two elements and returns the first element.

fun first(x,y) = x

What should the type of this function be? We want it to be able to work with any tuple, no matter the type. As it turns out, ML was one of the first programming language to allow this sort of construction to work. The formal name is parametric polymorphism. It means that a function can operate on many different kinds of types (because the actual computation is the same regardless of the types involved) and the full type signature can be deduced for each function call based on the parameters for that particular call.

The other kind of polymorphism is called ad-hoc polymorphism. Essentially this means that multiple (very different) operations can have the same name. For instance, addition of integers and addition of floating point numbers require two very different sets of instructions, but they both go under the name of +.

What ML does to make its type system understand parametric polymorphism is introduce so-called type variables. A type variable is any string prefixed by a single quote, e.g. ‘a, and they can represent any type. When ML encounters a function with an ambiguous type signature, it decides what the most general possible type is (which usually involves a lot of type variables), and then uses that type.

So to complete our example, the first function has the type

'a * 'b -> 'a

As a side note, the “first” function is in a sense the “canonical” operation that has this type signature. If nothing is known about the types, then no other action can happen besides the projection. There are some more interesting things to be said about such canonical operations (for instance, could we get away with not having to even define them at all).

The analogous version for lists is as follows. Note that in order to decompose a list into its first element and the tail list, we need to use pattern matching.

fun listFirst([x]) = x
| listFirst(head::tail) = head

And this function has the type signature ‘a list -> ‘a. As a slightly more complicated example (where we need recursion), we can write a function to test for list membership.

fun member(x, nil) = false
else member(x, tail)

If you run this program and see some interesting warning messages, see this StackOverflow question for a clarification.

## Defining New Types

The simplest way to define a new type is to just enumerate all possibilities. For instance, here is an enumerated datatype with four possibilities.

datatype maths = algebra | analysis | logic | computation

Then we can define functions which operate on those types using pattern matching.

fun awesome(algebra) = true
| awesome(analysis) = false
| awesome(logic) = false
| awesome(computation) = true

And this function has type maths -> bool (don’t take it too seriously :-)). We can also define data types whose constructors require arguments.

datatype language = functional of string*bool
| imperative of string
| other

Here we define a language to be functional or imperative. The functional type consists of a name and a boolean representing whether it is purely functional, while the imperative type just consists of a name. We can then construct these types by treating the type constructors as if they were functions.

val haskell = functional("Haskell", true) and
java = imperative("Java") and
prolog = other;

Perhaps more useful than this is to define types using type variables. A running example we will use for the remainder of this post is a binary tree of homogeneous elements at each node. Defining such types is easy: all one needs to do is place the appropriate type (with parentheses to clarify when the description of a type starts or ends) after the “of” keyword.

datatype 'a Tree = empty
| leaf of 'a
| node of (('a Tree) * 'a * ('a Tree))

We can create instances of an integer tree as expected:

val t2 = node(node(leaf(2), 3, leaf(4)), 6, leaf(8))

## The TreeSort Algorithm

We can define a host of useful operations on trees (of any type). For instance, below we compute the breadth of a tree (the total number of leaves), the depth of a tree (the maximal length of a path from the root to a leaf), and the ability to flatten a tree (traverse the tree in order and place all of the values into a list). These first two are a nice warm-up.

fun breadth(empty) = 0
| breadth(node(left, _, right)) = breadth(left) + breadth(right)

Here the underscore indicates a pattern match with a variable we won’t actually use. This is more space efficient for the compiler; it can avoid adding extra values to the current environment in a potentially deep recursion.

fun depth(empty) = 0
| depth(leaf(_)) = 1
| depth(node(left, _, right)) =
let
val (lDepth, rDepth) = (1 + depth(left), 1 + depth(right))
in
if lDepth > rDepth then lDepth else rDepth
end

This function should be self explanatory.

fun flatten(empty) = []
| flatten(leaf(x)) = [x]
| flatten(node(left, x, right)) =
flatten(left) @ (x :: flatten(right))

Here the @ symbol is list concatenation. This is not quite the most efficient way to do this (we are going to write a forthcoming post about tail-call optimization, and there we will see why), but it is certainly the clearest. In the final recursive call, we traverse the left subtree first, flattening it into a list in order. Then we flatten the right hand side, and put the current node’s element in between the two flattened parts.

Note that if our tree is ordered, then flatten will produce a strictly increasing list of the elements in the tree. For those readers unfamiliar with ordered binary trees, for all intents and purposes this is an “int Tree” and we can compare the values at different nodes. Then an ordered binary tree is a tree which satisfies the following property for each node: all of the values in the left child’s subtree are strictly smaller than the current node’s value, and all of the values in the right child’s subtree are greater than or equal to the current node’s value.

Indeed, we can use the flatten function as part of a simple algorithm to sort lists of numbers. First we insert the numbers in the unsorted list into a tree (in a way that preserves the ordered property at each step), and then we flatten the tree into a sorted list. This algorithm is called TreeSort, and the insert function is simple as well.

fun insert(x, empty) = leaf(x)
| insert(y, leaf(x)) = if x <= y
then node(empty, x, leaf(y))
else node(leaf(y), x, empty)
| insert(y, node(left, x, right)) =
if x <= y
then node(left, x, insert(y, right))
else node(insert(y, left), x, right)

If we’re at a nonempty node, then we just recursively insert into the appropriate subtree (taking care to create new interior nodes if we’re at a leaf). Note that we do not do any additional computation to ensure that the tree is balanced (that each node has approximately as many children in its left subtree as in its right). Doing so would digress from the point of this primer, but rest assured that the problem of keeping trees balanced has long been solved.

Now the process of calling insert on every element of a list is just a simple application of fold. ML does have the standard map, foldl, foldr, and filter functions, although apparently filter is not available in the standard library (one needs to reference the List module via List.filter).

In any case, foldl is written in the currying style and building the tree is a simple application of it. As we said, the full sorting algorithm is just the result of flattening the resulting tree with our in-order traversal.

fun sort(L) = flatten(foldl(insert)(empty)(L))

So there you have it! One of the simplest (efficient) sorting algorithms I can think of in about twelve lines of code.

## Free Monoid Homomorphisms: A More Advanced Example

Just to get a quick taste of what our series on category theory will entail, let’s write a program with a slightly more complicated type signature. The idea hinges on the idea that lists form a what’s called a free monoid. In particular,

Definition: monoid is a set $X$ equipped with an associative binary operation $\cdot: X \times X \to X$ and an identity element $1 \in X$ for which $x1 = 1x = x$ for all $x \in X$.

Those readers who have been following our series on group theory will recognize a monoid as a group with less structure (there is no guarantee of inverses for the $\cdot$ operation). The salient fact for this example is that the set of ML values of type $\textup{'a list}$ forms a monoid. The operation is list concatenation, and the identity element is the empty list. Call the empty list nil and the append operation @, as it is in ML.

More than that, $\textup{'a list}$ forms a free monoid, and the idea of freeness has multiple ways of realization. One sort of elementary way to understand freeness is that the only way to use the binary operation to get to the identity is to have the two summands both be the identity element. In terms of lists, the only way to concatenate two lists to get the empty list is to have both pieces be empty to begin with.

Another, more mature (more category-theoretical) way to think about freeness is to say is satisfies the following “universal” property. Call $A$ the set of values of type ‘a, and $[A]$ the set of values of type $\textup{'a list}$, and suppose that $(B, \cdot_B, 1_B)$ is the datum of an arbitrary monoid. The universal property says that if we are given a function $f: A \to B$, and we take the canonical map $g: A \to [A]$ which maps an element $a \in A$ to the single-entry list $[a] \in [A]$, then there is a unique way to extend $f$ to a monoid homomorphism $f^* : [A] \to B$ on lists, so that $f^*g = f$. We have mentioned monoid homomorphisms on this blog before in the context of string metrics, but briefly a monoid homomorphism respects the monoid structure in the sense that (for this example) $f^*(a \textup{ @ } b) = f^*(a) \cdot_B f^*(b)$ no matter what $a, b$ are.

This was quite a mouthful, but the data is often written in terms of a so-called “commutative diagram,” whose general definition we will defer to a future post. The diagram for this example looks like:

The dashed line says we are asserting the existence of $f^*$, and the symbol $\exists !$ says this function exists and is uniquely determined by $f, g$. The diagram “commutes” in the sense that traveling from $A$ to $B$ along $f$ gives you the same computational result as traveling by $g$ and then $f^*$. The reason for the word “universal” will become clear in future posts, but vaguely it’s because the set $[A]$ is a unique “starting place” in a special category.

If this talk is too mysterious, we can just go ahead and prove that $f^*$ exists by writing a program that computes the function transforming $f \mapsto f^*$. We call the function “listMonoidLift” because it “lifts” the function $f$ from just operating on $A$ to the status of a monoid homomorphism. Very regal, indeed.

Part of the beauty of this function is that a number of different list operations (list length, list sum, member tests, map, etc.), when viewed under this lens, all become special cases of this theorem! By thinking about things in terms of monoids, we write less code, and more importantly we recognize that these functions all have the same structural signature. Perhaps one can think of it like parametric polymorphism on steroids.

fun listMonoidLift(f:('a->'b), (combine:(('b * 'b) -> 'b), id:'b)) =
let
fun f'(nil) = id
in
f'
end

Here we specified the types of the input arguments to be completely clear what’s what. The first argument is our function $f$ as in the above diagram, and the second two arguments together form the data of our monoid (the set $B$ is implicitly the collection of types $'b$ determined at the time of the function call). Now let’s see how the list summation and list length functions can be written in terms of the listMonoidLift function.

fun plus(x, y) = x + y
val sum = listMonoidLift((fn x => x), (plus, 0))
val length = listMonoidLift((fn x => 1), (plus, 0))

The plus function on integers with zero as the identity is the monoid $B$ in both cases (and also happens to be $A$ by coincidence), but in the summation case the function $f$ is the identity and for length it is the constant $1$ function.

As a more interesting example, see how list membership is a lift.

fun member(x) = listMonoidLift((fn y => y = x),
((fn (a,b) => a orelse b), false))

Here the member function is curried; it has type ‘a -> ‘a list -> bool (though it’s a bit convoluted since listMonoidLift is what’s returning the ‘a list -> bool part of the type signature). Here the $B$ monoid is the monoid of boolean values, where the operation is logical “or” and the identity is false. It is a coincidence and a simple exercise to prove that $B$ is a free monoid as well.

Now the mapping $f(y)$ is the test to see if $y$ is the same object as $x$. The lift to the list monoid will compute the logical “or” of all evaluations of $f$ on the values.

Indeed, (although this author hates bringing up too many buzzwords where they aren’t entirely welcome) the monoid lifting operation we’ve just described is closely related to the MapReduce framework (without all the distributed computing parts). Part of the benefit of MapReduce is that the programmer need only define the Map() and Reduce() functions (the heart of the computation) and MapReduce does the rest. What this example shows is that defining the Map() function can be even simpler: one only needs define the function $f$, and Map() is computed as $f^*$ automatically. The Reduce() part is simply the definition of the target monoid $B$.

Just to drive this point home, we give the reader a special exercise: write map as a special case of listMonoidLift. The result (the map function) should have one of the two type signatures:

map : ('a -> 'b) * ('a list) -> 'b list
map : 'a -> 'b -> 'a list -> b' list

As a hint, the target monoid should also be a list monoid.

Part of why this author is hesitant to bring up contemporary software in discussing these ideas is because the ideas themselves are far from contemporary. Burstall and Landin’s, 1969 text Programs and Their Proofs (which this author would love to find a copy of) details this exact reduction and other versions of this idea in a more specific kind of structure called “free algebras.” So MapReduce (minus the computer programs and distributed systems) was a well-thought-out idea long before the internet or massively distributed systems were commonplace.

In any case, we’ll start the next post in this series right off with the mathematical concept of a category. We’ll start slow and detail many examples of categories that show up both in (elementary) mathematics and computing, and then move on to universal properties, functors, natural transformations, and more, implementing the ideas all along the way.

Until then!