Other Complexity Classes

Not Just Time, But Space Too!

So far on this blog we’ve introduced models for computation, focused on Turing machines and given a short overview of the two most fundamental classes of problems: P and NP. While the most significant open question in the theory of computation is still whether P = NP, it turns out that there are hundreds (almost 500, in fact!) other “classes” of problems whose relationships are more or less unknown.

For instance, we can ask about problems that can be solved using polynomially-bounded space instead of time. This class would be called “PSPACE,” and it’s easy to see that \textup{P} \subset \textup{PSPACE}. As usual, there is a corresponding notion of problems solvable with polynomial-bounded space on a nondeterministic Turing machine, having the likely name NPSPACE. One might expect another huge open question: does PSPACE = NPSPACE? It turns out they are equal, and this just goes to show how sometimes nondeterminism doesn’t allow us to “do more” in one sense, but it does in another.

Another interesting question is the computational complexity of problems in the presence of oracles. Suppose we had a magical machine that could solve some difficult or impossible problem (say, the Halting Problem) in constant time. Then what sorts of problems can be solved in polynomial time? This notion gives rise to a whole new family of complexity classes that rely critically on using the oracle.

Above we give an example of some believed relationships between some basic complexity classes (I say basic, but even I haven’t ever heard of ZPP before). In a simplistic mindset, the goal of computing theory as a field of mathematics is to determine all of the relationships between these classes. In other words, we want to be able to answer all questions like, “Is computation with logarithmically-bounded space fundamentally different from computation with polynomially-bounded time?” Tacking this on to the fat list of open questions like P vs. NP, we admit that nobody knows the answer.

Unfortunately a primer on any more of these complexity classes would take us too far from the scope of this blog. We hardly skimmed the surface on P and NP, leaving quite a bit of low-hanging fruit to speculation (like, is there a problem which is not in NP? Or more subtly, is there a problem which is in NP, but not NP-complete? There is such a problem, but as far as this author knows, there is no known naturally occurring problem).

But fear not, interested reader! Even if we can’t reasonably commit to cataloging more complexity classes here, we do have an excellent reference: the Complexity Zoo. This is a more or less complete list of all known complexity classes, with pages and pages of descriptions and documented relationships with links to papers. Browsing the front-page list of classes, we see such intriguing and mysterious names as “Almost-PSPACE,” and “HeuristicP,” and “Quantum Refereed Games.”

The novice reader should start with the Petting Zoo, which expands on our meager introduction by describing, mostly in informal terms, the twenty most important complexity classes, and providing awesome inclusion graphs like the one below:

"Really Important Inclusions" among complexity classes.

The project was conceived and is organized by Scott Aaronson, a prominent researcher at MIT (with a neat blog of his own, though it’s pretty difficult material when he talks about his own research in quantum circuits). The Zoo also has a small side-exhibit focusing on the problems themselves, which is called the Complexity Garden. For instance he covers the quintessential NP-complete problem that we mentioned last time, namely N-Satisfiability.

Here on this blog we do have one more primer in the realm of Complexity Theory planned, but it’s more a question about data than Turing machines. As mentioned in our post on low-complexity art, we eventually mean to introduce the notion of Kolmogorov complexity, and prove a few of its basic and interesting properties.

Until then!

About these ads

P vs. NP, A Primer (And a Proof Written in Racket)

Decidability Versus Efficiency

In the early days of computing theory, the important questions were primarily about decidability. What sorts of problems are beyond the power of a Turing machine to solve? As we saw in our last primer on Turing machines, the halting problem is such an example: it can never be solved a finite amount of time by a Turing machine. However, more recently (in the past half-century) the focus of computing theory has shifted away from possibility in favor of determining feasibility. In particular, we want to know how fast we can solve a particular problem on a Turing machine. The goal is to design efficient algorithms for important real-world problems, and know when it is impossible to design a more efficient algorithm than what we already have. From a mathematical perspective, that often means we would like to find the boundary problems: those which are “just barely” too hard to admit an efficient algorithm. Of course, this requires us to first define what it means for an algorithm to be “efficient.”

These questions are intimately intertwined with big-O analysis, which we presented in our very first primer on this blog, and the definitions we investigate in the sequel require fluency in such notation. That being said, we try to give colloquial analogues to the most important ideas.

The Class P

The general goal here is to describe classes of problems that roughly correspond to how efficiently one can solve them. The first such class (and by all rights, the smallest class we will consider) is called P, which stands for “polynomial.”

Definition: P is the set of languages which can be decided by a Turing machine in O(n^k) for some k, where n is the input size of the problem. In the common notation:

\displaystyle \textup{P} = \bigcup_{k \in \mathbb{N}} \textup{TIME}(n^k)

Where \textup{TIME}(n^k) is the set of problems decidable in O(n^k) time. Recall that a language is any set of strings (any subset of \Sigma^*, as we have been investigating in our past computing theory primers and our post about metrics on words).

We don’t have too much to say about P itself, except that many common problems are in P. The real difficulty is in proving that some problem does not lie in P.

Here is an example this author has used to explain to elementary school children what’s it’s like to not be in P. Consider a deck of 52 playing cards, and your goal is to sort the cards by suit and then by rank (as it is when you open a fresh pack of cards). There is one obvious way to do this that children teach themselves quite readily: just look for the Ace of spades, put it on the bottom of a new stack. Then find the two of spades, and continue in this manner until the deck is sorted. This algorithm takes O(n^2), since in the very worst case the deck is sorted in reverse order, and if you’re too dim to realize it, you look through the entire deck at every step.

Of course, there are much faster ways to do it than the approach we detailed above, and it is a well-known theorem that sorting is \Theta(n \log(n)). That is, there is both a lower bound and an upper bound of n \log(n) for the general sorting problem.

Now let’s investigate a ridiculous algorithm for sorting cards. What if at each step, we shuffled the deck, and then checked to see if it magically became sorted due to our shuffling. Forgetting for a moment that randomness comes into play, we would find the correct sorting once in 52! trials, and it would take 52 steps to check if it was sorted. For a deck of n cards, this would hence take time O(n! \cdot n), which would take so long even for 52 cards, that the sun would go out before you could finish. But what if we didn’t know a better way?

For an example of a problem where we don’t know a better way, consider the problem of graph coloring. Is there a Turing machine which, when given as input a graph G and a number k, can determine in polynomial time whether G admits a k-coloring? There is an obvious algorithm to decide the problem: there are only finitely many choices of assignments of vertices to colors, and we can simply check them all. In fact, there are k^n of them, where n is the number of vertices of G.

Unfortunately, this algorithm is not polynomial in runtime: we would have to enumerate all of the different colorings, and check whether each is a valid coloring; this process is easily seen to be o(nk^n), which is far from polynomial for arbitrary k > 2.

But the true challenge is this: how do we know there is no “faster” algorithm? Of all the crazy wild ideas one could have to solve a problem, how do we know none can be so clever that they reduce the running time to be a polynomial in n?

In fact, we don’t know for sure that there isn’t! This is the heart of the open problem which is succinctly called “P vs. NP”.

The Class NP

While P is a class of “easy” problems from one perspective (problems that can be solved quickly, even in the worst case), being a member of NP is another measure of “easiness,” but from a different perspective.

Definition: NP is the class of problems which have a polynomial-time verifier. That is, given an input w to a problem A \in \textup{NP}, there is a string c called a certificate and a Turing machine M for which M verifies that c proves w \in A and runs in polynomial time.

This definition is a bit hard to swallow, but examples clarify the situation greatly. For the problem of graph coloring, we note that a certificate would simply be a list of pairs (v_i, n_i) which give a coloring of the graph G. It is quite trivial to define a polynomial-time Turing machine that ensures the coloring of G is valid. Hence, graph coloring is in NP. This is the case with most problems in NP: a proof that w \in A is hard to find, but easy to verify once you have it.

There is another definition of NP which is often useful, and it gives a reason for prefixing the “polynomial” part of P with an N.

Definition: NP is the set of problems which are solvable by a nondeterministic Turing machine in polynomial time.

For the motivating picture behind “nondeterministic Turing machines,” we turn to an analogy. Imagine you have an infinite number of computers running in parallel, and they can communicate instantaneously. What sorts of problems could we solve in polynomial time with such a machine? We could certainly solve graph coloring: simply have each machine try one of the k^n different colorings, and have the entire machine halt when one coloring is found (or when they all finish, we can safely reject that the graph is k-colorable).

So we can reformulate the definition in set notation as:

\displaystyle \textup{NP} = \bigcup_{k \in \mathbb{N}} \textup{NTIME}(n^k)

Here \textup{NTIME}(f(n)) is the set of all languages which can be solved in time f(n) on a nondeterministic Turing machine.

In other words, “NP” stands for “nondeterministic polynomial-time” problems. And in fact this definition is equivalent to existence of a polynomial-time verifier, as in our first definition. To see this, note that we can construct a nondeterministic machine that enumerates all possible certificates (lending to our analogy, one on each of the infinitely numerous parallel computers), and then tests them using the polynomial-time verifier. Since each branch uses a polynomial-time algorithm, the whole Turing machine runs in (nondeterministic) polynomial time. On the other hand, if some branch of computation halts in deterministic time, then the sequence of configurations of the tape for that branch has polynomial length, and so a Turing machine can simply run that computation to ensure it follows the rules of Turing machines and ends in an accepting state. This clearly represents a certificate.

One might ask why we care about infinitely parallel computers. In reality, we can only have finitely many computations going at once, so why bother with this silly mathematical uselessness? As it often turns out in mathematics, it is useful to think about such structures simply because they capture the essence of what we wish to study in a concise and formal manner. For complexity theory, nondeterministic Turing machines capture the level of complexity we wish to understand: in a sense, it’s the “next level up” from polynomial-time decidable problems.

K-Clique and 3-Sat

We have two more examples of problems in NP which will be important later in this post: the problems of 3-Satisfiability and finding a k-Clique.

Definition: Let \varphi(x_1, \dots, x_n) be a propositional formula in n boolean variables. \varphi is satisfiable if there exists an assignment of the variables x_1, \dots, x_n \to \left \{ \textup{T}, \textup{F} \right \} that makes \varphi true.

For example, we could have the formula

\displaystyle (x \vee y \vee \overline{z}) \wedge (\overline{x} \vee \overline{y} \vee z)

And this is satisfiable by the assignment of x = \textup{T}, y = \textup{F}, z = \textup{T}.

It should be obvious to the reader now that determining whether a formula is satisfiable is in NP: a certificate is just a list of variables mapping to true or false, and checking that the formula is satisfied by a given assignment can be done in polynomial time.

In 3-Satisfiability (usually shortened to 3-Sat), we simply restrict the form of \varphi and ask the same question. The form is called conjunctive normal form, and colloquially it is a bunch of clauses joined with “and,” where each clause is a bunch of variables (and their negations) connected by “or”. Moreover, the “3” in 3-Sat requires that each clause contain exactly three literals. For example, the equation given above would be a valid input to 3-Sat.

k-Clique, on the other hand, is a question about graphs. Given a graph G and a positive integer k, determine whether G contains a complete subgraph of k vertices. (In a complete graph, there is an edge connecting each pair of vertices; the name “clique” is motivated by the party problem, in the sense that there is a “clique” of friends at the party who are all mutually friends with each other.)

As expected, a certificate that a graph has a k-Clique is just a list of the vertices in the clique, and checking that all pairs of vertices listed have a connecting edge is easily seen to take O(|G|k^2) = O(n^3), which is polynomial in the size of the input.

NP-Completeness

As it turns out, the problems of 3-Satisfiability and k-Clique are quite special (as is graph coloring). They belong to a special sub-class of NP called NP-complete. Before we can define what NP-complete means, we have to be able to compare problems in NP.

Definition: Given two languages A, B, we say A \leq_p B, or A is polynomial-time reducible to B if there exists a computable function f: \Sigma^* \to \Sigma^* such that w \in A if and only if f(w) \in B, and f can be computed in polynomial time.

We have seen this same sort of idea with mapping reducibility in our last primer on Turing machines. Given a language B that we wanted to show as undecidable, we could show that if we had a Turing machine which decided B, we could solve the halting problem. This is precisely the same idea: given a solution for B and an input for A, we can construct in polynomial time an input for B, use a decider for B to solve it, and then output accordingly. The only new thing is that the conversion of the input must happen in polynomial time.

Of course, this discussion was the proof of a clarifying lemma:

Lemma: If B \in \textup{P} and A \leq_p B, then A \in \textup{P}.

The proof is immediate, as B can be solved in polynomial time, and the conversion function runs in polynomial time. We leave the construction of an explicit Turing machine to decide A as an exercise to the reader.

To phrase things more colloquially, A \leq_p B is true if A is an easier” problem than B, hence justifying the less-than notation.

And now for the amazing part: there are problems in NP which are harder than all other problems in NP.

Definition: A language A \in \textup{NP} is called NP-complete if for all problems B \in \textup{NP}, B \leq_p A.

In other words, all problems in NP reduce to an NP-complete problem in polynomial time. In fact, we get another nice fact about NP-completeness that mirrors our observation about P above:

Lemma: If A is NP-complete and B \in \textup{NP} with A \leq_p B, then B is NP-complete.

Obviously the composition of two polynomial-time reductions is a polynomial-time reduction, so we can conclude that all problems in NP which reduce to A also reduce to B.

The cautious reader should be rather hesitant to believe that NP-complete problems should even exist. There is no reason we can’t come up with harder and harder problems, so why should there be a point after which we can’t quickly verify a solution?

Well, Stephen Cook proved in 1971 that there is an NP-complete problem, and shortly thereafter many more were found. Today, there are thousands of known NP-complete problems.

Perhaps unsurprisingly, Cook’s original NP-complete problem was N-Satisfiability (i.e., 3-Satisfiability without a constraint on the number of clauses or the form). Unfortunately the proof is quite difficult. We point the reader to the relevant Wikipedia page, and briefly mention the outline of a proof.

Given a nondeterministic polynomial-time Turing machine, we can bound the number of parallel computations and the length of each computation by n^k for some fixed k. Then we create a n^k by n^k table of the configurations of the Turing machine (the i,j cell for the i-th branch of computation and the j-th step). From this, we can construct a monstrously enormous (yet polynomial in size) formula which has a satisfying assignment if and only if the Turing machine halts on some branch of computation in an accept state. Here is a table of the formulas needed to do this. In short, the formula traces the computation of the machine at each step, and ensures the transition function is honestly followed, the tape is reliably updated, and the head of each tape moves in the correct direction.

The reason we say it’s unsurprising that Satisfiability is NP-complete is because it’s commonly believed that every aspect of mathematics boils down to pure logic, although the process of doing so is entrenched in gory detail every step of the way. So it’s understandable that all problems in NP reduce to a problem about logic which is also in NP. We stipulate that other complexity classes likely have “complete” problems that are essentially questions about logical formulas.

A New Way to Find NP-Complete Problems

Now that we have established the NP-completeness of Satisfiability, we can do the same for other problems by reduction from a known NP-complete problem. First, we claim that 3-Satisfiability is NP-complete, and we leave the proof as an exercise to the reader (hint: reduce from regular Satisfiability by putting the formula into the right form).

Now given that 3-Sat is NP-complete, we will prove that k-Clique is NP-complete, by reduction from 3-Sat (in fact our conversion function will work for any formulas in conjunctive normal form, but 3 is enough).

Theorem: k-Clique is NP-complete.

Proof. Given a formula \varphi in conjunctive normal form, we construct an instance of k-Clique as follows. First, let k be the number of clauses in \varphi. Construct a graph G_{\varphi} by creating a vertex for each literal term in \varphi, and (to help visualization) organize them into columns by their originating clause, and label the vertex with its corresponding literal. Introduce an edge connecting two terms a, b in different columns when the formula a \wedge b is not a contradiction. In other words, it cannot be of the form x \wedge \overline{x} for some variable x.

As a concrete example, the formula

\displaystyle (x \vee \overline{y} \vee z) \wedge (\overline{x} \vee \overline{z} \vee w)

converts to the graph

We claim that \varphi has a satisfying assignment of variables if and only if G_{\varphi} has a k-clique. Supposing there is a valid assignment of variables in \varphi, then there must be one variable in each clause which is true (and hence k variables). This translates to k vertices in G_{\varphi}, one vertex in each column which is true, and none of these vertices conflict with each other, so G_{\varphi} has an edge connecting each pair of the k vertices. Conversely, suppose G_{\varphi} has a k-clique. By our construction, two edges in the same column cannot be connected by and edge, and hence this k-clique must have one vertex in every column. If the vertex is labeled with a negation, assign it to the value F (so that the literal evaluates to true), and otherwise assign it the value T. This gives a satisfying assignment of the variables of \varphi, since each clause will evaluate to true under this assignment.

The final part of the proof is that the conversion function runs in polynomial time, and we claim this is obvious from the construction: if \varphi has n literals, then we create O(n) vertices and O(n^2) edges. The creation of each vertex and edge can be done in constant time, as can the verification that two literals do not conflict. \square

Of course, this is a question about the possibilities of computers. Instead of giving a theoretical proof, why not just write a program to compute the conversion? Well we did just that, and the main body of the code turned out to be quite tidy:

;; n-sat-to-clique: formula -> (listof edge)
;; transform an input to n-sat to an input for clique
;; assume the input expression is in CNF, and that
(define (n-sat-to-clique expr)
  (let* ([conjuncts (∧-conjuncts expr)]
         [columns (map (λ (x) (∨-disjuncts x)) conjuncts)]
         [labeled-columns (label-graph columns 1)]
         [possible-edges (all-pairs-in-distinct-lists labeled-columns)])
     (list->set (filter no-conflict? possible-edges))))

We have a data structure for a general formula (and provide a function to compute the conjunctive normal form of any expression), and a data structure for a graph (essentially, a list of pairs of labelled vertices), and so the process of checking all possible edges, and filtering out those which have no conflict, clearly takes O(n^2) time.

The rest of the code required to run the function above is available on this blog’s Github page.

Other NP-complete Problems, and P =? NP

In the real world NP-complete problems show up everywhere. Application domains include cryptography, financial portfolio and investment management, scheduling and operation dynamics, packing problems, chem- and bioinformatics, guarding art galleries, circuit design, compiler design, and even modelling social networks.

There are even many questions one can ask about games that turn out to be NP-complete in complexity. For instance, many questions about the classic game of Tetris are NP-complete, along with Minesweeper, FreeCell, Lemmings, Battleship, and Mastermind.

Now the big unsolved question is does P = NP? In other words, can any of these seemingly hard problems be solved quickly? The simple fact is, if the answer is yes to one such problem, then the answer is yes not only to all NP-complete problems, but to all problems in NP (as we saw in our lemma earlier). This is the heart of the million-dollar question that has been crowned the most difficult open problem in computer science to date. Almost everyone agrees that P and NP should not be equal, but nobody can prove it.

Of course, common people love to talk about P and NP because of all of the wild things that would happen if we suddenly discovered that P = NP. All widely used security systems would fail, internet transactions would no longer be safe, efficiency in industry would increase by orders of magnitude, we’d unlock the secrets of the human genome, we’d quickly solve open mathematical problems and find staggeringly ginormicon primes (ginormicon = gigantic + enormous + decepticon), governments will topple, rioting in the streets, the artificial intelligence singularity will occur, etc., etc.

But all of this is just media hype. The likely reality is that some problems are simply too hard to be solved in polynomial time in general, just as there are probably problems which have no polynomial-time verifiers (i.e., problems outside of NP), excluding the trivial problems which are undecidable. In the end, it’s just a matter of time until mathematics sorts everything out, or proves that it’s impossible to do so. Indeed, just two years ago this author remembers waking up to the news that there was a 100 page proof that P is not equal to NP, and moreover that Stephen Cook himself considered it a serious attempt. Unfortunately it turned out to have irreparable flaws, but failure made it no less exciting: this is how great mathematics is made.

On the other hand, there are cranks out there who have, for the last few years, been convinced that they have proved P = NP, but are ignorant of their own flaws and the rest of the world’s criticism. May Gauss have mercy on their mathematical souls.

Until next time!

Busy Beavers, and the Quest for Big Numbers

Finding Bigger Numbers, a Measure of Human Intellectual Progress

Before we get into the nitty gritty mathematics, I’d like to mirror the philosophical and historical insights that one can draw from the study of large numbers.

That may seem odd at first. What does one even mean by “studying” a large number? Of course, I don’t mean we stare at the number 1,000,000,000,000, which is quite large, and wonder how mankind can benefit from its elusive properties. What I really mean is that scientific and mathematical discoveries are very closely tied in our collective ability to describe large numbers.

That may seem even odder, but let’s enjoy a short historical digression to illustrate the point. Note that much of this historical information is borrowed from an essay by Scott Aaronson. He gives a wonderfully entertaining essay on the topic of Busy Beavers, but where he assumes the reader has no background knowledge of computing theory, we will assume the reader is familiar with the relevant computational topics in our primers.

In the age of the Greeks, it was generally believed that some quantities were beyond counting. Things like the number of grains of sand in the desert were forfeit to the realm of “infinity.” But in the third century B.C.E., Archimedes recognized that they weren’t beyond counting. And in his treatise The Sand Reckoner, he developed a rudimentary notion of exponentials, and was able to provide an upper bound on such mysterious numbers:

There are some [...] who think that the number of the sand is infinite in multitude [...] again there are some who, without regarding it as infinite, yet think that no number has been named which is great enough to exceed its multitude [...] But I will try to show you [numbers that] exceed not only the number of the mass of sand equal in magnitude to the earth [...] but also that of a mass equal in magnitude to the universe.

He proceeded to give an upper bound on the number of grains of sand needed to fill the universe: 10^{63}. Now this was a quite large number, and certainly beyond most people’s ability to visualize in quantity. But by the time Arabic numerals and algebra became common worldly knowledge in the Middle Ages, exponentiation became a paradigm trivially expressed, allowing people to write such large numbers as 10^{10^{10}} with the same ease we do today. These sorts of counting exercises became the topic of mathematical folklore (think of the tale of rice filling a chessboard), and they run amok in contemporary discussions of finance, statistics, physics, economics, and computer science.

Let’s take a moment to investigate exactly how ridiculously huge exponentials are. And in spite of the awe we show for such gargantuan quantities, we foreshadow that exponents are far from the biggest numbers we can grapple.

Let’s take a large digit, say, 9, and repeat it a thousand times to make a number. By all rights, this is a big number! It has a whopping one thousand digits, and each decimal place is as large as it can be. Now, consider the meager 9^9. Given a few annoying minutes, we could actually perform nine multiplications and get this number exactly; we’ll save you the trouble, it’s 387,420,489, a number with nine digits. Now, let’s inspect 9^{9^9}. This number, requiring us to write only one more digit on the page, is 9^{387,420,489}, a number with a staggering 369,693,100 decimal digits! Raising 9 to this power once more, to get 9^{9^{9^9}} is already too large for Wolfram Alpha to give much information about.

Now, come the 20th century we see two big jumps in the expression of large numbers. The first is the Ackermann Function. For the purposes of this post, we can think of it as simply a sequence of numbers, which is constructed as follows. The first number in the sequence, A(1), is simply 1+1. The next is A(2) = 2*2, then A(3) = 3^3, and so on. Of course, defining “and so on” is tough, because we don’t have commonly known names for such operations. The fourth number in this sequence A(4), is 4 raised to the power of itself four times. I.e., A(4) = 4^{4^{4^4}}. We call this operation a tetration of the number 4 to the height of 4, and don it with the notation 4 \uparrow \uparrow 4. This isn’t so hard to visualize, but while it’s a huge number in itself (it has 10^{154} decimal digits), the best is yet to come. The fifth element of this sequence, A(5), is the nested tetration of 5 with itself 5 times. In other words, it is

A(5) = 5 \uparrow \uparrow (5 \uparrow \uparrow (5 \uparrow \uparrow (5 \uparrow \uparrow 5)))

For convenience, we give this the notation 5 \uparrow \uparrow \uparrow 5, and call it pentation. Of course, A(6) would then be 6 hexated to the height of 6, and so on forever. The brave reader will take this to its extreme, and imagine how huge A(A(5)) is… Mind blowing.

Of course, one might ask: where the hell did this beast come from? What are Ackermann numbers good for anyway? The Ackermann Function and it’s associated sequence, it turns out, serve as important counterexamples in the theory of computable functions. The familiar reader will recognize the claim that the Ackermann Function is computable, but not primitive recursive. For the rest of the world, here’s a historical explanation of its origin.

Alan Turing, the inventor of the Turing machine.

Back in the days before computers (the early 20th century), mathematicians were fiercely interested in the theory of computing things. The common visitor to this blog will recall our primers on Finite Automata, Turing Machines, and such things as the Game of Life. It might not be a surprise to hear that there are many many different models of computation, and before real computers actually came along it was the job of mathematicians to compare them, sorting out which were “more useful” or “more expressive” than others. In the end, the Turing machine model was actually used to create rudimentary computers, which have since evolved into the supercomputers and iPads we have today. The mathematicians exploring computation during this era were at the forefront of a technological and intellectual revolution. On their shoulders, humanity entered the so-called Age of Information.

Perhaps the original such study of computation was the idea of a computable function of the natural numbers. Given a set of elementary functions which are axiomatically defined to be computable, and a way of combining these elementary functions to make more complicated functions, mathematicians constructed a huge class of functions that were computable. For instance, the function which adds two numbers together is computable; we all learned a nice algorithm to perform addition in grade school. The definitions and axioms were chosen in order to affirm such easy tasks, and then mathematicians explored the axioms to see how far they could push them. This line of reasoning would bring one to the ultimate quest: for a given model of computation, figure out what sorts of things are not computable.

Of course, even without a firm foundation in the theory of computable functions, we would guess that the sequence of Ackermann numbers is computable. In fact, using the model of Turing machines (which happens to be equivalent to the theory of computable functions, insofar as the tasks it deems computable) we can prove it so. We have unbounded space and unbounded time, and we know for a fact that all of these mathematical operations boil down to repeated multiplication. We can even present pseudocode, and such sites as RosettaCode have hundreds of implementations, ranging from a single line of K to a tail-recursive implementation in OCaml. There is no doubt about it, the Ackermann function is computable. In a somewhat naive sense, the Ackermann sequence is no deeper or more insightful than Archimedes’s first attempt at exponentiation. It just carries the idea of exponentiation as far as it can go.

The next natural question is: are there any sequences of numbers which grow so ridiculously fast that they simply cannot be computed? Of course, we know already of one undecidable (non-computable) task: the halting problem. Recall that the halting problem asks if it is possible to design a computer program T which, when given the source code of any other computer program S and an input to that program w, can determine in a finite amount of time whether S will loop infinitely on the input w, or whether it will halt. The blatant fact is that this is impossible. A hint at the proof is the question: how would this program analyze itself? What about it’s evil twin that does exactly the opposite? Here’s a comical version of the proof in Dr. Seuss form, and a more formal proof can be found in our primers on the subject.

As we have seen before, one way to prove that something is not computable is to first assume that it is computable, and then use that to construct a program that solves the halting problem. This results in a contradiction, and we conclude that the original problem could not be computable. In other words, we are looking for a sequence of numbers which has the following property: if we were able to compute an arbitrarily large entry of the sequence in a finite amount of time, we could use that ability to determine whether any program will halt or loop infinitely in a finite amount of time.

Of course, the idea that such a sequence should even exist is dubious. But amazingly enough, we will construct one, and find the next level of big numbers.

Busy Beaver Numbers

The idea behind Busy Beavers is quite simple. It was discovered by Tibor Radó, a Hungarian mathematician, in May of 1962, and does not rely at all on the arithmetic expressions of the past. The idea requires a bit of knowledge about Turing machines, so after promoting our primer once more, we will reiterate the ideas.

A Turing machine is essentially an infinite strip of paper called the tape on which one scribbles symbols in a deterministic way. In order to know what to do at each step, a Turing machine has an internal state record, with a finite set of rules for how to transition from state to state and manipulate the tape (write new symbols, erase old symbols, and move around to different places on the tape). In full rigor, the Turing machine is only allowed to look at one of the symbols on the tape at a time. So one should truly imagine a little beaver running back and forth across his dam, making tick marks in the wood and never knowing what he’s going to do next until he sees the tick mark next to the one he just made. The Turing machine has no limit on the amount of time it can run, and the tape is infinite, so it can write as many symbols as it needs to do its work. In the end, it will either stop, report a  “yes” or a “no,” or it will keep on going until the end of time, never to halt.

Perhaps understandably, the Busy Beaver numbers have to do with counting steps of computation in Turing machines. In particular, for a fixed number n, it is possible to look at all Turing machines which have exactly n states in their internal record (indeed, there can only be finitely many, and it is not too difficult to come up with an exact number). Among these Turing machines, some will run forever and some will eventually stop. We only care about the ones that stop, and the largest possible number of steps that those machines take before stopping.

Definition: The nth Busy Beaver number, denoted BB(n), is the largest number of steps taken by a Turing machine with exactly n states, which uses only two symbols on the tape (say 0 and 1), and which begins with an initially blank tape (all of the entries start as 0).

In other words, BB(n) is the number of steps taken by the buiest beaver of all. We call a Turing Machine which achieves the required number of steps a busy beaver.

Look at that cute little beaver run along.

Unsurprisingly, there are very few one-state busy beavers. Recalling the formal definition of a Turing machine, it must have either an accept state or a reject state, and if there is only one state and the machine halts, then it takes only one step before halting. Hence, BB(1) = 1. A bit of number crunching will convince the reader that the second Busy Beaver number is in fact BB(2) = 6. Not impressed? Well it took a bit of research to get there, but Radó (the original discoverer of the Busy Beaver numbers) proved BB(3) = 21. This required a mountain of work and many different partial results to acquire. Why is it so hard, you ask? The fact is, there is no algorithm to list the Busy Beaver numbers. It is an uncomputable function. Let’s give a proof of this before we get to the other known values of BB(n).

Theorem: BB(n) is uncomputable. That is, there is no Turing machine that can takes as input n and computes BB(n).

Proof. Suppose to the contrary that such a machine existed, and call it T. We will use T to construct a machine which solves the halting problem as follows:

On input \left \langle T, w \right \rangle, a description of a Turing machine and its associated input, we can determine in finite time the number of states that T has (it is encoded in the description of T). Now simply compute BB(n), where n is the number of states in T, and simulate T on input w, counting its steps as it proceeds. Eventually the simulation of T either halts, or makes more steps than BB(n). In the former case, we may stop the computation and declare that T halts. In the latter, we know that BB(n) is the maximal number of steps that can occur for any Turing machine with n states that eventually halts. Therefore, if T had not halted on the input w, it must never halt. We may stop our computation there, proclaim an infinite loop, and thereby solve the halting problem.

This is a contradiction, so the sequence BB(n) cannot be computable. \square

This gives some insight into how ridiculously fast BB(n) grows. In fact, it not only grows faster than the Ackermann function, but it grows faster than any sequence of numbers that could ever be computed by a machine! Even with infinite time to do so! Indeed, if BB(n) were bounded by some computable function, then we could compute the upper bounds for BB(n), use that to eliminate Turing machines that run too long until we narrow down the exact answer, and this would all happen in a finite amount of time.

In a sense, this is completely understandable. Computers are extremely complex! Applications like web browsers only need millions or billions of states, and in the proof above we’re pretending we could determine what all programs would do, even if they have trillions or quadrillions or A(1000) states! Imagine all of the crazy things that could happen with that amount of computing power. The colloquial thesis is that a single program is simply not powerful enough to know all of that. It’s like asking a human to read the minds of every human on the planet, all of the conscious thoughts of simpler animals, and all of the minds of aliens that planet Earth has never dreamed of, but which perhaps reside in faraway galaxies. For indeed, a Turing machine that could count Busy Beaver numbers must itself have finitely many states, and halt on any well-formed input, but be able to reason about such mysterious behaviors of Turing machines that have never been created.

The first three values of BB(n) were indeed wimpy, but now that we know it grows unfathomably quickly, we can mention the remaining known results. First, BB(4) = 107 was proved in the 1980’s. But for BB(5), the best known lower bound is 47,176,870, and this was found as recently as 1990. On the other hand, in July of 2010 BB(6) was discovered to be no smaller than  7.412 \cdot 10^{36,534}, and it is generally believed that this value will never be known. Nobody is brave enough to give a stab at such higher numbers in the sequence as BB(10), and understandably so.

Plato: “One day, mankind will bask in enlightenment, having computed the Busy Beaver function!” Aristotle: “Word.”

In this light, the Ackermann function is a wimp. But there are some important philosophical issues to consider once more. In Archimedes’s day, people generally believed the sands were beyond counting, and they were later enlightened to know better. Even Archimedes would have been baffled by such modern constructions as pentation and hexation, but nowadays a high school mathematics student can handle the idea (at least, without attempting to visualize it). The day may come to pass when we invent a new way of defining large numbers, some which make the Busy Beaver numbers seem like plain old exponentiation. In those days, perhaps we will define a new model of computation that defies the halting problem and ushers in a new era of mathematical insight and technological prowess.

Indeed, such a model would make long-standing mathematical problems moot. With the power to compute Busy Beaver numbers, it would certainly be trivial to come up with a way to definitively answer the Goldbach Conjecture, or any other mathematical question which can be verified for a single number. In fact, we know already of a simple way to prove the Goldbach Conjecture using Busy Beavers. We leave this as an exercise to the aspiring reader.

Heiner Marxen’s page provides an up-to-date account of the current state of the Busy Beaver Problem, and a list of possible candidates for BB(5), the estimates on BB(6), and links to relevant papers. Scott Aaronson’s essay on Busy Beavers delves deeper into the philosophical implications of solving the Busy Beaver problem (and gives rich analogies), but these considerations, which are essentially fanciful speculation, are beyond the scope of this blog.

Speaking of the scope of this blog, we have yet to write any programs! This will not do, and so next time we will write a program which simulates the currently known busy beavers (or busy beaver candidates) for n \leq 6, and displays the changing tape as the beaver runs his route. In a sense, we will be writing a universal Turing machine, which is a commonly used tool in the theory, and not hard to implement.

Until next time!

Fundamental Theorem of Algebra (With Picard’s Little Theorem)

This post assumes familiarity with some basic concepts in complex analysis, including continuity and entire (everywhere complex-differentiable) functions. This is likely the simplest proof of the theorem (at least, among those that this author has seen), although it stands on the shoulders of a highly nontrivial theorem.

The fundamental theorem of algebra has quite a few number of proofs (enough to fill a book!). In fact, it seems a new tool in mathematics can prove its worth by being able to prove the fundamental theorem in a different way. This series of proofs of the fundamental theorem also highlights how in mathematics there are many many ways to prove a single theorem, and in re-proving an established theorem we introduce new concepts and strategies. And perhaps most of all, we accentuate the unifying beauty of mathematics.

Problem: Let p(z): \mathbb{C} \to \mathbb{C} be a non-constant polynomial. Prove p(z) has a root in \mathbb{C}.

Solution: Picard’s Little Theorem states that any entire function \mathbb{C} \to \mathbb{C} whose range omits two distinct points must be constant. (See here for a proof sketch, and other notes)

Suppose to the contrary that p has no zero. We claim p also does not achieve some number in the set \left \{ \frac{1}{k} : k \in \mathbb{N} \right \}. Indeed, if it achieved all such values, we claim continuity would provide a zero. First, observe that as p(z) is unbounded for large enough |z|, we may pick a c \in \mathbb{R} such that |p(z)| > 1 whenever |z| > c. Then the points z_k such that p(z) = 1/k all lie within the closed disk of radius c. Because the set is closed and the complex plane is a complete metric space, we see that the sequence of z_k converges to some z, and by continuity p(z) = 0.

In other words, no such sequence of z_k can exist, and hence there is some 1/k omitted in the range of p(z). As polynomials are entire, Picard’s Little theorem applies, and we conclude that p(z) is constant, a contradiction. \square