**Problem**: Show that every natural number can be unambiguously described in fewer than twenty words.

**“Solution”**: Suppose to the contrary that not every natural number can be so described. Let be the set of all natural numbers which are describable in fewer than twenty words. Consider , the set of all words which cannot be described in fewer than twenty words.

Since is a subset of the natural numbers, which is well-ordered, it has a unique smallest element which we call . Now, we may describe unambiguously with the sentence: “the smallest natural number which cannot be unambiguously described in fewer than twenty words.” Since this description uses only fourteen words, we see that , a contradiction. Hence, must be the empty set.

**Explanation**: Let’s analyze this a bit further. Suppose that every number were indeed unambiguously describable in fewer than twenty words. It should be obvious that this admits only finitely many descriptions of infinitely many numbers!

In particular, let there be words in the English language, which we for the sake of argument say is the number of words in the Oxford English Dictionary. Then there are different phrases. This is a large number, but it is indeed finite. Even if every phrase describes a natural number, there’s no way that we could get them all! This proof is clearly nonsense.

This apparent paradox is in the same vein as Russell’s paradox, which we cover at the end of our set theory primer. Indeed, in its paradox form, this problem is sometimes called the Richard-Berry paradox. The problem is with what kinds of sets we construct. While with Russell’s paradox, the problem was with elements of our set, here it is with the propositions used to determine membership. This proof gives evidence that the English language (and any human language, really), is not rigorous enough for the purposes of mathematics. It also convinces us that naive set theory is problematic.

We find the meat of the problem when we finally get down to the definition of a description. In short (and I don’t want to spoil upcoming content on this blog), a description of a number is a program which computes the number (on some fixed universal Turing machine ). Note that this assumes that whatever a human can compute a computer can too, which is a controversial statement better known as the Church-Turing Thesis. If we further define a number’s “definition” as the shortest program which computes it, then this statement transforms into “x is the smallest integer whose definition has fewer than 100 characters.” As it turns out, when this English description is appropriately reformulated on a Turing machine, it is not an effective description: the statement is undeciable. In fact, once we get into the theory of Kolmogorov complexity, we will find that Berry’s paradox can be used to prove Gödel’s incompleteness theorem!

The problem of description is a very deep and historically debated one. The various approaches and sub-fields are encapsulated in the study of information theory. One such framework for descriptions is provided by Kolmogorov complexity, which we are diligently working toward understanding. Keep an eye out for our future primer on the subject, which we will write as extra background for the already-written post on low-complexity art and unknown future topics.

A word can be used multiple times in a description I.e. A description for 5 could be “The least prime number that is greater than the least square prime.” That uses “the” and “least” twice. So, an upper bound for the argument of the summation would be n^k, not that this changes the overall point.

Good point, Allan! It’s nice to see that you’re reading, and I will smoothly pretend that I put that there as a test to all the readers 🙂

Hope your new job at CSUCI goes well!

I see two problems with the above:

o You give one proof for X and another for not-X, but you never explain why the first proof is faulty. (My first take would be the possibility that the proposed sentence does not, after all, unambiguously describe a number. This would prevent the valid use of reductio ad absurdum.)

o You assume a finite word language. However, this need not be the case. To take a trivial example, consider a language in which all decimal representations of integers are considered words. We can now describe any integer with “the number whose decimal representation is [the number in question]”.

Good points! The real issue here is not that the description is ambiguous. In fact, the smallest element of any well-ordered set is indeed unique. The problem is that there is no definition of what a description is. This problem is so subtle that outside of saying “description is undefined,” or more specifically “the predicate is self-referential,” one cannot point to a fault in the proof. That is part of why it is such a famous paradox.

The use of English words is a poor one, specifically because some words are larger or more descriptive than others. It might be more prudent (and indeed the false proof would stand with this modification) if we substituted “a hundred characters” or “a hundred syllables” for “twenty words.” Indeed, the original version of the problem was phrased in syllables, but the classical paradox usually states it with words. Switching to characters or syllables would remove the issue of allowing a long decimal representation to count as a word, and we retain the meat of the proof.

Of course, that is a cop-out. But on the other hand, the description of an integer by its decimal representation is at best arguably short. For instance, if we required these descriptions be spoken (clearly this wouldn’t change the proof), then any decimal of order would require (on the order of) words. This hints at a deeper problem, that our descriptions of numbers are inherently not unified. We simply

don’ttreat numbers as single words. Of course we could invent an arbitrarily complicated language that satisfies our needs. We could even allowinfinitedescriptions, and describe each real number in one word! But this is simply uninteresting.The meat of the problem is when we finally get down to the definition of a description. In short (and I don’t want to spoil upcoming content on this blog), a description of a number is a program which computes the number (on some fixed universal Turing machine ). Note that this assumes that whatever a human can compute a computer can too, which is a controversial statement better known as the Church-Turing Thesis. If we further define a number’s “definition” as the shortest program which computes it, then this statement transforms into “x is the smallest integer whose definition has fewer than 100 characters.” As it turns out, when this “English” description is appropriately reformulated on a Turing machine, it is not an effective description: the statement is undeciable. In fact, once we get into the theory of Kolmogorov complexity, Berry’s paradox can be used to prove Gödel’s incompleteness theorem!

So you’re right, I didn’t do the problem justice here, because to do so would require a volume of work. I will, however, try to incorporate my response into the body so that new viewers don’t have to read through the comments. Thanks for all your feedback!

I have just come across this website … fantastic!

On a cheeky note – surely the proof is:

“every natural number” (only three words long). QED.

🙂

Again, love the site.

C

Your explanation got very philosophical very quickly, while I felt that there were clear logical holes in the “solution” that could have been discussed. Here’s a simple definition for “description” for the purposes of giving some enlightening logical “mistakes” in the “solution”. Let N be the set of natural numbers, let W be the assumed finite set of English words, let V = W¹ ∪ W² ∪ ··· ∪ W²⁰, the set of all word lists of up to 20 words, and let v = “the smallest natural number which cannot be unambiguously described in fewer than twenty words”, one of our elements of V.

Define a choice of descriptions as a function f: T → N, where T is a subset of V. This definition forces each description to unambiguously refer to one number. (It is a little overzealous in that concatenations of elements of T could actually be other elements of T, which English may or may not be able to handle with its grammar depending on the elements, but being more powerful and still falling short of a true solution is okay.) Note that in order to describe things unambiguously, this choice must be fixed and agreed upon by both the speaker and listener in any given conversation.

Problem 0: v is only well-defined for a _fixed_ choice of descriptions, which alters the conclusion.

That is, given a choice f: T → N, we may want to force v into T and assign f(v) = r, where r is the smallest element of N – f(T), but that is conflating notation all over the place. More precisely, you are making a new choice f’: T ∪ {v} → N, for which f'(x) = f(x) when x ≠ v, and f'(v) = r, the smallest element of N – f(T). With this new choice f’, there is a new r’ that is the smallest element of N – f'(T), and would be what you want some f”(v) to refer to. This recurses indefinitely. So, while we wanted to set out to prove

“There is no ‘smallest natural number which cannot be unambiguously described in fewer than twenty words’ for any one choice of descriptions.”

which would be required if there were any natural numbers left out of the choice of descriptions, we instead proved that

“There is no _unique_ ‘smallest natural number which cannot be unambiguously described in fewer than twenty words’ over _all possible_ choices of descriptions.”

which simply has no implication for the desired contradiction. As mentioned in your explanation, there would be a unique maximal ‘smallest natural number […] twenty words’, which shows that out of all possible choices of descriptions, there is none that successfully describes all natural numbers.

Problem 1: The desired reassignment of f(v) assumes that v is not already in T.

Where problem 0 shows that the “solution” stated the incorrect conclusion for the steps given, this problem shows that one of the steps has an explicit logical error, because it says that we may “unambiguously” assign f(v) = r. This is true if v is not already in T, but if v _is_ in T, then there are problems to discuss. If you really want it to be unambiguous, you can only handle that case by _re_assigning f(v) = r, after which you have just lost a description for what f(v) used to be, which may or may not affect f(T), and oh, dear, so much case analysis that wasn’t even addressed. It particularly needs to be addressed, because that’s where the possibility of V being finite becomes important.