In this post we’ll cover the second of the “basic four” methods of proof: the contrapositive implication. We will build off our material from last time and start by defining functions on sets.
Functions as Sets
So far we have become comfortable with the definition of a set, but the most common way to use sets is to construct functions between them. As programmers we readily understand the nature of a function, but how can we define one mathematically? It turns out we can do it in terms of sets, but let us recall the desired properties of a function:
- Every input must have an output.
- Every input can only correspond to one output (the functions must be deterministic).
One might try at first to define a function in terms of subsets of size two. That is, if are sets then a function would be completely specified by
where to enforce those two bullets, we must impose the condition that every occurs in one and only one of those subsets. Notationally, we would say that means is a member of the function. Unfortunately, this definition fails miserably when , because we have no way to distinguish the input from the output.
To compensate for this, we introduce a new type of object called a tuple. A tuple is just an ordered list of elements, which we write using round brackets, e.g. .
As a quick aside, one can define ordered tuples in terms of sets. We will leave the reader to puzzle why this works, and generalize the example provided:
And so a function is defined to be a list of ordered pairs where the first thing in the pair is an input and the second is an output:
Subject to the same conditions, that each value from must occur in one and only one pair. And again by way of notation we say if the pair is a member of as a set. Note that the concept of a function having “input and output” is just an interpretation. A function can be viewed independent of any computational ideas as just a set of pairs. Often enough we might not even know how to compute a function (or it might be provably uncomputable!), but we can still work with it abstractly.
It is also common to call functions “maps,” and to define “map” to mean a special kind of function (that is, with extra conditions) depending on the mathematical field one is working in. Even in other places on this blog, “map” might stand for a continuous function, or a homomorphism. Don’t worry if you don’t know these terms off hand; they are just special cases of functions as we’ve defined them here. For the purposes of this series on methods of proof, “function” and “map” and “mapping” mean the same thing: regular old functions on sets.
One of the most important and natural properties of a function is that of injectivity.
Definition: A function is an injection if whenever are distinct members of , then . The adjectival version of the word injection is injective.
As a quick side note, it is often the convention for mathematicians to use a capital letter to denote a set, and a lower-case letter to denote a generic element of that set. Moreover, the apostrophe on the is called a prime (so is spoken, “a prime”), and it’s meant to denote a variation on the non-prime’d variable in some way. In this case, the variation is that .
So even if we had not explicitly mentioned where the objects came from, the knowledgeable mathematician (which the reader is obviously becoming) would be reasonably certain that they come from . Similarly, if I were to lackadaisically present out of nowhere, the reader would infer it must come from .
One simple and commonly used example of an injection is the so-called inclusion function. If are sets, then there is a canonical function representing this subset relationship, namely the function defined by . It should be clear that non-equal things get mapped to non-equal things, because the function doesn’t actually do anything except change perspective on where the elements are sitting: two nonequal things sitting in are still nonequal in .
Another example is that of multiplication by two as a map on natural numbers. More rigorously, define by . It is clear that whenever as natural numbers then . For one, must have differing prime factorizations, and so must because we added the same prime factor of 2 to both numbers. Did you catch the quick proof by direct implication there? It was sneaky, but present.
Now the property of being an injection can be summed up by a very nice picture:
The arrows above represent the pairs , and the fact that no two arrows end in the same place makes this function an injection. Indeed, drawing pictures like this can give us clues about the true nature of a proposed fact. If the fact is false, it’s usually easy to draw a picture like this showing so. If it’s true, then the pictures will support it and hopefully make the proof obvious. We will see this in action in a bit (and perhaps we should expand upon it later with a post titled, “Methods of Proof — Proof by Picture”).
There is another, more subtle concept associated with injectivity, and this is where its name comes from. The word “inject” gives one the mental picture that we’re literally placing one set inside another set without changing the nature of . We are simply realizing it as being inside of , perhaps with different names for its elements. This interpretation becomes much clearer when one investigates sets with additional structure, such as groups, rings, or topological spaces. Here the word “injective mapping” much more literally means placing one thing inside another without changing the former’s structure in any way except for relabeling.
In any case, mathematicians have the bad (but time-saving) habit of implicitly identifying a set with its image under an injective mapping. That is, if is an injective function, then one can view as the same thing as . That is, they have the same elements except that renames the elements of as elements of . The abuse comes in when they start saying even when this is not strictly the case.
Here is an example of this abuse that many programmers commit without perhaps noticing it. Suppose is the set of all colors that can be displayed on a computer (as an abstract set; the elements are “this particular green,” “that particular pinkish mauve”). Now let be the set of all finite hexadecimal numbers. Then there is an obvious injective map from sending each color to its 6-digit hex representation. The lazy mathematician would say “Well, then, we might as well say , for this is the obvious way to view as a set of hexadecimal numbers.” Of course there are other ways (try to think of one, and then try to find an infinite family of them!), but the point is that this is the only way that anyone really uses, and that the other ways are all just “natural relabelings” of this way.
The precise way to formulate this claim is as follows, and it holds for arbitrary sets and arbitrary injective functions. If are two such ways to inject inside of , then there is a function such that the composition is precisely the map . If this is mysterious, we have some methods the reader can use to understand it more fully: give examples for simplified versions (what if there were only three colors?), draw pictures of “generic looking” set maps, and attempt a proof by direct implication.
Proof by Contrapositive
Often times in mathematics we will come across a statement we want to prove that looks like this:
If X does not have property A, then Y does not have property B.
Indeed, we already have: to prove a function is injective we must prove:
If x is not equal to y, then f(x) is not equal to f(y).
A proof by direct implication can be quite difficult because the statement gives us very little to work with. If we assume that does not have property , then we have nothing to grasp and jump-start our proof. The main (and in this author’s opinion, the only) benefit of a proof by contrapositive is that one can turn such a statement into a constructive one. That is, we can write “p implies q” as “not q implies not p” to get the equivalent claim:
If Y has property B then X has property A.
This rewriting is called the “contrapositive form” of the original statement. It’s not only easier to parse, but also probably easier to prove because we have something to grasp at from the beginning.
To the beginning mathematician, it may not be obvious that “if p then q” is equivalent to “if not q then not p” as logical statements. To show that they are requires a small detour into the idea of a “truth table.”
In particular, we have to specify what it means for “if p then q” to be true or false as a whole. There are four possibilities: p can be true or false, and q can be true or false. We can write all of these possibilities in a table.
p q T T T F F T F F
If we were to complete this table for “if p then q,” we’d have to specify exactly which of the four cases correspond to the statement being true. Of course, if the p part is true and the q part is true, then “p implies q” should also be true. We have seen this already in proof by direct implication. Next, if p is true and q is false, then it certainly cannot be the case that truth of p implies the truth of q. So this would be a false statement. Our truth table so far looks like
p q p->q T T T T F F F T ? F F ?
The next question is what to do if the premise p of “if p then q” is false. Should the statement as a whole be true or false? Rather then enter a belated philosophical discussion, we will zealously define an implication to be true if its hypothesis is false. This is a well-accepted idea in mathematics called vacuous truth. And although it seems to make awkward statements true (like “if 2 is odd then 1 = 0”), it is rarely a confounding issue (and more often forms the punchline of a few good math jokes). So we can complete our truth table as follows
p q p->q T T T T F F F T T F F T
Now here’s where contraposition comes into play. If we’re interested in determining when “not q implies not p” is true, we can add these to the truth table as extra columns:
p q p->q not q not p not q -> not p T T T F F T T F F T F F F T T F T T F F T T T T
As we can see, the two columns corresponding to “p implies q” and “not q implies not p” assume precisely the same truth values in all possible scenarios. In other words, the two statements are logically equivalent.
And so our proof technique for contrapositive becomes: rewrite the statement in its contrapositive form, and proceed to prove it by direct implication.
Examples and Exercises
Our first example will be completely straightforward and require nothing but algebra. Let’s show that the function is injective. Contrapositively, we want to prove that if then . Assuming the hypothesis, we start by supposing . Applying algebra, we get , and dividing by 7 shows that $x = x’$ as desired. So is injective.
This example is important because if we tried to prove it directly, we might make the mistake of assuming algebra works with the same way it does with equality. In fact, many of the things we take for granted about equality fail with inequality (for instance, if and it need not be the case that ). The contrapositive method allows us to use our algebraic skills in a straightforward way.
Next let’s prove that the composition of two injective functions is injective. That is, if and are injective functions, then the composition defined by is injective.
In particular, we want to prove that if then . Contrapositively, this is the same as proving that if then . Well by the fact that is injective, we know that (again contrapositively) whenever then , so it must be that . But by the same reasoning is injective and hence . This proves the statement.
This was a nice symbolic proof, but we can see the same fact in a picturesque form as well:
If we maintain that any two arrows in the diagram can’t have the same head, then following two paths starting at different points in will never land us at the same place in . Since is injective we have to travel to different places in , and since is injective we have to travel to different places in . Unfortunately, this proof cannot replace the formal one above, but it can help us understand it from a different perspective (which can often make or break a mathematical idea).
Expanding upon this idea we give the reader a challenge: Let be finite sets of the same size. Prove or disprove that if and are (arbitrary) functions, and if the composition is injective, then both of must be injective.
Another exercise which has a nice contrapositive proof: prove that if are finite sets and is an injection, then has at most as many elements as . This one is particularly susceptible to a “picture proof” like the one above. Although the formal the formal name for the fact one uses to prove this is the pigeonhole principle, it’s really just a simple observation.
Aside from inventing similar exercises with numbers (e.g., if is odd then is odd or is odd), this is all there is to the contrapositive method. It’s just a direct proof disguised behind a fact about truth tables. Of course, as is usual in more advanced mathematical literature, authors will seldom announce the use of contraposition. The reader just has to be watchful enough to notice it.
Though we haven’t talked about either the real numbers nor proofs of existence or impossibility, we can still pose this interesting question: is there an injective function from ? In truth there is not, but as of yet we don’t have the proof technique required to show it. This will be our next topic in the series: the proof by contradiction.