*This is a guest post by my friend and colleague Adam Lelkes. Adam’s interests are in algebra and theoretical computer science. This gem came up because Adam gave a talk on probabilistic computation in which he discussed this technique.*

**Problem:** Simulate a fair coin given only access to a biased coin.

**Solution:** (in Python)

def fairCoin(biasedCoin): coin1, coin2 = 0,0 while coin1 == coin2: coin1, coin2 = biasedCoin(), biasedCoin() return coin1

**Discussion:** This is originally von Neumann’s clever idea. If we have a biased coin (i.e. a coin that comes up heads with probability different from 1/2), we can simulate a fair coin by tossing pairs of coins until the two results are different. Given that we have different results, the probability that the first is “heads” and the second is “tails” is the same as the probability of “tails” then “heads”. So if we simply return the value of the first coin, we will get “heads” or “tails” with the same probability, i.e. 1/2.

Note that we did not have to know or assume anything about our biasedCoin function other than it returns 0 or 1 every time, and the results between function calls are independent and identically distributed. In particular, we do not need to know the probability of getting 1. (However, that probability should be strictly between 0 or 1.) Also, we do not use any randomness directly, only through the biasedCoin function.

Here is a simple simulation:

from random import random def biasedCoin(): return int(random() < 0.2)

This function will return 1 with probability 0.2. If we try

sum(biasedCoin() for i in range(10000))

with high probability we will get a number that is close to 2000. I got 2058.

On the other hand, if we try

sum(fairCoin(biasedCoin) for i in range(10000))

we should see a value that is approximately 5000. Indeed, when I tried it, I got 4982, which is evidence that fairCoin(biasedCoin) returns 1 with probability 1/2 (although I already gave a proof!).

One might wonder how many calls to biasedCoin we expect to make before the function returns. One can recognize the experiment as a geometric distribution and use the known expected value, but it is short so here is a proof. Let $ s$ be the probability of seeing two different outcomes in the biased coin flip, and $ t$ the expected number of trials until that happens. If after two flips we see the same outcome (HH or TT), then by independence the expected number of flips we need is unchanged. Hence

$ t = 2s + (1-s)(2 + t)$

Simplifying gives $ t = 2/s$, and since we know $ s = 2p(1-p)$ we expect to flip the coin $ \frac{1}{p(1-p)}$ times.

For a deeper dive into this topic, see these notes by Michael Mitzenmacher from Harvard University. They discuss strategies for simulating a fair coin from a biased coin that are optimal in the expected number of flips required to run the experiment once. He has also written a book on the subject of randomness in computing.

Hehe this question was a in creativity question section of our mathematical olympiad competition for high schoolers 🙂

You assume the “bias” is in singular tosses.

Try a 50% chance of returning !lasttoss and see what happens.

The more general problem with randomness, is that if I can construct ANY function where I can predict the value of a bit with anything other than 50%, it isn’t random – including N bits earlier, if the previous N bits are mostly 0 or 1 (some chaotic systems), or anything else.

I need to be careful here as you can create compression functions for true random noise (thermal, background radiation, quantum, radioactivity). In a few gigabytes of stuff, there will be a probability of repeated bits where a function for that particular blob of bits and the blob itself can be smaller – compress to a smaller number of biths than the original.

I think you’re mixing up some concepts here…

Fun little problem! I just spent my sunday morning trying out simple variations on ways to solve this problem in Python. Turns out there’s a lot of ways tosolve this which turn out not to actually work when you test them 🙂

The post’s title combined with your profile picture on facebook made me think that it was inspired by Scott Aaronson’s reference to that same problem in his book. Or is it just a coincidence?

Anyway, nice one 😀

Ha, I think it’s a common trick when you’re discussing probabilistic computation, but no, it was Adam who brought it up.

I’m curious, does this trick apply strictly to coins?

Or does it also work for loaded dice (i.e. an increased number of sides) ?