# Simulating a Biased Coin with a Fair Coin

This is a guest post by my friend and colleague Adam Lelkes. Adam’s interests are in algebra and theoretical computer science. This gem came up because Adam gave a talk on probabilistic computation in which he discussed this technique.

Problem: simulate a biased coin using a fair coin.

Solution: (in Python)

def biasedCoin(binaryDigitStream, fairCoin):
for d in binaryDigitStream:
if fairCoin() != d:
return d


Discussion: This function takes two arguments, an iterator representing the binary expansion of the intended probability of getting 1 (let us denote it as $p$) and another function that returns 1 or 0 with equal probability. At first glance this might seem like an overcomplicated way of solving this problem: why can’t the probability be a floating point number?

The point is that $p$ can have infinite precision! Assuming that fairCoin() gives us a perfectly random stream of 1’s and 0’s (independently and with probability 1/2) and we can read each bit of the binary expansion of $p$, this function returns 1 with probability exactly $p$ even if $p$ is irrational or a fraction with infinite decimal expansion. If we used floating point arithmetic there would be a small chance we get unlucky and exhaust the precision available. We would only get an approximation of the true bias at best.

Now let us explain why this algorithm works. We keep tossing our fair coins to get a sequence of random bits, until one of our random bits is different from the corresponding bit in the binary expansion of $p$. If we stop after $i$ steps, that means that the first $i-1$ bits in the two binary sequences were the same, which happens with probability $\frac{1}{2^{i-1}}$. Given that this happens, in the $i$th step we will return the $i$th bit of $p$; let us denote this bit by $p_i$. Then the probability of returning 1 is $\sum_{i=1}^\infty \frac{p_i}{2^{i-1}}$, which is the binary expansion of $p$.

This algorithm is also efficient. By efficient here we mean that the expected running time is constant. Of course, to show this we need to make some assumption about the computational complexity of calculating the bits of $p$. If we assume that the bits of $p$ are efficiently computable in the sense that the time required to compute $p_i$ is bounded by a polynomial in $i$, then this algorithm does run in constant expected time.

Indeed, the expected running time is $\sum_{i=0}^\infty \frac{i^n}{2^i}$. Showing that this sum is a constant is an easy calculus exercise: using the ratio test we get that

$\displaystyle \textup{limsup}_{i \to \infty} \left | \frac{\frac{(i+1)^n}{2^{i+1}}}{\frac{i^n}{2^i}} \right | = \limsup_{i\to\infty} \frac{\left(\frac{i+1}{i}\right)^n}{2} = \frac{1}{2} < 1$,

thus the series is convergent.

Now that we proved that our algorithm works, it’s time to try it! Let’s say that we want to simulate a coin which gives “heads” with probability 1/3.
We need to construct our binary digit stream. Since 1/3 is 0.010101… in binary, we could use the following simple generator:

def oneThird():
while True:
yield 0
yield 1


However, we might want to have a more general generator that gives us the binary representation of any number. The following function, which takes a number between 0 and 1 as its argument, does the job:

def binaryDigits(fraction):
while True:
fraction *= 2
yield int(fraction)
fraction = fraction % 1


We also need a fair coin simulator. For this simulation, let’s just use Python’s built-in pseudo-random number generator:

def fairCoin():
return random.choice([0,1])


Let us toss our biased coin 10000 times and take the sum. We expect the sum to be around 3333. Indeed, when I tried

&gt;&gt;&gt; sum(biasedCoin(oneThird(), fairCoin) for i in range(10000))
3330


It might be worth noting oneThird() is approximately ten times faster than binaryDigits(fractions.Fraction(1,3)), so when a large number of biased coins is needed, you can hardwire the binary representation of $p$ into the program.

# Simulating a Fair Coin with a Biased Coin

This is a guest post by my friend and colleague Adam Lelkes. Adam’s interests are in algebra and theoretical computer science. This gem came up because Adam gave a talk on probabilistic computation in which he discussed this technique.

Problem: Simulate a fair coin given only access to a biased coin.

Solution: (in Python)

def fairCoin(biasedCoin):
coin1, coin2 = 0,0
while coin1 == coin2:
coin1, coin2 = biasedCoin(), biasedCoin()
return coin1


Discussion: This is originally von Neumann’s clever idea. If we have a biased coin (i.e. a coin that comes up heads with probability different from 1/2), we can simulate a fair coin by tossing pairs of coins until the two results are different. Given that we have different results, the probability that the first is “heads” and the second is “tails” is the same as the probability of “tails” then “heads”. So if we simply return the value of the first coin, we will get “heads” or “tails” with the same probability, i.e. 1/2.

Note that we did not have to know or assume anything about our biasedCoin function other than it returns 0 or 1 every time, and the results between function calls are independent and identically distributed. In particular, we do not need to know the probability of getting 1. (However, that probability should be strictly between 0 or 1.) Also, we do not use any randomness directly, only through the biasedCoin function.

Here is a simple simulation:

from random import random
def biasedCoin():
return int(random() &lt; 0.2)


This function will return 1 with probability 0.2. If we try

sum(biasedCoin() for i in range(10000))


with high probability we will get a number that is close to 2000. I got 2058.

On the other hand, if we try

sum(fairCoin(biasedCoin) for i in range(10000))


we should see a value that is approximately 5000. Indeed, when I tried it, I got 4982, which is evidence that fairCoin(biasedCoin) returns 1 with probability 1/2 (although I already gave a proof!).

One might wonder how many calls to biasedCoin we expect to make before the function returns. One can recognize the experiment as a geometric distribution and use the known expected value, but it is short so here is a proof. Let $s$ be the probability of seeing two different outcomes in the biased coin flip, and $t$ the expected number of trials until that happens. If after two flips we see the same outcome (HH or TT), then by independence the expected number of flips we need is unchanged. Hence

$t = 2s + (1-s)(2 + t)$

Simplifying gives $t = 2/s$, and since we know $s = 2p(1-p)$ we expect to flip the coin $\frac{1}{p(1-p)}$ times.

For a deeper dive into this topic, see these notes by Michael Mitzenmacher from Harvard University. They discuss strategies for simulating a fair coin from a biased coin that are optimal in the expected number of flips required to run the experiment once. He has also written a book on the subject of randomness in computing.

# Reservoir Sampling

Problem: Given a data stream of unknown size $n$, pick an entry uniformly at random. That is, each entry has a $1/n$ chance of being chosen.

Solution: (in Python)

import random

def reservoirSample(stream):
for k,x in enumerate(stream, start=1):
if random.random() < 1.0 / k:
chosen = x

return chosen

Discussion: This is one of many techniques used to solve a problem called reservoir sampling. We often encounter data sets that we’d like to sample elements from at random. But with the advent of big data, the lists involved are so large that we either can’t fit it all at once into memory or we don’t even know how big it is because the data is in the form of a stream (e.g., the number of atomic price movements in the stock market in a year). Reservoir sampling is the problem of sampling from such streams, and the technique above is one way to achieve it.

In words, the above algorithm holds one element from the stream at a time, and when it inspects the $k$-th element (indexing $k$ from 1), it flips a coin of bias $1/k$ to decide whether to keep its currently held element or to drop it in favor of the new one.

We can prove quite easily by induction that this works. Indeed, let $n$ be the (unknown) size of the list, and suppose $n=1$. In this case there is only one element to choose from, and so the probability of picking it is 1. The case of $n=2$ is similar, and more illustrative. Now suppose the algorithm works for $n$ and suppose we increase the size of the list by 1 adding some new element $y$ to the end of the list. For any given $x$ among the first $n$ elements, the probability we’re holding $x$ when we  inspect $y$ is $1/n$ by induction. Now we flip a coin which lands heads with probability $1/(n+1)$, and if it lands heads we take $y$ and otherwise we keep $x$. The probability we get $y$ is exactly $1/(n+1)$, as desired, and the probability we get $x$ is $\frac{1}{n}\frac{n}{n+1} = \frac{1}{n+1}$. Since $x$ was arbitrary, this means that after the last step of the algorithm each entry is held with probability $1/(n+1)$.

$\square$

It’s easy to see how one could increase the number of coins being flipped to provide a sampling algorithm to pick any finite number of elements (with replacement, although a variant without replacement is not so hard to construct using this method). Other variants, exist, such as distributed and weighted sampling.

Python’s generators make this algorithm for reservoir sampling particularly nice. One can define a generator which abstractly represents a data stream (perhaps querying the entries from files distributed across many different disks), and this logic is hidden from the reservoir sampling algorithm. Indeed, this algorithm works for any iterable, although if we knew the size of the list we could sample much faster (by uniformly generating a random number and indexing the list appropriately). The start parameter given to the enumerate function makes the $k$ variable start at 1.

# Miller-Rabin Primality Test

Problem: Determine if a number is prime, with an acceptably small error rate.

Solution: (in Python)

import random

def decompose(n):
exponentOfTwo = 0

while n % 2 == 0:
n = n/2
exponentOfTwo += 1

return exponentOfTwo, n

def isWitness(possibleWitness, p, exponent, remainder):
possibleWitness = pow(possibleWitness, remainder, p)

if possibleWitness == 1 or possibleWitness == p - 1:
return False

for _ in range(exponent):
possibleWitness = pow(possibleWitness, 2, p)

if possibleWitness == p - 1:
return False

return True

def probablyPrime(p, accuracy=100):
if p == 2 or p == 3: return True
if p &lt; 2: return False

exponent, remainder = decompose(p - 1)

for _ in range(accuracy):
possibleWitness = random.randint(2, p - 2)
if isWitness(possibleWitness, p, exponent, remainder):
return False

return True


Discussion: This algorithm is known as the Miller-Rabin primality test, and it was a very important breakthrough in the study of probabilistic algorithms.

Efficiently testing whether a number is prime is a crucial problem in cryptography, because the security of many cryptosystems depends on the use of large randomly chosen primes. Indeed, we’ve seen one on this blog already which is in widespread use: RSA. Randomized algorithms also have quite useful applications in general, because it’s often that a solution which is correct with probability, say, $2^{-100}$ is good enough for practice.

But from a theoretical and historical perspective, primality testing lied at the center of a huge problem in complexity theory. In particular, it is unknown whether algorithms which have access to randomness and can output probably correct answers are more powerful than those that don’t. The use of randomness in algorithms comes in a number of formalizations, the most prominent of which is called BPP (Bounded-error Probabilistic Polynomial time). The Miller-Rabin algorithm shows that primality testing is in BPP. On the other hand, algorithms solvable in polynomial time without randomness are in a class called P.

For a long time (from 1975 to 2002), it was unknown whether primality testing was in P or not. There are very few remaining important problems that have BPP algorithms but are not known to be in P. Polynomial identity testing is the main example, and until 2002 primality testing shared its title. Now primality has a known polynomial-time algorithm. One might argue that (in theory) the Miller-Rabin test is now useless, but it’s still a nice example of a nontrivial BPP algorithm.

The algorithm relies on the following theorem:

Theorem: if $p$ is a prime, let $s$ be the maximal power of 2 dividing $p-1$, so that $p-1 = 2^{s}d$ and $d$ is odd. Then for any $1 \leq n \leq p-1$, one of two things happens:

• $n^d = 1 \mod p$ or
• $n^{2^j d} = -1 \mod p$ for some $0 \leq j < s$.

The algorithm then simply operates as follows: pick nonzero $n$ at random until both of the above conditions fail. Such an $n$ is called a witness for the fact that $p$ is a composite. If $p$ is not a prime, then there is at least a 3/4 chance that a randomly chosen $n$ will be a witness.

We leave the proof of the theorem as an exercise. Start with the fact that $a^{p-1} = 1 \mod p$ (this is Fermat’s Little Theorem). Then use induction to take square roots (the result has to be +/-1 mod p), and continue until you get to $a^{d}=1 \mod p$.

The Python code above uses Python’s built in modular exponentiation function pow to do fast modular exponents. The isWitness function first checks $n^d = 1 \mod p$ and then all powers $n^{2^j d} = -1 \mod p$. The probablyPrime function then simply generates random potential witnesses and checks them via the previous function. The output of the function is True if and only if all of the needed modular equivalences hold for all witnesses inspected. The choice of endpoints being 2 and $p-2$ are because 1 and $p-1$ will always have exponents 1 mod $p$.