# Zero-Knowledge: Definitions and Theory

The next Monday, when the fathers were all back at work, we kids were playing in a field. One kid says to me, “See that bird? What kind of bird is that?” I said, “I haven’t the slightest idea what kind of a bird it is.” He says, “It’s a brown-throated thrush. Your father doesn’t teach you anything!” But it was the opposite. He had already taught me: “See that bird?” he says. “It’s a Spencer’s warbler.” (I knew he didn’t know the real name.) “Well, in Italian, it’s a Chutto Lapittida. In Portuguese, it’s a Bom da Peida. In Chinese, it’s a Chung-long-tah, and in Japanese, it’s a Katano Tekeda. You can know the name of that bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird. You’ll only know about humans in different places, and what they call the bird. So let’s look at the bird and see what it’s doing—that’s what counts.” I learned very early the difference between knowing the name of something and knowing something.

In the first post in this series, we defined and implemented a simple zero-knowledge proof for graph isomorphism. In the second post, we saw a different protocol with a much heftier tagline: it gives a zero knowledge proof for any problem in the class NP. These two protocols used the same underlying framework—an interaction between a prover and a verifier—but they were actually very different.

Indeed, the graph isomorphism proof was “perfect” in two senses. First, it didn’t require any assumptions about cryptography. Nobody knows how to prove the Blum-Blum-Shub function is actually a one-way permutation (at the least, this would imply $\textup{P} \neq \textup{NP}$, so it’s probably hard to prove).

Second, in the graph isomorphism proof, the simulator produced transcripts of the protocol that came from the exact same distribution as the true transcripts created by the prover and verifier. This is why we called it zero-knowledge; anything the verifier can do with the output of the protocol, the simulator could do too. The verifier can’t be making use of the prover’s secret knowledge, since that information isn’t even in the simulator’s universe but the simulator can still compute what the verifier can.

But I didn’t tell you precisely why the 3-coloring protocol is a zero-knowledge proof, and in particular why it’s different from the graph isomorphism protocol. I also hinted that I had been misleading you, dear reader, as to the full 3-coloring proof of zero-knowledge. So in this post we’ll get into those nitty-gritty definitions that make the math rigorous. We’ll give a short sketch of the proof of zero-knowledge (the full proof would take many pages, not because it’s hard but because there are a lot of annoying details). And then we’ll give an overview of the landscape of theorems and conjectures about zero knowledge.

## Information vs. knowledge

You can’t understand where the following definitions come from without the crucial distinction between information and knowledge from the computer scientist’s perspective. Information concerns how many essential bits are encoded in a message, and nothing more. In particular, information is not the same as computational complexity, the required amount of computational resources required to actually do something. Knowledge, on the other hand, refers to the computational abilities you gain with the information provided.

Here’s an example in layman’s terms: say I give you a zero-knowledge proof that cancer can be cured using a treatment that takes only five days. Even though I might thoroughly convince you my cure works by exhibiting patients with vanishing tumors, you’ll still struggle to find a cure. This is despite the fact that there might be more bits of information relayed in the messages sent during my “zero-knowledge proof” than the number of bits needed to describe the cure! On the other hand, every proof that 1+1=2 is a zero-knowledge proof, because it’s not computationally difficult to prove this on your own in the first place. You don’t gain any new computational powers even if I tell you flat out what the proof is.

Knowledge is Richard Feynman’s worldview, information is knowing the many names for a bird.

From this perspective information theory is abstruse, though it’s much easier to prove theoretical limits on computation in terms of information theory than in terms of computational complexity. However, the theoretical limits of computation are lower that the limits of information theory, which is the essence of what enables cryptography (from an information theory standpoint, all public-key cryptography is broken!). Reframing what’s possible in terms of computation opens a treasure chest of useful algorithms.

From here we’ll explore three definitions of zero-knowledge. First, we’ll see “perfect” zero knowledge, which is believed to be too strong to be practical. We’ll give a sketch of the “right” proof that graph isomorphism has a perfect zero-knowledge proof. Then we’ll see two relaxations to “statistical” and “computational” zero knowledge. We’ll discuss their theoretical relationships to each other and the rest of the world from a high level.

## Perfect zero-knowledge

Recall from our first post, the interactive protocol for graph isomorphism has a very nice property. Say you took all the messages sent back and forth between the prover and verifier, and say you think of those messages as a random outcome of a probabilistic event. Then a simulator could have produced a set of messages drawn from the same distribution, without needing to see the messages from the protocol in the first place! So anything that the verifier could compute as a result of participating in the protocol, the simulator could compute without needing the protocol in the first place. The simulator just needs to assume the truth of the claim. This type of zero-knowledge is called perfect zero-knowledge.

Let’s distill this into some notation. In order to do this, we’ll shift to the theoretical way of discussing a “problem” as a language, i.e., a set of strings. For example, for 3-coloring you fix a method for describing graphs as strings (it doesn’t really matter which method you use), and the language is the set of strings

$\displaystyle \{ G : G \textup{ has a valid 3-coloring } \}$

Membership in the set is the same as saying an instance of the problem “does $G$ have a 3-coloring?” has a yes answer. Throughout the entire discussion we will fix a generic way to encode any object as a binary string, and we’ll implicitly equate $G$ with the string representing $G$, and call them both $G$. In this post I’ll always use $L$ for a language.

Now say that $P$ is an algorithm for the prover (a probabilistic Turing machine of arbitrary complexity), $V$ is the analogous polynomial-time algorithm for the verifier, and $M(P,V)$ is the resulting concatenated string of messages from one run of a zero-knowledge proof. Because $P$ and $V$ may be randomized, $M(P,V)$ is a distribution over strings $\{ 0, 1\}^m$, and one run of the protocol is equivalent to drawing $y \sim \{0,1\}^m$ according to this distribution. This just means: run an instance of the the protocol, which is random, and encode the output messages as a single string.

Now define a simulator as any probabilistic polynomial-time turing machine $S$. We interpret the simulator as trying to reproduce the output of the protocol without having access to the protocol’s messages, but with computational limits. The prover, verifier, and simulator all take as input the claim $x$ of length $n$, and the verifier and simulator run in time polynomial in $n$. Reminder: in order to be an interactive proof for the language $L$, if $x \in L$ then the verifier must be convinced with probability $1$, and if $x \not \in L$ then the verifier must reject with probability at least $1/2$.

Definition (not quite correct): An interactive proof $M(P,V)$ for a language $L$ is said to have perfect zero-knowledge if some simulator $S$ exists such that for every input $x \in L$, the distributions of the random variables $M(P,V)(x)$ and $S(x)$ are equal.

This definition isn’t quite correct, but it’s correct in spirit so that’s why we used it in our first post. Now let’s see a bit more detail about what really makes a zero-knowledge proof. Let’s remind ourselves of the argument for graph isomorphism, and then see why it’s not strong enough. The protocol we described is summed up as: prover has $G_1, G_2$, and sends a random permutation of $G_1$ to the verifier, calls it $H$. The verifier picks a random integer $k$ which is either 1 or 2, sends it the prover. The prover produces an isomorphism between $G_k$ and $H$, and sends the isomorphism back to the verifier, who checks to make sure the isomorphism is valid and accepts (otherwise, rejects).

In our argument, we said the simulator could simulate messages by picking $k$ for the verifier before choosing the random permutation $H$, and this is the nit we need to pick. The problem is that the prover doesn’t trust the verifier, but the simulator does trust the verifier, and performs actions for the verifier that are faithful to the protocol!

What if a naughty verifier were able to glean more information by misbehaving in the protocol? Say, instead of randomly choosing $k$ as 1 or 2, the verifier might always just pick $k=1$. Could such a misbehavior give the verifier more knowledge than strictly following the protocol?

Sure, maybe if the verifier did that, he wouldn’t end up with a valid proof (his acceptances and rejections may not be valid or sound), but it might be worth it if he can get a small bit of information about the hidden isomorphism. The goal is to treat the verifier like a hacker, but our simulator only simulates honest verifiers.

So a real zero-knowledge proof needs to account for a misbehaving verifier. One way to do this is to give the simulator black-box access to the verifier $V$, and prove that the distribution of messages produced by the simulator $S(V, x)$ is equal to the true distribution of messages produced in the actual protocol with the same (possibly misbehaving) verifier, $M(P,V)(x)$. Note that $S$ can still know about the protocol that $V$ is supposed to follow, but he can’t know exactly how $V$ is defined.

Unfortunately, there is no known zero-knowledge proof for a nontrivial problem that satisfies this stronger criterion for perfect zero knowledge, and the graph isomorphism proof in particular doesn’t satisfy it. The way to fix it, which I’ll just say informally, is to allow the simulator to fail with probability at most 1/2. So to summarize, the simulator gets black-box access to the verifier and is guaranteed an input $x \in L$, and has to sample from the exact distribution of messages as the real protocol, but the simulator is allowed to say “KAPUT!” at most half the time, when it thinks the output will be bad. The distribution of simulator outputs (conditioned on no-KAPUT) should be equal to the true distribution of outputs.

Now we can sketch the real graph isomorphism proof. The simulator is defined as follows:

1. Pick $i \in \{ 1, 2 \}$ randomly and let $H$ be a random permutation of $G_i$.
2. Invoke the verifier $V(H)$ as a subroutine with input $H$, and call the output $k$, which is either 1 or 2.
3. If $k \neq i$, output “KAPUT!”, otherwise invoke $V$ as a subroutine with the isomorphism $H \to G_i$ as input.

Now to show the message distributions are equal. The message sent in step 2 is produced with a different process from the actual zero knowledge protocol (since the prover just takes a random isomorphic copy of $G_1$), but since we know $G_1, G_2$ are isomorphic, the distribution over $H$ is the same in both cases.

Now when we invoke the verifier subroutine, we can’t know what the verifier will choose. But the point is that regardless of what the verifier does, there’s still a 50% chance it will pick $k=i$ since we chose $i$ randomly and independently of the definition of $V$.

If you ignore that we need to prove a few probability-theory facts (like that the choice of $H$ really is uniformly random), then this completes the proof.

$\square$

## Statistical and computational zero-knowledge

While perfect zero-knowledge is the strongest form of zero-knowledge, it’s believed to be too strong. A more tractable definition is called statistical zero-knowledge, and for this we relax the requirement that the message distributions are equal to the requirement that they’re “very similar.” Informally, by “very similar” I mean that if the output strings have length $m$, then the sum of the differences of the probabilities is “basically” exponentially small in $m$.

More formally, if $D_1, D_2$ are two distributions over $\{0,1\}^m$, then we say they’re statistically close if the following distance function vanishes (asymptotically in $m$) faster than $1/m^c$ for any $c > 0$.

$\displaystyle d(D_1, D_2) = \sum_{x \in \{0,1\}^m} \left | \Pr_{y \sim D_1}[y=x] - \Pr_{y \sim D_2}[y=x] \right |$

This distance function just sums, for every outcome, the difference between the probabilities of those outcomes. In other words, being statistically close means that the two distributions can’t disagree on any particular input too much, nor can they disagree on most the outputs more than a tiny bit.

In little-o notation, statistically close means $d(D_1, D_2) = o(1/m^c)$ for every constant $c > 0$. It could be like $2^{-m}$ or something vanishing much slower like $1/m^{\log \log m}$. A function of $m$ which has this property is called negligible, and it’s the standard choice of a security guarantee for cryptography.

Definition: Using the same notation as perfect zero-knowledge, an interactive proof is said to have statistical zero-knowledge if there is a simulator such that for every input $x \in L$, the distributions of $M(P,V)(x)$ and $S(x)$ are statistically close.

Interestingly, for this definition and the next (computational zero-knowledge), giving the simulator the power to output KAPUT with probability 1/2 doesn’t add any power. It turns out that any KAPUTful simulator can be transformed into a KAPUTless simulator with polynomial overhead. A sketch of how this works: run the simulator until either it does not KAPUT, or else you’ve tried $n^2$ times with all KAPUTs. The latter event happens with exponentially small probability, so in that case the KAPUTless simulator outputs a uniformly random string (it’s a “don’t care”). This maintains statistical closeness (and, as we’ll see, computational indistinguishability).

Finally, the weakest form of zero-knowledge isn’t so much a property of the two message distributions, but a property of a computationally limited algorithm trying to distinguish between them. We model this by having a probabilistic polynomial-time algorithm $A$ which accepts as input the ability to draw from a distribution, and $A$ produces as output a single bit. We interpret this bit as $A$‘s claim that the two distributions are “different.$A$‘s goal is to answer correctly. If $A$ can do that with non-negligible probability, then $A$ wins and “distinguishes” between the two input distributions. So we’ll call $A$ the distinguisher.  Note that $A$ can do things like draw a $\textup{poly}(m)$ number of samples and do any polynomial-time computation with those samples.

Definition: An interactive proof $M(P,V)$ is said to have computational zero-knowledge if there is a simulator $S$ such that for every input $x$, and for every probabilistic polynomial-time algorithm $A$, the following probability is negligible.

$\displaystyle \Pr[A(M(P,V)(x)) \neq A(S(x))]$

In words, $A$ fails to distinguish between $M(P,V)(x)$ and $S(x)$ with a significant (non-negligible) edge over random guessing. Another way to say it is that the distributions $A(M(P,V)(x))$ and $A(S(x))$ are statistically close. So the simulator can’t necessarily produce a statistically close distribution on messages, but the simulator can produce a distribution that fools a computationally limited observer.

If general, two distributions $D_1, D_2$ are called computationally indistinguishable if they don’t have a distinguisher. I.e., no polynomial-time adversary can distinguish between them with non-negligible probability, in the same way we described above.

So our landscape consists of three classes of problems nested as follows:

Perfect ZK $\subset$ statistical ZK $\subset$ Computational ZK

An interesting question we’ll discuss by towards the end of this post is whether these classes of problems are actually different.

## Back to 3-coloring

This computational distinguishing property shows up all over cryptography. The reason is simple: almost all cryptographic hardness assumptions can be rephrased as the computational indistinguishability properties. The hardness of one-way functions, predicting pseudorandom generators, all of it is the inability of a polynomial-time adversary to solve a problem.

From this perspective, we can already see why it should be obvious that the zero-knowledge proof for 3-coloring has computational zero knowledge: we used crypto to commit to a coloring, and revealed the secret key later. If the verifier could break the zero-knowledge aspect, then they could defeat the underlying cryptographic primitive. The formal proof of computational zero-knowledge is a drawn-out, detail-oriented quest to entrench this idea in formalism and fit the simulator definition.

Nevertheless, it’s interesting to see where precisely the assumption makes its way into the simulator. Reminder of the protocol:

1. The prover has a 3-coloring, and publicly commits to a random permutation of that coloring, sends it to the verifier.
2. The verifier picks an edge uniformly at random, sends it to the prover.
3. The prover reveals the colors for that edge, sends them to the verifier.
4. The verifier checks that the revealed colors were indeed the ones committed to, and accepts if they are different colors.

Here’s the graph 3-coloring simulator, which involves some KAPUTing:

1. Pick $n$ colors $c_1, \dots, c_n \in \{ 1,2,3 \}$ uniformly at random. Commit to these as the “coloring.”
2. Invoke the verifier with the committed “coloring” as input, and receive an edge $(i,j)$ as output.
3. If $c_i = c_j$, KAPUT, else, run the verifier as a subroutine with the true $c_i, c_j$.

Note that if an honest verifier picks the edge randomly, then it will be properly colored with probability $2/3$, and the simulator wins.

A dishonest verifier is trickier, because it gets access to the entire committed coloring. A priori there might be a devious way to select a bad edge. So let’s suppose for the sake of contradiction that the verifier has a method for choosing an improperly colored edge. While it takes a few steps to get here (and a lot more detail), in the end this means that the verifier can determine, with probability better at least $\frac{1}{3} + \frac{1}{n^c}$ for some constant $c$, the difference between a commitment of the bit zero and a commitment of the bit one.

In other words, we can use a verifier that breaks zero-knowledge of this protocol as a bit-commitment-scheme breaker! But the verifier is a probabilistic polynomial-time algorithm, which contradicts the security assumption of a bit-commitment scheme.

Actually, I don’t think I ever formally said what the security assumption for a bit commitment scheme is, since in the previous posts I hadn’t provided the definition of computational indistinguishability. So here it is: a bit-commitment scheme, which is defined by a one-way permutation $G$ with a hard-core predicate $f$, maps $b \mapsto (G(x), f(x) \oplus b)$ and has the property that the distributions committing to zero and one are computationally indistinguishable. That is, the following two distributions:

$\textup{commit}_n(0) = \{ (G(x), f(x) \oplus 0): x \in \{ 0,1 \}^n \}$

$\textup{commit}_n(1) = \{ (G(x), f(x) \oplus 1): x \in \{ 0,1 \}^n \}$

So the verifier can pick a bad edge with probability as most $1/3 + \textup{negligible}$, which is far less than 1/2. Then you can use the trick we showed earlier to get a good enough simulator that never KAPUTs. And that proves computational zero-knowledge.

## Theorems and limitations

Here’s an interesting theorem:

Theorem: Any zero-knowledge protocol which doesn’t use randomization (on both sides) can be solved in randomized polynomial time (is in BPP).

So the random choices made by the prover/verifier are actually essential to the novelty of this concept.

I haven’t yet said anything about statistical zero-knowledge beyond the definition. It turns out there are a lot of interesting problems with statistical zero-knowledge proofs. I should really expand the discussion of statistical zero knowledge into its own post, but I want to wrap up this series and explore some other projects.

One such problem is pretty meta: the problem of telling whether two distributions are statistically close. In fact, this problem is complete for the class of problems with statistical zero-knowledge proofs, in the same sense that 3-coloring is NP-complete; any problem which has a statistical zero-knowledge proof can be reduced to a question about statistical closeness of two specific distributions. The class of problems with statistical zero-knowledge proofs is usually shortened to SZK (along with PZK for perfect zero-knowledge and CZK for computational zero-knowledge).

In addition to being interesting from a theoretical perspective, it’s a very practical question to study: if you can sample from a distribution efficiently, which properties of that distribution can you estimate efficiently? The completeness result says that the inherent complexity of this class of problems is captured by statistical zero-knowledge proofs.

As a theoretical object of study, SZK has massive advantages over perfect/computational zero-knowledge. In particular,

Theorem: Every interactive proof that has statistical zero-knowledge against an honest verifier can be transformed into a statistical zero-knowledge proof against an arbitrary adversary.

So the entire “gotcha” content of this post, which we had to worry about with perfect zero-knowledge, doesn’t exist for statistical zero-knowledge. It still exists for computational zero-knowledge, as far as I know, because it’s not know that every computational zero-knowledge proof also has statistical zero-knowledge (a stronger property).

On another front, people are relatively convinced that perfect zero-knowledge is too restrictive to be useful. This comes from the following theorem due to Fortnow

Theorem: If an NP-complete problem has a statistical zero-knowledge proof, then the polynomial hierarchy collapses.

Without going into detail about what the polynomial hierarchy is, suffice it to say that most people believe it’s unlikely. And because this means that SZK probably does not contain all of NP, we can ask natural followup questions like, can problems in SZK be solved by polynomial-time quantum algorithms? It’s also open whether these can solve NP-complete problems.

Statistical zero-knowledge can be specialized even further.

Theorem: Every problem with a statistical zero-knowledge proof has a statistical zero-knowledge proof in which the only messages the verifier sends are random coin flips.

This is a so-called “public coin” protocol, which makes zero-knowledge proofs much easier to reason about.

For further reading, see this introduction by Salil Vadhan, arguably the world’s leading expert on the topic. Particularly interesting is sections 3-5 in which Vadhan describes more detail about how zero-knowledge has been used as an “interface” for techniques to pass between complexity theory and cryptography.

For further reading with on the excruciating details of the definitions and proofs presented in this series, see Oded Goldreich’s Foundations of Cryptography (Volume I, Chapter 4). I have mixed feelings about this book because, despite the fact that I’ve found it enlightening, the book somehow manages to be simultaneously dense and scattered. It leaves no snag unsmoothed, for better or worse.

Until next time!

# Book mailing list

I’m launching an email list for my book-in-progress, which is tentatively titled, “A Programmer’s Introduction to Mathematics.”

This will probably end up being a monthly or once-every-two-months newsletter with my progress updates. I’ll probably also use this email list to send out sneak peeks of certain chapters and ask for feedback. A lot about the book is still up in the air, but now that the excitement of my PhD defense has passed, I can focus deeply on the book.

A short description of the intended audience (from the comments on this post)

My target audience is a programmer who has gotten through the standard CS education, but never felt they deeply understood math. Think of it as a step up from Kalid’s Better Explained blog (I won’t explain the basics of exponents or trigonometry), but a big step below the hard posts on this blog. I’ll emphasize connections and analogies between math and programming. Think of it as a coherent, linear version of the introductory posts on this blog with cool and nontrivial applications. I will also focus on building the skills and mindset that allows one to pick up other math books and continue learning.

More than anything, I want to keep this blog free of too much non-math, so this will be my last plug for the book until the day it’s released.

A mock cover. Look closely and you’ll see it’s got 95 pages so far.

# Concrete Examples of Quantum Gates

So far in this series we’ve seen a lot of motivation and defined basic ideas of what a quantum circuit is. But on rereading my posts, I think we would all benefit from some concreteness.

## “Local” operations

So by now we’ve understood that quantum circuits consist of a sequence of gates $A_1, \dots, A_k$, where each $A_i$ is an 8-by-8 matrix that operates “locally” on some choice of three (or fewer) qubits. And in your head you imagine starting with some state vector $v$ and applying each $A_i$ locally to its three qubits until the end when you measure the state and get some classical output.

But the point I want to make is that $A_i$ actually changes the whole state vector $v$, because the three qubits it acts “locally” on are part of the entire basis. Here’s an example. Suppose we have three qubits and they’re in the state

$\displaystyle v = \frac{1}{\sqrt{14}} (e_{001} + 2e_{011} - 3e_{101})$

Recall we abbreviate basis states by subscripting them by binary strings, so $e_{011} = e_0 \otimes e_1 \otimes e_1$, and a valid state is any unit vector over the $2^3 = 8$ possible basis elements. As a vector, this state is $\frac{1}{\sqrt{14}} (0,1,0,2,0,-3,0,0)$

Say we apply the gate $A$ that swaps the first and third qubits. “Locally” this gate has the following matrix:

$\displaystyle \begin{pmatrix} 1&0&0&0 \\ 0&0&1&0 \\ 0&1&0&0 \\ 0&0&0&1 \end{pmatrix}$

where we index the rows and columns by the relevant strings in lexicographic order: 00, 01, 10, 11. So this operation leaves $e_{00}$ and $e_{11}$ the same while swapping the other two. However, as an operation on three qubits the operation looks quite different. And it’s sort of hard to describe a general way to write it down as a matrix because of the choice of indices. There are three different perspectives.

Perspective 1: if the qubits being operated on are sequential (like, the third, fourth, and fifth qubits), then we can write the matrix as $I_{2^{a}} \otimes A \otimes I_{2^{b}}$ where a tensor product of matrices is the Kronecker product and $a + b + \log \textup{dim}(A) = n$ (the number of qubits adds up). Then the final operation looks like a “tiled product” of identity matrices by $A$, but it’s a pain to write out. Let me hurt my self for your sake, dear reader.

And each copy of $A \otimes I_{2^b}$ looks like

That’s a mess, but if you write it out for our example of swapping the first and third qubits of a three-qubit register you get the following:

And this makes sense: the gate changes any entry of the state vector that has values for the first and third qubit that are different. This is what happens to our state:

$\displaystyle v = \frac{1}{\sqrt{14}} (0,1,0,2,0,-3,0,0) \mapsto \frac{1}{\sqrt{14}} (0,0,0,0,1,-3,2,0)$

Perspective 2: just assume every operation works on the first three qubits, and wrap each operation $A$ in between an operation that swaps the first three qubits with the desired three. So like $BAB$ for $B$ a swap operation. Then the matrix form looks a bit simpler, and it just means we permute the columns of the matrix form we gave above so that it just has the form $A \otimes I_a$. This allows one to retain a shred of sanity when trying to envision the matrix for an operation that acts on three qubits that are not sequential. The downside is that to actually use this perspective in an analysis you have to carry around the extra baggage of these permutation matrices. So one might use this as a simplifying assumption (a “without loss of generality” statement).

Perspective 3: ignore matrices and write things down in a summation form. So if $\sigma$ is the permutation that swaps 1 and 3 and leaves the other indices unchanged, we can write the general operation on a state $v = \sum_{x \in \{ 0,1 \}^n } a_x e_{x}$ as $Av = \sum_{x \in \{ 0,1 \}^n} a_x e_{\sigma(x)}$.

The third option is probably the nicest way to do things, but it’s important to keep the matrix view in mind for many reasons. Just one quick reason: “errors” in quantum gates (that are meant to approximately compute something) compound linearly in the number of gates because the operations are linear. This is a key reason that allows one to design quantum analogues of error correcting codes.

So we’ve established that the basic (atomic) quantum gates are “local” in the sense that they operate on a fixed number of qubits, but they are not local in the sense that they can screw up the entire state vector.

## A side note on the meaning of “local”

When I was chugging through learning this stuff (and I still have far to go), I wanted to come up with an alternate characterization of the word “local” so that I would feel better about using the word “local.” Mathematicians are as passionate about word choice as programmers are about text editors. In particular, for a long time I was ignorantly convinced that quantum gates that act on a small number of qubits don’t affect the marginal distribution of measurement outcomes for other qubits. That is, I thought that if $A$ acts on qubits 1,2,3, then $Av$ and $v$ have the same probability of a measurement producing a 1 in index 4, 5, etc, conditioned on fixing a measurement outcome for qubits 1,2,3. In notation, if $x$ is a random variable whose values are binary strings and $v$ is a state vector, I’ll call $x \sim v$ the random process of measuring a state vector $v$ and getting a string $x$, then my claim was that the following was true for every $b_1, b_2, b_3 \in \{0,1\}$ and every $1 \leq i \leq n$:

\displaystyle \begin{aligned}\Pr_{x \sim v}&[x_i = 1 \mid x_1 = b_1, x_2 = b_2, x_3 = b_3] = \\ \Pr_{x \sim Av}&[x_i = 1 \mid x_1 = b_1, x_2 = b_2, x_3 = b_3] \end{aligned}

You could try to prove this, and you would fail because it’s false. In fact, it’s even false if $A$ acts on only a single qubit! Because it’s so tedious to write out all of the notation, I decided to write a program to illustrate the counterexample. (The most brazenly dedicated readers will try to prove this false fact and identify where the proof fails.)

import numpy

H = (1/(2**0.5)) * numpy.array([[1,1], [1,-1]])
I = numpy.identity(4)
A = numpy.kron(H,I)


Here $H$ is the 2 by 2 Hadamard matrix, which operates on a single qubit and maps $e_0 \mapsto \frac{e_0 + e_1}{\sqrt{2}}$, and $e_1 \mapsto \frac{e_0 - e_1}{\sqrt{2}}$. This matrix is famous for many reasons, but one simple use as a quantum gate is to generate uniform random coin flips. In particular, measuring $He_0$ outputs 1 and 0 with equal probability.

So in the code sample above, $A$ is the mapping which applies the Hadamard operation to the first qubit and leaves the other qubits alone.

Then we compute some arbitrary input state vector $w$

def normalize(z):
return (1.0 / (sum(abs(z)**2) ** 0.5)) * z

v = numpy.arange(1,9)
w = normalize(v)


And now we write a function to compute the probability of some query conditioned on some fixed bits. We simply sum up the square norms of all of the relevant indices in the state vector.

def condProb(state, query={}, fixed={}):
num = 0
denom = 0
dim = int(math.log2(len(state)))

for x in itertools.product([0,1], repeat=dim):
if any(x[index] != b for (index,b) in fixed.items()):
continue

i = sum(d << i for (i,d) in enumerate(reversed(x)))
denom += abs(state[i])**2
if all(x[index] == b for (index, b) in query.items()):
num += abs(state[i]) ** 2

if num == 0:
return 0

return num / denom


So if the query is query = {1:0} and the fixed thing is fixed = {0:0}, then this will compute the probability that the measurement results in the second qubit being zero conditioned on the first qubit also being zero.

And the result:

Aw = A.dot(w)
query = {1:0}
fixed = {0:0}
print((condProb(w, query, fixed), condProb(Aw, query, fixed)))
# (0.16666666666666666, 0.29069767441860467)


So they are not equal in general.

Also, in general we won’t work explicitly with full quantum gate matrices, since for $n$ qubits the have size $2^{2n}$ which is big. But for finding counterexamples to guesses and false intuition, it’s a great tool.

## Some important gates on 1-3 qubits

Let’s close this post with concrete examples of quantum gates. Based on the above discussion, we can write out the 2 x 2 or 4 x 4 matrix form of the operation and understand that it can apply to any two qubits in the state of a quantum program. Gates are most interesting when they’re operating on entangled qubits, and that will come out when we visit our first quantum algorithm next time, but for now we will just discuss at a naive level how they operate on the basis vectors.

Let $H$ be the following 2 by 2 matrix, which operates on a single qubit and maps $e_0 \mapsto \frac{e_0 + e_1}{\sqrt{2}}$, and $e_1 \mapsto \frac{e_0 - e_1}{\sqrt{2}}$.

$\displaystyle H = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}$

One can use $H$ to generate uniform random coin flips. In particular, measuring $He_0$ outputs 1 and 0 with equal probability.

### Quantum NOT gate:

Let $X$ be the 2 x 2 matrix formed by swapping the columns of the identity matrix.

$\displaystyle X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$

This gate is often called the “Pauli-X” gate by physicists. This matrix is far too simple to be named after a person, and I can only imagine it is still named after a person for the layer of obfuscation that so often makes people feel smarter (same goes for the Pauli-Y and Pauli-Z gates, but we’ll get to those when we need them).

If we’re thinking of $e_0$ as the boolean value “false” and $e_1$ as the boolean value “true”, then the quantum NOT gate simply swaps those two states. In particular, note that composing a Hadamard and a quantum NOT gate can have interesting effects: $XH(e_0) = H(e_0)$, but $XH(e_1) \neq H(e_1)$. In the second case, the minus sign is the culprit. Which brings us to…

### Phase shift gate:

Given an angle $\theta$, we can “shift the phase” of one qubit by an angle of $\theta$ using the 2 x 2 matrix $R_{\theta}$.

$\displaystyle R_{\theta} = \begin{pmatrix} 1 & 0 \\ 0 & e^{i \theta} \end{pmatrix}$

“Phase” is a term physicists like to use for angles. Since the coefficients of a quantum state vector are complex numbers, and since complex numbers can be thought of geometrically as vectors with direction and magnitude, it makes sense to “rotate” the coefficient of a single qubit. So $R_{\theta}$ does nothing to $e_0$ and it rotates the coefficient of $e_1$ by an angle of $\theta$.

Continuing in our theme of concreteness, if I have the state vector $v = \frac{1}{\sqrt{204}} (1,2,3,4,5,6,7,8)$ and I apply a rotation of $pi$ to the second qubit, then my operation is the matrix $I_2 \otimes R_{\pi} \otimes I_2$ which maps $e_{i0k} \mapsto e_{i0k}$ and $e_{i1k} \mapsto -e_{i1k}$. That would map the state $v$ to $(1,2,-3,-4,5,6,-7,-8)$.

If we instead used the rotation by $\pi/2$ we would get the output state $(1,2,3i, 4i, 5, 6, 7i, 8i)$.

### Quantum AND/OR gate:

In the last post in this series we gave the quantum AND gate and left the quantum OR gate as an exercise. Rather than write out the matrix again, let me remind you of this gate using a description of the effect on the basis $e_{ijk}$ where $i,j,k \in \{ 0,1 \}$. Recall that we need three qubits in order to make the operation reversible (which is a consequence of all unitary gates being unitary matrices). Some notation: $\oplus$ is the XOR of two bits, and $\wedge$ is AND, $\vee$ is OR. The quantum AND gate maps

$\displaystyle e_{ijk} \mapsto e_{ij(k \oplus (i \wedge j))}$

In words, the third coordinate is XORed with the AND of the first two coordinates. We think of the third coordinate as a “scratchwork” qubit which is maybe prepared ahead of time to be in state zero.

Simiarly, the quantum OR gate maps $e_{ijk} \mapsto e_{ij(k \oplus (i \vee j))}$. As we saw last time these combined with the quantum NOT gate (and some modest number of scratchwork qubits) allows quantum circuits to simulate any classical circuit.

### Controlled-* gate:

The last example in this post is a meta-gate that represents a conditional branching. If we’re given a gate $A$ acting on $k$ qubits, then we define the controlled-A to be an operation which acts on $k+1$ qubits. Let’s call the added qubit “qubit zero.” Then controlled-A does nothing if qubit zero is in state 0, and applies $A$ if qubit zero is in state 1. Qubit zero is generally called the “control qubit.”

The matrix representing this operation decomposes into blocks if the control qubit is actually the first qubit (or you rearrange).

$\displaystyle \begin{pmatrix} I_{2^{k-1}} & 0 \\ 0 & A \end{pmatrix}$

A common example of this is the controlled-NOT gate, often abbreviated CNOT, and it has the matrix

$\displaystyle \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix}$

## Looking forward

Okay let’s take a step back and evaluate our life choices. So far we’ve spent a few hours of our time motivating quantum computing, explaining the details of qubits and quantum circuits, and seeing examples of concrete quantum gates and studying measurement. I’ve hopefully hammered into your head the notion that quantum states which aren’t pure tensors (i.e. entangled) are where the “weirdness” of quantum computing comes from. But we haven’t seen any examples of quantum algorithms yet!

Next time we’ll see our first example of an algorithm that is genuinely quantum. We won’t tackle factoring yet, but we will see quantum “weirdness” in action.

Until then!