**Problem: **Determine if two polynomial expressions represent the same function. Specifically, if $ p(x_1, x_2, \dots, x_n)$ and $ q(x_1, x_2, \dots, x_n)$ are a polynomial with inputs, outputs and coefficients in a field $ F$, where $ |F|$ is sufficiently large, then the problem is to determine if $ p(\mathbf{x}) = q(\mathbf{x})$ for every $ x \in F$, in time polynomial in the number of bits required to write down $ p$ and $ q$.

**Solution:** Let $ d$ be the maximum degree of all terms in $ p, q$. Choose a finite set $ S \subset F$ with $ |S| > 2d$. Repeat the following process 100 times:

- Choose inputs $ z_1, z_2, \dots, z_n \in S$ uniformly at random.
- Check if $ p(z_1, \dots, z_n) = q(z_1, \dots, z_n)$.

If every single time the two polynomials agree, accept the claim that they are equal. If they disagree on any input, reject. You will be wrong with probability at most $ 2^{-100}$.

**Discussion: **At first glance it’s unclear why this problem is hard.

If you have two representations of polynomials $ p, q$, say expressed in algebraic notation, why can’t you just do the algebra to convert them both into the same format, and see if they’re equal?

Unfortunately, that conversion can take exponential time. For example, suppose you have a polynomial $ p(x) = (x+1)^{1000}$. Though it only takes a few bits to write down, expressing it in a “canonical form,” often in the monomial form $ a_0 + a_1x + \dots + a_d x^d$, would require exponentially many bits in the original representation. In general, it’s unknown how to algorithmically transform polynomials into a “canonical form” (so that they can be compared) in subexponential time.

Instead, the best we know how to do is treat the polynomials as black boxes and plug values into them.

Indeed, for single variable polynomials it’s well known that a nonzero degree $ d$ polynomial has at most $ d$ roots. A similar result is true for polynomials with many variables, and so we can apply that result to the polynomial $ p – q$ to determine if $ p = q$. This theorem is so important (and easy to prove) that it deserves the name of *lemma.*

**The Schwartz-Zippel lemma. **Let $ p$ be a nonzero polynomial of total degree $ d \geq 0$ over a field $ F$. Let $ S$ be a finite subset of $ F$ and let $ z_1, \dots, z_n$ be chosen uniformly at random from $ S$. The probability that $ p(z_1, \dots, z_n) = 0$ is at most $ d / |S|$.

*Proof. *By induction on the number of variables $ n$. For the case of $ n=1$, it’s the usual fact that a single-variable polynomial can have at most $ d$ roots. Now for the inductive step, assume this is true for all polynomials with $ n-1$ variables, and we will prove it for $ n$ variables. Write $ p$ as a polynomial in the variable $ x_1$, whose coefficients are other polynomials:

$ \displaystyle p(x_1, \dots, x_n) = \sum_{k=1}^d Q_k(x_2, \dots, x_n) x_1^k$

Here we’ve grouped $ p$ by the powers of $ x_1$, so that $ Q_i$ are the coefficients of each $ x_1^k$. This is useful because we’ll be able to apply the inductive hypothesis to one of the $ Q_i$’s, which have fewer variables.

Indeed, we claim there must be some $ Q_k$ which is nonzero for $ k > 0$. Clearly, since $ p$ is not the zero polynomial, some $ Q_k$ must be nonzero. If the only nonzero $ Q_k$ is $ Q_0$, then we’re done because $ p$ doesn’t depend on $ x_1$ at all. Otherwise, take the largest nonzero $ Q_k$. It’s true that the degree of $ Q_k$ is at most $ d-k$. This is true because the term $ x_1^k Q_k$ has degree at most $ d$.

By the inductive hypothesis, if we choose $ z_2, \dots, z_n$ and plug them into $ Q_k$, we get zero with probability at most $ \frac{d-k}{|S|}$. The crucial part is that if this *polynomial* coefficient is nonzero, then the entire polynomial $ p$ is nonzero. This is true even if an unlucky choice of $ x_1 = z_1$ causes the resulting evaluation $ p(z_1, \dots, z_n) \neq 0$.

To think about it a different way, imagine we’re evaluating the polynomial in phases. In the first phase, we pick the $ z_2, \dots, z_n$. We could also pick $ z_1$ independently but not reveal what it is, for the sake of this story. Then we plug in the $ z_2, \dots, z_n$, and the result is a one-variable polynomial whose largest coefficient is $ Q_k(z_1, \dots, z_n)$. The inductive hypothesis tells us that this one-variable polynomial is the zero polynomial with probability at most $ \frac{d-k}{|S|}$. (It’s probably a smaller probability, since *all* the coefficients have to be zero, but we’re just considering the largest one for the sake of generality and simplicity)

Indeed, the resulting polynomial after we plug in $ z_2, \dots, z_n$ has degree $ k$, so we can apply the inductive hypothesis to it as well, and the probability that it’s zero for a random choice of $ z_1$ is at most $ k / |S|$.

Finally, the probability that both occur can be computed using basic probability algebra. Let $ A$ be the event that, for these $ z_i$ inputs, $ Q_k$ is zero, and $ B$ the event that $ p$ is zero for the $ z_i$ and the additional $ z_1$.

Then $ \textup{Pr}[B] = \textup{Pr}[B \textup{ and } A] + \textup{Pr}[B \textup{ and } !A] = \textup{Pr}[B \mid A] \textup{Pr}[A] + \textup{Pr}[B \mid !A] \textup{Pr}[!A]$.

Note the two quantities above that we don’t know are $ \textup{Pr}[B \mid A]$ and $ \textup{Pr}[!A]$, so we’ll bound them from above by 1. The rest of the quantities add up to exactly what we want, and so

$ \displaystyle \textup{Pr}[B] \leq \frac{d-k}{|S|} + \frac{k}{|S|} = \frac{d}{|S|},$

which proves the theorem.

$ \square$

While this theorem is almost trivial to prove (it’s elementary induction, and the obvious kind), it can be used to solve polynomial identity testing, as well as finding perfect matchings in graphs and test numbers for primality.

But while the practical questions are largely solved–it’s hard to imagine a setting where you’d need faster primality testing than the existing randomized algorithms–the theory and philosophy of the result is much more interesting.

Indeed, checking two polynomials for equality has no known deterministic polynomial time algorithm. It’s one of a small class of problems, like integer factoring and the discrete logarithm, which are not known to be efficiently solvable in theory, but are also not known to be NP-hard, so there is still hope. The existence of this randomized algorithm increases hope (integer factorization sure doesn’t have one!). And more generally, the fact that there are so few natural problems in this class makes one wonder whether randomness is actually beneficial at all. From a polynomial-time-defined-as-efficient perspective, can every problem efficiently solvable with access to random bits also be solved without such access? In the computational complexity lingo, does P = BPP? Many experts think the answer is yes.

I’m sure this is a dumb question but I’ll ask anyway.

If you have two quadratic functions, you can check for equality by testing 3 values, knowing that any 3 specific points uniquely describe a quadratic.

Why doesn’t this idea scale up with multivariable equations?

@atrix256 This is a fantastic question, and I’m embarrassed I didn’t see it coming and preemptively answer it in the post. (It also hints at some of the subtler issues of this post, which largely went unanswered as I wrote them)

The first problem is that when you have a multivariate function (even over R), it can have infinitely many roots. Take, for example, the function f(x,y) = xy. It has as its roots the two lines y=0, and x=0. So the usual bound on the number of roots before being forced to be the zero polynomial is simply not true. In fact, if you look closely we’re applying the single-variable rule in a modified guise in the proof (since we write the original polynomial as a “single variable” polynomial whose coefficients are polynomials in the other variables). Probability and counting are two sides of the same coin here.

A second problem is that, for finite fields of small size (like F_2), there are nonzero polynomials (not all coefficients are zero) which still *evaluate* to zero on all inputs! While the problem, as stated here, avoids this by letting the size of F be “sufficiently large,” it shows that when the field is not infinite, or not algebraically closed, strange things can happen. However, in this case you can solve the small field problem by taking any field extension of F which is sufficiently large to apply this theorem, and you get the same result. I didn’t want to assume too much background because that’s sort of a “trick” that isn’t at the heart of this proof. The problem is clear enough when you have F_p for a very large prime p.

This conversation had me thinking – I wonder if multivariable equations might have roots expressed not as scalars, but as lines, curves, surfaces, hypersurfaces etc.

That led me to algebraic geometry (https://en.m.wikipedia.org/wiki/Algebraic_geometry) which I don’t yet know much of anything about, but looks like it’s relevant.

Makes me wonder then, if you could find roots as equations, you could maybe do that recursively (?) until you get to univariate equations which you could then test the usual way.

Probably flawed reasoning but it’ll be interesting to see how it’s flawed and why.

Thanks for this post, it’s an interesting topic!

Yes, I think this is actually what the proof in this post is doing, albeit indirectly. This is another perspective that is more direct: https://math.stackexchange.com/questions/367108/counting-roots-of-a-multivariate-polynomial-over-a-finite-field

You’re right that algebraic geometry is the field to look at. It’s also one of the crown jewels of pure mathematics. If you’re coming at this from a programming/computer science perspective, a fantastic book is called Ideals, Varieties, and Algorithms

Nice post. I didn’t know what you meant by largest , until it was clear later that you meant largest .

I am intrigued by the philosophy at the end. In principle, we could do without randoms, but computers have finite precision, and programmers have finite patience. I recently decided to resort to randoms to make completion of my project more plausible. In particular, I need to choose a plane that intersects previously chosen planes in a given pattern. The problem is the calculation of the intersections might produce nearly singular matrices. In principle, there might be an abstraction that solves that for me, but I don’t know what it is, and I probably would not understand it. In practice, it suffices for me to try several constrained random planes.