Problem: Determine if two polynomial expressions represent the same function. Specifically, if and
are a polynomial with inputs, outputs and coefficients in a field
, where
is sufficiently large, then the problem is to determine if
for every
, in time polynomial in the number of bits required to write down
and
.
Solution: Let be the maximum degree of all terms in
. Choose a finite set
with
. Repeat the following process 100 times:
- Choose inputs
uniformly at random.
- Check if
.
If every single time the two polynomials agree, accept the claim that they are equal. If they disagree on any input, reject. You will be wrong with probability at most .
Discussion: At first glance it’s unclear why this problem is hard.
If you have two representations of polynomials , say expressed in algebraic notation, why can’t you just do the algebra to convert them both into the same format, and see if they’re equal?
Unfortunately, that conversion can take exponential time. For example, suppose you have a polynomial . Though it only takes a few bits to write down, expressing it in a “canonical form,” often in the monomial form
, would require exponentially many bits in the original representation. In general, it’s unknown how to algorithmically transform polynomials into a “canonical form” (so that they can be compared) in subexponential time.
Instead, the best we know how to do is treat the polynomials as black boxes and plug values into them.
Indeed, for single variable polynomials it’s well known that a nonzero degree polynomial has at most
roots. A similar result is true for polynomials with many variables, and so we can apply that result to the polynomial
to determine if
. This theorem is so important (and easy to prove) that it deserves the name of lemma.
The Schwartz-Zippel lemma. Let be a nonzero polynomial of total degree
over a field
. Let
be a finite subset of
and let
be chosen uniformly at random from
. The probability that
is at most
.
Proof. By induction on the number of variables . For the case of
, it’s the usual fact that a single-variable polynomial can have at most
roots. Now for the inductive step, assume this is true for all polynomials with
variables, and we will prove it for
variables. Write
as a polynomial in the variable
, whose coefficients are other polynomials:
Here we’ve grouped by the powers of
, so that
are the coefficients of each
. This is useful because we’ll be able to apply the inductive hypothesis to one of the
‘s, which have fewer variables.
Indeed, we claim there must be some which is nonzero for
. Clearly, since
is not the zero polynomial, some
must be nonzero. If the only nonzero
is
, then we’re done because
doesn’t depend on
at all. Otherwise, take the largest nonzero
. It’s true that the degree of
is at most
. This is true because the term
has degree at most
.
By the inductive hypothesis, if we choose and plug them into
, we get zero with probability at most
. The crucial part is that if this polynomial coefficient is nonzero, then the entire polynomial
is nonzero. This is true even if an unlucky choice of
causes the resulting evaluation
.
To think about it a different way, imagine we’re evaluating the polynomial in phases. In the first phase, we pick the . We could also pick
independently but not reveal what it is, for the sake of this story. Then we plug in the
, and the result is a one-variable polynomial whose largest coefficient is
. The inductive hypothesis tells us that this one-variable polynomial is the zero polynomial with probability at most
. (It’s probably a smaller probability, since all the coefficients have to be zero, but we’re just considering the largest one for the sake of generality and simplicity)
Indeed, the resulting polynomial after we plug in has degree
, so we can apply the inductive hypothesis to it as well, and the probability that it’s zero for a random choice of
is at most
.
Finally, the probability that both occur can be computed using basic probability algebra. Let be the event that, for these
inputs,
is zero, and
the event that
is zero for the
and the additional
.
Then .
Note the two quantities above that we don’t know are and
, so we’ll bound them from above by 1. The rest of the quantities add up to exactly what we want, and so
which proves the theorem.
While this theorem is almost trivial to prove (it’s elementary induction, and the obvious kind), it can be used to solve polynomial identity testing, as well as finding perfect matchings in graphs and test numbers for primality.
But while the practical questions are largely solved–it’s hard to imagine a setting where you’d need faster primality testing than the existing randomized algorithms–the theory and philosophy of the result is much more interesting.
Indeed, checking two polynomials for equality has no known deterministic polynomial time algorithm. It’s one of a small class of problems, like integer factoring and the discrete logarithm, which are not known to be efficiently solvable in theory, but are also not known to be NP-hard, so there is still hope. The existence of this randomized algorithm increases hope (integer factorization sure doesn’t have one!). And more generally, the fact that there are so few natural problems in this class makes one wonder whether randomness is actually beneficial at all. From a polynomial-time-defined-as-efficient perspective, can every problem efficiently solvable with access to random bits also be solved without such access? In the computational complexity lingo, does P = BPP? Many experts think the answer is yes.
I’m sure this is a dumb question but I’ll ask anyway.
If you have two quadratic functions, you can check for equality by testing 3 values, knowing that any 3 specific points uniquely describe a quadratic.
Why doesn’t this idea scale up with multivariable equations?
LikeLiked by 1 person
@atrix256 This is a fantastic question, and I’m embarrassed I didn’t see it coming and preemptively answer it in the post. (It also hints at some of the subtler issues of this post, which largely went unanswered as I wrote them)
The first problem is that when you have a multivariate function (even over R), it can have infinitely many roots. Take, for example, the function f(x,y) = xy. It has as its roots the two lines y=0, and x=0. So the usual bound on the number of roots before being forced to be the zero polynomial is simply not true. In fact, if you look closely we’re applying the single-variable rule in a modified guise in the proof (since we write the original polynomial as a “single variable” polynomial whose coefficients are polynomials in the other variables). Probability and counting are two sides of the same coin here.
A second problem is that, for finite fields of small size (like F_2), there are nonzero polynomials (not all coefficients are zero) which still *evaluate* to zero on all inputs! While the problem, as stated here, avoids this by letting the size of F be “sufficiently large,” it shows that when the field is not infinite, or not algebraically closed, strange things can happen. However, in this case you can solve the small field problem by taking any field extension of F which is sufficiently large to apply this theorem, and you get the same result. I didn’t want to assume too much background because that’s sort of a “trick” that isn’t at the heart of this proof. The problem is clear enough when you have F_p for a very large prime p.
LikeLiked by 1 person
This conversation had me thinking – I wonder if multivariable equations might have roots expressed not as scalars, but as lines, curves, surfaces, hypersurfaces etc.
That led me to algebraic geometry (https://en.m.wikipedia.org/wiki/Algebraic_geometry) which I don’t yet know much of anything about, but looks like it’s relevant.
Makes me wonder then, if you could find roots as equations, you could maybe do that recursively (?) until you get to univariate equations which you could then test the usual way.
Probably flawed reasoning but it’ll be interesting to see how it’s flawed and why.
Thanks for this post, it’s an interesting topic!
LikeLike
Yes, I think this is actually what the proof in this post is doing, albeit indirectly. This is another perspective that is more direct: https://math.stackexchange.com/questions/367108/counting-roots-of-a-multivariate-polynomial-over-a-finite-field
You’re right that algebraic geometry is the field to look at. It’s also one of the crown jewels of pure mathematics. If you’re coming at this from a programming/computer science perspective, a fantastic book is called Ideals, Varieties, and Algorithms
LikeLiked by 1 person
Nice post. I didn’t know what you meant by largest
, until it was clear later that you meant largest
.
LikeLike
I am intrigued by the philosophy at the end. In principle, we could do without randoms, but computers have finite precision, and programmers have finite patience. I recently decided to resort to randoms to make completion of my project more plausible. In particular, I need to choose a plane that intersects previously chosen planes in a given pattern. The problem is the calculation of the intersections might produce nearly singular matrices. In principle, there might be an abstraction that solves that for me, but I don’t know what it is, and I probably would not understand it. In practice, it suffices for me to try several constrained random planes.
LikeLike