Learning a single-variable polynomial, or the power of adaptive queries

Problem: Alice chooses a secret polynomial $p(x)$ with nonnegative integer coefficients. Bob wants to discover this polynomial by querying Alice for the value of $p(x)$ for some integer $x$ of Bob’s choice. What is the minimal number of queries Bob needs to determine $p(x)$ exactly?

Solution: Two queries. The first is $p(1)$, and if we call $N = p(1) + 1$, then the second query is $p(N)$.

To someone who is familiar with polynomials, this may seem shocking, and I’ll explain why it works in a second. After all, it’s very easy to prove that if Bob gives Alice all of his queries at the same time (if the queries are not adaptive), then it’s impossible to discover what $p(x)$ is using fewer than $\textup{deg}(p) + 1$ queries. This is due to a fact called polynomial interpolation, which we’ve seen on this blog before in the context of secret sharing. Specifically, there is a unique single-variable degree $d$ polynomial passing through $d+1$ points (with distinct $x$-values). So if you knew the degree of $p$, you could determine it easily. But Bob doesn’t know the degree of the polynomial, and there’s no way he can figure it out without adaptive queries! Indeed, if Bob tries and gives a set of $d$ queries, Alice could have easily picked a polynomial of degree $d+1$. So it’s literally impossible to solve this problem without adaptive queries.

The lovely fact is that once you allow adaptiveness, the number of queries you need doesn’t even depend on the degree of the secret polynomial!

Okay let’s get to the solution. It was crucial that our polynomial had nonnegative integer coefficients, because we’re going to do a tiny bit of number theory. Let $p(x) = a_0 + a_1 x + \dots + a_d x^d$. First, note that $p(1)$ is exactly the sum of the coefficients $\sum_i a_i$, and in particular $p(1) + 1$ is larger than any single coefficient. So call this $N$, and query $p(N)$. This gives us a number $y_0$ of the form

$\displaystyle y_0 = a_0 + a_1N + a_2N^2 + \dots + a_dN^d$

And because $N$ is so big, we can compute $a_0$ easily by computing $y_0 \mod N$. Now set $y_1 = (y_0 - a_0) / N$, and this has the form $a_1 + a_2N + \dots + a_dN^{d-1}$. We can compute modulus again to get $a_1$, and repeat until we have all the coefficients. We’ll stop once we get a $y_i$ that is zero.

As a small technical note, this is a polynomial-time algorithm in the number of bits needed to write down $p(x)$. So this demonstrates the power of adaptive queries: we get from something which is uncomputable with any number of queries to something which is efficiently computable with a constant number of queries.

The obvious follow-up question is: can you come up with an efficient algorithm if we allow the coefficients to be negative integers?

21 thoughts on “Learning a single-variable polynomial, or the power of adaptive queries”

1. U

Holy ****! this is amazing.
Is there any conceivable way to generalize this to positive reals (or rationals with unknown denominator)?

Like

2. Ask for p(10^N) – then all coefficient in the polynomial will be visible inside the number p(10^N).

Like

• Sure, you can pick any base larger than p(1). It’s the same process to discover the coefficients for an algorithm that we don’t assume can “see” stuff.

Like

• Randy

Wouldn’t this only work if the coefficients were less than 10?

Like

• You can write numbers in any base you want. Base 16, for example, is called hexadecimal

Like

3. Allen Knutson

How about multivariate polynomials with natural-number coefficients?

Like

4. To answer your follow-up question: No, it’s impossible. Not only is it not efficient, there is no algorithm which always halts and returns the correct answer.
Let L be a finite list of integers, which we will first guess p( L_0 ), if it returns a 0, then we test p( L_1 ). If it doesn’t return 0, then we have some information about the polynomial, and we can use some other strategy which is irrelevant here (for soon to be obvious reasons.) We continue going down the list L until something returns non-zero, or we get to the end.
Now, lets construct p such that p(x)=(x – L_0 )(x – L_1 )(x – L_2 )* … *(x – L_last )
Trivially, for any L_i in L, p( L_i )=0
So there is no finite list of integers L which will for sure know the polynomial. By that lemma, there is no infinite list of L which at any point will definitely know p. Consider the case of p(x)=0, no guessing strategy will at any finite point know whether p(x)=0, or if p(x) is a polynomial as constructed above.
Since no guessing strategy will ever know for sure that p(x)=0 if all queries so far have returned 0, there is no guessing strategy which in a finite number of guesses can figure out all polynomials.

Like

• I’m not entirely convinced by this argument, and here’s why. In a problem like this one allows the queries to be adaptive, and so the following is a valid algorithm:

i = 1
while the result of my query is zero:
query p(i)
i = i+1


This would produce an infinite list of queries, but you’re guaranteed that not only will this loop terminate, because your adversary “chose” a polynomial in advance, but it will only take linear time in the size of the unknown polynomial! You can imagine that Alice has to commit to at least the degree of the polynomial in advance, but then she could pick the roots of the polynomial in this worst-case way. And then with this algorithm the next query I ask will give me a unique polynomial (d+1 points for a degree d polynomial), and I’ll win (though I don’t know for sure the polynomial has degree d, I’d have to find another way to determine that). In any case, this is where I feel your argument falls short, but it is exactly the right mindset for thinking about these questions.

Like

• Tom

It’s possible that I’m missing something, but couldn’t you use the above argument to prove that the problem is impossible as stated? The guessing strategy does matter.

Like

• No, again Scott’s argument assumes that the queries are not adaptive (hence the subtitle, “the power of adaptive queries”).

Like

5. Aren’t you answering the question for the maximal number of queries, instead of the minimal number of queries that Bob needs?

Like

• It is a worst case assumption on the inputs (the choice of polynomial), and I’m asking for the best algorithm in terms of query complexity. So it’s both.

Like

6. stephanwehner

I think your method gives the maximal number of queries that Bob needs, not the minimal number of queries.

Like

7. To prove an algorithm is correct, you must first provide the algorithm, and then prove for all polynomials that the algorithm terminates with the correct answer. My argument is that you can’t both prove the algorithm terminates and that it always provides the correct answer.
For any finite number of queries, we can create a polynomial P (as shown in my previous post) which will return 0 for all queries to it. This means that there is no limit to the number of queries the algorithm may need to make in order to correctly return the polynomial.
However, the polynomial may instead of P(x)=0, it always returns 0. For an algorithm to correctly determine that the polynomial it must at some point terminate and return P(x)=0.
There’s no way that an algorithm would be able to handle both cases. In the case of your provided algorithm, it will simply loop forever, always seeing that the answer to the query is 0. If at any point it “gave up” and returns the answer of P(x)=0, then there’s polynomials that it’s incorrectly labeling as such.

That said, looking back at your original wording, there may be some ambiguity which would potentially allow for this to be false (but not for the reasons you provided.) You said negative integers, does that allow for coefficients of 0? One could argue that the coefficients of any polynomial for things larger than the degree of the polynomial are 0, but ignoring that: You may say that P(x)=0 is illegal if you also say that there’s no 0 coefficients less than the degree, and that P(x)==0 is also illegal. On the other side, it’s easy to make my construction of the larger degree polynomials illegal, since it’s trivial to provide a list {say, -1, and 1} which produces a polynomial {P(x)=(x+1)(x-1)=x^2-1} which has 0 coefficients less than the degree. There may be ways to multiply additional arguments {in this case, (x+2), so that p(x)=(x + 1)(x – 1)(x + 2)=x^3+2 x^2-x-2} so that the polynomial remains valid under these very arbitrary rules, however I’m not sure where to even start to prove or disprove that.

Like

• @Scott, the problem states that Alice chooses a secret polynomial p(x) with nonnegative integer coefficients. Such polynomials are never zero for positive values of x. But Jeremy’s method queries at 1 and N = p(1) + 1, both stricly positive integers. I think your argument doesn’t take that into consideration.

Like

• Scott is replying to a different problem (when the polynomial is allowed to have negative coefficients). He’s trying to prove it’s impossible (he’s right), but his proof is just a little bit off.

Like

• I was not trying to prove my algorithm was correct. I was showing you why your proof was incorrect. In your original comment you argued “Suppose an algorithm queries T queries (where I haven’t decided yet what the polynomial is). Then I will build a polynomial with T roots, and pretend I was answering queries for that all along. Since there are lots of polynomials with more than T roots, no algorithm can ever win.” Then you go on to say something about p(x) being identically zero as an afterthought (after you finished your proof). This is an incorrect proof for the same reasons I outlined above; you cannot assume that an algorithm will have a universally bounded number of queries, since the number of queries depends on which polynomial you commit to.

Here is a correct proof, which starts from your afterthought. Suppose A is a deterministic algorithm which always terminates and learns the unknown polynomial. Run A on the identically zero polynomial, and let $q_1, \dots, q_m$ be the queries it makes during that run. Now $m$ is finite because it depends only on the run of A for a single polynomial (see how it differs here from your argument?). *Now* you can fool A by using the $q_i$ to build a polynomial $\prod_i (x - q_i)$ that A cannot distinguish from the identically zero polynomial because its behavior deterministically depends on the responses to the queries.

Now a more interesting task: find a *randomized* algorithm that can learn any single variable polynomial with arbitrarily high probability. Or easier: if you assume the polynomial is not identically zero, can you learn it then? It’s clear that once you figure out the degree of the polynomial you are done; just keep querying until you can interpolate. So the question is can you determine the degree?

Like

8. Daniel

Any function is the sum of an even and an odd function: F = Feven + Fodd . If you ask for two values P(a) and P(-a) then you have one value for Feven Feven(a) = 1/2( P(a)+P(-a)) and you also know one value of Fodd = 1/2(P(a)-P(-a)). Now solve the problem using your technique, that is ask for Feven(F(even(a)+1) and you obtain Feven. You can solve Fodd using the same technique, that requires 4 queries, but perhaps you can take some advantage of knowing Feven to reduce that to 3, I am not sure. Good post.

Like

9. Daniel

After reading other comments, I see now that there is a problem with the zero polynomial, since you can’t determine that a polynomial is zero from a finite number of queries, there is no algorithm that can determine correctly in a finite number of steps that a polynomial is zero from a finite number of values. If you bound the degree or the input then you can guarantee a terminanting algorithm.

A version for which my proof works: Given a polynomial with integer coefficients and whose roots are all non integer, find the minimal number of queries to determine the polynomial. With four is enough, Take 1, -1, and b=1+max(Feven(1), – Fodd(-1)), and -b. Then apply your technique to determine Fodd and Feven. A question: can it be done with three queries?

Like