# Sample Extraction from RLWE to LWE

In this article I’ll derive a trick used in FHE called sample extraction. In brief, it allows one to partially convert a ciphertext in the Ring Learning With Errors (RLWE) scheme to the Learning With Errors (LWE) scheme.

Here are some other articles I’ve written about other FHE building blocks, though they are not prerequisites for this article.

## LWE and RLWE

The first two articles in the list above define the Learning With Errors problem (LWE). I will repeat the definition here:

LWE: The LWE encryption scheme has the following parameters:

• A plaintext space $\mathbb{Z}/q\mathbb{Z}$, where $q \geq 2$ is a positive integer. This is the space that the underlying message $m$ comes from.
• An LWE dimension $n \in \mathbb{N}$.
• A discrete Gaussian error distribution $D$ with a mean of zero and a fixed standard deviation.

An LWE secret key is defined as a vector $s \in \{0, 1\}^n$ (uniformly sampled). An LWE ciphertext is defined as a vector $a = (a_1, \dots, a_n)$, sampled uniformly over $(\mathbb{Z} / q\mathbb{Z})^n$, and a scalar $b = \langle a, s \rangle + m + e$, where $m$ is the message, $e$ is drawn from $D$ and all arithmetic is done modulo $q$. Note: the message $m$ usually is represented by placing an even smaller message (say, a 4-bit message) in the highest-order bits of a 32-bit unsigned integer. So then decryption corresponds to computing $b – \langle a, s \rangle = m + e$ and rounding the result to recover $m$ while discarding $e$.

Without the error term, an attacker could determine the secret key from a polynomial-sized collection of LWE ciphertexts with something like Gaussian elimination. The set of samples looks like a linear (or affine) system, where the secret key entries are the unknown variables. With an error term, the problem of solving the system is believed to be hard, and only exponential time/space algorithms are known.

RLWE: The Ring Learning With Errors (RLWE) problem is the natural analogue of LWE, where all scalars involved are replaced with polynomials over a (carefully) chosen ring.

Formally, the RLWE encryption scheme has the following parameters:

• A ring $R = \mathbb{Z}/q\mathbb{Z}$, where $q \geq 2$ is a positive integer. This is the space of coefficients of all polynomials in the scheme. I usually think of $q$ as $2^{32}$, i.e., unsigned 32-bit integers.
• A plaintext space $R[x] / (x^N + 1)$, where $N$ is a power of 2. This is the space that the underlying message $m(x)$ comes from, and it is encoded as a list of $N$ integers forming the coefficients of the polynomial.
• An RLWE dimension $n \in \mathbb{N}$.
• A discrete Gaussian error distribution $D$ with a mean of zero and a fixed standard deviation.

An RLWE secret key $s$ is defined as a list of $n$ polynomials with binary coefficients in $\mathbb{B}[x] / (x^N+1)$, where $\mathbb{B} = \{0, 1\}$. The coefficients are uniformly sampled, like in LWE. An RLWE ciphertext is defined as a vector of $n$ polynomials $a = (a_1(x), \dots, a_n(x))$, sampled uniformly over $(R[x] / (x^N+1))^n$, and a polynomial $b(x) = \langle a, s \rangle + m(x) + e(x)$, where $m(x)$ is the message (with a similar “store it in the top bits” trick as LWE), $e(x)$ is a polynomial with coefficients drawn from $D$ and all the products of the inner product are done in $R[x] / (x^N+1)$. Decryption in RLWE involves computing $b(x) – \langle a, s \rangle$ and rounding appropriately to recover $m(x)$. Just like with RLWE, the message is “hidden” in the noise added to an equation corresponding to the polynomial products (i.e., without the noise and with enough sample encryptions of the same message/secret key, you can solve the system and recover the message). For more notes on how polynomial multiplication ends up being tricker in this ring, see my negacyclic polynomial multiplication article.

The most common version of RLWE you will see in the literature sets the vector dimension $n=1$, and so the secret key $s$ is a single polynomial, the ciphertext is a single polynomial, and RLWE can be viewed as directly replacing the vector dot product in LWE with a polynomial product. However, making $n$ larger is believed to provide more security, and it can be traded off against making the polynomial degree smaller, which can be useful when tweaking parameters for performance (keeping the security level constant).

## Sample Extraction

Sample extraction is the trick of taking an RLWE encryption of $m(x) = m_0 + m_1(x) + \dots + m_{N-1}x^{N-1}$, and outputting an LWE encryption of $m_0$. In our case, the degree $N$ and the dimension $n_{\textup{RLWE}}$ of the input RLWE ciphertext scheme is fixed, but we may pick the dimension $n_{\textup{LWE}}$ of the LWE scheme as we choose to make this trick work.

This is one of those times in math when it is best to “just work it out with a pencil.” It turns out there are no serious obstacles to our goal. We start with polynomials $a = (a_1(x), \dots, a_n(x))$ and $b(x) = \langle a, s \rangle + m(x) + e(x)$, and we want to produce a vector of scalars $(x_1, \dots, x_D)$ of some dimension $D$, a corresponding secret key $s$, and a $b = \langle a, s \rangle + m_0 + e’$, where $e’$ may be different from the input error $e(x)$, but is hopefully not too much larger.

As with many of the articles in this series, we employ the so-called “phase function” to help with the analysis, which is just the partial decryption of an RLWE ciphertext without the rounding step: $\varphi(x) = b(x) – \langle a, s \rangle = m(x) + e(x)$. The idea is as follows: inspect the structure of the constant term of $\varphi(x)$, oh look, it’s an LWE encryption.

So let’s expand the constant term of $b(x) – \langle a, s \rangle$. Given a polynomial expression, I will use the notation $(-)[0]$ to denote the constant coefficient, and $(-)[k]$ for the $k$-th coefficient.

\begin{aligned}(b(x) – \langle a, s \rangle)[0] &= b[0] – \left ( (a_1s_1)[0] + \dots + (a_n s_n)[0] \right ) \end{aligned}

Each entry in the dot product is a negacyclic polynomial product, so its constant term requires summing all the pairs of coefficients of $a_i$ and $s_i$ whose degrees sum to zero mod $N$, and flipping signs when there’s wraparound. In particular, a single product above for $a_i s_i$ has the form:

$$(a_is_i) [0] = s_i[0]a_i[0] – s_i[1]a_i[N-1] – s_i[2]a_i[N-2] – \dots – s_i[N-1]a_i[1]$$

Notice that I wrote the coefficients of $s_i$ in increasing order. This was on purpose, because if we re-write this expression $(a_is_i)[0]$ as a dot product, we get

$$(a_is_i[0]) = \left \langle (s_i[0], s_i[1], \dots, s_i[N-1]), (a_i[0], -a_i[N-1], \dots, -a_i[1])\right \rangle$$

In particular, the $a_i[k]$ are public, so we can sign-flip and reorder them easily in our conversion trick. But $s_i$ is unknown at the time the sample extraction needs to occur, so it helps if we can leave the secret key untouched. And indeed, when we apply the above expansion to all of the terms in the computation of $\varphi(x)[0]$, we end up manipulating the $a_i$’s a lot, but merely “flattening” the coefficients of $s = (s_1(x), \dots, s_n(x))$ into a single long vector.

So combining all of the above products, we see that $(b(x) – \langle a, s \rangle)[0]$ is already an LWE encryption with $(x, y) = ((x_1, \dots, x_D), b[0])$, and $x$ being the very long ($D = n*N$) vector

\begin{aligned} x = (& a_0[0], -a_0[N-1], \dots, -a_0[1], \\ &a_1[0], -a_1[N-1], \dots, -a_1[1], \\ &\dots , \\ &a_n[0], -a_n[N-1], \dots, -a_n[1] ) \end{aligned}

And the corresponding secret key is

\begin{aligned} s_{\textup{LWE}} = (& (s_0[0], s_0[1], \dots, s_0[N-1] \\ &(s_1[0], s_1[1], \dots, s_1[N-1], \\ &\dots , \\ &s_n[0], s_n[1], \dots, s_n[N-1] ) \end{aligned}

And the error in this ciphertext is exactly the constant coefficient of the error polynomial $e(x)$ from the RLWE encryption, which is independent of the error of all the other coefficients.

## Commentary

This trick is a best case scenario. Unlike with key switching, we don’t need to encrypt the output LWE secret key to perform the conversion. And unlike modulus switching, there is no impact on the error growth in the conversion from RLWE to LWE. So in a sense, this trick is “perfect,” though it loses information about the other coefficients of $m(x)$ in the process. As it happens, the CGGI FHE scheme that these articles are building toward only uses the constant coefficient.

The only twist to think about is that the output LWE ciphertext is dependent on the RLWE scheme parameters. What if you wanted to get a smaller-dimensional LWE ciphertext as output? This is a realistic concern, as in the CGGI FHE scheme one starts from an LWE ciphertext of one dimension, goes to RLWE of another (larger) dimension, and needs to get back to LWE of the original dimension by the end.

To do this, you have two options: one is to pick the RLWE ciphertext parameters $n, N$, so that their product is the value you need. A second is to allow the RLWE parameters to be whatever you need for performance/security, and then employ a key switching operation after the sample extraction to get back to the LWE parameters you need.

It is worth mentioning—though I am far from fully understanding the methods—there other ways to convert between LWE and RLWE. One can go from LWE to RLWE, or from a collection of LWEs to RLWE. Some methods can be found in this paper and its references.

Until next time!

# Estimating the Security of Ring Learning with Errors (RLWE)

This article was written by my colleague, Cathie Yun. Cathie is an applied cryptographer and security engineer, currently working with me to make fully homomorphic encryption a reality at Google. She’s also done a lot of cool stuff with zero knowledge proofs.

In previous articles, we’ve discussed techniques used in Fully Homomorphic Encryption (FHE) schemes. The basis for many FHE schemes, as well as other privacy-preserving protocols, is the Learning With Errors (LWE) problem. In this article, we’ll talk about how to estimate the security of lattice-based schemes that rely on the hardness of LWE, as well as its widely used variant, Ring LWE (RLWE).

A previous article on modulus switching introduced LWE encryption, but as a refresher:

## Reminder of LWE

A literal repetition from the modulus switching article. The LWE encryption scheme I’ll use has the following parameters:

• A plaintext space $\mathbb{Z}/q\mathbb{Z}$, where $q \geq 2$ is a positive integer. This is the space that the underlying message comes from.
• An LWE dimension $n \in \mathbb{N}$.
• A discrete Gaussian error distribution $D$ with a mean of zero and a fixed standard deviation.

An LWE secret key is defined as a vector in $\{0, 1\}^n$ (uniformly sampled). An LWE ciphertext is defined as a vector $a = (a_1, \dots, a_n)$, sampled uniformly over $(\mathbb{Z} / q\mathbb{Z})^n$, and a scalar $b = \langle a, s \rangle + m + e$, where $e$ is drawn from $D$ and all arithmetic is done modulo $q$. Note that $e$ must be small for the encryption to be valid.

## Learning With Errors (LWE) security

Choosing appropriate LWE parameters is a nontrivial challenge when designing and implementing LWE based schemes, because there are conflicting requirements of security, correctness, and performance. Some of the parameters that can be manipulated are the LWE dimension $n$, error distribution $D$ (referred to in the next few sections as $X_e$), secret distribution $X_s$, and plaintext modulus $q$.

## Lattice Estimator

Here is where the Lattice Estimator tool comes to our assistance! The lattice estimator is a Sage module written by a group of lattice cryptography researchers which estimates the concrete security of Learning with Errors (LWE) instances.

For a given set of LWE parameters, the Lattice Estimator calculates the cost of all known efficient lattice attacks – for example, the Primal, Dual, and Coded-BKW attacks. It returns the estimated number of “rops” or “ring operations” required to carry out each attack; the attack that is the most efficient is the one that determines the security parameter. The bits of security for the parameter set can be calculated as $\log_2(\text{rops})$ for the most efficient attack.

## Running the Lattice Estimator

For example, let’s estimate the security of the security parameters originally published for the popular TFHE scheme:

n = 630
q = 2^32
Xs = UniformMod(2)
Xe = DiscreteGaussian(stddev=2^17)


After installing the Lattice Estimator and sage, we run the following commands in sage:

> from estimator import *
> schemes.TFHE630
LWEParameters(n=630, q=4294967296, Xs=D(σ=0.50, μ=-0.50), Xe=D(σ=131072.00), m=+Infinity, tag='TFHE630')
> _ = LWE.estimate(schemes.TFHE630)
bkw                  :: rop: ≈2^153.1, m: ≈2^139.4, mem: ≈2^132.6, b: 4, t1: 0, t2: 24, ℓ: 3, #cod: 552, #top: 0, #test: 78, tag: coded-bkw
usvp                 :: rop: ≈2^124.5, red: ≈2^124.5, δ: 1.004497, β: 335, d: 1123, tag: usvp
bdd                  :: rop: ≈2^131.0, red: ≈2^115.1, svp: ≈2^131.0, β: 301, η: 393, d: 1095, tag: bdd
bdd_hybrid           :: rop: ≈2^185.3, red: ≈2^115.9, svp: ≈2^185.3, β: 301, η: 588, ζ: 0, |S|: 1, d: 1704, prob: 1, ↻: 1, tag: hybrid
bdd_mitm_hybrid      :: rop: ≈2^265.5, red: ≈2^264.5, svp: ≈2^264.5, β: 301, η: 2, ζ: 215, |S|: ≈2^189.2, d: 1489, prob: ≈2^-146.6, ↻: ≈2^148.8, tag: hybrid
dual                 :: rop: ≈2^128.7, mem: ≈2^72.0, m: 551, β: 346, d: 1181, ↻: 1, tag: dual
dual_hybrid          :: rop: ≈2^119.8, mem: ≈2^115.5, m: 516, β: 314, d: 1096, ↻: 1, ζ: 50, tag: dual_hybrid


In this example, the most efficient attack is the dual_hybrid attack. It uses 2^119.8 ring operations, and so these parameters provide 119.8 bits of security. The reader may notice that the TFHE website claims those parameters give 128 bits of security. This discrepancy is due to the fact that they used an older library (the LWE estimator, which is no longer maintained), which doesn’t take into account the most up-to-date lattice attacks.

For further reading, Benjamin Curtis wrote an article about parameter selection for the CONCRETE implementation of the TFHE scheme. Benjamin Curtis, Martin Albrecht, and other researchers also used the Lattice Estimator to estimate all the LWE and NTRU schemes.

## Ring Learning with Errors (RLWE) security

It is often desirable to use Ring LWE instead of LWE, for greater efficiency and smaller key sizes (as Chris Peikert illustrates via meme). We’d like to estimate the security of a Ring LWE scheme, but it wasn’t immediately obvious to us how to do this, since the Lattice Estimator only operates over LWE instances. In order to use the Lattice Estimator for this security estimate, we first needed to do a reduction from the RLWE instance to an LWE instance.

## Attempted RLWE to LWE reduction

Given an RLWE instance with $\text{RLWE_dimension} = k$ and $\text{poly_log_degree} = N$, we can create a relation that looks like an LWE instance of $\text{LWE_dimension} = N * k$ with the same security, as long as $N$ is a power of 2 and there are no known attacks that target the ring structure of RLWE that are more efficient than the best LWE attacks. Note: $N$ must be a power of 2 so that $x^N+1$ is a cyclotomic polynomial.

An RLWE encryption has the following form: $(a_0(x), a_1(x), … a_{k-1}(x), b(x))$

•   Public polynomials: $a_0(x), a_1(x), \dots a_{k-1}(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} (\mathbb{Z}/{q \mathbb{Z}[x]} ) / (x^N + 1)^k$• Secret (binary) polynomials:$ s_0(x), s_1(x), \dots s_{k-1}(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} (\mathbb{B}_N[x])^k$
•   Error: $e(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} \chi_e$• RLWE instance:$ b(x) = \sum_{i=0}^{k-1} a_i(x) \cdot s_i(x) + e(x) \in (\mathbb{Z}/{q \mathbb{Z}[x]} ) / (x^N + 1)$We would like to express this in the form of an LWE encryption. We can make start with the simple case, where$ k=1 $. Therefore, we will only be working with the zero-entry polynomials,$a_0(x)$and$s_0(x)$. (For simplicity, in the next example you can ignore the zero-subscript and think of them as$a(x)$and$s(x)$). ## Naive reduction for$k=1$(wrong!) Naively, if we simply defined the LWE$A$matrix to be a concatenation of the coefficients of the RLWE polynomial$a(x)$, we get: $$A_{\text{LWE}} = ( a_{0, 0}, a_{0, 1}, \dots a_{0, N-1} )$$ We can do the same for the LWE$s$vector: $$s_{\text{LWE}} = ( s_{0, 0}, s_{0, 1}, \dots s_{0, N-1} )$$ But this doesn’t give us the value of$b_{LWE}$for the LWE encryption that we want. In particular, the first entry of$b_{LWE}$, which we can call$b_{\text{LWE}, 0}$, is simply a product of the first entries of$a_0(x)$and$s_0(x)$: $$b_{\text{LWE}, 0} = a_{0, 0} \cdot s_{0, 0} + e_0$$ However, we want$b_{\text{LWE}, 0}$to be a sum of the products of all the coefficients of$a_0(x)$and$s_0(x)$that give us a zero-degree coefficient mod$x^N + 1$. This modulus is important because it causes the product of high-degree monomials to “wrap around” to smaller degree monomials because of the negacyclic property, such that$x^N \equiv -1 \mod x^N + 1$. So the constant term$b_{\text{LWE}, 0}should include all of the following terms: \begin{aligned} b_{\text{LWE}, 0} = & a_{0, 0} \cdot s_{0, 0} \\ – & a_{0, 1} \cdot s_{0, N-1} \\ – & a_{0, 2} \cdot s_{0, N-2} \\ – & \dots \\ – & a_{0, N-1} \cdot s_{0, 1}\\ + & e_0\\ \end{aligned} ## Improved reduction fork=1$We can achieve the desired value of$b_{\text{LWE}}$by more strategically forming a matrix$A_{\text{LWE}}$, to reflect the negacyclic property of our polynomials in the RLWE space. We can keep the naive construction for$s_\text{LWE}$. $$A_{\text{LWE}} = \begin{pmatrix} a_{0, 0} & -a_{0, N-1} & -a_{0, N-2} & \dots & -a_{0, 1}\\ a_{0, 1} & a_{0, 0} & -a_{0, N-1} & \dots & -a_{0, 2}\\ \vdots & \ddots & & & \vdots \\ a_{0, N-1} & \dots & & & a_{0, 0} \\ \end{pmatrix}$$ This definition of$A_\text{LWE}$gives us the desired value for$b_\text{LWE}$, when$b_{\text{LWE}}$is interpreted as the coefficients of a polynomial. As an example, we can write out the elements of the first row of$b_\text{LWE}: \begin{aligned} b_{\text{LWE}, 0} = & \sum_{i=0}^{N-1} A_{\text{LWE}, 0, i} \cdot s_{0, i} + e_0 \\ b_{\text{LWE}, 0} = & a_{0, 0} \cdot s_{0, 0} \\ – & a_{0, 1} \cdot s_{0, N-1} \\ – & a_{0, 2} \cdot s_{0, N-2} \\ – & \dots \\ – & a_{0, N-1} \cdot s_{0, 1}\\ + & e_0 \\ \end{aligned} ## Generalizing for allk$In the generalized$k$case, we have the RLWE equation: $$b(x) = a_0(x) \cdot s_0(x) + a_1(x) \cdot s_1(x) \cdot a_{k-1}(x) \cdot s_{k-1}(x) + e(x)$$ We can construct the LWE elements as follows: $$A_{\text{LWE}} = \left ( \begin{array}{c|c|c|c} A_{0, \text{LWE}} & A_{1, \text{LWE}} & \dots & A_{k-1, \text{LWE}} \end{array} \right )$$ where each sub-matrix is the construction from the previous section: $$A_{\text{LWE}} = \begin{pmatrix} a_{i, 0} & -a_{i, N-1} & -a_{i, N-2} & \dots & -a_{i, 1}\\ a_{i, 1} & a_{i, 0} & -a_{i, N-1} & \dots & -a_{i, 2}\\ \vdots & \ddots & & & \vdots \\ a_{i, N-1} & \dots & & & a_{i, 0} \\ \end{pmatrix}$$ And the secret keys are stacked similarly: $$s_{\text{LWE}} = ( s_{0, 0}, s_{0, 1}, \dots s_{0, N-1} \mid s_{1, 0}, s_{1, 1}, \dots s_{1, N-1} \mid \dots )$$ This is how we can reduce an RLWE instance with RLWE dimension$k$and polynomial modulus degree$N$, to a relation that looks like an LWE instance of LWE dimension$N * k$. ## Caveats and open research This reduction does not result in a correctly formed LWE instance, since an LWE instance would have a matrix$A$that is randomly sampled, whereas the reduction results in an matrix$A$that has cyclic structure, due to the cyclic property of the RLWE instance. This is why I’ve been emphasizing that the reduction produces an instance that looks like LWE. All currently known attacks on RLWE do not take advantage of the structure, but rather directly attack this transformed LWE instance. Whether the additional ring structure can be exploited in the design of more efficient attacks remains an open question in the lattice cryptography research community. In her PhD thesis, Rachel Player mentions the RLWE to LWE security reduction: In order to try to pick parameters in Ring-LWE-based schemes (FHE or otherwise) that we hope are sufficiently secure, we can choose parameters such that the underlying Ring-LWE instance should be hard to solve according to known attacks. Each Ring-LWE sample can be used to extract$n$LWE samples. To the best of our knowledge, the most powerful attacks against$d$-sample Ring-LWE all work by instead attacking the$nd$-sample LWE problem. When estimating the security of a particular set of Ring-LWE parameters we therefore estimate the security of the induced set of LWE parameters. This indicates that we can do this reduction for certain RLWE instances. However, we must be careful to ensure that the polynomial modulus degree$N$is a power of two, because otherwise the error distribution “breaks”, as my colleague Baiyu Li explained to me in conversation: The RLWE problem is typically defined in using the ring of integers of the cyclotomic field$\mathbb{Q}[X]/(f(X))$, where$f(X)$is a cyclotomic polynomial of degree$k=\phi(N)$(where$\phi$is Euler’s totient function), and the error is a spherical Gaussian over the image of the canonical embedding into the complex numbers$\mathbb{C}^k$(basically the images of primitive roots of unity under$f$). In many cases we set$N$to be a power of 2, thus$f(X)=X^{N/2}+1$, since the canonical embedding for such$N$has a nice property that the preimage of the spherical Gaussian error is also a spherical Gaussian over the coefficients of polynomials in$\mathbb{Q}[X]/(f(X))$. So in this case we can sample$k=N/2$independent Gaussian numbers and use them as the coefficients of the error polynomial$e(x)$. For$N$not a power of 2,$f(X)$may have some low degree terms, and in order to get the spherical Gaussian with the same variance$s^2$in the canonical embedding, we probably need to use a larger variance when sampling the error polynomial coefficients. The RLWE we frequently use in practice is actually a specialized version called “polynomial LWE”, and instantiated with$N$= power of 2 and so$f(X)=X^{N/2}+1$. For other parameters the two are not exactly the same. This paper has some explanations: https://eprint.iacr.org/2018/170.pdf The error distribution “breaks” if$N$is not a power of 2 due to the fact that the precise form of RLWE is not defined on integer polynomial rings$R = \mathbb{Z}[X]/(f(X))$, but is defined on its dual (or the dual in the underlying number field, which is a fractional ideal of$\mathbb{Q}[X]/(f(x))$), and the noise distribution is on the Minkowski embedding of this dual ring. For non-power of 2$N$, the product mod$f$of two small polynomials in$\mathbb{Q}[X]/(f(x))$may be large, where small/large means their L2 norm on the coefficient vector. This means that in order to sample the required noise distribution, you may need a skewed coefficient distribution. Only when$N$is a power of 2, the dual of$R$is a scaling of$R$, and distance in the embedding of$R^{\text{dual}}$is preserved in$R$, and so we can just sample iid gaussian coefficient to get the required noise. Because working with a power-of-two RLWE polynomial modulus gives “nice” error behavior, this parameter choice is often recommended and chosen for concrete instantiations of RLWE. For example, the Homomorphic Encryption Standard recommends and only analyzes the security of parameters for power-of-two cyclotomic fields for use in homomorphic encryption (though future versions of the standard aim to extend the security analysis to generic cyclotomic rings): We stress that when the error is chosen from sufficiently wide and “well spread” distributions that match the ring at hand, we do not have meaningful attacks on RLWE that are better than LWE attacks, regardless of the ring. For power-of-two cyclotomics, it is sufficient to sample the noise in the polynomial basis, namely choosing the coefficients of the error polynomial$e \in \mathbb{Z}[x] / \phi_k(x)$independently at random from a very “narrow” distribution. Existing works analyzing and targeting the ring structure of RLWE include: It would of course be great to have a definitive answer on whether we can be confident using this RLWE to LWE reduction to estimate the security of RLWE based schemes. In the meantime, we have seen many Fully Homomorphic Encryption (FHE) schemes using this RLWE to LWE reduction, and we hope that this article helps explain how that reduction works and the existing open questions around this approach. # Negacyclic Polynomial Multiplication In this article I’ll cover three techniques to compute special types of polynomial products that show up in lattice cryptography and fully homomorphic encryption. Namely, the negacyclic polynomial product, which is the product of two polynomials in the quotient ring$\mathbb{Z}[x] / (x^N + 1)$. As a precursor to the negacyclic product, we’ll cover the simpler cyclic product. All of the Python code written for this article is on GitHub. ## The DFT and Cyclic Polynomial Multiplication A recent program gallery piece showed how single-variable polynomial multiplication could be implemented using the Discrete Fourier Transform (DFT). This boils down to two observations: 1. The product of two polynomials$f, g$can be computed via the convolution of the coefficients of$f$and$g$. 2. The Convolution Theorem, which says that the Fourier transform of a convolution of two signals$f, g$is the point-wise product of the Fourier transforms of the two signals. (The same holds for the DFT) This provides a much faster polynomial product operation than one could implement using the naïve polynomial multiplication algorithm (though see the last section for an implementation anyway). The DFT can be used to speed up large integer multiplication as well. A caveat with normal polynomial multiplication is that one needs to pad the input coefficient lists with enough zeros so that the convolution doesn’t “wrap around.” That padding results in the output having length at least as large as the sum of the degrees of$f$and$g$(see the program gallery piece for more details). If you don’t pad the polynomials, instead you get what’s called a cyclic polynomial product. More concretely, if the two input polynomials$f, g$are represented by coefficient lists$(f_0, f_1, \dots, f_{N-1}), (g_0, g_1, \dots, g_{N-1})$of length$N$(implying the inputs are degree at most$N-1$, i.e., the lists may end in a tail of zeros), then the Fourier Transform technique computes $f(x) \cdot g(x) \mod (x^N – 1)$ This modulus is in the sense of a quotient ring$\mathbb{Z}[x] / (x^N – 1)$, where$(x^N – 1)$denotes the ring ideal generated by$x^N-1$, i.e., all polynomials that are evenly divisible by$x^N – 1$. A particularly important interpretation of this quotient ring is achieved by interpreting the ideal generator$x^N – 1$as an equation$x^N – 1 = 0$, also known as$x^N = 1$. To get the canonical ring element corresponding to any polynomial$h(x) \in \mathbb{Z}[x]$, you “set”$x^N = 1$and reduce the polynomial until there are no more terms with degree bigger than$N-1$. For example, if$N=5$then$x^{10} + x^6 – x^4 + x + 2 = -x^4 + 2x + 3$(the$x^{10}$becomes 1, and$x^6 = x$). To prove the DFT product computes a product in this particular ring, note how the convolution theorem produces the following formula, where$\textup{fprod}(f, g)$denotes the process of taking the Fourier transform of the two coefficient lists, multiplying them entrywise, and taking a (properly normalized) inverse FFT, and$\textup{fprod}(f, g)(j)$is the$j$-th coefficient of the output polynomial: $\textup{fprod}(f, g)(j) = \sum_{k=0}^{N-1} f_k g_{j-k \textup{ mod } N}$ In words, the output polynomial coefficient$j$equals the sum of all products of pairs of coefficients whose indices sum to$j$when considered “wrapping around”$N$. Fixing$j=1$as an example,$\textup{fprod}(f, g)(1) = f_0 g_1 + f_1g_0 + f_2 g_{N-1} + f_3 g_{N-2} + \dots$. This demonstrates the “set$x^N = 1$” interpretation above: the term$f_2 g_{N-1}$corresponds to the product$f_2x^2 \cdot g_{N-1}x^{N-1}$, which contributes to the$x^1$term of the polynomial product if and only if$x^{2 + N-1} = x$, if and only if$x^N = 1$. To achieve this in code, we simply use the version of the code from the program gallery piece, but fix the size of the arrays given to numpy.fft.fft in advance. We will also, for simplicity, assume the$N$one wishes to use is a power of 2. The resulting code is significantly simpler than the original program gallery code (we omit zero-padding to length$N$for brevity). import numpy from numpy.fft import fft, ifft def cyclic_polymul(p1, p2, N): """Multiply two integer polynomials modulo (x^N - 1). p1 and p2 are arrays of coefficients in degree-increasing order. """ assert len(p1) == len(p2) == N product = fft(p1) * fft(p2) inverted = ifft(product) return numpy.round(numpy.real(inverted)).astype(numpy.int32)  As a side note, there’s nothing that stops this from working with polynomials that have real or complex coefficients, but so long as we use small magnitude integer coefficients and round at the end, I don’t have to worry about precision issues (hat tip to Brad Lucier for suggesting an excellent paper by Colin Percival, “Rapid multiplication modulo the sum and difference of highly composite numbers“, which covers these precision issues in detail). ## Negacyclic polynomials, DFT with duplication Now the kind of polynomial quotient ring that shows up in cryptography is critically not$\mathbb{Z}[x]/(x^N-1)$, because that ring has enough easy-to-reason-about structure that it can’t hide secrets. Instead, cryptographers use the ring$\mathbb{Z}[x]/(x^N+1)$(the minus becomes a plus), which is believed to be more secure for cryptography—although I don’t have a great intuitive grasp on why. The interpretation is similar here as before, except we “set”$x^N = -1$instead of$x^N = 1$in our reductions. Repeating the above example, if$N=5$then$x^{10} + x^6 – x^4 + x + 2 = -x^4 + 3$(the$x^{10}$becomes$(-1)^2 = 1$, and$x^6 = -x$). It’s called negacyclic because as a term$x^k$passes$k \geq N$, it cycles back to$x^0 = 1$, but with a sign flip. The negacyclic polynomial multiplication can’t use the DFT without some special hacks. The first and simplest hack is to double the input lists with a negation. That is, starting from$f(x) \in \mathbb{Z}[x]/(x^N+1)$, we can define$f^*(x) = f(x) – x^Nf(x)$in a different ring$\mathbb{Z}[x]/(x^{2N} – 1)$(and similarly for$g^*$and$g$). Before seeing how this causes the DFT to (almost) compute a negacyclic polynomial product, some math wizardry. The ring$\mathbb{Z}[x]/(x^{2N} – 1)$is special because it contains our negacyclic ring as a subring. Indeed, because the polynomial$x^{2N} – 1$factors as$(x^N-1)(x^N+1)$, and because these two factors are coprime in$\mathbb{Z}[x]/(x^{2N} – 1)$, the Chinese remainder theorem (aka Sun-tzu’s theorem) generalizes to polynomial rings and says that any polynomial in$\mathbb{Z}[x]/(x^{2N} – 1)$is uniquely determined by its remainders when divided by$(x^N-1)$and$(x^N+1)$. Another way to say it is that the ring$\mathbb{Z}[x]/(x^{2N} – 1)$factors as a direct product of the two rings$\mathbb{Z}[x]/(x^{N} – 1)$and$\mathbb{Z}[x]/(x^{N} + 1)$. Now mapping a polynomial$f(x)$from the bigger ring$(x^{2N} – 1)$to the smaller ring$(x^{N}+1)$involves taking a remainder of$f(x)$when dividing by$x^{N}+1$(“setting”$x^N = -1$and reducing). There are many possible preimage mappings, depending on what your goal is. In this case, we actually intentionally choose a non preimage mapping, because in general to compute a preimage requires solving a system of congruences in the larger polynomial ring. So instead we choose$f(x) \mapsto f^*(x) = f(x) – x^Nf(x) = -f(x)(x^N – 1)$, which maps back down to$2f(x)$in$\mathbb{Z}[x]/(x^{N} + 1)$. This preimage mapping has a particularly nice structure, in that you build it by repeating the polynomial’s coefficients twice and flipping the sign of the second half. It’s easy to see that the product$f^*(x) g^*(x)$maps down to$4f(x)g(x)$. So if we properly account for these extra constant factors floating around, our strategy to perform negacyclic polynomial multiplication is to map$f$and$g$up to the larger ring as described, compute their cyclic product (modulo$x^{2N} – 1$) using the FFT, and then the result should be a degree$2N-1$polynomial which can be reduced with one more modular reduction step to the right degree$N-1$negacyclic product, i.e., setting$x^N = -1$, which materializes as taking the second half of the coefficients, flipping their signs, and adding them to the corresponding coefficients in the first half. The code for this is: def negacyclic_polymul_preimage_and_map_back(p1, p2): p1_preprocessed = numpy.concatenate([p1, -p1]) p2_preprocessed = numpy.concatenate([p2, -p2]) product = fft(p1_preprocessed) * fft(p2_preprocessed) inverted = ifft(product) rounded = numpy.round(numpy.real(inverted)).astype(p1.dtype) return (rounded[: p1.shape[0]] - rounded[p1.shape[0] :]) // 4  However, this chosen mapping hides another clever trick. The product of the two preimages has enough structure that we can “read” the result off without doing the full “set$x^N = -1$” reduction step. Mapping$f$and$g$up to$f^*, g^*$and taking their product modulo$(x^{2N} – 1)gives \begin{aligned} f^*g^* &= -f(x^N-1) \cdot -g(x^N – 1) \\ &= fg (x^N-1)^2 \\ &= fg(x^{2N} – 2x^N + 1) \\ &= fg(2 – 2x^N) \\ &= 2(fg – x^Nfg) \end{aligned} This has the same syntactical format as the original mappingf \mapsto f – x^Nf$, with an extra factor of 2, and so its coefficients also have the form “repeat the coefficients and flip the sign of the second half” (times two). We can then do the “inverse mapping” by reading only the first half of the coefficients and dividing by 2. def negacyclic_polymul_use_special_preimage(p1, p2): p1_preprocessed = numpy.concatenate([p1, -p1]) p2_preprocessed = numpy.concatenate([p2, -p2]) product = fft(p1_preprocessed) * fft(p2_preprocessed) inverted = ifft(product) rounded = numpy.round(0.5 * numpy.real(inverted)).astype(p1.dtype) return rounded[: p1.shape[0]]  Our chosen mapping$f \mapsto f-x^Nf$is not particularly special, except that it uses a small number of pre and post-processing operations. For example, if you instead used the mapping$f \mapsto 2f + x^Nf$(which would map back to$f$exactly), then the FFT product would result in$5fg + 4x^Nfg$in the larger ring. You can still read off the coefficients as before, but you’d have to divide by 5 instead of 2 (which, the superstitious would say, is harder). It seems that “double and negate” followed by “halve and take first half” is the least amount of pre/post processing possible. ## Negacyclic polynomials with a “twist” The previous section identified a nice mapping (or embedding) of the input polynomials into a larger ring. But studying that shows some symmetric structure in the FFT output. I.e., the coefficients of$f$and$g$are repeated twice, with some scaling factors. It also involves taking an FFT of two$2N$-dimensional vectors when we start from two$N$-dimensional vectors. This sort of situation should make you think that we can do this more efficiently, either by using a smaller size FFT or by packing some data into the complex part of the input, and indeed we can do both. [Aside: it’s well known that if all the entries of an FFT input are real, then the result also has symmetry that can be exploted for efficiency by reframing the problem as a size-N/2 FFT in some cases, and just removing half the FFT algorithm’s steps in other cases, see Wikipedia for more] This technique was explained in Fast multiplication and its applications (pdf link) by Daniel Bernstein, a prominent cryptographer who specializes in cryptography performance, and whose work appears in widely-used standards like TLS, OpenSSH, and he designed a commonly used elliptic curve for cryptography. [Aside: Bernstein cites this technique as using something called the “Tangent FFT (pdf link).” This is a drop-in FFT replacement he invented that is faster than previous best (split-radix FFT), and Bernstein uses it mainly to give a precise expression for the number of operations required to do the multiplication end to end. We will continue to use the numpy FFT implementation, since in this article I’m just focusing on how to express negacyclic multiplication in terms of the FFT. Also worth noting both the Tangent FFT and “Fast multiplication” papers frame their techniques—including FFT algorithm implementations!—in terms of polynomial ring factorizations and mappings. Be still, my beating cardioid.] In terms of polynomial mappings, we start from the ring$\mathbb{R}[x] / (x^N + 1)$, where$N$is a power of 2. We then pick a reversible mapping from$\mathbb{R}[x]/(x^N + 1) \to \mathbb{C}[x]/(x^{N/2} – 1)$(note the field change from real to complex), apply the FFT to the image of the mapping, and reverse appropriately it at the end. One such mapping takes two steps, first mapping$\mathbb{R}[x]/(x^N + 1) \to \mathbb{C}[x]/(x^{N/2} – i)$and then from$\mathbb{C}[x]/(x^{N/2} – i) \to \mathbb{C}[x]/(x^{N/2} – 1)$. The first mapping is as easy as the last section, because$(x^N + 1) = (x^{N/2} + i) (x^{N/2} – i)$, and so we can just set$x^{N/2} = i$and reduce the polynomial. This as the effect of making the second half of the polynomial’s coefficients become the complex part of the first half of the coefficients. The second mapping is more nuanced, because we’re not just reducing via factorization. And we can’t just map$i \mapsto 1$generically, because that would reduce complex numbers down to real values. Instead, we observe that (momentarily using an arbitrary degree$k$instead of$N/2$), for any polynomial$f \in \mathbb{C}[x]$, the remainder of$f \mod x^k-i$uniquely determines the remainder of$f \mod x^k – 1$via the change of variables$x \mapsto \omega_{4k} x$, where$\omega_{4k}$is a$4k$-th primitive root of unity$\omega_{4k} = e^{\frac{2 \pi i}{4k}}$. Spelling this out in more detail: if$f(x) \in \mathbb{C}[x]$has remainder$f(x) = g(x) + h(x)(x^k – i)$for some polynomial$h(x), then \begin{aligned} f(\omega_{4k}x) &= g(\omega_{4k}x) + h(\omega_{4k}x)((\omega_{4k}x)^{k} – i) \\ &= g(\omega_{4k}x) + h(\omega_{4k}x)(e^{\frac{\pi i}{2}} x^k – i) \\ &= g(\omega_{4k}x) + i h(\omega_{4k}x)(x^k – 1) \\ &= g(\omega_{4k}x) \mod (x^k – 1) \end{aligned} Translating this back tok=N/2$, the mapping from$\mathbb{C}[x]/(x^{N/2} – i) \to \mathbb{C}[x]/(x^{N/2} – 1)$is$f(x) \mapsto f(\omega_{2N}x)$. And if$f = f_0 + f_1x + \dots + f_{N/2 – 1}x^{N/2 – 1}$, then the mapping involves multiplying each coefficient$f_k$by$\omega_{2N}^k$. When you view polynomials as if they were a simple vector of their coefficients, then this operation$f(x) \mapsto f(\omega_{k}x)$looks like$(a_0, a_1, \dots, a_n) \mapsto (a_0, \omega_{k} a_1, \dots, \omega_k^n a_n)$. Bernstein calls the operation a twist of$\mathbb{C}^n$, which I mused about in this Mathstodon thread. What’s most important here is that each of these transformations are invertible. The first because the top half coefficients end up in the complex parts of the polynomial, and the second because the mapping$f(x) \mapsto f(\omega_{2N}^{-1}x)$is an inverse. Together, this makes the preprocessing and postprocessing exact inverses of each other. The code is then def negacyclic_polymul_complex_twist(p1, p2): n = p2.shape[0] primitive_root = primitive_nth_root(2 * n) root_powers = primitive_root ** numpy.arange(n // 2) p1_preprocessed = (p1[: n // 2] + 1j * p1[n // 2 :]) * root_powers p2_preprocessed = (p2[: n // 2] + 1j * p2[n // 2 :]) * root_powers p1_ft = fft(p1_preprocessed) p2_ft = fft(p2_preprocessed) prod = p1_ft * p2_ft ifft_prod = ifft(prod) ifft_rotated = ifft_prod * primitive_root ** numpy.arange(0, -n // 2, -1) return numpy.round( numpy.concatenate([numpy.real(ifft_rotated), numpy.imag(ifft_rotated)]) ).astype(p1.dtype)  And so, at the cost of a bit more pre- and postprocessing, we can negacyclically multiply two degree$N-1$polynomials using an FFT of length$N/2$. In theory, no information is wasted and this is optimal. ## And finally, a simple matrix multiplication The last technique I wanted to share is not based on the FFT, but it’s another method for doing negacyclic polynomial multiplication that has come in handy in situations where I am unable to use FFTs. I call it the Toeplitz method, because one of the polynomials is converted to a Toeplitz matrix. Sometimes I hear it referred to as a circulant matrix technique, but due to the negacyclic sign flip, I don’t think it’s a fully accurate term. The idea is to put the coefficients of one polynomial$f(x) = f_0 + f_1x + \dots + f_{N-1}x^{N-1}$into a matrix as follows: $\begin{pmatrix} f_0 & -f_{N-1} & \dots & -f_1 \\ f_1 & f_0 & \dots & -f_2 \\ \vdots & \vdots & \ddots & \vdots \\ f_{N-1} & f_{N-2} & \dots & f_0 \end{pmatrix}$ The polynomial coefficients are written down in the first column unchanged, then in each subsequent column, the coefficients are cyclically shifted down one, and the term that wraps around the top has its sign flipped. When the second polynomial is treated as a vector of its coefficients, say,$g(x) = g_0 + g_1x + \dots + g_{N-1}x^{N-1}$, then the matrix-vector product computes their negacyclic product (as a vector of coefficients): $\begin{pmatrix} f_0 & -f_{N-1} & \dots & -f_1 \\ f_1 & f_0 & \dots & -f_2 \\ \vdots & \vdots & \ddots & \vdots \\ f_{N-1} & f_{N-2} & \dots & f_0 \end{pmatrix} \begin{pmatrix} g_0 \\ g_1 \\ \vdots \\ g_{N-1} \end{pmatrix}$ This works because each row$j$corresponds to one output term$x^j$, and the cyclic shift for that row accounts for the degree-wrapping, with the sign flip accounting for the negacyclic part. (If there were no sign attached, this method could be used to compute a cyclic polynomial product). The Python code for this is def cylic_matrix(c: numpy.array) -> numpy.ndarray: """Generates a cyclic matrix with each row of the input shifted. For input: [1, 2, 3], generates the following matrix: [[1 2 3] [2 3 1] [3 1 2]] """ c = numpy.asarray(c).ravel() a, b = numpy.ogrid[0 : len(c), 0 : -len(c) : -1] indx = a + b return c[indx] def negacyclic_polymul_toeplitz(p1, p2): n = len(p1) # Generates a sign matrix with 1s below the diagonal and -1 above. up_tri = numpy.tril(numpy.ones((n, n), dtype=int), 0) low_tri = numpy.triu(numpy.ones((n, n), dtype=int), 1) * -1 sign_matrix = up_tri + low_tri cyclic_matrix = cylic_matrix(p1) toeplitz_p1 = sign_matrix * cyclic_matrix return numpy.matmul(toeplitz_p1, p2)  Obviously on most hardware this would be less efficient than an FFT-based method (and there is some relationship between circulant matrices and Fourier Transforms, see Wikipedia). But in some cases—when the polynomials are small, or one of the two polynomials is static, or a particular hardware choice doesn’t handle FFTs with high-precision floats very well, or you want to take advantage of natural parallelism in the matrix-vector product—this method can be useful. It’s also simpler to reason about. Until next time! # Key Switching in LWE Last time we covered an operation in the LWE encryption scheme called modulus switching, which allows one to switch from one modulus to another, at the cost of introducing a small amount of extra noise, roughly$\sqrt{n}$, where$n$is the dimension of the LWE ciphertext. This time we’ll cover a more sophisticated operation called key switching, which allows one to switch an LWE ciphertext from being encrypted under one secret key to another, without ever knowing either secret key. ## Reminder of LWE A literal repetition of the last article. The LWE encryption scheme I’ll use has the following parameters: • A plaintext space$\mathbb{Z}/q\mathbb{Z}$, where$q \geq 2$is a positive integer. This is the space that the underlying message comes from. • An LWE dimension$n \in \mathbb{N}$. • A discrete Gaussian error distribution$D$with a mean of zero and a fixed standard deviation. An LWE secret key is defined as a vector in$\{0, 1\}^n$(uniformly sampled). An LWE ciphertext is defined as a vector$a = (a_1, \dots, a_n)$, sampled uniformly over$(\mathbb{Z} / q\mathbb{Z})^n$, and a scalar$b = \langle a, s \rangle + m + e$, where$e$is drawn from$D$and all arithmetic is done modulo$q$. Note that$e$must be small for the encryption to be valid. Sometimes I will denote by$\textup{LWE}_s(x)$the LWE encryption of plaintext$x$under the secret key$s$, and it should be understood that this is a fixed (but arbitrary) draw from the distribution of LWE ciphertexts described above. ## Main idea: homomorphically almost-decrypt The main idea is to encrypt each entry of the original secret key using the new secret key (this collection of encryptions is jointly called a key-switching key), and then use this to homomorphically evaluate the first step of the decryption function (i.e., compute$b – \langle a, s \rangle$). The result is an encryption of the (noisy) message under the new key. First we’ll show how this works in a naïve sense. In particular, doing what I said in the last paragraph verbatim won’t work because the error will grow too large. But we’ll do it anyway, measure the error, and the remainder of the article will show how the gadget decomposition can be used to reduce the error. ## Key switching, without gadget decompositions Start with an LWE ciphertext for the plaintext$m$. Call it$\displaystyle c = (a_1, \dots, a_n, b) \in (\mathbb{Z}/q\mathbb{Z})^{n+1}$where$\displaystyle b = \left ( \sum_{i=1}^n a_i s_i \right ) + m + e_{\textup{original}}$and$s = (s_1, \dots, s_n) \in \{ 0,1\}^n$is the secret key. Now say we have another secret key, possibly of a different dimension$t = (t_1, \dots, t_m) \in \{ 0, 1\}^m$, and we would like to switch the ciphertext$c$to a ciphertext$c’$which encrypts the same underlying message$m$, but under the new secret key$t$. That is, we would like to write$\displaystyle c’ = (a’_1, \dots, a’_m, b’) \in (\mathbb{Z}/q\mathbb{Z})^{m+1}$where$\displaystyle b’ = \left ( \sum_{i=1}^n a’_i t_i \right ) + m + e_{\textup{original}} + e_{\textup{new}}$implying that there is possibly some additional error introduced as a result. As usual, so long as the total error in the ciphertext remains small enough (and$m$is stored in the significant bits of the underlying integer space), the result will still be a valid LWE ciphertext. Define the key switching key$\textup{KSK}(s, t)$as follows (I will omit the$s, t$and just call it KSK from now on):$\displaystyle \textup{KSK} = \{ \textup{KSK}_i = \textup{LWE}_t(s_i) = (x_{i, 1}, \dots, x_{i, m}, y_i) \mid i=1, \dots, n\}$In other words,$\textup{KSK}_i$encrypts bit$s_i$, and$y_i = \langle x_i, t \rangle + s_i + e_i$makes it a valid LWE encryption. Now the algorithm to switch keys is merely as follows (where the first vector has$mleading zeros to ensure the dimensions align):\displaystyle c’ = (0, \dots, 0, b) – \sum_{i=1}^n a_i \textup{KSK}_i$This is computing a linear combination of the$\textup{KSK}_i$. The specific linear combination is the first step of LWE decryption ($b – \langle a, s \rangle$), but performed on ciphertexts of$b$and the$s_i$. Note,$(0, \dots, 0, b)$is a valid (but insecure) LWE ciphertext of$b$under any secret key, in part because we’re pretending the LWE samples and error were all sampled as zero; an unlikely but coherent outcome used to jumpstart a homomorphic computation in more places than key switching. So if you wanted to, you could write$c’$as follows, to highlight how we’re computing additions and linear scalings of LWE ciphertexts.$\displaystyle c’ = \textup{LWE}_{\textup{t}}(b) – \sum_{i=1}^n a_i \textup{LWE}_t(s_i)$This should be enough to show that$c’$is a valid LWE encryption (if we accept that adding and scaling preserves LWE validity). But to warm up for the rest of the article we’ll reprove it with a slightly different technique. This will also help us understand the error growth. Because LWE naturally admits sums and scalar products with corresponding added error, we expect the error to grow proportionally to the number of additions and the magnitudes of the$a_i$’s. And you may already be able to tell that because the$a_i$’s are uniform$\mathbb{Z}/q\mathbb{Z}$elements, this part will be far too large to be useful. Let’s make this explicit now. To show it’s a valid LWE encryption, we define the function$\varphi_s$, defined on any LWE ciphertext$c = (a_1, \dots, a_n, b)$as$\varphi_s(c) = b – \langle a, s \rangle$. Some authors call$\varphi_s$the “phase” function, but I think of it as a close friend: the first step of the decryption function for LWE (the second step would be rounding off the error). Critically, an LWE encryption is valid if and only if$\varphi_s(c) = m + e$(provided$e$is sufficiently small). Because$\varphi_s$is a linear function, it factors through the definition of$c’$nicely, and we get$\displaystyle \begin{aligned} \varphi_t(c’) &= \varphi_t((0, \dots, 0, b)) – \sum_{i=1}^n a_i \varphi_t(\textup{KSK}_i) \\ &= b – \sum_{i=1}^n a_i (y_i – \langle x_i, t \rangle) \\ &= b – \sum_{i=1}^n a_i (s_i + e_i) \end{aligned}$where (reminder)$e_i$is the error sample from$\textup{KSK}_i$’s definition. Distributing$a_i$across the$(s_i + e_i)$simplifies everything nicely$\displaystyle \begin{aligned} &= b – \sum_{i=1}^n a_i s_i – \sum_{i=1}^n a_i e_i \\ &= m + e_{\textup{original}} – \sum_{i=1}^n a_i e_i \end{aligned}$Now as we foreshadowed,$e_{\textup{new}} = -\sum_{i=1}^n a_i e_i$is simply too large. A typical LWE ciphertext will have error at least 1 (or it would be useless), and if$q = 2^{32}$, the$a_i$’s would also be of magnitude roughly$2^{31}$, so summing even two of those would corrupt even a 1-bit message stored in the most significant bit of the plaintext. The way to deal with this is to use a bit decomposition. ## Key switching, with gadget decompositions Recall from the gadget decomposition article that the core function of a gadget decomposition is to preserve the ultimate value of a dot product while making the vectors multiplicands larger (spending space/time) but also making the size of the coefficients of one of the vectors smaller (reducing the accumulation of error due to that dot product). This is exactly the approach we’ll take here. The “dot product” in question is$(a_1, \dots, a_n) \cdot \textup{KSK}$(where KSK is viewed as a matrix), and we’ll expand the values$a_i$into a vector of its digits in a base-$B$number system, while modifying the key switching key so that those missing powers of$B$are part of the LWE encryption. This will result in replacing the error term that looked like$\sum_{i=1}^n a_i e_i$with an error term like$\sum_{i=1}^n c B e_i$for some small constant$c$(expect it to be even less than$B$). More specifically, define decomposition parameters as a triple of numbers$(B, k, L)$. The number$B$is a power of 2 no bigger than$q/2$, and$L$, or the number of levels of the decomposition, is the positive integer such that$B^L = q$(this is forced by the choice of$B$). Then finally,$k$is a number between$0$and$L-1$describing the “lowest level” (or least-significant digit) included in the decomposition. An error-free decomposition sets the parameter$k=0$, and this is defined simply as a base-$B$representation of a number. For example, suppose$q = 2^{32}$, and$(B, k, L) = (256, 0, 4)$, and we’re decomposing$x=2^{32} – 2$. Then$\textup{Decomp}_{256, 0, 4}(x) = (254, 255, 255, 255)$. I subtracted 2 to emphasize that the digits are little-Endian (the right-most entry is the most significant, representing the$256^3$place). An approximate decomposition is one with$k > 0$. For example, suppose$(B, k, L) = (256, 2, 4)$and again$x=2^{32} – 2$. Setting$k=2$means that we represent this number as if it were$(0, 0, 255, 255)$, wiping out the two least significant digits. The error of this approximation is$65534 = 254 + 255 \cdot 256^1$. As we will see, an approximate decomposition may help reduce overall error by splitting the newly introduced error into a sum of two terms, where$k$scales the error differently in each term. Let’s go through the key-switching key derivation again, using an error-free decomposition$(B, 0, L)$. First, re-define the key switching key as follows.$\displaystyle \textup{KSK} = \{ \textup{KSK}_{i, j} = \textup{LWE}_t(s_i B^j) \mid i=1, \dots, n ; j = 0, \dots, L-1\}$Note that this increases the dimension of the key-switching key by 1. Previously the key-switching key was a list of LWE ciphertexts (2-dimensional array of numbers), and now it’s a 3-dimensional array, with the new dimension corresponding to the decomposition digit$j$. Because the powers of$B$are attached to the message, they will factor out and allow us to reconstruct the original$a_i$’s, but they will not be included in the error part because error is added to the message during encryption. Next, to perform the key switch, define$\textup{Decomp}(a_i) = (a_{i,0}, \dots, a_{i,L-1})$and compute$\displaystyle c’ = (0, \dots, 0, b) – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} \textup{KSK}_{i,j}$This is the same as the original key switch, but the extra summation accounts for the extra dimension introduced by the gadget decomposition. Then we can repeat the same$\varphi_t$trick and see how the original$a_i$’s are reconstructed.$\displaystyle \begin{aligned} \varphi_t(c’) &= b – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} \varphi_t(\textup{KSK}_{i,j}) \\ &= b -\sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} (s_i B^j + e_i) \\ &= b -\sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} s_i B^j – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} e_i \\ &= b -\sum_{i=1}^n a_i s_i – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} e_i \\ &= m + e_{\textup{original}} – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} e_i \end{aligned}$One key ingredient above is noticing that in$\sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} s_i B^j$, the$s_i$factors out of the innermost sum, and what you have left is$\sum_{j=0}^{L-1} a_{i,j} B^j$, which is exactly how to reconstruct$a_i$from its base-$B$digits. The second key ingredient is that the innermost term on the second line is$a_{i,j} (s_i B^j + e_i)$, which means that only the digits$a_{i,j}$are multiplied by the error terms, not including the powers of$B$, and so the final error can be bounded by the largest allowable value of a single digit$B-1$, resulting in the new error being$L (B-1) \sum_{i=1}^n e_i$. For a Gaussian centered at zero, the expectation of these errors is zero, and using standard bounding arguments like Chernoff bounds, you can prove that with high probability this new error is at most$L(B-1) \sigma \sqrt{2n \log n}$, where$\sigma$is the standard deviation of the error distribution. Now, finally, we can run through this argument one more time, but using an approximate decomposition. This merely changes the sum’s lower bound from$j=0$to$j=k$. Start by calling$\tilde{a}_i = \sum_{j=k}^{L-1} a_{i,j} B^j$, the approximation of$a_i$from its most significant bits. Then the error of this approximation is$a_i – \tilde{a}_i = \sum_{j=0}^{k-1} a_{i,j} B^j$, a relatively small quantity at most$(B^k – 1) / (B-1)$(if each$a_{i,j} = B-1$is as large as possible).$\displaystyle \begin{aligned} \varphi_t(c’) &= b – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} \varphi_t(\textup{KSK}_{i,j}) \\ &= b -\sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} (s_i B^j + e_i) \\ &= b -\sum_{i=1}^n s_i \sum_{j=k}^{L-1} a_{i,j} B^j – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \\ &= b -\sum_{i=1}^n s_i \tilde{a}_i – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \end{aligned}$Mentally zoom in on the first sum$\sum_{i=1}^n s_i \tilde{a}_i$. Use the trick of adding zero to get$\displaystyle \sum_{i=1}^n s_i \tilde{a}_i = \sum_{i=1}^n s_i (a_i + \tilde{a}_i – a_i) = \sum_{i=1}^n s_i a_i – \sum_{i=1}^n s_i(a_i – \tilde{a}_i)$The term$\sum_{i=1}^n s_i(a_i – \tilde{a}_i)$is part of our new error term, and recalling that the secret key bits are binary, you should think of this in expectation as roughly$\frac{n}{2} B^{k-1}$(more precisely,$\frac{n}{2} (B^{k}-1)/(B-1)$). Continuing, we arrive at$\displaystyle \begin{aligned} \varphi_t(c’) &= b -\sum_{i=1}^n a_i s_i – \sum_{i=1}^n s_i(a_i – \tilde{a}_i) – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \\ &= m + e_{\textup{original}} – \sum_{i=1}^n s_i(a_i – \tilde{a}_i) – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \end{aligned}$## Rough error analysis Now the choice of$k$admits a tradeoff that one can optimize for to minimize the total newly introduced error. I’m going to switch to a sloppy mode of math to heuristically navigate this tradeoff. The triangle inequality lets us bound the magnitude of the error by the sum of the magnitudes of the parts, i.e., the error is bounded from above by$\displaystyle \left | \sum_{i=1}^n s_i(a_i – \tilde{a}_i) \right | + \left | \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \right |$The left term is like$\frac{n}{2} B^{k-1}$as we stated earlier, and with high probability it’s at most$(n/2 + \sqrt{n \log n}) B^{k-1}$. The right term is at most$(L-k)B \sum_{i=1}^n e_i$, (worst case size of$a_{i,j}$, increasing$B-1$to$B$because why not), and with high probability the sum of the$e_i$is like$\sigma \sqrt{2n \log n}$, making the whole term bounded by$(L-k)B \sigma \sqrt{2n \log n}$. So we want to minimize the sum$\displaystyle (n/2 + \sqrt{n \log n}) B^{k-1} + (L-k)B \sigma \sqrt{2n \log n}$We could try to explicitly optimize this for$k$, treating the other terms as constant, but it won’t be nice because$k$is present in both a linear term and an exponent. We could also just stare at it and think. The approximation error (the term on the left) is going to get exponentially larger as$k$grows, so we want to keep$k$relatively small. But on the other hand, the standard deviation$\sigma$should be much larger than$n$to keep LWE secure. This is effectively what we’re trying to suppress: error that grows like$O(n)$is small enough to deal with, but error that grows like$\omega(n)$is problematic. Increasing$k$gives us a meager (but nontrivial) means to reduce the constant coefficient on that part of the error in exchange for$\Theta(n)$growth with in the other term. I admit, as of the time of this writing I still don’t understand how to set production security parameters for LWE. Is it still linear in$n$? Super-linear? Not sure. I’m betting future Jeremy will clarify this to me in another article. Even if it were linear in$n$, the right term multiplies$\sigma$by$\sqrt{n \log n}$which makes the whole thing super-linear, whereas the left term adds a square root factor. So the tradeoff in$k$should still help. Until I understand LWE security, I won’t have the asymptotics I need to analyze this further. Moreover, the allowed values of$B, k$are so small that we can brute force evaluate all options. For example, if$B = 16$then$k$can be between 0 and 7. And realistically, if$n \approx 2^{10}$, then letting$k = 4$makes the first term roughly$2^{26}\$, which leaves only 6 bits left for the message (further reduced by any error introduced by the second term).

Thanks to Cathie Yun and Asra Ali for providing feedback on an early draft of this article.

Until next time!