Encoding Schemes in FHE

In cryptography, we need a distinction between a cleartext and a plaintext. A cleartext is a message in its natural form. A plaintext is a cleartext that is represented in a specific way to prepare it for encryption in a specific scheme. The process of taking a cleartext and turning it into a plaintext is called encoding, and the reverse is called decoding.

In homomorphic encryption, the distinction matters. Cleartexts are generally all integers, though the bit width of allowed integers can be restricted (e.g., 4-bit integers). On the other hand, each homomorphic encryption (HE) scheme has its own special process for encoding and decoding, and since HEIR hopes to support all HE schemes, I set about cataloguing the different encoding schemes. This article is my notes on what they are.

If you’re not familiar with the terms Learning With Errors LWE and and its ring variant RLWE, then you may want to read up on those Wikipedia pages first. These problems are fundamental to most FHE schemes.

Bit field encoding for LWE

A bit field encoding simply places the bits of a small integer cleartext within a larger integer plaintext. An example might be a 3-bit integer cleartext placed in the top-most bits of a 32-bit integer plaintext. This is necessary because operations on FHE ciphertexts accumulate noise, which pollutes the lower-order bits of the corresponding plaintext (BGV is a special case that inverts this, see below).

Many papers in the literature will describe “placing in the top-most bits” as “applying a scaling factor,” which essentially means pick a power of 2 $\Delta$ and encode an integer $x$ as $\Delta x$. However, by using a scaling factor it’s not immediately clear if all of the top-most bits of the plaintext are safe to use.

To wit, the CGGI (aka TFHE) scheme has a slightly more specific encoding because it requires the topmost bit to be zero in order to use its fancy programmable bootstrapping feature. Don’t worry if you don’t know what it means, but just know that in this scheme the top-most bit is set aside.

This encoding is hence most generally described by specifying a starting bit and a bit width for the location of the cleartext in a plaintext integer. The code would look like

plaintext = message << (plaintext_bit_width - starting_bit - cleartext_bit_width)

There are additional steps that come into play when one wants to encode a decimal value in LWE, which can be done with fixed point representations.

As mentioned above, the main HE scheme that uses bit field LWE encodings is CGGI, but all the schemes use this encoding as part of their encoding because all schemes need to ensure there is space for noise growth during FHE operations.

Coefficient encoding for RLWE

One of the main benefits of RLWE-based FHE schemes is that you can pack lots of cleartexts into one plaintext. For this and all the other RLWE-based sections, the cleartext space is something like $(\mathbb{Z}/3\mathbb{Z})^{1024}$, vectors of small integers of some dimension. Many folks in the FHE world call $p$ the modulus of the cleartexts. And the plaintext space is something like $(\mathbb{Z}/2^{32}\mathbb{Z})[x] / (x^{1024} + 1)$, i.e., polynomials with large integer coefficients and a polynomial degree matching the cleartext space dimension. Many people call $q$ the coefficient modulus of the plaintext space.

In the coefficient encoding for RLWE, the bit-field encoding is applied to each input, and they are interpreted as coefficients of the polynomial.

This encoding scheme is also used in CGGI, in order to encrypt a lookup table as a polynomial for use in programmable bootstrapping. But it can also be used (though it is rarely used) in the BGV and BFV schemes, and rarely because both of those schemes use the polynomial multiplication to have semantic meaning. When you encode RLWE with the coefficient encoding, polynomial multiplication corresponds to a convolution of the underlying cleartexts, when most of the time those schemes prefer that multiplication corresponds to some kind of point-wise multiplication. The next encoding will handle that exactly.

Evaluation encoding for RLWE

The evaluation encoding borrows ideas from the Discrete Fourier Transform literature. See this post for a little bit more about why the DFT and polynomial multiplication are related.

The evaluation encoding encodes a vector $(v_1, \dots, v_N)$ by interpreting it as the output value of a polynomial $p(x)$ at some implicitly determined, but fixed points. These points are usually the roots of unity of $x^N + 1$ in the ring $\mathbb{Z}/q\mathbb{Z}$ (recall, the coefficients of the polynomial ring), and one computes this by picking $q$ in such a way that guarantees the multiplicative group $(\mathbb{Z}/q\mathbb{Z})^\times$ has a generator, which plays the analogous role of a $2N$-th root of unity that you would normally see in the complex numbers.

Once you have the root of unity, you can convert from the evaluation form to a coefficient form (which many schemes need for the encryption step) via an inverse number-theoretic transform (INTT). And then, of course, one must scale the coefficients using the bit field encoding to give room for noise. The coefficient form here is considered the “encoded” version of the cleartext vector.

Aside: one can perform the bit field encoding step before or after the INTT, since the bitfield encoding is equivalent to multiplying by a constant, and scaling a polynomial by a constant is equivalent to scaling its point evaluations by the same constant. Polynomial evaluation is a linear function of the coefficients.

The evaluation encoding is the most commonly used encoding used for both the BGV and BFV schemes. And then after encryption is done, one usually NTT’s back to the evaluation representation so that polynomial multiplication can be more quickly implemented as entry-wise multiplication.

Rounded canonical embeddings for RLWE

This embedding is for a family of FHE schemes related to the CKKS scheme, which focuses on approximate computation.

Here the cleartext space and plaintext spaces change slightly. The cleartext space is $\mathbb{C}^{N/2}$, and the plaintext space is again $(\mathbb{Z}/q\mathbb{Z})[x] / (x^N + 1)$ for some machine-word-sized power of two $q$. As you’ll note, the cleartext space is continuous but the plaintext space is discrete, so this necessitates some sort of approximation.

Aside: In the literature you will see the plaintext space described as just $(\mathbb{Z}[x] / (x^N + 1)$, and while this works in principle, in practice doing so requires multiprecision integer computations, and ends up being slower than the alternative, which is to use a residue number system before encoding, and treat the plaintext space as $(\mathbb{Z}/q\mathbb{Z})[x] / (x^N + 1)$. I’ll say more about RNS encoding in the next section.

The encoding is easier to understand by first describing the decoding step. Given a polynomial $f \in (\mathbb{Z}/q\mathbb{Z})[x] / (x^N + 1)$, there is a map called the canonical embedding $\varphi: (\mathbb{Z}/q\mathbb{Z})[x] / (x^N + 1) \to \mathbb{C}^N$ that evaluates $f$ at the odd powers of a primitive $2N$-th root of unity. I.e., letting $\omega = e^{2\pi i / 2N}$, we have

\[ \varphi(f) = (f(\omega), f(\omega^3), f(\omega^5), \dots, f(\omega^{2N-1})) \]

Aside: My algebraic number theory is limited (not much beyond a standard graduate course covering Galois theory), but this paper has some more background. My understanding is that we’re viewing the input polynomials as actually sitting inside the number field $\mathbb{Q}[x] / (x^N + 1)$ (or else $q$ is a prime and the original polynomial ring is a field), and the canonical embedding is a specialization of a more general theorem that says that for any subfield $K \subset \mathbb{C}$, the Galois group $K/\mathbb{Q}$ is exactly the set of injective homomorphisms $K \to \mathbb{C}$. I don’t recall exactly why these polynomial quotient rings count as subfields of $\mathbb{C}$, and I think it is not completely trivial (see, e.g., this stack exchange question).

As specialized to this setting, the canonical embedding is a scaled isometry for the 2-norm in both spaces. See this paper for a lot more detail on that. This is a critical aspect of the analysis for FHE, since operations in the ciphertext space add perturbations (noise) in the plaintext space, and it must be the case that those perturbations decode to similar perturbations so that one can use bounds on noise growth in the plaintext space to ensure the corresponding cleartexts stay within some desired precision.

Because polynomials commute with complex conjugation ($f(\overline{z}) = \overline{f(z)}$), and roots of unity satisfy $\overline{\omega^k} = \omega^{-k}$, this canonical embedding is duplicating information. We can throw out the second half of the roots of unity and retain the same structure (the scaling in the isometry changes as well). The result is that the canonical embedding is defined $\varphi: (\mathbb{Z}/q\mathbb{Z})[x] / (x^N + 1) \to \mathbb{C}^{N/2}$ via

\[ \varphi(f) = (f(\omega), f(\omega^3), \dots, f(\omega^{N-1})) \]

Since we’re again using the bit-field encoding to scale up the inputs for noise, the decoding is then defined by applying the canonical embedding, and then applying bit-field decoding (scaling down).

This decoding process embeds the discrete polynomial space inside $\mathbb{C}^{N/2}$ as a lattice, but input cleartexts need not lie on that lattice. And so we get to the encoding step, which involves rounding to a point on the lattice, then inverting the canonical embedding, then applying the bit-field encoding to scale up for noise.

Using commutativity, one can more specifically implement this by first inverting the canonical embedding (which again uses an FFT-like operation), the result of which is in $\mathbb{C}[x] / (x^N + 1)$, then apply the bit-field encoding to scale up, then round the coefficients to be in $\mathbb{Z}[x] / (x^N + 1)$. As mentioned above, if you want the coefficients to be machine-word-sized integers, you’ll have to design this all to ensure the outputs are sufficiently small, and then treat the output as $\mathbb{Z}/q\mathbb{Z}[x] / (x^N + 1)$. Or else use a RNS mechanism.

Residue Number System Pre-processing

In all of the above schemes, the cleartext spaces can be too small for practical use. In the CGGI scheme, for example, a typical cleartext space is only 3 or 4 bits. Some FHE schemes manage this by representing everything in terms of boolean circuits, and pack inputs to various boolean gates in those bits. That is what I’ve mainly focused on, but it has the downside of increasing the number of FHE operations, requiring deeper circuits and more noise management operations, which are slow. Other approaches try to use the numerical structure of the ciphertexts more deliberately, and Sunzi’s Theorem (colloquially known as the Chinese Remainder Theorem) comes to the rescue here.

There will be two “cleartext” spaces floating around here, one for the “original” message, which I’ll call the “original” cleartext space, and one for the Sunzi’s-theorem-decomposed message, which I’ll call the “RNS” cleartext space (RNS for residue number system).

The original cleartext space size $M$ must be a product of primes or co-prime integers $M = m_1 \cdot \dots \cdot m_r$, with each $m_i$ being small enough to be compatible with the desired FHE’s encoding. E.g., for a bit-field encoding, $M$ might be large, but each $m_i$ would have to be at most a 4-bit prime (which severely limits how much we can decompose).

Then, we represent a single original cleartext message $x \in \mathbb{Z}/M\mathbb{Z}$ via its residues mod each $m_i$. I.e., $x$ becomes $r$ different cleartexts $(x \mod m_1, x \mod m_2, \dots, x \mod m_r)$ in the RNS cleartext space. From there we can either encode all the cleartexts in a single plaintext—the various RLWE encodings support this so long as $r < N$ (or $N/2$ for the canonical embedding))—or else encode them as difference plaintexts. In the latter case, the executing program needs to ensure the plaintexts are jointly processed. E.g., any operation that happens to one must happen to all, to ensure that the residues stay in sync and can be reconstructed at the end.

And finally, after decoding we use the standard reconstruction algorithm from Sunzi’s theorem to rebuild the original cleartext from the decoded RNS cleartexts.

I’d like to write a bit more about RNS decompositions and Sunzi’s theorem in a future article, because it is critical to how many FHE schemes operate, and influences a lot of their designs. For example, I glazed over how inverting the canonical embedding works in detail, and it is related to Sunzi’s theorem in a deep way. So more on that in the future.

Estimating the Security of Ring Learning with Errors (RLWE)

This article was written by my colleague, Cathie Yun. Cathie is an applied cryptographer and security engineer, currently working with me to make fully homomorphic encryption a reality at Google. She’s also done a lot of cool stuff with zero knowledge proofs.


In previous articles, we’ve discussed techniques used in Fully Homomorphic Encryption (FHE) schemes. The basis for many FHE schemes, as well as other privacy-preserving protocols, is the Learning With Errors (LWE) problem. In this article, we’ll talk about how to estimate the security of lattice-based schemes that rely on the hardness of LWE, as well as its widely used variant, Ring LWE (RLWE).

A previous article on modulus switching introduced LWE encryption, but as a refresher:

Reminder of LWE

A literal repetition from the modulus switching article. The LWE encryption scheme I’ll use has the following parameters:

  • A plaintext space $\mathbb{Z}/q\mathbb{Z}$, where $q \geq 2$ is a positive integer. This is the space that the underlying message comes from.
  • An LWE dimension $n \in \mathbb{N}$.
  • A discrete Gaussian error distribution $ D$ with a mean of zero and a fixed standard deviation.

An LWE secret key is defined as a vector in $\{0, 1\}^n$ (uniformly sampled). An LWE ciphertext is defined as a vector $a = (a_1, \dots, a_n)$, sampled uniformly over $(\mathbb{Z} / q\mathbb{Z})^n$, and a scalar $b = \langle a, s \rangle + m + e$, where $e$ is drawn from $D$ and all arithmetic is done modulo $q$. Note that $e$ must be small for the encryption to be valid.

Learning With Errors (LWE) security

Choosing appropriate LWE parameters is a nontrivial challenge when designing and implementing LWE based schemes, because there are conflicting requirements of security, correctness, and performance. Some of the parameters that can be manipulated are the LWE dimension $n$, error distribution $D$ (referred to in the next few sections as $X_e$), secret distribution $X_s$, and plaintext modulus $q$.

Lattice Estimator

Here is where the Lattice Estimator tool comes to our assistance! The lattice estimator is a Sage module written by a group of lattice cryptography researchers which estimates the concrete security of Learning with Errors (LWE) instances.

For a given set of LWE parameters, the Lattice Estimator calculates the cost of all known efficient lattice attacks – for example, the Primal, Dual, and Coded-BKW attacks. It returns the estimated number of “rops” or “ring operations” required to carry out each attack; the attack that is the most efficient is the one that determines the security parameter. The bits of security for the parameter set can be calculated as $\log_2(\text{rops})$ for the most efficient attack.

For example, we used this script to sweep over a decent subset of the parameter space of LWE to determine which parameter settings had 128-bit security.

Running the Lattice Estimator

For example, let’s estimate the security of the security parameters originally published for the popular TFHE scheme:

n = 630
q = 2^32
Xs = UniformMod(2)
Xe = DiscreteGaussian(stddev=2^17)

After installing the Lattice Estimator and sage, we run the following commands in sage:

> from estimator import *
> schemes.TFHE630
LWEParameters(n=630, q=4294967296, Xs=D(σ=0.50, μ=-0.50), Xe=D(σ=131072.00), m=+Infinity, tag='TFHE630')
> _ = LWE.estimate(schemes.TFHE630)
bkw                  :: rop: ≈2^153.1, m: ≈2^139.4, mem: ≈2^132.6, b: 4, t1: 0, t2: 24, ℓ: 3, #cod: 552, #top: 0, #test: 78, tag: coded-bkw
usvp                 :: rop: ≈2^124.5, red: ≈2^124.5, δ: 1.004497, β: 335, d: 1123, tag: usvp
bdd                  :: rop: ≈2^131.0, red: ≈2^115.1, svp: ≈2^131.0, β: 301, η: 393, d: 1095, tag: bdd
bdd_hybrid           :: rop: ≈2^185.3, red: ≈2^115.9, svp: ≈2^185.3, β: 301, η: 588, ζ: 0, |S|: 1, d: 1704, prob: 1, ↻: 1, tag: hybrid
bdd_mitm_hybrid      :: rop: ≈2^265.5, red: ≈2^264.5, svp: ≈2^264.5, β: 301, η: 2, ζ: 215, |S|: ≈2^189.2, d: 1489, prob: ≈2^-146.6, ↻: ≈2^148.8, tag: hybrid
dual                 :: rop: ≈2^128.7, mem: ≈2^72.0, m: 551, β: 346, d: 1181, ↻: 1, tag: dual
dual_hybrid          :: rop: ≈2^119.8, mem: ≈2^115.5, m: 516, β: 314, d: 1096, ↻: 1, ζ: 50, tag: dual_hybrid

In this example, the most efficient attack is the dual_hybrid attack. It uses 2^119.8 ring operations, and so these parameters provide 119.8 bits of security. The reader may notice that the TFHE website claims those parameters give 128 bits of security. This discrepancy is due to the fact that they used an older library (the LWE estimator, which is no longer maintained), which doesn’t take into account the most up-to-date lattice attacks.

For further reading, Benjamin Curtis wrote an article about parameter selection for the CONCRETE implementation of the TFHE scheme. Benjamin Curtis, Martin Albrecht, and other researchers also used the Lattice Estimator to estimate all the LWE and NTRU schemes.

Ring Learning with Errors (RLWE) security

It is often desirable to use Ring LWE instead of LWE, for greater efficiency and smaller key sizes (as Chris Peikert illustrates via meme). We’d like to estimate the security of a Ring LWE scheme, but it wasn’t immediately obvious to us how to do this, since the Lattice Estimator only operates over LWE instances. In order to use the Lattice Estimator for this security estimate, we first needed to do a reduction from the RLWE instance to an LWE instance.

Attempted RLWE to LWE reduction

Given an RLWE instance with $ \text{RLWE_dimension} = k $ and $ \text{poly_log_degree} = N $, we can create a relation that looks like an LWE instance of $ \text{LWE_dimension} = N * k $ with the same security, as long as $N$ is a power of 2 and there are no known attacks that target the ring structure of RLWE that are more efficient than the best LWE attacks. Note: $N$ must be a power of 2 so that $x^N+1$ is a cyclotomic polynomial.

An RLWE encryption has the following form: $ (a_0(x), a_1(x), … a_{k-1}(x), b(x)) $

  •   Public polynomials: $ a_0(x), a_1(x), \dots a_{k-1}(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} (\mathbb{Z}/{q \mathbb{Z}[x]} ) / (x^N + 1)^k$
  •   Secret (binary) polynomials: $ s_0(x), s_1(x), \dots s_{k-1}(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} (\mathbb{B}_N[x])^k$
  •   Error: $ e(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} \chi_e$
  •   RLWE instance: $ b(x) = \sum_{i=0}^{k-1} a_i(x) \cdot s_i(x) + e(x) \in (\mathbb{Z}/{q \mathbb{Z}[x]} ) / (x^N + 1)$

We would like to express this in the form of an LWE encryption. We can make start with the simple case, where $ k=1 $. Therefore, we will only be working with the zero-entry polynomials, $a_0(x)$ and $s_0(x)$. (For simplicity, in the next example you can ignore the zero-subscript and think of them as $a(x)$ and $s(x)$).

Naive reduction for $k=1$ (wrong!)

Naively, if we simply defined the LWE $A$ matrix to be a concatenation of the coefficients of the RLWE polynomial $a(x)$, we get:

$$ A_{\text{LWE}} = ( a_{0, 0}, a_{0, 1}, \dots a_{0, N-1} ) $$

We can do the same for the LWE $s$ vector:

$$ s_{\text{LWE}} = ( s_{0, 0}, s_{0, 1}, \dots s_{0, N-1} ) $$

But this doesn’t give us the value of $b_{LWE}$ for the LWE encryption that we want. In particular, the first entry of $b_{LWE}$, which we can call $b_{\text{LWE}, 0}$, is simply a product of the first entries of $a_0(x)$ and $s_0(x)$:

$$ b_{\text{LWE}, 0} = a_{0, 0} \cdot s_{0, 0} + e_0 $$

However, we want $b_{\text{LWE}, 0}$ to be a sum of the products of all the coefficients of $a_0(x)$ and $s_0(x)$ that give us a zero-degree coefficient mod $x^N + 1$. This modulus is important because it causes the product of high-degree monomials to “wrap around” to smaller degree monomials because of the negacyclic property, such that $x^N \equiv -1 \mod x^N + 1$. So the constant term $b_{\text{LWE}, 0}$ should include all of the following terms:

$$\begin{aligned}
b_{\text{LWE}, 0} = & a_{0, 0} \cdot s_{0, 0} \\
 – & a_{0, 1} \cdot s_{0, N-1} \\
 – & a_{0, 2} \cdot s_{0, N-2} \\
 – & \dots \\
 – & a_{0, N-1} \cdot s_{0, 1}\\
 + & e_0\\
\end{aligned}
$$

Improved reduction for $k=1$

We can achieve the desired value of $b_{\text{LWE}}$ by more strategically forming a matrix $A_{\text{LWE}}$, to reflect the negacyclic property of our polynomials in the RLWE space. We can keep the naive construction for $s_\text{LWE}$.

$$ A_{\text{LWE}} =
\begin{pmatrix}
a_{0, 0}   & -a_{0, N-1} & -a_{0, N-2} & \dots & -a_{0, 1}\\
a_{0, 1}   & a_{0, 0}    & -a_{0, N-1} & \dots & -a_{0, 2}\\
\vdots     & \ddots      &             &       & \vdots   \\
a_{0, N-1} & \dots       &             &       & a_{0, 0} \\
\end{pmatrix}
$$

This definition of $A_\text{LWE}$ gives us the desired value for $b_\text{LWE}$, when $b_{\text{LWE}}$ is interpreted as the coefficients of a polynomial. As an example, we can write out the elements of the first row of $b_\text{LWE}$:

$$
\begin{aligned}
b_{\text{LWE}, 0} = & \sum_{i=0}^{N-1} A_{\text{LWE}, 0, i} \cdot s_{0, i} + e_0 \\
b_{\text{LWE}, 0} = & a_{0, 0} \cdot s_{0, 0} \\
 – & a_{0, 1} \cdot s_{0, N-1} \\
 – & a_{0, 2} \cdot s_{0, N-2} \\
 – & \dots \\
 – & a_{0, N-1} \cdot s_{0, 1}\\
 + & e_0 \\
\end{aligned}
$$

Generalizing for all $k$

In the generalized $k$ case, we have the RLWE equation:

$$ b(x) = a_0(x) \cdot s_0(x) + a_1(x) \cdot s_1(x) \cdot a_{k-1}(x) \cdot s_{k-1}(x) + e(x) $$

We can construct the LWE elements as follows:

$$A_{\text{LWE}} =
\left ( \begin{array}{c|c|c|c}
A_{0, \text{LWE}} & A_{1, \text{LWE}} & \dots & A_{k-1, \text{LWE}} \end{array}
 \right )
$$

where each sub-matrix is the construction from the previous section:

$$ A_{\text{LWE}} =
\begin{pmatrix}
a_{i, 0}   & -a_{i, N-1} & -a_{i, N-2} & \dots & -a_{i, 1}\\
a_{i, 1}   & a_{i, 0}    & -a_{i, N-1} & \dots & -a_{i, 2}\\
\vdots     & \ddots      &             &       & \vdots   \\
a_{i, N-1} & \dots       &             &       & a_{i, 0} \\
\end{pmatrix}
$$

And the secret keys are stacked similarly:

$$ s_{\text{LWE}} = ( s_{0, 0}, s_{0, 1}, \dots s_{0, N-1} \mid s_{1, 0}, s_{1, 1}, \dots s_{1, N-1} \mid \dots ) $$

This is how we can reduce an RLWE instance with RLWE dimension $k$ and polynomial modulus degree $N$, to a relation that looks like an LWE instance of LWE dimension $N * k$.

Caveats and open research

This reduction does not result in a correctly formed LWE instance, since an LWE instance would have a matrix $A$ that is randomly sampled, whereas the reduction results in an matrix $A$ that has cyclic structure, due to the cyclic property of the RLWE instance. This is why I’ve been emphasizing that the reduction produces an instance that looks like LWE. All currently known attacks on RLWE do not take advantage of the structure, but rather directly attack this transformed LWE instance. Whether the additional ring structure can be exploited in the design of more efficient attacks remains an open question in the lattice cryptography research community.

In her PhD thesis, Rachel Player mentions the RLWE to LWE security reduction:

In order to try to pick parameters in Ring-LWE-based schemes (FHE or otherwise) that we hope are sufficiently secure, we can choose parameters such that the underlying Ring-LWE instance should be hard to solve according to known attacks. Each Ring-LWE sample can be used to extract $n$ LWE samples. To the best of our knowledge, the most powerful attacks against $d$-sample Ring-LWE all work by instead attacking the $nd$-sample LWE problem. When estimating the security of a particular set of Ring-LWE parameters we therefore estimate the security of the induced set of LWE parameters.

This indicates that we can do this reduction for certain RLWE instances. However, we must be careful to ensure that the polynomial modulus degree $N$ is a power of two, because otherwise the error distribution “breaks”, as my colleague Baiyu Li explained to me in conversation:

The RLWE problem is typically defined in using the ring of integers of the cyclotomic field $\mathbb{Q}[X]/(f(X))$, where $f(X)$ is a cyclotomic polynomial of degree $k=\phi(N)$ (where $\phi$ is Euler’s totient function), and the error is a spherical Gaussian over the image of the canonical embedding into the complex numbers $\mathbb{C}^k$ (basically the images of primitive roots of unity under $f$). In many cases we set $N$ to be a power of 2, thus $f(X)=X^{N/2}+1$, since the canonical embedding for such $N$ has a nice property that the preimage of the spherical Gaussian error is also a spherical Gaussian over the coefficients of polynomials in $\mathbb{Q}[X]/(f(X))$. So in this case we can sample $k=N/2$ independent Gaussian numbers and use them as the coefficients of the error polynomial $e(x)$. For $N$ not a power of 2, $f(X)$ may have some low degree terms, and in order to get the spherical Gaussian with the same variance $s^2$ in the canonical embedding, we probably need to use a larger variance when sampling the error polynomial coefficients.

The RLWE we frequently use in practice is actually a specialized version called “polynomial LWE”, and instantiated with $N$ = power of 2 and so $f(X)=X^{N/2}+1$. For other parameters the two are not exactly the same. This paper has some explanations: https://eprint.iacr.org/2018/170.pdf

The error distribution “breaks” if $N$ is not a power of 2 due to the fact that the precise form of RLWE is not defined on integer polynomial rings $R = \mathbb{Z}[X]/(f(X))$, but is defined on its dual (or the dual in the underlying number field, which is a fractional ideal of $\mathbb{Q}[X]/(f(x))$), and the noise distribution is on the Minkowski embedding of this dual ring. For non-power of 2 $N$, the product mod $f$ of two small polynomials in $\mathbb{Q}[X]/(f(x))$ may be large, where small/large means their L2 norm on the coefficient vector. This means that in order to sample the required noise distribution, you may need a skewed coefficient distribution. Only when $N$ is a power of 2, the dual of $R$ is a scaling of $R$, and distance in the embedding of $R^{\text{dual}}$ is preserved in $R$, and so we can just sample iid gaussian coefficient to get the required noise.

Because working with a power-of-two RLWE polynomial modulus gives “nice” error behavior, this parameter choice is often recommended and chosen for concrete instantiations of RLWE. For example, the Homomorphic Encryption Standard
recommends and only analyzes the security of parameters for power-of-two cyclotomic fields for use in homomorphic encryption (though future versions of the standard aim to extend the security analysis to generic cyclotomic rings):

We stress that when the error is chosen from sufficiently wide and “well spread” distributions that match the ring at hand, we do not have meaningful attacks on RLWE that are better than LWE attacks, regardless of the ring. For power-of-two cyclotomics, it is sufficient to sample the noise in the polynomial basis, namely choosing the coefficients of the error polynomial $e \in \mathbb{Z}[x] / \phi_k(x)$ independently at random from a very “narrow” distribution.

Existing works analyzing and targeting the ring structure of RLWE include:

It would of course be great to have a definitive answer on whether we can be confident using this RLWE to LWE reduction to estimate the security of RLWE based schemes. In the meantime, we have seen many Fully Homomorphic Encryption (FHE) schemes using this RLWE to LWE reduction, and we hope that this article helps explain how that reduction works and the existing open questions around this approach.

Key Switching in LWE

Last time we covered an operation in the LWE encryption scheme called modulus switching, which allows one to switch from one modulus to another, at the cost of introducing a small amount of extra noise, roughly $\sqrt{n}$, where $n$ is the dimension of the LWE ciphertext.

This time we’ll cover a more sophisticated operation called key switching, which allows one to switch an LWE ciphertext from being encrypted under one secret key to another, without ever knowing either secret key.

Reminder of LWE

A literal repetition of the last article. The LWE encryption scheme I’ll use has the following parameters:

  • A plaintext space $\mathbb{Z}/q\mathbb{Z}$, where $q \geq 2$ is a positive integer. This is the space that the underlying message comes from.
  • An LWE dimension $n \in \mathbb{N}$.
  • A discrete Gaussian error distribution $D$ with a mean of zero and a fixed standard deviation.

An LWE secret key is defined as a vector in $\{0, 1\}^n$ (uniformly sampled). An LWE ciphertext is defined as a vector $a = (a_1, \dots, a_n)$, sampled uniformly over $(\mathbb{Z} / q\mathbb{Z})^n$, and a scalar $b = \langle a, s \rangle + m + e$, where $e$ is drawn from $D$ and all arithmetic is done modulo $q$. Note that $e$ must be small for the encryption to be valid.

Sometimes I will denote by $\textup{LWE}_s(x)$ the LWE encryption of plaintext $x$ under the secret key $s$, and it should be understood that this is a fixed (but arbitrary) draw from the distribution of LWE ciphertexts described above.

Main idea: homomorphically almost-decrypt

The main idea is to encrypt each entry of the original secret key using the new secret key (this collection of encryptions is jointly called a key-switching key), and then use this to homomorphically evaluate the first step of the decryption function (i.e., compute $b – \langle a, s \rangle$). The result is an encryption of the (noisy) message under the new key.

First we’ll show how this works in a naïve sense. In particular, doing what I said in the last paragraph verbatim won’t work because the error will grow too large. But we’ll do it anyway, measure the error, and the remainder of the article will show how the gadget decomposition can be used to reduce the error.

Key switching, without gadget decompositions

Start with an LWE ciphertext for the plaintext $m$. Call it

$\displaystyle c = (a_1, \dots, a_n, b) \in (\mathbb{Z}/q\mathbb{Z})^{n+1}$

where

$\displaystyle b = \left ( \sum_{i=1}^n a_i s_i \right ) + m + e_{\textup{original}}$

and $s = (s_1, \dots, s_n) \in \{ 0,1\}^n$ is the secret key. Now say we have another secret key, possibly of a different dimension $t = (t_1, \dots, t_m) \in \{ 0, 1\}^m$, and we would like to switch the ciphertext $c$ to a ciphertext $c’$ which encrypts the same underlying message $m$, but under the new secret key $t$. That is, we would like to write

$\displaystyle c’ = (a’_1, \dots, a’_m, b’) \in (\mathbb{Z}/q\mathbb{Z})^{m+1}$

where

$\displaystyle b’ = \left ( \sum_{i=1}^n a’_i t_i \right ) + m + e_{\textup{original}} + e_{\textup{new}}$

implying that there is possibly some additional error introduced as a result. As usual, so long as the total error in the ciphertext remains small enough (and $m$ is stored in the significant bits of the underlying integer space), the result will still be a valid LWE ciphertext.

Define the key switching key $\textup{KSK}(s, t)$ as follows (I will omit the $s, t$ and just call it KSK from now on):

$\displaystyle \textup{KSK} = \{ \textup{KSK}_i = \textup{LWE}_t(s_i) = (x_{i, 1}, \dots, x_{i, m}, y_i) \mid i=1, \dots, n\}$

In other words, $\textup{KSK}_i$ encrypts bit $s_i$, and $y_i = \langle x_i, t \rangle + s_i + e_i$ makes it a valid LWE encryption.

Now the algorithm to switch keys is merely as follows (where the first vector has $m$ leading zeros to ensure the dimensions align):

$\displaystyle c’ = (0, \dots, 0, b) – \sum_{i=1}^n a_i \textup{KSK}_i$

This is computing a linear combination of the $\textup{KSK}_i$. The specific linear combination is the first step of LWE decryption ($b – \langle a, s \rangle$), but performed on ciphertexts of $b$ and the $s_i$. Note, $(0, \dots, 0, b)$ is a valid (but insecure) LWE ciphertext of $b$ under any secret key, in part because we’re pretending the LWE samples and error were all sampled as zero; an unlikely but coherent outcome used to jumpstart a homomorphic computation in more places than key switching. So if you wanted to, you could write $c’$ as follows, to highlight how we’re computing additions and linear scalings of LWE ciphertexts.

$\displaystyle c’ = \textup{LWE}_{\textup{t}}(b) – \sum_{i=1}^n a_i \textup{LWE}_t(s_i)$

This should be enough to show that $c’$ is a valid LWE encryption (if we accept that adding and scaling preserves LWE validity). But to warm up for the rest of the article we’ll reprove it with a slightly different technique. This will also help us understand the error growth. Because LWE naturally admits sums and scalar products with corresponding added error, we expect the error to grow proportionally to the number of additions and the magnitudes of the $a_i$’s. And you may already be able to tell that because the $a_i$’s are uniform $\mathbb{Z}/q\mathbb{Z}$ elements, this part will be far too large to be useful. Let’s make this explicit now.

To show it’s a valid LWE encryption, we define the function $\varphi_s$, defined on any LWE ciphertext $c = (a_1, \dots, a_n, b)$ as $\varphi_s(c) = b – \langle a, s \rangle$. Some authors call $\varphi_s$ the “phase” function, but I think of it as a close friend: the first step of the decryption function for LWE (the second step would be rounding off the error). Critically, an LWE encryption is valid if and only if $\varphi_s(c) = m + e$ (provided $e$ is sufficiently small).

Because $\varphi_s$ is a linear function, it factors through the definition of $c’$ nicely, and we get

$\displaystyle \begin{aligned} \varphi_t(c’) &= \varphi_t((0, \dots, 0, b)) – \sum_{i=1}^n a_i \varphi_t(\textup{KSK}_i) \\ &= b – \sum_{i=1}^n a_i (y_i – \langle x_i, t \rangle) \\ &= b – \sum_{i=1}^n a_i (s_i + e_i) \end{aligned}$

where (reminder) $e_i$ is the error sample from $\textup{KSK}_i$’s definition. Distributing $a_i$ across the $(s_i + e_i)$ simplifies everything nicely

$\displaystyle \begin{aligned} &= b – \sum_{i=1}^n a_i s_i – \sum_{i=1}^n a_i e_i \\ &= m + e_{\textup{original}} – \sum_{i=1}^n a_i e_i \end{aligned}$

Now as we foreshadowed, $e_{\textup{new}} = -\sum_{i=1}^n a_i e_i$ is simply too large. A typical LWE ciphertext will have error at least 1 (or it would be useless), and if $q = 2^{32}$, the $a_i$’s would also be of magnitude roughly $2^{31}$, so summing even two of those would corrupt even a 1-bit message stored in the most significant bit of the plaintext.

The way to deal with this is to use a bit decomposition.

Key switching, with gadget decompositions

Recall from the gadget decomposition article that the core function of a gadget decomposition is to preserve the ultimate value of a dot product while making the vectors multiplicands larger (spending space/time) but also making the size of the coefficients of one of the vectors smaller (reducing the accumulation of error due to that dot product).

This is exactly the approach we’ll take here. The “dot product” in question is $(a_1, \dots, a_n) \cdot \textup{KSK}$ (where KSK is viewed as a matrix), and we’ll expand the values $a_i$ into a vector of its digits in a base-$B$ number system, while modifying the key switching key so that those missing powers of $B$ are part of the LWE encryption. This will result in replacing the error term that looked like $\sum_{i=1}^n a_i e_i$ with an error term like $\sum_{i=1}^n c B e_i$ for some small constant $c$ (expect it to be even less than $B$).

More specifically, define decomposition parameters as a triple of numbers $(B, k, L)$. The number $B$ is a power of 2 no bigger than $q/2$, and $L$, or the number of levels of the decomposition, is the positive integer such that $B^L = q$ (this is forced by the choice of $B$). Then finally, $k$ is a number between $0$ and $L-1$ describing the “lowest level” (or least-significant digit) included in the decomposition.

An error-free decomposition sets the parameter $k=0$, and this is defined simply as a base-$B$ representation of a number. For example, suppose $q = 2^{32}$, and $(B, k, L) = (256, 0, 4)$, and we’re decomposing $x=2^{32} – 2$. Then $\textup{Decomp}_{256, 0, 4}(x) = (254, 255, 255, 255)$. I subtracted 2 to emphasize that the digits are little-Endian (the right-most entry is the most significant, representing the $256^3$ place).

An approximate decomposition is one with $k > 0$. For example, suppose $(B, k, L) = (256, 2, 4)$ and again $x=2^{32} – 2$. Setting $k=2$ means that we represent this number as if it were $(0, 0, 255, 255)$, wiping out the two least significant digits. The error of this approximation is $65534 = 254 + 255 \cdot 256^1$. As we will see, an approximate decomposition may help reduce overall error by splitting the newly introduced error into a sum of two terms, where $k$ scales the error differently in each term.

Let’s go through the key-switching key derivation again, using an error-free decomposition $(B, 0, L)$. First, re-define the key switching key as follows.

$\displaystyle \textup{KSK} = \{ \textup{KSK}_{i, j} = \textup{LWE}_t(s_i B^j) \mid i=1, \dots, n ; j = 0, \dots, L-1\}$

Note that this increases the dimension of the key-switching key by 1. Previously the key-switching key was a list of LWE ciphertexts (2-dimensional array of numbers), and now it’s a 3-dimensional array, with the new dimension corresponding to the decomposition digit $j$.

Because the powers of $B$ are attached to the message, they will factor out and allow us to reconstruct the original $a_i$’s, but they will not be included in the error part because error is added to the message during encryption.

Next, to perform the key switch, define $\textup{Decomp}(a_i) = (a_{i,0}, \dots, a_{i,L-1})$ and compute

$\displaystyle c’ = (0, \dots, 0, b) – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} \textup{KSK}_{i,j}$

This is the same as the original key switch, but the extra summation accounts for the extra dimension introduced by the gadget decomposition. Then we can repeat the same $\varphi_t$ trick and see how the original $a_i$’s are reconstructed.

$\displaystyle \begin{aligned} \varphi_t(c’) &= b – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} \varphi_t(\textup{KSK}_{i,j}) \\ &= b -\sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} (s_i B^j + e_i) \\ &= b -\sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} s_i B^j – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} e_i \\ &= b -\sum_{i=1}^n a_i s_i – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} e_i \\ &= m + e_{\textup{original}} – \sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} e_i \end{aligned}$

One key ingredient above is noticing that in $\sum_{i=1}^n \sum_{j=0}^{L-1} a_{i,j} s_i B^j$, the $s_i$ factors out of the innermost sum, and what you have left is $\sum_{j=0}^{L-1} a_{i,j} B^j$, which is exactly how to reconstruct $a_i$ from its base-$B$ digits.

The second key ingredient is that the innermost term on the second line is $a_{i,j} (s_i B^j + e_i)$, which means that only the digits $a_{i,j}$ are multiplied by the error terms, not including the powers of $B$, and so the final error can be bounded by the largest allowable value of a single digit $B-1$, resulting in the new error being $L (B-1) \sum_{i=1}^n e_i$. For a Gaussian centered at zero, the expectation of these errors is zero, and using standard bounding arguments like Chernoff bounds, you can prove that with high probability this new error is at most $L(B-1) \sigma \sqrt{2n \log n}$, where $\sigma$ is the standard deviation of the error distribution.

Now, finally, we can run through this argument one more time, but using an approximate decomposition. This merely changes the sum’s lower bound from $j=0$ to $j=k$. Start by calling $\tilde{a}_i = \sum_{j=k}^{L-1} a_{i,j} B^j$, the approximation of $a_i$ from its most significant bits. Then the error of this approximation is $a_i – \tilde{a}_i = \sum_{j=0}^{k-1} a_{i,j} B^j$, a relatively small quantity at most $(B^k – 1) / (B-1)$ (if each $a_{i,j} = B-1$ is as large as possible).

$\displaystyle \begin{aligned} \varphi_t(c’) &= b – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} \varphi_t(\textup{KSK}_{i,j}) \\ &= b -\sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} (s_i B^j + e_i) \\ &= b -\sum_{i=1}^n s_i \sum_{j=k}^{L-1} a_{i,j} B^j – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \\ &= b -\sum_{i=1}^n s_i \tilde{a}_i – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \end{aligned}$

Mentally zoom in on the first sum $\sum_{i=1}^n s_i \tilde{a}_i$. Use the trick of adding zero to get

$\displaystyle \sum_{i=1}^n s_i \tilde{a}_i = \sum_{i=1}^n s_i (a_i + \tilde{a}_i – a_i) = \sum_{i=1}^n s_i a_i – \sum_{i=1}^n s_i(a_i – \tilde{a}_i)$

The term $\sum_{i=1}^n s_i(a_i – \tilde{a}_i)$ is part of our new error term, and recalling that the secret key bits are binary, you should think of this in expectation as roughly $\frac{n}{2} B^{k-1}$ (more precisely, $\frac{n}{2} (B^{k}-1)/(B-1)$).

Continuing, we arrive at

$\displaystyle \begin{aligned} \varphi_t(c’) &= b -\sum_{i=1}^n a_i s_i – \sum_{i=1}^n s_i(a_i – \tilde{a}_i) – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \\ &= m + e_{\textup{original}} – \sum_{i=1}^n s_i(a_i – \tilde{a}_i) – \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \end{aligned}$

Rough error analysis

Now the choice of $k$ admits a tradeoff that one can optimize for to minimize the total newly introduced error. I’m going to switch to a sloppy mode of math to heuristically navigate this tradeoff.

The triangle inequality lets us bound the magnitude of the error by the sum of the magnitudes of the parts, i.e., the error is bounded from above by

$\displaystyle \left | \sum_{i=1}^n s_i(a_i – \tilde{a}_i) \right | + \left | \sum_{i=1}^n \sum_{j=k}^{L-1} a_{i,j} e_i \right |$

The left term is like $\frac{n}{2} B^{k-1}$ as we stated earlier, and with high probability it’s at most $(n/2 + \sqrt{n \log n}) B^{k-1}$. The right term is at most $(L-k)B \sum_{i=1}^n e_i$, (worst case size of $a_{i,j}$, increasing $B-1$ to $B$ because why not), and with high probability the sum of the $e_i$ is like $\sigma \sqrt{2n \log n}$, making the whole term bounded by $(L-k)B \sigma \sqrt{2n \log n}$. So we want to minimize the sum

$\displaystyle (n/2 + \sqrt{n \log n}) B^{k-1} + (L-k)B \sigma \sqrt{2n \log n}$

We could try to explicitly optimize this for $k$, treating the other terms as constant, but it won’t be nice because $k$ is present in both a linear term and an exponent. We could also just stare at it and think. The approximation error (the term on the left) is going to get exponentially larger as $k$ grows, so we want to keep $k$ relatively small. But on the other hand, the standard deviation $\sigma$ should be much larger than $n$ to keep LWE secure. This is effectively what we’re trying to suppress: error that grows like $O(n)$ is small enough to deal with, but error that grows like $\omega(n)$ is problematic. Increasing $k$ gives us a meager (but nontrivial) means to reduce the constant coefficient on that part of the error in exchange for $\Theta(n)$ growth with in the other term.

I admit, as of the time of this writing I still don’t understand how to set production security parameters for LWE. Is it still linear in $n$? Super-linear? Not sure. I’m betting future Jeremy will clarify this to me in another article. Even if it were linear in $n$, the right term multiplies $\sigma$ by $\sqrt{n \log n}$ which makes the whole thing super-linear, whereas the left term adds a square root factor. So the tradeoff in $k$ should still help.

Until I understand LWE security, I won’t have the asymptotics I need to analyze this further. Moreover, the allowed values of $B, k$ are so small that we can brute force evaluate all options. For example, if $B = 16$ then $k$ can be between 0 and 7. And realistically, if $n \approx 2^{10}$, then letting $k = 4$ makes the first term roughly $2^{26}$, which leaves only 6 bits left for the message (further reduced by any error introduced by the second term).

Thanks to Cathie Yun and Asra Ali for providing feedback on an early draft of this article.

Until next time!

Modulus Switching in LWE

The Learning With Errors problem is the basis of a few cryptosystems, and a foundation for many fully homomorphic encryption (FHE) schemes. In this article I’ll describe a technique used in some of these schemes called modulus switching.

In brief, an LWE sample is a vector of values in $\mathbb{Z}/q\mathbb{Z}$ for some $q$, and in LWE cryptosystems an LWE sample can be modified so that it hides a secret message $m$. Modulus switching allows one to convert an LWE encryption from having entries in $\mathbb{Z}/q{Z}$ to entries in some other $\mathbb{Z}/q'{Z}$, i.e., change the modulus from $q$ to $q’ < q$.

The reason you’d want to do this are a bit involved, so I won’t get into them here and instead back-reference this article in the future.

LWE encryption

Briefly, the LWE encryption scheme I’ll use has the following parameters:

  • A plaintext space $ \mathbb{Z}/q\mathbb{Z}$, where $ q \geq 2$ is a positive integer. This is the space that the underlying message comes from.
  • An LWE dimension $ n \in \mathbb{N}$.
  • A discrete Gaussian error distribution $ D$ with a mean of zero and a fixed standard deviation.

An LWE secret key is defined as a vector in $ \{0, 1\}^n$ (uniformly sampled). An LWE ciphertext is defined as a vector $ a = (a_1, \dots, a_n)$, sampled uniformly over $ (\mathbb{Z} / q\mathbb{Z})^n$, and a scalar $ b = \langle a, s \rangle + m + e$, where $ e$ is drawn from $ D$ and all arithmetic is done modulo $ q$.

Without the error term, an attacker could determine the secret key from a polynomial-sized collection of LWE ciphertexts with something like Gaussian elimination. The set of samples looks like a linear (or affine) system, where the secret key entries are the unknown variables. With an error term, the problem of solving the system is believed to be hard, and only exponential time/space algorithms are known.

However, the error term in an LWE encryption encompasses all of the obstacles to FHE. For starters, if your message is $ m=1$ and the error distribution is wide (say, a standard deviation of 10), then the error will completely obscure the message from the start. You can’t decrypt the LWE ciphertext because you can’t tell if the error generated in a particular instance was 9 or 10. So one thing people do is have a much smaller cleartext space (actual messages) and encode cleartexts as plaintexts by putting the messages in the higher-order bits of the plaintext space. E.g., you can encode 10-bit messages in the top 10 bits of a 32-bit integer, and leave the remaining 22 bits of the plaintext for the error distribution.

Moreover, for FHE you need to be able to add and multiply ciphertexts to get the corresponding sum/product of the underlying plaintexts. One can easily see that adding two LWE ciphertexts produces an LWE ciphertext of the sum of the plaintexts (multiplication is harder and beyond the scope of this article). Summing ciphertexts also sums the error terms together. So the error grows with each homomorphic operation, and eventually the error may overtake the message, at which point decryption fails. How to deal with this error accumulation is 99% of the difficulty of FHE.

Finally, because the error can be negative, even if you store a message in the high-order bits of the plaintext, you can’t decrypt by simply clearing the low order error bits. In that case an error of -1 would result in a corrupted message. Instead, to decrypt, we round the value $ b – \langle a, s \rangle = m + e$ to the nearest multiple of $ 2^k$, where $k$ is the number of bits “reserved” for error, as described above. In particular, decryption will only succeed if the error is small enough in absolute value. So to make this work in practice, one must coordinate the encoding scheme (how many bits to reserve for error), the dimension of the vector $ a$, and the standard deviation of the error distribution.

Modulus switching

With a basic outline of an LWE ciphertext, we can talk about modulus switching.

Start with an LWE ciphertext for the plaintext $ m$. Call it $ (a_1, \dots, a_n, b) \in (\mathbb{Z}/q\mathbb{Z})^{n+1}$, where

$ \displaystyle b = \left ( \sum_{i=1}^n a_i s_i \right ) + m + e_{\textup{original}}$

Given $ q’ < q$, we would like to produce a vector $ (a’_1, \dots, a’_n, b’) \in (\mathbb{Z}/q’\mathbb{Z})^{n+1}$ (all that has changed is I’ve put a prime on all the terms to indicate which are changing, most notably the new modulus $ q’$) that also encrypts $ m$, without knowing $ m$ or $ e_{\textup{original}}$, i.e., without access to the secret key.

Failed attempt: why not simply reduce each entry in the ciphertext vector modulo $ q’$? That would set $ a’_i = a_i \mod q’$ and $ b’ = b \mod q’$. Despite the fact that this operation produces a perfectly valid equation, it won’t work. The problem is that taking $ m \mod q’$ destroys part or all of the underlying message. For example, say $ x$ is a 12-bit number stored in the top 12 bits of the plaintext, i.e., $ m = x \cdot 2^{20}$. If $ q’ = 2^{15}$, then the message is a multiple of $ q’$ already, so the proposed modulus produces zero.

For this reason, we can’t hope to perfectly encrypt $ m$, as the output ciphertext entries may not have a modulus large enough to represent $ m$ at all. Rather, we can only hope to encrypt something like “the message $ x$ that’s encoded in $ m$, but instead with $ x$ stored in lower order bits than $ m$ originally used.” In more succinct terms, we can hope to encrypt $ m’ = m q’ / q$. Indeed, the operation of $ m \mapsto m q’ / q$ shifts up by $ \log_2(q’)$ many bits (temporarily exceeding the maximum allowable bit length), and then shifting down by $ \log_2(q)$ many bits.

For example, say the number $ x=7$ is stored in the top 3 bits of a 32-bit unsigned integer ($ q = 2^{32}$), i.e., $ m = 7 \cdot 2^{29}$ and $ q’ = 2^{10}$. Then $ m q’ / q = 7 \cdot 2^{29} \cdot 2^{10} / 2^{32} = 7 \cdot 2^{29+10 – 32} = 7 \cdot 2^7$, which stores the same underlying number $ x=7$, but in the top three bits of a 10-bit message. In particular, $ x$ is in the same “position” in the plaintext space, while the plaintext space has shrunk around it.

Side note: because of this change to the cleartext-to-plaintext encoding, the decryption/decoding steps before and after a modulus switch are slightly different. In decryption you use different moduli, and in decoding you round to different powers of 2.

So the trick is instead to apply $ z \mapsto z q’ / q$ to all the entries of the LWE ciphertext vector. However, because the entries like $ a_i$ use the entire space of bits in the plaintext, this transformation will not necessarily result in an integer. So we can round the result to an integer and analyze that. The final proposal for a modulus switch is

$ \displaystyle a’_i = \textup{round}(a_i q’ / q)$

$ \displaystyle b’ = \textup{round}(b q’ / q)$

Because the error growth of LWE ciphertexts permeates everything, in addition to proving this transformation produces a valid ciphertext, we also have to understand how it impacts the error term.

Analyzing the modulus switch

The statement summarizing the last section:

Theorem: Let $ \mathbf{c} = (a_1, \dots, a_n, b) \in (\mathbb{Z}/q\mathbb{Z})^{n+1}$ be an LWE ciphertext encrypting plaintext $ m$ with error term $ e_\textup{original}$. Let $ q’ < q$. Then $ c’ = \textup{round}(\mathbf{c} q’ / q)$ (where rounding is performed entrywise) is an LWE encryption of $ m’ = m q’ / q$, provided $ m’$ is an integer.

Proof. The only substantial idea is that $ \textup{round}(x) = x + \varepsilon$, where $ |\varepsilon| \leq 0.5$. This is true by the definition of rounding, but that particular way to express it allows us to group the error terms across a sum-of-rounded-things in isolation, and then everything else has a factor of $ q’/q$ that can be factored out. Let’s proceed.

Let $ c’ = (a’_1, \dots, a’_n, b’)$, where $ a’_i = \textup{round}(a_i q’ / q)$ and likewise for $ b’$. need to show that $ b’ = \left ( \sum_{i=1}^n a’_i s_i \right ) + m q’ / q + e_{\textup{new}}$, where $ e_{\textup{new}}$ is a soon-to-be-derived error term.

Expanding $ b’$ and using the “only substantial idea” above, we get

$ \displaystyle b’ = \textup{round}(b q’ / q) = bq’/q + \varepsilon_b$

For some $ \varepsilon_b$ with magnitude at most $ 1/2$. Continuing to expand, and noting that $ b$ is related to the $ a_i$ only modulo $ q$, we have

$ \displaystyle \begin{aligned} b’ &= bq’/q + \varepsilon_b \\ b’ &= \left ( \left ( \sum_{i=1}^n a_i s_i \right ) + m + e_{\textup{original}} \right ) \frac{q’}{q} + \varepsilon_b \mod q \end{aligned}$

Because we’re switching moduli, it makes sense to rewrite this over the integers, which means we add a term $ Mq$ for some integer $ M$ and continue to expand

$ \displaystyle \begin{aligned} b’ &= \left ( \left ( \sum_{i=1}^n a_i s_i \right ) + m + e_{\textup{original}} + Mq \right ) \frac{q’}{q} + \varepsilon_b \\ &= \left ( \sum_{i=1}^n \left ( a_i \frac{q’}{q} \right) s_i \right ) + m \frac{q’}{q} + e_{\textup{original}}\frac{q’}{q} + Mq \frac{q’}{q} + \varepsilon_b \\ &= \left ( \sum_{i=1}^n \left ( a_i \frac{q’}{q} \right) s_i \right ) + m’ + e_{\textup{original}}\frac{q’}{q} + Mq’ + \varepsilon_b \end{aligned}$

The terms with $ a_i$ are still missing their rounding, so, just like $ b’$, rewrite $ a’_i = a_i q’/q + \varepsilon_i$ as $ a_i q’/q = a’_i – \varepsilon_i$, expanding, simplifying, and finally reducing modulo $ q’$ to get

$ \displaystyle \begin{aligned} b’ &= \left ( \sum_{i=1}^n \left ( a’_i – \varepsilon_i \right) s_i \right ) + m’ + e_{\textup{original}}\frac{q’}{q} + Mq’ + \varepsilon_b \\ &= \left ( \sum_{i=1}^n a’_i s_i \right ) – \left ( \sum_{i=1}^n \varepsilon_i s_i \right) + m’ + e_{\textup{original}}\frac{q’}{q} + Mq’ + \varepsilon_b \\ &= \left ( \sum_{i=1}^n a’_i s_i \right ) + m’ + Mq’ + \left [ e_{\textup{original}}\frac{q’}{q} – \left ( \sum_{i=1}^n \varepsilon_i s_i \right) + \varepsilon_b \right ] \\ &= \left ( \sum_{i=1}^n a’_i s_i \right ) + m’ + \left [ e_{\textup{original}}\frac{q’}{q} – \left ( \sum_{i=1}^n \varepsilon_i s_i \right) + \varepsilon_b \right ] \mod q’ \end{aligned}$

Define the square bracketed term as $ e_{\textup{new}}$, and we have proved the theorem.

$ \square$

The error after modulus switching is laid out. It’s the original error scaled, plus at most $ n+1$ terms, each of which is at most $ 1/2$. However, note that this is larger than it appears. If the new modulus is, say, $ q’=1024$, and the dimension is $ n = 512$, then in the worst case the error right after modulus switching will leave us only $ 1$ bit left for the message. This is not altogether unrealistic, as production (128-bit) security parameters for LWE put $ n$ around 600. But it is compensated for by the fact that the secret $ s$ is chosen uniformly at random, and the errors are symmetric around zero. So in expectation only half the bits will be set, and half of the set bits will have a positive error, and half a negative error. Using these facts, you can bound the probability that the error exceeds, say, $ \sqrt{n \log n}$ using a standard Hoeffding bound argument. I further believe that the error is bounded by $ \sqrt{n}$. I have verified it empirically, but haven’t been able to quite nail down a proof.

Until next time!