# We’re Knot Friends

It’s April Cools again.

For a few summers in high school and undergrad, I was a day camp counselor. I’ve written before about how it helped me develop storytelling skills, but recently I thought of it again because, while I was cleaning out a closet full of old junk, I happened upon a bag of embroidery thread. While stereotypically used to sew flowers into a pillowcase or write “home sweet home” on a hoop, at summer camps embroidery thread is used to make friendship bracelets.

For those who don’t know, a friendship bracelet is a simple form of macramé—meaning the design is constructed by tying knots, as opposed to weaving or braiding. Bracelet patterns are typically simple enough for a child of 8 or 9 to handle, albeit with a bit of practice. They are believed to originate among the indigenous peoples of the Americas, where knots were tied into string to track time and count, but in the United States their popularity arose among children as a gift-giving symbol of friendship. As the lore goes, when someone gives you a friendship bracelet, you put it on and make a wish, and you must leave it on until the bracelet naturally falls off, at which point your wish comes true.

Kids took the “falling off naturally” rule *very* seriously, but in retrospect I find a different aspect more fascinating. Tying friendship bracelets is a communal activity. It’s a repetitious task that you can’t do absentmindedly, it takes a few hours at least, and you have to stay put while you do it. But you can enjoy shared company, and at the end you’ve made something pretty. Kids would sit in a circle, each working on their own bracelet, sometimes even safety pinning them to each others backpacks in a circle-the-wagons manner, while chit chatting about whatever occupied their minds. Kids who were generally hyper and difficult to corral miraculously organized themselves into a serene, rhythmic focus. And it was pleasant to sit and knot along with them, when the job wasn’t pulling me away for some reason.

Thinking of this makes me realize me how little I’ve experienced communal activities since then. It has the same feeling of a family sitting together making Christmas cookies, or a group of artists sitting together sketching. People complain about the difficulty of making friends in your thirties, and I wonder how much of that is simply because we don’t afford ourselves the time for such communal activities. We aren’t regularly around groups of people with the sort of free time that precipitates these moments of idle bonding.

Without any thoughts like this at the time, I nevertheless developed friendship bracelet making as a specialty. I spent a lot of time teaching kids how to tie them. I’m not sure how I grew into the role. I suspect the craft aspect of it tickled my brain, but at the time I was not nearly as conscious of my love for craftsmanship as I am now. I learned a dozen or so patterns, and figured out a means to tie a two-tone pattern of letters, with which I could write people’s names in a pixelated font. It impressed many pre-teens.

Ten years later, this bag of string managed to travel with me across the US through grad school and many apartments, and I thought maybe I could find a math circle activity involving knots and patterns and…well, something mathy. My attempt at making this an activity was a disaster, but not for the reason I thought it might be. It turns out eight year olds don’t yet have enough dexterity to tie bracelets accurately or efficiently enough to start asking questions about the possible knot patterns. I was clearly still re-acclimating to the ability range typical of that age.

After that I figured, why not try making one again? In the intervening years, I had occasionally seen a pattern that clearly wasn’t constructed using the techniques I knew. To elaborate, I’ll need to briefly explain how to make a simple bracelet. Compared to other forms of fiber arts, it’s quite simple, and requires nothing like a loom or knitting needles. Just the string and something to hold the piece in place.

You start by tying all your threads together in a single knot at one end, tape or pin it down for tension, and spread out your strings, Then, using the left most string, and gradually moving it from left to right, you proceed to tie “stitches,” where a single “stitch” consists of two overhand knots of the left string over the right string. As a result of one stitch, the “leading” string (the left most one, in this case) produces the color that is displayed on top, and it “moves” rightward one position. Doing this with the same string across all strings results in a (slightly diagonal) line of stitches of the same color. Once you complete a single row, the now-formerly leading string is on the rightmost end, and you use the leftmost string as your new leading string.

The stripe pattern is usually one of the first patterns one learns because it’s very simple. But you can image that, by tying strings in different orders, and judiciously picking which string is the “leading” string (i.e., which string’s color is shown in each stitch), you can make a variety of patterns. Some of them are pictured at the beginning of this article. However, the confounding patterns I saw couldn’t have been made this way, in part because, first off, they were much more intricate than is possible to construct in the above style (there’s clearly some limiting structure there). And second, they used more colors than the width of the bracelet, meaning somehow new colored threads were swapped in and out part way through the design. See, for example, these cow bracelets.

Otherwise having no experience with fiber arts, I was clueless and curious about how this could be done. After some searching I found so-called *alpha* bracelets, which cracked the case wide open.

Instead of using strings both as the structure to hold knots and the things that tie the knots, an alpha bracelet has strings that go the length of the bracelet, and serve no purpose but to have knots tied on them. By analogy with weaving (which I knew nothing about a few months ago), they distinguish warp and weft threads, whereas “classical” bracelets do not. And because we’re tying knots, the “warp” threads’ color is never shown, except at the ends when being tied off.

To get more colors, there’s a slightly intricate process of “tying in” a new thread, where the old leading string is threaded between the two overhand knots of a new stitch and passes underneath the whole composition. Masha Knots, a bracelet YouTuber, has perhaps the most popular tutorial on the internet on how to make alpha bracelets. But through this search, I also discovered the website braceletbook.com, which has a compendium of different patterns. The diagrams on that site clarified for me one obvious difference between “classical” and alpha bracelets: the stitches of classical bracelets lie on a sheared lattice, while alpha bracelets lie on a standard Euclidean grid. And you can easily generate notation describing how to tie a pattern.

The alpha technique allows you to draw pixel art into your bracelet. And elaborate alpha patterns tend to be much larger than is practical to wear on your wrist. It effectively becomes a kind of miniaturized macramé tapestry.

So I wanted to try my hand at it. Since I’m now in my thirties and friendship isn’t what it used to be, I wasn’t quite sure what sort of bracelet to make. Thankfully my toddler loves Miyazaki films, so I made him this No Face bracelet.

It’s a little rough around the edges, but not bad for my first one. And a toddler doesn’t care. He’s just happy to have a No Face friend. After that I started on a new pattern, which is currently about 80% done. Continuing with the Japanese theme, it’s a take on Hokusai’s *Great Wave*.

If you look closely you can see a few places where I messed up, the worst being the bottom right where I over tightened a few stitches on the edge causing the edge to slant. Because this one was so large I fastened the end to a small dowel, which makes it look like a scroll.

Again, since alpha bracelets are knotted pixel art tapestries, I figured why not put these on my wall and make a tiny gallery. And there are always a handful of contemporary artists whose art I adore, but whose prices are too high, or whose best pieces have been sold, and who don’t make prints. So I will never get to put on my wall. Take, for example, Kelly Reemtsen, known for her dramatically posed women in colorful 50’s dresses wielding power tools. I emailed her years ago asking about prints and she replied, “I don’t do prints.” Today she apparently does, but it’s still extremely hard to find any prints of her good pieces.

The first time I saw one of her pieces (in a restaurant on Newbury street in Boston), it really struck me. But as I’ve saved up enough money to afford what her art used to cost, so has she gained enough fame that her prices stay perpetually impractical. I even tried painting my own imitation of one of her paintings, though it’s not all that good.

So instead I decided to convert one of her pieces to pixel art, and tie a friendship bracelet tapestry myself. Here’s my pixel-art-in-progress. It still needs some cleaning up, and I’m not sure how to get exactly the right colors of thread, but I’m working on it.

In my life, this craft has strayed quite far from communal tying and gift giving. But it still scratches a certain itch for working with my hands, and the slow, steady progression toward building something that is unhindered by anything outside your own effort. Plus, each stitch takes only a few seconds to tie, and unlike woodworking or knitting, it has no setup/suspend/teardown time. You just put the strings down. Having an ongoing project at my desk gives me something quick to do when my programs are compiling, or when I’m in a listening-only meeting. Instead of opening a social media site for an empty dopamine hit, or getting mad about someone else’s bad takes, or playing a game of bullet chess, I can do 1/500th of something that will beautify my life.

# Sample Extraction from RLWE to LWE

In this article I’ll derive a trick used in FHE called *sample extraction*. In brief, it allows one to partially convert a ciphertext in the Ring Learning With Errors (RLWE) scheme to the Learning With Errors (LWE) scheme.

Here are some other articles I’ve written about other FHE building blocks, though they are not prerequisites for this article.

- Modulus Switching in LWE
- Key Switching in LWE
- The Gadget Decomposition in FHE
- Negacyclic Polynomial Multiplication
- Estimating the Security of Ring Learning With Errors

## LWE and RLWE

The first two articles in the list above define the Learning With Errors problem (LWE). I will repeat the definition here:

**LWE: **The LWE encryption scheme has the following parameters:

- A plaintext space $ \mathbb{Z}/q\mathbb{Z}$, where $ q \geq 2$ is a positive integer. This is the space that the underlying message $m$ comes from.
- An
*LWE dimension*$ n \in \mathbb{N}$. - A discrete Gaussian
*error distribution*$ D$ with a mean of zero and a fixed standard deviation.

An LWE secret key is defined as a vector $s \in \{0, 1\}^n$ (uniformly sampled). An LWE ciphertext is defined as a vector $ a = (a_1, \dots, a_n)$, sampled uniformly over $ (\mathbb{Z} / q\mathbb{Z})^n$, and a scalar $ b = \langle a, s \rangle + m + e$, where $m$ is the message, $e$ is drawn from $D$ and all arithmetic is done modulo $q$. Note: the message $m$ usually is represented by placing an even smaller message (say, a 4-bit message) in the highest-order bits of a 32-bit unsigned integer. So then decryption corresponds to computing $b – \langle a, s \rangle = m + e$ and rounding the result to recover $m$ while discarding $e$.

Without the error term, an attacker could determine the secret key from a polynomial-sized collection of LWE ciphertexts with something like Gaussian elimination. The set of samples looks like a linear (or affine) system, where the secret key entries are the unknown variables. With an error term, the problem of solving the system is believed to be hard, and only exponential time/space algorithms are known.

**RLWE:** The Ring Learning With Errors (RLWE) problem is the natural analogue of LWE, where all scalars involved are replaced with polynomials over a (carefully) chosen ring.

Formally, the RLWE encryption scheme has the following parameters:

- A ring $R = \mathbb{Z}/q\mathbb{Z}$, where $ q \geq 2$ is a positive integer. This is the space of coefficients of all polynomials in the scheme. I usually think of $q$ as $2^{32}$, i.e., unsigned 32-bit integers.
- A plaintext space $R[x] / (x^N + 1)$, where $N$ is a power of 2. This is the space that the underlying message $m(x)$ comes from, and it is encoded as a list of $N$ integers forming the coefficients of the polynomial.
- An
*RLWE dimension*$n \in \mathbb{N}$. - A discrete Gaussian
*error distribution*$D$ with a mean of zero and a fixed standard deviation.

An RLWE secret key $s$ is defined as a list of $n$ polynomials with binary coefficients in $\mathbb{B}[x] / (x^N+1)$, where $\mathbb{B} = \{0, 1\}$. The coefficients are uniformly sampled, like in LWE. An RLWE ciphertext is defined as a vector of $n$ polynomials $a = (a_1(x), \dots, a_n(x))$, sampled uniformly over $(R[x] / (x^N+1))^n$, and a polynomial $b(x) = \langle a, s \rangle + m(x) + e(x)$, where $m(x)$ is the message (with a similar “store it in the top bits” trick as LWE), $e(x)$ is a polynomial with coefficients drawn from $D$ and all the products of the inner product are done in $R[x] / (x^N+1)$. Decryption in RLWE involves computing $b(x) – \langle a, s \rangle$ and rounding appropriately to recover $m(x)$. Just like with RLWE, the message is “hidden” in the noise added to an equation corresponding to the polynomial products (i.e., without the noise and with enough sample encryptions of the same message/secret key, you can solve the system and recover the message). For more notes on how polynomial multiplication ends up being tricker in this ring, see my negacyclic polynomial multiplication article.

The most common version of RLWE you will see in the literature sets the vector dimension $n=1$, and so the secret key $s$ is a single polynomial, the ciphertext is a single polynomial, and RLWE can be viewed as directly replacing the vector dot product in LWE with a polynomial product. However, making $n$ larger is believed to provide more security, and it can be traded off against making the polynomial degree smaller, which can be useful when tweaking parameters for performance (keeping the security level constant).

## Sample Extraction

Sample extraction is the trick of taking an RLWE encryption of $m(x) = m_0 + m_1(x) + \dots + m_{N-1}x^{N-1}$, and outputting an LWE encryption of $m_0$. In our case, the degree $N$ and the dimension $n_{\textup{RLWE}}$ of the input RLWE ciphertext scheme is fixed, but we may pick the dimension $n_{\textup{LWE}}$ of the LWE scheme as we choose to make this trick work.

This is one of those times in math when it is best to “just work it out with a pencil.” It turns out there are no serious obstacles to our goal. We start with polynomials $a = (a_1(x), \dots, a_n(x))$ and $b(x) = \langle a, s \rangle + m(x) + e(x)$, and we want to produce a vector of scalars $(x_1, \dots, x_D)$ of some dimension $D$, a corresponding secret key $s$, and a $b = \langle a, s \rangle + m_0 + e’$, where $e’$ may be different from the input error $e(x)$, but is hopefully not too much larger.

As with many of the articles in this series, we employ the so-called “phase function” to help with the analysis, which is just the partial decryption of an RLWE ciphertext without the rounding step: $\varphi(x) = b(x) – \langle a, s \rangle = m(x) + e(x)$. The idea is as follows: **inspect the structure of the constant term of **$\varphi(x)$**, oh look, it’s an LWE encryption.**

So let’s expand the constant term of $b(x) – \langle a, s \rangle$. Given a polynomial expression, I will use the notation $(-)[0]$ to denote the constant coefficient, and $(-)[k]$ for the $k$-th coefficient.

$$ \begin{aligned}(b(x) – \langle a, s \rangle)[0] &= b[0] – \left ( (a_1s_1)[0] + \dots + (a_n s_n)[0] \right ) \end{aligned}$$

Each entry in the dot product is a negacyclic polynomial product, so its constant term requires summing all the pairs of coefficients of $a_i$ and $s_i$ whose degrees sum to zero mod $N$, and flipping signs when there’s wraparound. In particular, a single product above for $a_i s_i$ has the form:

$$(a_is_i) [0] = s_i[0]a_i[0] – s_i[1]a_i[N-1] – s_i[2]a_i[N-2] – \dots – s_i[N-1]a_i[1]$$

Notice that I wrote the coefficients of $s_i$ in increasing order. This was on purpose, because if we re-write this expression $(a_is_i)[0]$ as a dot product, we get

$$(a_is_i[0]) = \left \langle (s_i[0], s_i[1], \dots, s_i[N-1]), (a_i[0], -a_i[N-1], \dots, -a_i[1])\right \rangle$$

In particular, the $a_i[k]$ are public, so we can sign-flip and reorder them easily in our conversion trick. But $s_i$ is unknown at the time the sample extraction needs to occur, so it helps if we can leave the secret key untouched. And indeed, when we apply the above expansion to all of the terms in the computation of $\varphi(x)[0]$, we end up manipulating the $a_i$’s a lot, but merely “flattening” the coefficients of $s = (s_1(x), \dots, s_n(x))$ into a single long vector.

So combining all of the above products, we see that $(b(x) – \langle a, s \rangle)[0]$ is already an LWE encryption with $(x, y) = ((x_1, \dots, x_D), b[0])$, and $x$ being the very long ($D = n*N$) vector

$$\begin{aligned} x = (& a_0[0], -a_0[N-1], \dots, -a_0[1], \\ &a_1[0], -a_1[N-1], \dots, -a_1[1], \\ &\dots , \\ &a_n[0], -a_n[N-1], \dots, -a_n[1] ) \end{aligned}$$

And the corresponding secret key is

$$\begin{aligned} s_{\textup{LWE}} = (& (s_0[0], s_0[1], \dots, s_0[N-1] \\ &(s_1[0], s_1[1], \dots, s_1[N-1], \\ &\dots , \\ &s_n[0], s_n[1], \dots, s_n[N-1] ) \end{aligned}$$

And the error in this ciphertext is exactly the constant coefficient of the error polynomial $e(x)$ from the RLWE encryption, which is independent of the error of all the other coefficients.

## Commentary

This trick is a best case scenario. Unlike with key switching, we don’t need to encrypt the output LWE secret key to perform the conversion. And unlike modulus switching, there is no impact on the error growth in the conversion from RLWE to LWE. So in a sense, this trick is “perfect,” though it loses information about the other coefficients of $m(x)$ in the process. As it happens, the CGGI FHE scheme that these articles are building toward only uses the constant coefficient.

The only twist to think about is that the output LWE ciphertext is dependent on the RLWE scheme parameters. What if you wanted to get a smaller-dimensional LWE ciphertext as output? This is a realistic concern, as in the CGGI FHE scheme one starts from an LWE ciphertext of one dimension, goes to RLWE of another (larger) dimension, and needs to get back to LWE of the original dimension by the end.

To do this, you have two options: one is to pick the RLWE ciphertext parameters $n, N$, so that their product is the value you need. A second is to allow the RLWE parameters to be whatever you need for performance/security, and then employ a key switching operation after the sample extraction to get back to the LWE parameters you need.

It is worth mentioning—though I am far from fully understanding the methods—there other ways to convert between LWE and RLWE. One can go from LWE to RLWE, or from a collection of LWEs to RLWE. Some methods can be found in this paper and its references.

Until next time!

# Google’s Fully Homomorphic Encryption Compiler — A Primer

Back in May of 2022 I transferred teams at Google to work on Fully Homomorphic Encryption (newsletter announcement). Since then I’ve been working on a variety of projects in the space, including being the primary maintainer on github.com/google/fully-homomorphic-encryption, which is an open source FHE compiler for C++. This article will be an introduction to how to use it to compile programs to FHE, as well as a quick overview of its internals.

If you’d like to contribute to this project, please reach out to me at mathintersectprogramming@gmail.com or at j2kun@mathstodon.xyz. I have a few procedural hurdles to overcome before I can accept external contributions (with appropriate git commit credit), but if there’s enough interest I will make time for it sooner as opposed to later.

## Overview

The core idea of fully homomorphic encryption (henceforth FHE) is that you can encrypt data and then run programs on it without ever decrypting it. In the extreme, even if someone had physical access to the machine and could inspect the values of individual memory cells or registers while the program was running, they would not see any of the bits of the underlying data being operated on (without cracking the cryptosystem).

Our FHE compiler converts C++ programs that operate on plaintext to programs that operate on the corresponding FHE ciphertexts (since it emits high-level code that then needs to be further compiled, it could be described as a *transpiler*). More specifically, it converts a specific *subset* of valid C++ programs—more on what defines that subset later—to programs that run the same program on encrypted data via one of the supported FHE cryptosystem implementations. In this sense it’s close to a traditional compiler: parse the input, run a variety of optimization passes, and generate some output. However, as we’ll see in this article, the unique properties of FHE make the compiler more like hardware circuit toolchains.

The variety of FHE supported by the compiler today is called “gate bootstrapping.” I won’t have time to go into intense detail about the math behind it, but suffice it to say that this technique gives away performance in exchange for a simpler job of optimizing and producing a working program. What I will say is that this blend of FHE encrypts *each bit* of its input into a separate ciphertext, and then represents the program as a boolean (combinational) circuit—composed of gates like AND, OR, XNOR, etc. Part of the benefit of the compiler is that it manages a mapping of higher order types like integers, arrays, and structs, to lists of encrypted booleans and back again.

A few limitations result from this circuit-based approach, which will be woven throughout the rest of this tutorial. First is that all loops must be fully unrolled and have statically-known bounds. Second, constructs like pointers, and dynamic memory allocation are not supported. Third, all control flow is multiplexed, meaning that all branches of all if statements are evaluated, and only then is one chosen. Finally, there are important practical considerations related to the bit-width of the types used and the expansion of cleartexts into ciphertexts that impact the performance of the resulting program.

On the other hand, combinational circuit optimization is a well-studied problem with off-the-shelf products that can be integrated (narrator: they did integrate some) into the FHE compiler to make the programs run faster.

## Dependencies

*tl;dr:* check out the dockerfiles.

Google’s internal build system is called `blaze`

, and its open source counterpart (equivalent in all except name) is called `bazel`

. One of the first curious things you’ll notice about the compiler is that bazel is used both to build the project and to *use* the project (the latter I’d like to change). So you’ll need to install bazel, and an easy way to do that is to install `bazelisk`

, which is the analogue of `nvm`

for Node or `pyenv`

for Python. You won’t need multiple versions of bazel, but this is just the easiest way to install the latest version. I’ll be using Bazel 4.0.0, but there are newer versions that should work just fine as well.

You’ll need a C compiler (I use gcc12) because most of the project’s dependencies are built from source (see next paragraph), and a small number of external libraries and programs to support some of the circuit optimizer plugins. For debian-based systems, this is the full list

```
apt-get update && apt-get install -y \
gcc \
git \
libtinfo5 \
python \
python3 \
python3-pip \
autoconf \
libreadline-dev \
flex \
bison \
wget
```

As mentioned above, all the other dependencies are built *from source*, and this will take a while the first time you build the project. So you might as well clone and get that build started while you read. The command below will build the project and all the example binaries, and then cache the intermediate build artifacts for future builds, only recompiling what has changed in the mean time. See the Bazel/Starlark section for more details on what this command is doing. **Note: **the one weird case is LLVM. If you use an exotic operating system (or a docker container, don’t get me started on why this is an issue) then bazel may choose to build LLVM from scratch, which will take an hour or two for the first build. It may also fail due to a missing dependency of your system, which will be extremely frustrating (this is the #1 complaint in our GitHub issues). But, if you’re on a standard OS/architecture combination (as enumerated here), it will just fetch the right LLVM dependency and install it on your system.

```
git clone https://github.com/google/fully-homomorphic-encryption.git
cd fully-homomorphic-encryption
bazel build ...:all
```

A clean build on my home machine takes about 16 minutes.

## Two end-to-end examples: add and string_cap

In this section I’ll show two end-to-end examples of using the compiler as an end user. The first will be for a dirt-simple program that adds two 32-bit integers. The second will be for a program that capitalizes the first character of each word in an ASCII string. The examples are already in the repository under transpiler/examples by the names `simple_sum`

and `string_cap`

.

Both of these programs will have the form of compiling a single function that is the entry point for the FHE part of the program, and providing a library and API to integrate it with a larger program.

First `simple_sum`

. Add a header and source file like you would any standard C++ program, but with one extra line to tell the compiler which function is the function that should be compiled (along with any functions called within it).

```
// add.h
int add(int a, int b);
// add.cc
#include "add.h"
#pragma hls_top
int add(int a, int b) {
return a + b;
}
```

The line `#pragma hls_top`

tells the compiler which function is the entry point. Incidentally, `hls`

stands for “high level synthesis,” and the pragma itself comes from the XLS project, which we use as our parser and initial circuit builder. Here ‘top’ just means top level function.

Then, inside a file in the same directory called `BUILD`

(see the Bazel/Starlark section next for an overview of the build system), create a build target that invokes the FHE compiler. In our case we’ll use the OpenFHE backend.

```
# BUILD
# loads the FHE compiler as an extension to Bazel.
load("//transpiler:fhe.bzl", "fhe_cc_library")
fhe_cc_library(
name = "add_fhe_lib",
src = "add.cc",
hdrs = ["add.h"],
encryption = "openfhe", # backend cryptosystem library
interpreter = True, # use dynamic thread scheduling
optimizer = "yosys", # boolean circuit optimizer
)
```

The full options for this build rule (i.e., the documentation of the compiler’s main entry point) can be found in the docstring of the bazel macro. I picked the parameters that have what I think of as the best tradeoff between stability and performance.

If you run `bazel build add_fhe_lib`

, then you will see it build but nothing else (see the “intermediate files” section for more on what’s happening behind the scenes). But if you typed something wrong in the build file it would err at this point. It generates a header and `cc`

file that contains the same API as `add`

, but with different types for the arguments and extra arguments needed by the FHE library backend.

Next we need a main routine that uses the library. Since we’re using OpenFHE as our backend, it requires some configuration and the initial encryption of its inputs. The full code, with some slight changes for the blog, looks like this

```
#include <stdio.h>
#include <iostream>
#include <ostream>
#include "absl/strings/numbers.h"
#include "transpiler/codelab/add/add_fhe_lib.h"
#include "transpiler/data/openfhe_data.h"
constexpr auto kSecurityLevel = lbcrypto::MEDIUM;
int main(int argc, char** argv) {
if (argc < 3) {
fprintf(stderr, "Usage: add_main [int] [int]\n\n");
return 1;
}
int x, y;
if(!absl::SimpleAtoi(argv[1], &x)) {
std::cout << "Bad int " << argv[1] << std::endl;
return 1;
}
if(!absl::SimpleAtoi(argv[2], &y)) {
std::cout << "Bad int " << argv[2] << std::endl;
return 1;
}
std::cout << "Computing " << x << " + " << y << std::endl;
// Set up backend context and encryption keys.
auto context = lbcrypto::BinFHEContext();
context.GenerateBinFHEContext(kSecurityLevel);
auto sk = context.KeyGen();
context.BTKeyGen(sk);
OpenFhe<int> ciphertext_x = OpenFhe<int>::Encrypt(x, context, sk);
OpenFhe<int> ciphertext_y = OpenFhe<int>::Encrypt(y, context, sk);
OpenFhe<int> result(context);
auto status = add(result, ciphertext_x, ciphertext_y, context);
if(!status.ok()) {
std::cout << "FHE computation failed: " << status << std::endl;
return 1;
}
std::cout << "Result: " << result.Decrypt(sk) << "\n";
return 0;
}
```

The parts that are not obvious boilerplate include:

Configuring the security level of the OpenFHE library (which is called BinFHE to signal it’s doing binary circuit FHE).

```
constexpr auto kSecurityLevel = lbcrypto::MEDIUM;
```

Setting up the initial OpenFHE secret key

```
auto context = lbcrypto::BinFHEContext();
context.GenerateBinFHEContext(kSecurityLevel);
auto sk = context.KeyGen();
context.BTKeyGen(sk);
```

Encrypting the inputs. This uses an API provided by the compiler (though because the project was a research prototype, I think the original authors never got around to unifying the “set up the secret key” part behind an API) and included in this from `include "transpiler/data/openfhe_data.h"`

```
OpenFhe<int> ciphertext_x = OpenFhe<int>::Encrypt(x, context, sk);
OpenFhe<int> ciphertext_y = OpenFhe<int>::Encrypt(y, context, sk);
```

Then calling the FHE-enabled `add`

function, and decrypting the results.

Then create another `BUILD `

rule for the binary:

```
cc_binary(
name = "add_openfhe_fhe_demo",
srcs = [
"add_openfhe_fhe_demo.cc",
],
deps = [
":add_fhe_lib",
"//transpiler/data:openfhe_data",
"@com_google_absl//absl/strings",
"@openfhe//:binfhe",
],
)
```

Running it with bazel:

```
$ bazel run add_openfhe_fhe_demo -- 5 7
Computing 5 + 7
Result: 12
```

Timing this on my system, it takes a little less than 7 seconds.

On to a more complicated example: `string_cap`

, which will showcase loops and arrays. This was slightly simplified from the GitHub example. First the header and source files:

```
// string_cap.h
#define MAX_LENGTH 32
void CapitalizeString(char my_string[MAX_LENGTH]);
// string_cap.cc
#include "string_cap.h"
#pragma hls_top
void CapitalizeString(char my_string[MAX_LENGTH]) {
bool last_was_space = true;
#pragma hls_unroll yes
for (int i = 0; i < MAX_LENGTH; i++) {
char c = my_string[i];
if (last_was_space && c >= 'a' && c <= 'z') {
my_string[i] = c - ('a' - 'A');
}
last_was_space = (c == ' ');
}
}
```

Now there’s a bit to discuss. First, the string has a static length known at compile time. This is required because the FHE program is a boolean circuit. It defines wires for each of the inputs, and it must know how many wires to define. In this case it will be a circuit with `32 * 8`

wires, one for each bit of each character in the array.

The second new thing is the `#pragma hsl_unroll yes`

, which, like `hls_top`

, tells the XLS compiler to fully unroll that loop. Because the FHE program is a static circuit, it cannot have any loops. XLS unrolls our loops for us, and incidentally, I learned recently that it uses the Z3 solver to first *prove* the loops can be unrolled (which can lead to some slow compile times for complex programs). I’m not aware of other compilers that do this proving part. It looks like LLVM’s loop unroller just slingshots its CPU cycles into the sun if it’s asked to fully unroll an infinite loop.

The main routine is similar as before:

```
#include <array>
#include <iostream>
#include <string>
#include "openfhe/binfhe/binfhecontext.h"
#include "transpiler/data/openfhe_data.h"
#include "transpiler/examples/string_cap/string_cap.h"
#include "transpiler/examples/string_cap/string_cap_openfhe_yosys_interpreted.h"
int main(int argc, char** argv) {
if (argc < 2) {
fprintf(stderr, "Usage: string_cap_openfhe_testbench string_input\n\n");
return 1;
}
std::string input = argv[1];
input.resize(MAX_LENGTH, '\0');
std::string plaintext(input);
auto cc = lbcrypto::BinFHEContext();
cc.GenerateBinFHEContext(lbcrypto::MEDIUM);
auto sk = cc.KeyGen();
cc.BTKeyGen(sk);
auto ciphertext = OpenFheArray<char>::Encrypt(plaintext, cc, sk);
auto status = CapitalizeString(ciphertext, cc);
if (!status.ok()) {
std::cout << "FHE computation failed " << status << std::endl;
return 1;
};
std::cout << "Decrypted result: " << ciphertext.Decrypt(sk) << std::endl;
}
```

The key differences are:

- We resize the input to be exactly
`MAX_LENGTH`

, padding with null bytes. - We use
`OpenFheArray`

instead of`OpenFhe`

to encode an array of characters.

And now omitting the binary’s build rule and running it, we get

```
$ bazel run string_cap_openfhe_yosys_interpreted_testbench -- 'hello there'
Decrypted result: Hello There
```

Interestingly, this also takes about 6 seconds to run on my machine (same as the “add 32-bit integers” program). It would be the same runtime for a longer string, up to 32 characters, since, of course, the program processes all `MAX_LENGTH`

characters without knowing if they are null bytes.

## An overview of Bazel and Starlark

The FHE compiler originated within Google in a curious way. It was created by dozens of volunteer contributors (20%-ers, as they say), many of whom worked on the XLS hardware synthesis toolchain, which is a core component of the compiler. Because of these constraints, and also because it was happening entirely in Google, there wasn’t much bandwidth available to make the compiler independent of Google’s internal build tooling.

This brings us to Bazel and Starlark, which is the user-facing façade of the compiler today. Bazel is the open source analogue of Google’s internal build system (“Blaze” is the internal tool), and Starlark is its Python-inspired scripting language. There are lots of opinions about Bazel that I won’t repeat here. Instead I will give a minimal overview of how it works with regards to the FHE compiler.

First some terminology. To work with Bazel you do the following.

- Define a
`WORKSPACE`

file which defines all your project’s external dependencies, how to fetch their source code, and what bazel commands should be used to build them. This can be thought of as a top-level CMakeLists, except that it doesn’t contain any instructions for building the project beyond declaring the root of the project’s directory tree and the project’s name. - Define a set of
`BUILD`

files in each subdirectory, declaring the*build targets*that can be built from the source files in that directory (but not its subdirectories). This is analogous to CMakeLists files in subdirectories. Each build target can declare dependence on other build targets, and`bazel build`

ensures the dependencies are built first, and caches the build results across a session. Many projects have a`BUILD`

file in the project root to expose the project’s public libraries and APIs. - Use the built-in bazel
*rules*like`cc_library`

and`cc_binary`

and`cc_test`

to group files into libraries that can be built with`bazel build`

, executable binaries that can also be run with`bazel run`

, and tests that can also be run with`bazel test`

. Most bazel rules boil down to calling some executable program like`gcc`

or`javac`

with specific arguments, while also keeping track of the accumulated dependency set of build artifacts in a “hermetic” location on the filesystem. - Write any additional bazel
*macros*that chain together built-in bazel commands, e.g., for defining logical groupings of build commands that need to happen in a particular sequence. Macros look like Python functions that call individual bazel rules and possibly pass data between them. They’re written in`.bzl`

files which are interpreted directly by`bazel`

.

Generally, `bazel`

builds targets in two phases. First—the analysis phase—it loads all the `BUILD`

files and imported `.bzl`

files, and scans for all the rules that were called. In particular, it *runs* the macros, because it needs to know what rules are called by the macros (and rules can be guarded by control flow, or their arguments can be generated dynamically, etc.). But it doesn’t *run* the build rules themselves. In doing this, it can build a complete graph of dependencies, and report errors about typos, missing dependencies, cycles, etc. Once the analysis phase is complete, it runs the underlying rules in dependency order, and caches the results. Bazel will only run a rule again if something changes with the files it depends on or its underlying dependencies.

The FHE compiler is written in Starlark, in the sense that the main entrypoint for the compiler is the Bazel macro `fhe_cc_library`

. This macro chains together a bunch of rules that call the parser, circuit optimizer, and codegen steps, each one being its own Bazel rule. Each of these rules in turn declare/write files that we can inspect—see the next section.

Here’s what `fhe_cc_library`

looks like (a subset of the control flow for brevity)

```
def fhe_cc_library(name, src, hdrs, copts = [], num_opt_passes = 1,
encryption = "openfhe", optimizer = "xls", interpreter = False, library_name = None,
**kwargs):
"""A rule for building FHE-based cc_libraries. [docstring ommitted]"""
transpiled_xlscc_files = "{}.cc_to_xls_ir".format(name)
library_name = library_name or name
cc_to_xls_ir(
name = transpiled_xlscc_files,
library_name = library_name,
src = src,
hdrs = hdrs,
defines = kwargs.get("defines", None),
)
# below, adding a leading colon to the `src` argument points the source files attribute
# to the files generated by a previously generated rule, with the name being the unique
# identifier.
transpiled_structs_headers = "{}.xls_cc_transpiled_structs".format(name)
xls_cc_transpiled_structs(
name = transpiled_structs_headers,
src = ":" + transpiled_xlscc_files,
encryption = encryption,
)
if optimizer == "yosys": # other branch omitted for brevity
verilog = "{}.verilog".format(name)
xls_ir_to_verilog(name = verilog, src = ":" + transpiled_xlscc_files)
netlist = "{}.netlist".format(name)
verilog_to_netlist(name = netlist, src = ":" + verilog, encryption = encryption)
cc_fhe_netlist_library(
name = name,
src = ":" + netlist,
encryption = encryption,
interpreter = interpreter,
transpiled_structs = ":" + transpiled_structs_headers,
copts = copts,
**kwargs
)
```

The rules invoked by the macro include:

`cc_to_xls_ir`

, which calls the parser`xlscc`

and outputs an intermediate representation of the program as a high-level circuit. This step does the loop unrolling and other smarts related to converting C++ to a circuit.`xlscc_transpiled_structs`

, which calls a binary that handles structs (this part is complicated and will not be covered in this article).`xls_ir_to_verilog`

, which converts the XLS IR to verilog so that it can be optimized using Yosys/ABC, a popular circuit design and optimization program.`verilog_to_netlist`

, which invokes Yosys to both optimize the circuit and convert it to the lowest-level IR, which is called a netlist.`cc_fhe_netlist_library`

, which calls the codegen step to generate C++ code from the netlist in the previous step.

All of this results in a C++ library (generated by the last step) that can be linked against an existing program and whose generated source we can inspect. Now let’s see what each generated file looks like.

## The intermediate files generated by the compiler

Earlier I mentioned that bazel puts the intermediate files generated by each build rule into a “hermetic” location on the filesystem. That location is sym-linked from the workspace root by a link called `bazel-bin`

.

```
$ ls -al . | grep bazel-bin
/home/j2kun/.cache/bazel/_bazel_j2kun/42987a3d4769c6105b2fa57d2291edc3/execroot/com_google_fully_homomorphic_encryption/bazel-out/k8-opt/bin
```

Within `bazel-bin`

there’s a mirror of the project’s source tree, and in the directory for a build rule you can find all the generated files. For our 32-bit adder here’s what it looks like:

```
$ ls
_objs add_test
add_fhe_lib.cc add_test-2.params
add_fhe_lib.entry add_test.runfiles
add_fhe_lib.generic.types.h add_test.runfiles_manifest
add_fhe_lib.h libadd.a
add_fhe_lib.ir libadd.a-2.params
add_fhe_lib.netlist.v libadd.pic.a
add_fhe_lib.netlist.v.dot libadd.pic.a-2.params
add_fhe_lib.opt.ir libadd.so
add_fhe_lib.types.h libadd.so-2.params
add_fhe_lib.v libadd_fhe_lib.a
add_fhe_lib.ys libadd_fhe_lib.a-2.params
add_fhe_lib_meta.proto libadd_fhe_lib.pic.a
add_openfhe_fhe_demo libadd_fhe_lib.pic.a-2.params
add_openfhe_fhe_demo-2.params libadd_fhe_lib.so
add_openfhe_fhe_demo.runfiles libadd_fhe_lib.so-2.params
add_openfhe_fhe_demo.runfiles_manifest
```

You can see the output `.h`

and `.cc`

files and their compiled `.so`

files (the output build artifacts), but more importantly for us are the internal generated files. This is where we get to actually see the circuits generated.

The first one worth inspecting is `add_fhe_lib.opt.ir`

, which is the output of the `xlscc`

compiler plus an XLS-internal optimization step. This is the main part of how the compiler uses the XLS project: to convert an input program into a circuit. The file looks like:

```
package my_package
file_number 1 "./transpiler/codelab/add/add.cc"
top fn add(x: bits[32], y: bits[32]) -> bits[32] {
ret add.3: bits[32] = add(x, y, id=3, pos=[(1,18,25)])
}
```

As you can see, it’s an XLS-defined internal representation (IR) of the main routine with some extra source code metadata. Because XLS-IR natively supports additions, the result is trivial. One interesting thing to note is that numbers are represented as bit arrays. In short, XLS-IR’s value type system supports only bits, arrays, and tuples, which tuples being the mechanism for supporting structures.

Next, the XLS-IR is converted to Verilog in `add_fhe_lib.v`

, resulting in the (similarly trivial)

```
module add(
input wire [31:0] x,
input wire [31:0] y,
output wire [31:0] out
);
wire [31:0] add_6;
assign add_6 = x + y;
assign out = add_6;
endmodule
```

The next step is to run this verilog through Yosys, which is a mature circuit synthesis suite, and for our purposes is encapsulates the two tasks:

- Convert higher-level operations to a specified set of boolean gates (that operate on individual bits)
- Optimize the resulting circuit to be as small as possible

XLS can also do this, and if you want to see that you can change the build rule `optimizer`

attribute from `yosys`

to `xls`

. But we’ve found that Yosys routinely produces 2-3x smaller circuits. The script that we give to yosys can be found in `fhe_yosys.bzl`

, which encapsulates the bazel macros and rules related to invoking Yosys. The output for our adder program is:

```
module add(x, y, out);
wire _000_;
wire _001_;
wire _002_;
[...]
wire _131_;
wire _132_;
output [31:0] out;
wire [31:0] out;
input [31:0] x;
wire [31:0] x;
input [31:0] y;
wire [31:0] y;
nand2 _133_ (.A(x[12]), .B(y[12]), .Y(_130_));
xor2 _134_ ( .A(x[12]), .B(y[12]), .Y(_131_));
nand2 _135_ ( .A(x[11]), .B(y[11]), .Y(_132_));
or2 _136_ ( .A(x[11]), .B(y[11]), .Y(_000_));
nand2 _137_ ( .A(x[10]), .B(y[10]), .Y(_001_));
xor2 _138_ ( .A(x[10]), .B(y[10]), .Y(_002_));
nand2 _139_ ( .A(x[9]), .B(y[9]), .Y(_003_));
or2 _140_ ( .A(x[9]), .B(y[9]), .Y(_004_));
nand2 _141_ ( .A(x[8]), .B(y[8]), .Y(_005_));
xor2 _142_ ( .A(x[8]), .B(y[8]), .Y(_006_));
nand2 _143_ ( .A(x[7]), .B(y[7]), .Y(_007_));
or2 _144_ ( .A(x[7]), .B(y[7]), .Y(_008_));
[...]
xor2 _291_ ( .A(_006_), .B(_035_), .Y(out[8]));
xnor2 _292_ ( .A(x[9]), .B(y[9]), .Y(_128_));
xnor2 _293_ ( .A(_037_), .B(_128_), .Y(out[9]));
xor2 _294_ ( .A(_002_), .B(_039_), .Y(out[10]));
xnor2 _295_ ( .A(x[11]), .B(y[11]), .Y(_129_));
xnor2 _296_ ( .A(_041_), .B(_129_), .Y(out[11]));
xor2 _297_ ( .A(_131_), .B(_043_), .Y(out[12]));
endmodule
```

This produces a circuit with a total of 165 gates.

The codegen step then produces a `add_fhe_lib.cc`

file which loads this circuit into an interpreter which knows to map the operation `and2`

to the chosen backend cryptosystem library call (see the source for the OpenFHE backend), and uses thread-pool scheduling on CPU to speed up the evaluation of the circuit.

For the string_cap circuit, the `opt.ir`

shows off a bit more of XLS’s IR, including operations for sign extension, array indexing & slicing, and multiplexing (`sel`

) branches. The resulting netlist after optimization is a 684-gate circuit (though many of those are “inverter” or “buffer” gates, which are effectively free for FHE).

The compiler also outputs a `.dot`

file which can be rendered to an SVG (warning, the SVG is ~2.3 MiB). If you browse this circuit, you’ll see it is rather shallow and wide, and this allows the thread-pool scheduler to take advantage of the parallelism in the circuit to make it run fast. Meanwhile, the 32-bit adder, though it has roughly 25% the total number of gates, is a much deeper circuit and hence has less parallelism.

## Supported C++ input programs and encryption overhead

This has so far been a tour of the compiler, but if you want to get started using the compiler to write programs, you’ll need to keep a few things in mind.

First, the subset of C++ supported by the compiler is rather small. As mentioned earlier, all data needs to have static sizes. This means, e.g., you can’t write a program that processes arbitrary images. Instead, you have to pick an upper bound on the image size, zero-pad the image appropriately before encrypting it, and then write the program to operate on that image size. In the same vein, the integer types you choose have nontrivial implications on performance. To see this, replace the `int`

type in the 32-bit adder with a `char`

and inspect the resulting circuit.

Similarly, loops need static bounds on their iteration count. Or, more precisely, `xlscc`

needs to be able to fully unwrap every loop—which permits some forms of while loops and recursion that provably terminate. This can cause some problem if the input code has loops with complex exit criteria (i.e., `break`

‘s guarded by if/else). It also requires you to think hard about how you write your loops, though future work will hopefully let the compiler do that thinking for you.

Finally, encrypting each bit of a plaintext message comes with major tax on space usage. Each encryption of a single bit corresponds to a list of roughly 700 32-bit integers. If you want to encrypt a 100×100 pixel greyscale image, each pixel of which is an 8-bit integer (0-255), it will cost you **218 MiB** to store all the pixels in memory. It’s roughly a 20,000x overhead. For comparison, the music video for Rick Astley’s “Never Gonna Give You Up” at 360p is about 9 MiB (pretty small for a 3 minute video!), but encrypted in FHE would be **188 GiB**, which (generously) corresponds to 20 feature-length films at 1080p. Some other FHE schemes have smaller ciphertext sizes, but at the cost of even larger in-memory requirements to run the computations. So if you want to run programs to operate on video—you can do it, but you will need to distribute the work appropriately, and find useful ways to reduce the data size as much as possible before encrypting it (such as working in lower resolution, greyscale, and a lower frame rate), which will also result in overall faster programs.

Until next time!

[Personal note]: Now that I’m more or less ramped up on the FHE domain, I’m curious to know what aspects of FHE my readers are interested in. Mathematical foundations? More practical demonstrations? Library tutorials? Circuit optimization? Please comment and tell me about what you’re interested in.

# Estimating the Security of Ring Learning with Errors (RLWE)

This article was written by my colleague, Cathie Yun. Cathie is an applied cryptographer and security engineer, currently working with me to make fully homomorphic encryption a reality at Google. She’s also done a lot of cool stuff with zero knowledge proofs.

In previous articles, we’ve discussed techniques used in Fully Homomorphic Encryption (FHE) schemes. The basis for many FHE schemes, as well as other privacy-preserving protocols, is the Learning With Errors (LWE) problem. In this article, we’ll talk about how to estimate the security of lattice-based schemes that rely on the hardness of LWE, as well as its widely used variant, Ring LWE (RLWE).

A previous article on modulus switching introduced LWE encryption, but as a refresher:

## Reminder of LWE

A literal repetition from the modulus switching article. The LWE encryption scheme I’ll use has the following parameters:

- A plaintext space $\mathbb{Z}/q\mathbb{Z}$, where $q \geq 2$ is a positive integer. This is the space that the underlying message comes from.
- An
*LWE dimension*$n \in \mathbb{N}$. - A discrete Gaussian
*error distribution*$ D$ with a mean of zero and a fixed standard deviation.

An LWE secret key is defined as a vector in $\{0, 1\}^n$ (uniformly sampled). An LWE ciphertext is defined as a vector $a = (a_1, \dots, a_n)$, sampled uniformly over $(\mathbb{Z} / q\mathbb{Z})^n$, and a scalar $b = \langle a, s \rangle + m + e$, where $e$ is drawn from $D$ and all arithmetic is done modulo $q$. Note that $e$ must be small for the encryption to be valid.

## Learning With Errors (LWE) security

Choosing appropriate LWE parameters is a nontrivial challenge when designing and implementing LWE based schemes, because there are conflicting requirements of security, correctness, and performance. Some of the parameters that can be manipulated are the LWE dimension $n$, error distribution $D$ (referred to in the next few sections as $X_e$), secret distribution $X_s$, and plaintext modulus $q$.

## Lattice Estimator

Here is where the Lattice Estimator tool comes to our assistance! The lattice estimator is a Sage module written by a group of lattice cryptography researchers which estimates the concrete security of Learning with Errors (LWE) instances.

For a given set of LWE parameters, the Lattice Estimator calculates the cost of all known efficient lattice attacks – for example, the Primal, Dual, and Coded-BKW attacks. It returns the estimated number of “rops” or “ring operations” required to carry out each attack; the attack that is the most efficient is the one that determines the security parameter. The bits of security for the parameter set can be calculated as $\log_2(\text{rops})$ for the most efficient attack.

## Running the Lattice Estimator

For example, let’s estimate the security of the security parameters originally published for the popular TFHE scheme:

```
n = 630
q = 2^32
Xs = UniformMod(2)
Xe = DiscreteGaussian(stddev=2^17)
```

After installing the Lattice Estimator and sage, we run the following commands in sage:

```
> from estimator import *
> schemes.TFHE630
LWEParameters(n=630, q=4294967296, Xs=D(σ=0.50, μ=-0.50), Xe=D(σ=131072.00), m=+Infinity, tag='TFHE630')
> _ = LWE.estimate(schemes.TFHE630)
bkw :: rop: ≈2^153.1, m: ≈2^139.4, mem: ≈2^132.6, b: 4, t1: 0, t2: 24, ℓ: 3, #cod: 552, #top: 0, #test: 78, tag: coded-bkw
usvp :: rop: ≈2^124.5, red: ≈2^124.5, δ: 1.004497, β: 335, d: 1123, tag: usvp
bdd :: rop: ≈2^131.0, red: ≈2^115.1, svp: ≈2^131.0, β: 301, η: 393, d: 1095, tag: bdd
bdd_hybrid :: rop: ≈2^185.3, red: ≈2^115.9, svp: ≈2^185.3, β: 301, η: 588, ζ: 0, |S|: 1, d: 1704, prob: 1, ↻: 1, tag: hybrid
bdd_mitm_hybrid :: rop: ≈2^265.5, red: ≈2^264.5, svp: ≈2^264.5, β: 301, η: 2, ζ: 215, |S|: ≈2^189.2, d: 1489, prob: ≈2^-146.6, ↻: ≈2^148.8, tag: hybrid
dual :: rop: ≈2^128.7, mem: ≈2^72.0, m: 551, β: 346, d: 1181, ↻: 1, tag: dual
dual_hybrid :: rop: ≈2^119.8, mem: ≈2^115.5, m: 516, β: 314, d: 1096, ↻: 1, ζ: 50, tag: dual_hybrid
```

In this example, the most efficient attack is the `dual_hybrid`

attack. It uses `2^119.8`

ring operations, and so these parameters provide `119.8`

bits of security. The reader may notice that the TFHE website claims those parameters give 128 bits of security. This discrepancy is due to the fact that they used an older library (the LWE estimator, which is no longer maintained), which doesn’t take into account the most up-to-date lattice attacks.

For further reading, Benjamin Curtis wrote an article about parameter selection for the CONCRETE implementation of the TFHE scheme. Benjamin Curtis, Martin Albrecht, and other researchers also used the Lattice Estimator to estimate all the LWE and NTRU schemes.

## Ring Learning with Errors (RLWE) security

It is often desirable to use Ring LWE instead of LWE, for greater efficiency and smaller key sizes (as Chris Peikert illustrates via meme). We’d like to estimate the security of a Ring LWE scheme, but it wasn’t immediately obvious to us how to do this, since the Lattice Estimator only operates over LWE instances. In order to use the Lattice Estimator for this security estimate, we first needed to do a reduction from the RLWE instance to an LWE instance.

## Attempted RLWE to LWE reduction

Given an RLWE instance with $ \text{RLWE_dimension} = k $ and $ \text{poly_log_degree} = N $, we can create a relation that *looks like* an LWE instance of $ \text{LWE_dimension} = N * k $ with the same security, as long as $N$ is a power of 2 and there are no known attacks that target the ring structure of RLWE that are more efficient than the best LWE attacks. Note: $N$ must be a power of 2 so that $x^N+1$ is a cyclotomic polynomial.

An RLWE encryption has the following form: $ (a_0(x), a_1(x), … a_{k-1}(x), b(x)) $

- Public polynomials: $ a_0(x), a_1(x), \dots a_{k-1}(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} (\mathbb{Z}/{q \mathbb{Z}[x]} ) / (x^N + 1)^k$
- Secret (binary) polynomials: $ s_0(x), s_1(x), \dots s_{k-1}(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} (\mathbb{B}_N[x])^k$
- Error: $ e(x) \overset{{\scriptscriptstyle\$}}{\leftarrow} \chi_e$
- RLWE instance: $ b(x) = \sum_{i=0}^{k-1} a_i(x) \cdot s_i(x) + e(x) \in (\mathbb{Z}/{q \mathbb{Z}[x]} ) / (x^N + 1)$

We would like to express this in the form of an LWE encryption. We can make start with the simple case, where $ k=1 $. Therefore, we will only be working with the zero-entry polynomials, $a_0(x)$ and $s_0(x)$. (For simplicity, in the next example you can ignore the zero-subscript and think of them as $a(x)$ and $s(x)$).

## Naive reduction for $k=1$ (wrong!)

**Naively**, if we simply defined the LWE $A$ matrix to be a concatenation of the coefficients of the RLWE polynomial $a(x)$, we get:

$$ A_{\text{LWE}} = ( a_{0, 0}, a_{0, 1}, \dots a_{0, N-1} ) $$

We can do the same for the LWE $s$ vector:

$$ s_{\text{LWE}} = ( s_{0, 0}, s_{0, 1}, \dots s_{0, N-1} ) $$

But this doesn’t give us the value of $b_{LWE}$ for the LWE encryption that we want. In particular, the first entry of $b_{LWE}$, which we can call $b_{\text{LWE}, 0}$, is simply a product of the first entries of $a_0(x)$ and $s_0(x)$:

$$ b_{\text{LWE}, 0} = a_{0, 0} \cdot s_{0, 0} + e_0 $$

However, we **want** $b_{\text{LWE}, 0}$ to be a sum of the products of all the coefficients of $a_0(x)$ and $s_0(x)$ that give us a zero-degree coefficient mod $x^N + 1$. This modulus is important because it causes the product of high-degree monomials to “wrap around” to smaller degree monomials because of the negacyclic property, such that $x^N \equiv -1 \mod x^N + 1$. So the constant term $b_{\text{LWE}, 0}$ should include all of the following terms:

$$\begin{aligned}

b_{\text{LWE}, 0} = & a_{0, 0} \cdot s_{0, 0} \\

– & a_{0, 1} \cdot s_{0, N-1} \\

– & a_{0, 2} \cdot s_{0, N-2} \\

– & \dots \\

– & a_{0, N-1} \cdot s_{0, 1}\\

+ & e_0\\

\end{aligned}

$$

## Improved reduction for $k=1$

We can achieve the desired value of $b_{\text{LWE}}$ by more strategically forming a matrix $A_{\text{LWE}}$, to reflect the negacyclic property of our polynomials in the RLWE space. We can keep the naive construction for $s_\text{LWE}$.

$$ A_{\text{LWE}} =

\begin{pmatrix}

a_{0, 0} & -a_{0, N-1} & -a_{0, N-2} & \dots & -a_{0, 1}\\

a_{0, 1} & a_{0, 0} & -a_{0, N-1} & \dots & -a_{0, 2}\\

\vdots & \ddots & & & \vdots \\

a_{0, N-1} & \dots & & & a_{0, 0} \\

\end{pmatrix}

$$

This definition of $A_\text{LWE}$ gives us the desired value for $b_\text{LWE}$, when $b_{\text{LWE}}$ is interpreted as the coefficients of a polynomial. As an example, we can write out the elements of the first row of $b_\text{LWE}$:

$$

\begin{aligned}

b_{\text{LWE}, 0} = & \sum_{i=0}^{N-1} A_{\text{LWE}, 0, i} \cdot s_{0, i} + e_0 \\

b_{\text{LWE}, 0} = & a_{0, 0} \cdot s_{0, 0} \\

– & a_{0, 1} \cdot s_{0, N-1} \\

– & a_{0, 2} \cdot s_{0, N-2} \\

– & \dots \\

– & a_{0, N-1} \cdot s_{0, 1}\\

+ & e_0 \\

\end{aligned}

$$

## Generalizing for all $k$

In the generalized $k$ case, we have the RLWE equation:

$$ b(x) = a_0(x) \cdot s_0(x) + a_1(x) \cdot s_1(x) \cdot a_{k-1}(x) \cdot s_{k-1}(x) + e(x) $$

We can construct the LWE elements as follows:

$$A_{\text{LWE}} =

\left ( \begin{array}{c|c|c|c}

A_{0, \text{LWE}} & A_{1, \text{LWE}} & \dots & A_{k-1, \text{LWE}} \end{array}

\right )

$$

where each sub-matrix is the construction from the previous section:

$$ A_{\text{LWE}} =

\begin{pmatrix}

a_{i, 0} & -a_{i, N-1} & -a_{i, N-2} & \dots & -a_{i, 1}\\

a_{i, 1} & a_{i, 0} & -a_{i, N-1} & \dots & -a_{i, 2}\\

\vdots & \ddots & & & \vdots \\

a_{i, N-1} & \dots & & & a_{i, 0} \\

\end{pmatrix}

$$

And the secret keys are stacked similarly:

$$ s_{\text{LWE}} = ( s_{0, 0}, s_{0, 1}, \dots s_{0, N-1} \mid s_{1, 0}, s_{1, 1}, \dots s_{1, N-1} \mid \dots ) $$

This is how we can reduce an RLWE instance with RLWE dimension $k$ and polynomial modulus degree $N$, to a relation that **looks like** an LWE instance of LWE dimension $N * k$.

## Caveats and open research

This reduction does not result in a correctly formed LWE instance, since an LWE instance would have a matrix $A$ that is randomly sampled, whereas the reduction results in an matrix $A$ that has cyclic structure, due to the cyclic property of the RLWE instance. This is why I’ve been emphasizing that the reduction produces an instance that *looks like* LWE. All currently known attacks on RLWE do not take advantage of the structure, but rather directly attack this transformed LWE instance. Whether the additional ring structure can be exploited in the design of more efficient attacks remains an open question in the lattice cryptography research community.

In her PhD thesis, Rachel Player mentions the RLWE to LWE security reduction:

In order to try to pick parameters in Ring-LWE-based schemes (FHE or otherwise) that we hope are sufficiently secure, we can choose parameters such that the underlying Ring-LWE instance should be hard to solve according to known attacks. Each Ring-LWE sample can be used to extract $n$ LWE samples. To the best of our knowledge, the most powerful attacks against $d$-sample Ring-LWE all work by instead attacking the $nd$-sample LWE problem. When estimating the security of a particular set of Ring-LWE parameters we therefore estimate the security of the induced set of LWE parameters.

This indicates that we can do this reduction for certain RLWE instances. However, we must be careful to ensure that the polynomial modulus degree $N$ is a power of two, because otherwise the error distribution “breaks”, as my colleague Baiyu Li explained to me in conversation:

The RLWE problem is typically defined in using the ring of integers of the cyclotomic field $\mathbb{Q}[X]/(f(X))$, where $f(X)$ is a cyclotomic polynomial of degree $k=\phi(N)$ (where $\phi$ is Euler’s totient function), and the error is a spherical Gaussian over the image of the canonical embedding into the complex numbers $\mathbb{C}^k$ (basically the images of primitive roots of unity under $f$). In many cases we set $N$ to be a power of 2, thus $f(X)=X^{N/2}+1$, since the canonical embedding for such $N$ has a nice property that the preimage of the spherical Gaussian error is also a spherical Gaussian over the coefficients of polynomials in $\mathbb{Q}[X]/(f(X))$. So in this case we can sample $k=N/2$ independent Gaussian numbers and use them as the coefficients of the error polynomial $e(x)$. For $N$ not a power of 2, $f(X)$ may have some low degree terms, and in order to get the spherical Gaussian with the same variance $s^2$ in the canonical embedding, we probably need to use a larger variance when sampling the error polynomial coefficients.

The RLWE we frequently use in practice is actually a specialized version called “polynomial LWE”, and instantiated with $N$ = power of 2 and so $f(X)=X^{N/2}+1$. For other parameters the two are not exactly the same. This paper has some explanations: https://eprint.iacr.org/2018/170.pdf

The error distribution “breaks” if $N$ is not a power of 2 due to the fact that the precise form of RLWE is not defined on integer polynomial rings $R = \mathbb{Z}[X]/(f(X))$, but is defined on its dual (or the dual in the underlying number field, which is a fractional ideal of $\mathbb{Q}[X]/(f(x))$), and the noise distribution is on the Minkowski embedding of this dual ring. For non-power of 2 $N$, the product mod $f$ of two small polynomials in $\mathbb{Q}[X]/(f(x))$ may be large, where small/large means their L2 norm on the coefficient vector. This means that in order to sample the required noise distribution, you may need a skewed coefficient distribution. Only when $N$ is a power of 2, the dual of $R$ is a scaling of $R$, and distance in the embedding of $R^{\text{dual}}$ is preserved in $R$, and so we can just sample iid gaussian coefficient to get the required noise.

Because working with a power-of-two RLWE polynomial modulus gives “nice” error behavior, this parameter choice is often recommended and chosen for concrete instantiations of RLWE. For example, the Homomorphic Encryption Standard

recommends and only analyzes the security of parameters for power-of-two cyclotomic fields for use in homomorphic encryption (though future versions of the standard aim to extend the security analysis to generic cyclotomic rings):

We stress that when the error is chosen from sufficiently wide and “well spread” distributions that match the ring at hand, we do not have meaningful attacks on RLWE that are better than LWE attacks, regardless of the ring. For power-of-two cyclotomics, it is sufficient to sample the noise in the polynomial basis, namely choosing the coefficients of the error polynomial $e \in \mathbb{Z}[x] / \phi_k(x)$ independently at random from a very “narrow” distribution.

Existing works analyzing and targeting the ring structure of RLWE include:

It would of course be great to have a definitive answer on whether we can be confident using this RLWE to LWE reduction to estimate the security of RLWE based schemes. In the meantime, we have seen many Fully Homomorphic Encryption (FHE) schemes using this RLWE to LWE reduction, and we hope that this article helps explain how that reduction works and the existing open questions around this approach.