Back in May of 2022 I transferred teams at Google to work on Fully Homomorphic Encryption (newsletter announcement). Since then I’ve been working on a variety of projects in the space, including being the primary maintainer on github.com/google/fully-homomorphic-encryption, which is an open source FHE compiler for C++. This article will be an introduction to how to use it to compile programs to FHE, as well as a quick overview of its internals.

If you’d like to contribute to this project, please reach out to me at mathintersectprogramming@gmail.com or at j2kun@mathstodon.xyz. I have a few procedural hurdles to overcome before I can accept external contributions (with appropriate git commit credit), but if there’s enough interest I will make time for it sooner as opposed to later.

Overview

The core idea of fully homomorphic encryption (henceforth FHE) is that you can encrypt data and then run programs on it without ever decrypting it. In the extreme, even if someone had physical access to the machine and could inspect the values of individual memory cells or registers while the program was running, they would not see any of the bits of the underlying data being operated on (without cracking the cryptosystem).

Our FHE compiler converts C++ programs that operate on plaintext to programs that operate on the corresponding FHE ciphertexts (since it emits high-level code that then needs to be further compiled, it could be described as a transpiler). More specifically, it converts a specific subset of valid C++ programs—more on what defines that subset later—to programs that run the same program on encrypted data via one of the supported FHE cryptosystem implementations. In this sense it’s close to a traditional compiler: parse the input, run a variety of optimization passes, and generate some output. However, as we’ll see in this article, the unique properties of FHE make the compiler more like hardware circuit toolchains.

The variety of FHE supported by the compiler today is called “gate bootstrapping.” I won’t have time to go into intense detail about the math behind it, but suffice it to say that this technique gives away performance in exchange for a simpler job of optimizing and producing a working program. What I will say is that this blend of FHE encrypts each bit of its input into a separate ciphertext, and then represents the program as a boolean (combinational) circuit—composed of gates like AND, OR, XNOR, etc. Part of the benefit of the compiler is that it manages a mapping of higher order types like integers, arrays, and structs, to lists of encrypted booleans and back again.

A few limitations result from this circuit-based approach, which will be woven throughout the rest of this tutorial. First is that all loops must be fully unrolled and have statically-known bounds. Second, constructs like pointers, and dynamic memory allocation are not supported. Third, all control flow is multiplexed, meaning that all branches of all if statements are evaluated, and only then is one chosen. Finally, there are important practical considerations related to the bit-width of the types used and the expansion of cleartexts into ciphertexts that impact the performance of the resulting program.

On the other hand, combinational circuit optimization is a well-studied problem with off-the-shelf products that can be integrated (narrator: they did integrate some) into the FHE compiler to make the programs run faster.

Dependencies

tl;dr: check out the dockerfiles.

Google’s internal build system is called blaze, and its open source counterpart (equivalent in all except name) is called bazel. One of the first curious things you’ll notice about the compiler is that bazel is used both to build the project and to use the project (the latter I’d like to change). So you’ll need to install bazel, and an easy way to do that is to install bazelisk, which is the analogue of nvm for Node or pyenv for Python. You won’t need multiple versions of bazel, but this is just the easiest way to install the latest version. I’ll be using Bazel 4.0.0, but there are newer versions that should work just fine as well.

You’ll need a C compiler (I use gcc12) because most of the project’s dependencies are built from source (see next paragraph), and a small number of external libraries and programs to support some of the circuit optimizer plugins. For debian-based systems, this is the full list

apt-get update && apt-get install -y \
  gcc \
  git \
  libtinfo5 \
  python \
  python3 \
  python3-pip \
  autoconf \
  libreadline-dev \
  flex \
  bison \
  wget

As mentioned above, all the other dependencies are built from source, and this will take a while the first time you build the project. So you might as well clone and get that build started while you read. The command below will build the project and all the example binaries, and then cache the intermediate build artifacts for future builds, only recompiling what has changed in the mean time. See the Bazel/Starlark section for more details on what this command is doing. Note: the one weird case is LLVM. If you use an exotic operating system (or a docker container, don’t get me started on why this is an issue) then bazel may choose to build LLVM from scratch, which will take an hour or two for the first build. It may also fail due to a missing dependency of your system, which will be extremely frustrating (this is the #1 complaint in our GitHub issues). But, if you’re on a standard OS/architecture combination (as enumerated here), it will just fetch the right LLVM dependency and install it on your system.

git clone https://github.com/google/fully-homomorphic-encryption.git
cd fully-homomorphic-encryption
bazel build ...:all

A clean build on my home machine takes about 16 minutes.

Two end-to-end examples: add and string_cap

In this section I’ll show two end-to-end examples of using the compiler as an end user. The first will be for a dirt-simple program that adds two 32-bit integers. The second will be for a program that capitalizes the first character of each word in an ASCII string. The examples are already in the repository under transpiler/examples by the names simple_sum and string_cap.

Both of these programs will have the form of compiling a single function that is the entry point for the FHE part of the program, and providing a library and API to integrate it with a larger program.

First simple_sum. Add a header and source file like you would any standard C++ program, but with one extra line to tell the compiler which function is the function that should be compiled (along with any functions called within it).

// add.h
int add(int a, int b);

// add.cc
#include "add.h"

#pragma hls_top
int add(int a, int b) {
  return a + b;
}

The line #pragma hls_top tells the compiler which function is the entry point. Incidentally, hls stands for “high level synthesis,” and the pragma itself comes from the XLS project, which we use as our parser and initial circuit builder. Here ‘top’ just means top level function.

Then, inside a file in the same directory called BUILD (see the Bazel/Starlark section next for an overview of the build system), create a build target that invokes the FHE compiler. In our case we’ll use the OpenFHE backend.

# BUILD
# loads the FHE compiler as an extension to Bazel.
load("//transpiler:fhe.bzl", "fhe_cc_library")

fhe_cc_library(
  name = "add_fhe_lib",
  src = "add.cc",
  hdrs = ["add.h"],
  encryption = "openfhe",  # backend cryptosystem library
  interpreter = True,      # use dynamic thread scheduling
  optimizer = "yosys",     # boolean circuit optimizer
)

The full options for this build rule (i.e., the documentation of the compiler’s main entry point) can be found in the docstring of the bazel macro. I picked the parameters that have what I think of as the best tradeoff between stability and performance.

If you run bazel build add_fhe_lib, then you will see it build but nothing else (see the “intermediate files” section for more on what’s happening behind the scenes). But if you typed something wrong in the build file it would err at this point. It generates a header and cc file that contains the same API as add, but with different types for the arguments and extra arguments needed by the FHE library backend.

Next we need a main routine that uses the library. Since we’re using OpenFHE as our backend, it requires some configuration and the initial encryption of its inputs. The full code, with some slight changes for the blog, looks like this

#include <stdio.h>
#include <iostream>
#include <ostream>

#include "absl/strings/numbers.h"
#include "transpiler/codelab/add/add_fhe_lib.h"
#include "transpiler/data/openfhe_data.h"

constexpr auto kSecurityLevel = lbcrypto::MEDIUM;

int main(int argc, char** argv) {
  if (argc < 3) {
    fprintf(stderr, "Usage: add_main [int] [int]\n\n");
    return 1;
  }

  int x, y;
  if(!absl::SimpleAtoi(argv[1], &x)) {
    std::cout << "Bad int " << argv[1] << std::endl;
    return 1;
  }
  if(!absl::SimpleAtoi(argv[2], &y)) {
    std::cout << "Bad int " << argv[2] << std::endl;
    return 1;
  }
  std::cout << "Computing " << x << " + " << y << std::endl;

  // Set up backend context and encryption keys.
  auto context = lbcrypto::BinFHEContext();
  context.GenerateBinFHEContext(kSecurityLevel);
  auto sk = context.KeyGen();
  context.BTKeyGen(sk);

  OpenFhe<int> ciphertext_x = OpenFhe<int>::Encrypt(x, context, sk);
  OpenFhe<int> ciphertext_y = OpenFhe<int>::Encrypt(y, context, sk);
  OpenFhe<int> result(context);
  auto status = add(result, ciphertext_x, ciphertext_y, context);
  if(!status.ok()) {
    std::cout << "FHE computation failed: " << status << std::endl;
    return 1;
  }

  std::cout << "Result: " << result.Decrypt(sk) << "\n";
  return 0;
}

The parts that are not obvious boilerplate include:

Configuring the security level of the OpenFHE library (which is called BinFHE to signal it’s doing binary circuit FHE).

constexpr auto kSecurityLevel = lbcrypto::MEDIUM;

Setting up the initial OpenFHE secret key

 auto context = lbcrypto::BinFHEContext();
 context.GenerateBinFHEContext(kSecurityLevel);
 auto sk = context.KeyGen();
 context.BTKeyGen(sk);

Encrypting the inputs. This uses an API provided by the compiler (though because the project was a research prototype, I think the original authors never got around to unifying the “set up the secret key” part behind an API) and included in this from include "transpiler/data/openfhe_data.h"

 OpenFhe<int> ciphertext_x = OpenFhe<int>::Encrypt(x, context, sk);
 OpenFhe<int> ciphertext_y = OpenFhe<int>::Encrypt(y, context, sk);

Then calling the FHE-enabled add function, and decrypting the results.

Then create another BUILD rule for the binary:

cc_binary(
    name = "add_openfhe_fhe_demo",
    srcs = [
        "add_openfhe_fhe_demo.cc",
    ],
    deps = [
        ":add_fhe_lib",
        "//transpiler/data:openfhe_data",
        "@com_google_absl//absl/strings",
        "@openfhe//:binfhe",
    ],
)

Running it with bazel:

$ bazel run add_openfhe_fhe_demo -- 5 7
Computing 5 + 7
Result: 12

Timing this on my system, it takes a little less than 7 seconds.

On to a more complicated example: string_cap, which will showcase loops and arrays. This was slightly simplified from the GitHub example. First the header and source files:

// string_cap.h
#define MAX_LENGTH 32
void CapitalizeString(char my_string[MAX_LENGTH]);

// string_cap.cc
#include "string_cap.h"

#pragma hls_top
void CapitalizeString(char my_string[MAX_LENGTH]) {
  bool last_was_space = true;
#pragma hls_unroll yes
  for (int i = 0; i < MAX_LENGTH; i++) {
    char c = my_string[i];
    if (last_was_space && c >= 'a' && c <= 'z') {
      my_string[i] = c - ('a' - 'A');
    }
    last_was_space = (c == ' ');
  }
}

Now there’s a bit to discuss. First, the string has a static length known at compile time. This is required because the FHE program is a boolean circuit. It defines wires for each of the inputs, and it must know how many wires to define. In this case it will be a circuit with 32 * 8 wires, one for each bit of each character in the array.

The second new thing is the #pragma hsl_unroll yes, which, like hls_top, tells the XLS compiler to fully unroll that loop. Because the FHE program is a static circuit, it cannot have any loops. XLS unrolls our loops for us, and incidentally, I learned recently that it uses the Z3 solver to first prove the loops can be unrolled (which can lead to some slow compile times for complex programs). I’m not aware of other compilers that do this proving part. It looks like LLVM’s loop unroller just slingshots its CPU cycles into the sun if it’s asked to fully unroll an infinite loop.

The main routine is similar as before:

#include <array>
#include <iostream>
#include <string>

#include "openfhe/binfhe/binfhecontext.h"
#include "transpiler/data/openfhe_data.h"
#include "transpiler/examples/string_cap/string_cap.h"
#include "transpiler/examples/string_cap/string_cap_openfhe_yosys_interpreted.h"

int main(int argc, char** argv) {
  if (argc < 2) {
    fprintf(stderr, "Usage: string_cap_openfhe_testbench string_input\n\n");
    return 1;
  }

  std::string input = argv[1];
  input.resize(MAX_LENGTH, '\0');
  std::string plaintext(input);

  auto cc = lbcrypto::BinFHEContext();
  cc.GenerateBinFHEContext(lbcrypto::MEDIUM);
  auto sk = cc.KeyGen();
  cc.BTKeyGen(sk);

  auto ciphertext = OpenFheArray<char>::Encrypt(plaintext, cc, sk);
  auto status = CapitalizeString(ciphertext, cc);
  if (!status.ok()) {
    std::cout << "FHE computation failed " << status << std::endl;
    return 1;
  };
  std::cout << "Decrypted result: " << ciphertext.Decrypt(sk) << std::endl;
}

The key differences are:

And now omitting the binary’s build rule and running it, we get

$ bazel run string_cap_openfhe_yosys_interpreted_testbench -- 'hello there'
Decrypted result: Hello There

Interestingly, this also takes about 6 seconds to run on my machine (same as the “add 32-bit integers” program). It would be the same runtime for a longer string, up to 32 characters, since, of course, the program processes all MAX_LENGTH characters without knowing if they are null bytes.

An overview of Bazel and Starlark

The FHE compiler originated within Google in a curious way. It was created by dozens of volunteer contributors (20%-ers, as they say), many of whom worked on the XLS hardware synthesis toolchain, which is a core component of the compiler. Because of these constraints, and also because it was happening entirely in Google, there wasn’t much bandwidth available to make the compiler independent of Google’s internal build tooling.

This brings us to Bazel and Starlark, which is the user-facing façade of the compiler today. Bazel is the open source analogue of Google’s internal build system (“Blaze” is the internal tool), and Starlark is its Python-inspired scripting language. There are lots of opinions about Bazel that I won’t repeat here. Instead I will give a minimal overview of how it works with regards to the FHE compiler.

First some terminology. To work with Bazel you do the following.

Generally, bazel builds targets in two phases. First—the analysis phase—it loads all the BUILD files and imported .bzl files, and scans for all the rules that were called. In particular, it runs the macros, because it needs to know what rules are called by the macros (and rules can be guarded by control flow, or their arguments can be generated dynamically, etc.). But it doesn’t run the build rules themselves. In doing this, it can build a complete graph of dependencies, and report errors about typos, missing dependencies, cycles, etc. Once the analysis phase is complete, it runs the underlying rules in dependency order, and caches the results. Bazel will only run a rule again if something changes with the files it depends on or its underlying dependencies.

The FHE compiler is written in Starlark, in the sense that the main entrypoint for the compiler is the Bazel macro fhe_cc_library. This macro chains together a bunch of rules that call the parser, circuit optimizer, and codegen steps, each one being its own Bazel rule. Each of these rules in turn declare/write files that we can inspect—see the next section.

Here’s what fhe_cc_library looks like (a subset of the control flow for brevity)

def fhe_cc_library(name, src, hdrs, copts = [], num_opt_passes = 1,
        encryption = "openfhe", optimizer = "xls", interpreter = False, library_name = None,
        **kwargs):
    """A rule for building FHE-based cc_libraries. [docstring ommitted]"""
    transpiled_xlscc_files = "{}.cc_to_xls_ir".format(name)
    library_name = library_name or name
    cc_to_xls_ir(
        name = transpiled_xlscc_files,
        library_name = library_name,
        src = src,
        hdrs = hdrs,
        defines = kwargs.get("defines", None),
    )

    # below, adding a leading colon to the `src` argument points the source files attribute
    # to the files generated by a previously generated rule, with the name being the unique
    # identifier.
    transpiled_structs_headers = "{}.xls_cc_transpiled_structs".format(name)
    xls_cc_transpiled_structs(
        name = transpiled_structs_headers,
        src = ":" + transpiled_xlscc_files,
        encryption = encryption,
    )

    if optimizer == "yosys":  # other branch omitted for brevity
        verilog = "{}.verilog".format(name)
        xls_ir_to_verilog(name = verilog, src = ":" + transpiled_xlscc_files)
        netlist = "{}.netlist".format(name)
        verilog_to_netlist(name = netlist, src = ":" + verilog, encryption = encryption)
        cc_fhe_netlist_library(
            name = name,
            src = ":" + netlist,
            encryption = encryption,
            interpreter = interpreter,
            transpiled_structs = ":" + transpiled_structs_headers,
            copts = copts,
            **kwargs
        )

The rules invoked by the macro include:

All of this results in a C++ library (generated by the last step) that can be linked against an existing program and whose generated source we can inspect. Now let’s see what each generated file looks like.

The intermediate files generated by the compiler

Earlier I mentioned that bazel puts the intermediate files generated by each build rule into a “hermetic” location on the filesystem. That location is sym-linked from the workspace root by a link called bazel-bin.

$ ls -al . | grep bazel-bin
/home/j2kun/.cache/bazel/_bazel_j2kun/42987a3d4769c6105b2fa57d2291edc3/execroot/com_google_fully_homomorphic_encryption/bazel-out/k8-opt/bin

Within bazel-bin there’s a mirror of the project’s source tree, and in the directory for a build rule you can find all the generated files. For our 32-bit adder here’s what it looks like:

$ ls
_objs                                   add_test
add_fhe_lib.cc                          add_test-2.params
add_fhe_lib.entry                       add_test.runfiles
add_fhe_lib.generic.types.h             add_test.runfiles_manifest
add_fhe_lib.h                           libadd.a
add_fhe_lib.ir                          libadd.a-2.params
add_fhe_lib.netlist.v                   libadd.pic.a
add_fhe_lib.netlist.v.dot               libadd.pic.a-2.params
add_fhe_lib.opt.ir                      libadd.so
add_fhe_lib.types.h                     libadd.so-2.params
add_fhe_lib.v                           libadd_fhe_lib.a
add_fhe_lib.ys                          libadd_fhe_lib.a-2.params
add_fhe_lib_meta.proto                  libadd_fhe_lib.pic.a
add_openfhe_fhe_demo                    libadd_fhe_lib.pic.a-2.params
add_openfhe_fhe_demo-2.params           libadd_fhe_lib.so
add_openfhe_fhe_demo.runfiles           libadd_fhe_lib.so-2.params
add_openfhe_fhe_demo.runfiles_manifest

You can see the output .h and .cc files and their compiled .so files (the output build artifacts), but more importantly for us are the internal generated files. This is where we get to actually see the circuits generated.

The first one worth inspecting is add_fhe_lib.opt.ir, which is the output of the xlscc compiler plus an XLS-internal optimization step. This is the main part of how the compiler uses the XLS project: to convert an input program into a circuit. The file looks like:

package my_package

file_number 1 "./transpiler/codelab/add/add.cc"

top fn add(x: bits[32], y: bits[32]) -> bits[32] {
  ret add.3: bits[32] = add(x, y, id=3, pos=[(1,18,25)])
}

As you can see, it’s an XLS-defined internal representation (IR) of the main routine with some extra source code metadata. Because XLS-IR natively supports additions, the result is trivial. One interesting thing to note is that numbers are represented as bit arrays. In short, XLS-IR’s value type system supports only bits, arrays, and tuples, which tuples being the mechanism for supporting structures.

Next, the XLS-IR is converted to Verilog in add_fhe_lib.v, resulting in the (similarly trivial)

module add(
  input wire [31:0] x,
  input wire [31:0] y,
  output wire [31:0] out
);
  wire [31:0] add_6;
  assign add_6 = x + y;
  assign out = add_6;
endmodule

The next step is to run this verilog through Yosys, which is a mature circuit synthesis suite, and for our purposes is encapsulates the two tasks:

XLS can also do this, and if you want to see that you can change the build rule optimizer attribute from yosys to xls. But we’ve found that Yosys routinely produces 2-3x smaller circuits. The script that we give to yosys can be found in fhe_yosys.bzl, which encapsulates the bazel macros and rules related to invoking Yosys. The output for our adder program is:

module add(x, y, out);
  wire _000_;
  wire _001_;
  wire _002_;
  [...]
  wire _131_;
  wire _132_;
  output [31:0] out;
  wire [31:0] out;
  input [31:0] x;
  wire [31:0] x;
  input [31:0] y;
  wire [31:0] y;
  nand2 _133_ (.A(x[12]), .B(y[12]), .Y(_130_));
  xor2 _134_ ( .A(x[12]), .B(y[12]), .Y(_131_));
  nand2 _135_ ( .A(x[11]), .B(y[11]), .Y(_132_));
  or2 _136_ ( .A(x[11]), .B(y[11]), .Y(_000_));
  nand2 _137_ ( .A(x[10]), .B(y[10]), .Y(_001_));
  xor2 _138_ ( .A(x[10]), .B(y[10]), .Y(_002_));
  nand2 _139_ ( .A(x[9]), .B(y[9]), .Y(_003_));
  or2 _140_ ( .A(x[9]), .B(y[9]), .Y(_004_));
  nand2 _141_ ( .A(x[8]), .B(y[8]), .Y(_005_));
  xor2 _142_ ( .A(x[8]), .B(y[8]), .Y(_006_));
  nand2 _143_ ( .A(x[7]), .B(y[7]), .Y(_007_));
  or2 _144_ ( .A(x[7]), .B(y[7]), .Y(_008_));
  [...]
  xor2 _291_ ( .A(_006_), .B(_035_), .Y(out[8]));
  xnor2 _292_ ( .A(x[9]), .B(y[9]), .Y(_128_));
  xnor2 _293_ ( .A(_037_), .B(_128_), .Y(out[9]));
  xor2 _294_ ( .A(_002_), .B(_039_), .Y(out[10]));
  xnor2 _295_ ( .A(x[11]), .B(y[11]), .Y(_129_));
  xnor2 _296_ ( .A(_041_), .B(_129_), .Y(out[11]));
  xor2 _297_ ( .A(_131_), .B(_043_), .Y(out[12]));
endmodule

This produces a circuit with a total of 165 gates.

The codegen step then produces a add_fhe_lib.cc file which loads this circuit into an interpreter which knows to map the operation and2 to the chosen backend cryptosystem library call (see the source for the OpenFHE backend), and uses thread-pool scheduling on CPU to speed up the evaluation of the circuit.

For the string_cap circuit, the opt.ir shows off a bit more of XLS’s IR, including operations for sign extension, array indexing & slicing, and multiplexing (sel) branches. The resulting netlist after optimization is a 684-gate circuit (though many of those are “inverter” or “buffer” gates, which are effectively free for FHE).

The compiler also outputs a .dot file which can be rendered to an SVG (warning, the SVG is ~2.3 MiB). If you browse this circuit, you’ll see it is rather shallow and wide, and this allows the thread-pool scheduler to take advantage of the parallelism in the circuit to make it run fast. Meanwhile, the 32-bit adder, though it has roughly 25% the total number of gates, is a much deeper circuit and hence has less parallelism.

Supported C++ input programs and encryption overhead

This has so far been a tour of the compiler, but if you want to get started using the compiler to write programs, you’ll need to keep a few things in mind.

First, the subset of C++ supported by the compiler is rather small. As mentioned earlier, all data needs to have static sizes. This means, e.g., you can’t write a program that processes arbitrary images. Instead, you have to pick an upper bound on the image size, zero-pad the image appropriately before encrypting it, and then write the program to operate on that image size. In the same vein, the integer types you choose have nontrivial implications on performance. To see this, replace the int type in the 32-bit adder with a char and inspect the resulting circuit.

Similarly, loops need static bounds on their iteration count. Or, more precisely, xlscc needs to be able to fully unwrap every loop—which permits some forms of while loops and recursion that provably terminate. This can cause some problem if the input code has loops with complex exit criteria (i.e., break‘s guarded by if/else). It also requires you to think hard about how you write your loops, though future work will hopefully let the compiler do that thinking for you.

Finally, encrypting each bit of a plaintext message comes with major tax on space usage. Each encryption of a single bit corresponds to a list of roughly 700 32-bit integers. If you want to encrypt a 100×100 pixel greyscale image, each pixel of which is an 8-bit integer (0-255), it will cost you 218 MiB to store all the pixels in memory. It’s roughly a 20,000x overhead. For comparison, the music video for Rick Astley’s “Never Gonna Give You Up” at 360p is about 9 MiB (pretty small for a 3 minute video!), but encrypted in FHE would be 188 GiB, which (generously) corresponds to 20 feature-length films at 1080p. Some other FHE schemes have smaller ciphertext sizes, but at the cost of even larger in-memory requirements to run the computations. So if you want to run programs to operate on video—you can do it, but you will need to distribute the work appropriately, and find useful ways to reduce the data size as much as possible before encrypting it (such as working in lower resolution, greyscale, and a lower frame rate), which will also result in overall faster programs.

Until next time!

[Personal note]: Now that I’m more or less ramped up on the FHE domain, I’m curious to know what aspects of FHE my readers are interested in. Mathematical foundations? More practical demonstrations? Library tutorials? Circuit optimization? Please comment and tell me about what you’re interested in.