FHE-based RAM

Jean-Philippe Bossuat

June 17, 2025

For Phantom VM, we needed to design an efficient fully homomorphic encryption (FHE)-based random-access memory (RAM). Today, we are releasing its implementation, built using our FHE library Poulpy, available here.

An FHE-based RAM allows a user to store their RAM encrypted on a server. The user can then read from or write to any memory address while keeping the content of the RAM, the read/write address, and the write value private. The server only learns the total size of the RAM and that the user made a read/write call, nothing else

With a 1MB RAM, our construction achieves a read latency of 302 ms and a write latency of 871 ms when running on a single core of an Intel i9-12900K processor. The system is also embarrassingly parallelizable, which can significantly reduce runtimes in practice.

In the following sections, we provide an overview of our FHE-RAM design, along with instructions for using it. We also present performance benchmarks for various RAM sizes and discuss the factors that influence performance.

Detailed Overview

Note: In this section we use terms like GLWE, GGSW, and automorphisms. If you are new to these concepts you can think of

a GLWE ciphertext as an encryption of a polynomial \(\sum_{i=0}^{n-1} m_{i}X^{i}\),

a GGSW ciphertext as an encryption of a polynomial such that it enables the homomorphic product GLWE x GGSW -> GLWE. In our use case we need GGSW ciphertexts to encrypt monomials (i.e. \(X^{\pm i}\)),

an automorphism as a homomorphic permutation over the message space of a GLWE or GGSW ciphertext which applies the map \(X^{i} \rightarrow X^{ki} \mod (X^{n}+1)\).

a trace operation as homomorphically zeroing all coefficients except the constant one. For example the trace maps \(\sum_{i=0}^{n-1} m_{i}X^{i}\) to \(m_{0}\).

Global Parameters

In the setup phase, the client and server first retrieve the public parameters. These include the cryptographic parameters as well as some information about the RAM instance, such as its maximum supported address.

pub struct Parameters {
    module: Module<FFT64>, // FFT/NTT tables.
    basek: usize,          // Torus 2^{-k} decomposition.
    rank: usize,           // GLWE/GGLWE/GGSW rank.
    k_pt: usize,           // Ram plaintext (GLWE) Torus precision.
    k_ct: usize,           // Ram ciphertext (GLWE) Torus precision.
    k_addr: usize,         // Ram address (GGSW) Torus precision.
    k_evk: usize,          // Ram evaluation keys (GGLWE) Torus precision
    xs: f64,               // Secret-key distribution.
    xe: f64,               // Noise standard deviation.
    max_addr: usize,       // Maximum supported address.
    decomp_n: Vec<u8>,     // Digit decomposition of N.
    word_size: usize,      // Digit decomposition of a Ram word.
}

An instantiation of the default parameters can be obtained with:

let params = Parameters::new();

Key Generation

Next, the client generates a key pair using the following method:

pub fn gen_keys(params: &Parameters) -> (GLWESecret<Vec<u8>, FFT64>, EvaluationKeys)

The GLWESecret<Vec<u8>, FFT64> struct represents the client’s symmetric encryption key for GLWE (General Learning With Errors). The EvaluationKeys struct is defined as:

pub struct EvaluationKeys {
    pub(crate) auto_keys: HashMap<i64, AutomorphismKey<Vec<u8>, FFT64>>,
    pub(crate) tensor_key: TensorKey<Vec<u8>, FFT64>,
}

It contains a set of public evaluation keys, which are sent to the machine operating the FHE-RAM. These keys are required for some of the homomorphic operations described later.

Data Encryption

Finally, the client chooses some initial data to encrypt into the RAM. Before doing so, let’s take a look at how the RAM is structured.

pub struct Ram {
    pub(crate) params: Parameters,    // Global public parameters
    pub(crate) subrams: Vec<SubRam>,  // One SubRam for each word chunk
    pub(crate) scratch: ScratchOwned, // Scratch space
}

The Ram contains the public parameters, a vector of SubRam, and some pre-allocated scratch space. Our design allows words to be split into chunks (for example, a u32 into 4 × u8 or 8 × u4). This is useful if homomorphic operations will be applied to the values read from or written to the RAM. Each chunk is encrypted in its own SubRam.

pub(crate) struct SubRam {
    data: Vec<GLWECiphertext<Vec<u8>>>,      // Actual encrypted data
    tree: Vec<Vec<GLWECiphertext<Vec<u8>>>>, // Temporary tree used by the read step
    packer: StreamPacker,                    // GLWE packing procedure
    state: bool,                             // Wether it is read to be written to
}

Each SubRam stores a vector of GLWE ciphertexts, where each ciphertext encrypts the respective chunk of the value at the RAM address. The tree holds a temporary structure populated during a read call to prepare for a subsequent write. The state boolean indicates whether the SubRam is ready for a write. The packer provides a memory-efficient way to merge several GLWE ciphertexts.

The client instantiates the RAM by calling:

pub fn encrypt_sk(&mut self, data: &[u8], sk: &GLWESecret<Vec<u8>, FFT64>);

with their byte buffer data. The encrypt_sk method encrypts the client’s data into the RAM.

For example, when the number of chunks is set to 4 (i.e., 4 × u8), and the client’s byte buffer is:

[x00, x01, x02, x03, x10, x11, x12, x13, x20, x21, x22, x23, ...]

then the RAM will contain 4 SubRams as follows:

SubRam[0] <- Vec<GLWE([x00, x10, x20, ...]), GLWE(...), ...>
SubRam[1] <- Vec<GLWE([x01, x11, x21, ...]), GLWE(...), ...>
SubRam[2] <- Vec<GLWE([x02, x12, x22, ...]), GLWE(...), ...>
SubRam[3] <- Vec<GLWE([x03, x13, x23, ...]), GLWE(...), ...>

Adress Structure

Overview

An address is an encryption of a monomial \(X^{-i} \equiv -X^{n-i} \mod (X^{n}+1)\), where \(0 \leq i < \text{maxaddr}\) for some power of two \(n\) (the GLWE degree). If the maximum supported address exceeds \(n\), \(X^{-i}\) cannot be directly encrypted. Instead, it is decomposed in base \(n\) as:

\[ X^{-i} = \prod_{j=0}^{\log_n(\text{maxaddr})-1} X^{-d_j} \quad \text{with} \quad |d_j| < n \]

pub struct Address {
    coordinates: Vec<Coordinate<Vec<u8>>>, // Base n decomposition of the index
    k: usize,                              // Torus precision
    rows: usize,                           // Digit decomposision
    base2d: Base2D,                        // Full address base
}
pub(crate) struct Base2D(pub Vec<Base1D>);
pub(crate) struct Base1D(pub Vec<u8>);

Each Coordinate stores the \(j\)-th digit of \(i\) in base \(n\) (i.e. GGSW(\(X^{-d_j}\))).

pub(crate) struct Coordinate<D> {
    pub(crate) value: Vec<GGSWCiphertext<D, FFT64>>,
    pub(crate) base1d: Base1D,
}

A Coordinate can store more than a single GGSW ciphertext because we allow for a second level of decomposition over base-\(n\) digits of the index \(i\). This enables address representation digits to have small values and not depend on \(n\), which makes it easier to compute the address homomorphically when FHE-RAM is composed with other protocols. However, the second decomposition is not strictly needed for basic FHE-RAM functionality.

Toy Example

Let \(i=237\) and \(n=2^{6}\), then the first digit decomposition of \(i\) (in base \(n\)) is \(\vec{d} = [45, 3]\). Indeed we have that \(45 + 3\cdot 2^{6} = 237\). Now lets assume that we want to bound the digits of \(\vec{d}\) by \(B < n\) for some with \(B|n\) (since \(n\) is a power of two, this implies that \(B\) is too), for example with \(B=2^{3}\). We apply a second decomposition in base \(B\) on \(\vec{d}\) and we obtain \(\vec{d'} = [[5, 5], [3, 0]]\), indeed \(5 + 5\cdot 2^{3} = 45\) and \(3 + 0\cdot 2^{3} = 3\).

As such the address \(i\) with decomposition \([[5, 5], [3, 0]]\) will contain

\[ [[\textsf{GGSW}(X^{-5}), \textsf{GGSW}(X^{-5\cdot 2^{3}})], [\textsf{GGSW}(X^{-3}), \textsf{GGSW}(X^{-0\cdot 2^{3}})]] \]

(values are negated because we want to rotate to the left)

Encrypting an Address

let idx: u32 = source.next_u32() % params.max_addr() as u32;
let mut addr: Address = Address::alloc(&params);
addr.encrypt_sk(&params, idx, &sk);

Read procedure

Overview

Upon receiving a read request with an encrypted address from the client, the server executes the read operation via:

pub fn read(
        &mut self,
        address: &Address,
        keys: &EvaluationKeys,
    ) -> Vec<GLWECiphertext<Vec<u8>>>;

let ct: Vec<GLWECiphertext<Vec<u8>>> = ram.read(&address, &keys);

The read method returns a Vec<GLWECiphertext<Vec<u8>>>, i.e. one GLWE ciphertext per SubRam. Each GLWE ciphertext encrypts the corresponding chunk of the value at the requested address in its constant coefficient.

Inner workings

At a high level, the read operation evaluates this homomorphic circuit:

Pack(...Pack(Pack(Vec<GLWE> x Address[0]) x Address[1]) x Address[2]), ...) x Address[last]

Vec<GLWE> x Address[i]: rotates the coefficients of each GLWE by the \(i\)-th coordinate.
Pack(Vec<GLWE>): extracts the constant coefficient of each GLWE and packs them into a new GLWE, reducing the number of GLWE ciphertexts by a factor of \(n\) at each step.

This process continues until one GLWE remains. That GLWE is then rotated by Address[last], and its constant coefficient is extracted and returned.

The read operation can also be visualized as a tree-like circuit — this perspective will be useful later. Here’s a diagram for a SubRam of 8 GLWE (with PACK reducing by 4 for illustration):

GLWE0  GLWE1  GLWE2  GLWE3  GLWE4  GLWE5  GLWE6  GLWE7
  │      │      │      │      │      │      │      │
 mul    mul    mul    mul    mul    mul    mul    mul  <- Address[0]
  │      │      │      │      │      │      │      │
GLWE0R GLWE1R GLWE2R GLWE3R GLWE4R GLWE5R GLWE6R GLWE7R
  │┌─────┘      │      │      │┌─────┘      │      │
  ││┌───────────┘      │      ││┌───────────┘      │
  │││┌─────────────────┘      │││┌─────────────────┘
  PACK                        PACK
  │                           │
GLWE8                       GLWE9
  │                           │
 mul <- Address[1]           mul <- Address[1]
  │                           │
GLWE8R                      GLWE9R
  │                           │
  │┌──────────────────────────┘
  PACK
  │
GLWE10
  │
 mul <- Address[2]
  │
GLWE10R

Toy Example

Let the data of the RAM be \(m_{0}, ..., m_{511}\), encrypted by batch of \(n=2^{6}\) GLWE ciphertexs, i.e. there are \(512 / n = 8\) GLWE ciphertexts:

\[ [\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{n+i}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{2n+i}X^{i}), ...] \]

Lets also take the address from the previous toy example. The address encrypts \(i=237\) decomposed with basis \([[2^{3}, 2^{3}], [2^{3}, 2^{3}]]\):

\[ [[\textsf{GGSW}(X^{-5}), \textsf{GGSW}(X^{-5\cdot 2^{3}})], [\textsf{GGSW}(X^{-3}), \textsf{GGSW}(X^{-0\cdot 2^{3}})]] \]

The read procedure will proceed as follow:

Apply \(\textsf{GGSW}(X^{-5})\) and \(\textsf{GGSW}(X^{-5\cdot 2^{3}})\) on the GLWE ciphertext to obtain

\[ [\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{n+i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{2n+i+45}X^{i}), ...] \]

Extract the first coefficient of each GLWE and pack them into a new GLWE:

\[ [\textsf{GLWE}(\sum_{i=0}^{7} m_{in+45}X^{i})] \]

Apply \(\textsf{GGSW}(X^{-3})\) and \(\textsf{GGSW}(X^{0})\) on the GLWE ciphertext to obtain

\[ [\textsf{GLWE}(\sum_{i=0}^{7} m_{(i+3)n+45}X^{i})] \]

Extract the first coefficient which is \(m_{3n+45} = m_{237}\)

Read Before Write Procedure

The read_prepare_write call is identical to read, except that it caches the generated tree in memory for use in a subsequent write.

pub fn read_prepare_write(
        &mut self,
        address: &Address,
        keys: &EvaluationKeys,
    ) -> Vec<GLWECiphertext<Vec<u8>>>;

Write Procedure

Overview

It is necessary to call read_prepare_write at the target address before invoking write.

pub fn write<DataW: AsRef<[u8]>>(
    &mut self,
    w: &Vec<GLWECiphertext<DataW>>, // Each GLWE must encrypt [wi, 0, 0, ..., 0];
    address: &Address,              // Must be the same as the read_prepare_write
    keys: &EvaluationKeys,
)

ram.write(&ct_w, &addr, &keys);

Inner Workings

Writing a value to the FHE-RAM is conceptually the inverse of the read operation — though slightly more complex.

First, assume we have GLWE(w) encrypting the value w in the constant term, and access to the cached state from read_prepare_write and the address.

At the root (GLWE10R), which holds the rotated value, compute:

GLWE10R' <- GLWE10R - TRACE(GLWE10R) + GLWE(w)
GLWE10'  <- GLWE10R' x AUTO(Address[2], -1)

Next, update the previous layer:

GLWE8R' <- GLWE8R - TRACE(GLWE8R) + TRACE(GLWE10')
GLWE8'  <- GLWE8R' x AUTO(Address[1], -1)

GLWE9R' <- GLWE9R - TRACE(GLWE9R) + TRACE(GLWE10' * X)
GLWE9'  <- GLWE9R' x AUTO(Address[1], -1)

In general:

a_next[i] <- a_next[i] - TRACE(a_next[i]) + TRACE(a_prev * X^i)
a_next[i] <- a_next[i] x AUTO(Address[j], -1)

The process repeats, walking back the tree, until the original RAM data is updated.

Toy Example

Lets follow on the previous example, which was to read on a ram of 512 elements at index \(i=237\). Assume we want to write \(w\) at this index.

We have the intermediate states

\([\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{n+i}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{2n+i}X^{i}), ...]\)
\([\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{n+i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{2n+i+45}X^{i}), ...]\)
\([\textsf{GLWE}(\sum_{i=0}^{7} m_{in+45}X^{i})]\)
\([\textsf{GLWE}(\sum_{i=0}^{7} m_{(i+3)n+45}X^{i})]\) as well as the Address

\[ [[\textsf{GGSW}(X^{-5}), \textsf{GGSW}(X^{-5\cdot 2^{3}})], [\textsf{GGSW}(X^{-3}), \textsf{GGSW}(X^{-0\cdot 2^{3}})]] \]

As per the procedure describe above, we first start with the state 4).

First we replace the constant coefficient of \([\textsf{GLWE}(\sum_{i=0}^{7} m_{(i+3)n+45}X^{i})]\) with \(w\), which outputs \([\textsf{GLWE}(w + \sum_{i=1}^{7} m_{(i+3)n+45}X^{i})]\)
Then we derive \([\textsf{GGSW}(X^{3}), \textsf{GGSW}(X^{0\cdot 2^{3}})]\) from \([\textsf{GGSW}(X^{-3}), \textsf{GGSW}(X^{-0\cdot 2^{3}})]\).
We apply \([\textsf{GGSW}(X^{3}), \textsf{GGSW}(X^{0\cdot 2^{3}})]\) on \([\textsf{GLWE}(w + \sum_{i=1}^{7} m_{(i+3)n+45}X^{i})]\) to obtain \[ \textsf{GLWE}(\sum_{i=0}^{2}m_{(i+3)n+45}X^{i} + wX^{3} + \sum_{i=4}^{7}m_{(i+3)n+45}X^{i}) \]
We replace the first coefficient of each of the GLWE ciphertexts of \[[\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{n+i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{2n+i+45}X^{i}), ...]\] by the i-th coefficient of \[ \textsf{GLWE}(\sum_{i=0}^{2}m_{(i+3)n+45}X^{i} + wX^{3} + \sum_{i=4}^{7}m_{(i+3)n+45}X^{i}) \] to obtain \[ [\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i+45}X^{i}), ..., \textsf{GLWE}(w+\sum_{i=1}^{n-1} m_{3n+i+45}X^{i}), ...] \]
We derive \([\textsf{GGSW}(X^{5}), \textsf{GGSW}(X^{5\cdot 2^{3}})]\) from \([\textsf{GGSW}(X^{-5}), \textsf{GGSW}(X^{-5\cdot 2^{3}})]\).
We apply \([\textsf{GGSW}(X^{5}), \textsf{GGSW}(X^{5\cdot 2^{3}})]\) on each GLWE ciphertext of \[ [\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i+45}X^{i}), ..., \textsf{GLWE}(w+\sum_{i=1}^{n-1} m_{3n+i+45}X^{i}), ...] \] to obtain \[ [\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i}X^{i}), ..., \textsf{GLWE}(\sum_{i=1}^{44} m_{3n+i}X^{i} + wX^{45} + \sum_{i=46}^{n-1}X^{i}), ...] \]

Code Example

See code example here.

Performance

Following are out of the shelve performance results for various RAM sizes where entry size is fixed to a 32-bit value stored in 4 chunks (i.e. RAM contains 4 SumRams). Benchmarks were obtained on an i9-12900K using a single core.

The main intention of the table below is to provide a rough estimate of the cost and how performance scales with \(\#\mathcal{D}\). Precise performance heavily depends on other factors mentioned later.

\(\#\mathcal{D}\)	Read	Write
1kb	6ms	7ms
4kb	6ms	7ms
16kb	6ms	7ms
64kb	38ms	64ms
256kb	102ms	228ms
1024kb	302ms	871ms
4096kb	930ms	3522ms

Until ~16 KB, the cost remains roughly constant. After that point, the cost grows superlinearly as:

\[ \mathcal{O}(\log_n(\#D) \cdot \#D) \]

This cutoff occurs because at larger sizes, \(\mathcal{D}\) no longer fits in a single ciphertext — the number of ciphertexts increases proportionally to \(\#\mathcal{D}\).

Key Factors Affecting Performance

Parallelization: Parallelization can be exploited at multiple levels:
- Word level: e.g. storing a u32 as (u8, u8, u8, u8) allows parallel processing across chunks.
- Ciphertext level: as \(\#\mathcal{D}\) grows, so does the number of ciphertexts, providing further opportunities for parallelism.
Read/Write Budget: Each read or write operation introduces a small amount of additive noise to \(\#\mathcal{D}\). There is an upper limit to the number of operations that can be performed before this noise corrupts the encrypted data. A smaller noise budget allows for smaller parameters, thus better performance.
Cryptographic Parameterization: Parameterizing lattice-based cryptographic schemes is a vast topic and beyond the scope of this post. The key point is that for any given security target, there are multiple parameter choices — each presenting trade-offs between noise growth, ciphertext size, and operational complexity.