June 17, 2025
For Phantom VM, we needed to design an efficient fully homomorphic encryption (FHE)-based random-access memory (RAM). Today, we are releasing its implementation, built using our FHE library Poulpy, available here.
An FHE-based RAM allows a user to store their RAM encrypted on a server. The user can then read from or write to any memory address while keeping the content of the RAM, the read/write address, and the write value private. The server only learns the total size of the RAM and that the user made a read/write call, nothing else
With a 1MB RAM, our construction achieves a read latency of 302 ms and a write latency of 871 ms when running on a single core of an Intel i9-12900K processor. The system is also embarrassingly parallelizable, which can significantly reduce runtimes in practice.
In the following sections, we provide an overview of our FHE-RAM design, along with instructions for using it. We also present performance benchmarks for various RAM sizes and discuss the factors that influence performance.
Note: In this section we use terms like GLWE, GGSW, and automorphisms. If you are new to these concepts you can think of
- a GLWE ciphertext as an encryption of a polynomial \(\sum_{i=0}^{n-1} m_{i}X^{i}\),
- a GGSW ciphertext as an encryption of a polynomial such that it enables the homomorphic product GLWE x GGSW -> GLWE. In our use case we need GGSW ciphertexts to encrypt monomials (i.e. \(X^{\pm i}\)),
- an automorphism as a homomorphic permutation over the message space of a GLWE or GGSW ciphertext which applies the map \(X^{i} \rightarrow X^{ki} \mod (X^{n}+1)\).
- a trace operation as homomorphically zeroing all coefficients except the constant one. For example the trace maps \(\sum_{i=0}^{n-1} m_{i}X^{i}\) to \(m_{0}\).
In the setup phase, the client and server first retrieve the public parameters. These include the cryptographic parameters as well as some information about the RAM instance, such as its maximum supported address.
pub struct Parameters {
: Module<FFT64>, // FFT/NTT tables.
module: usize, // Torus 2^{-k} decomposition.
basek: usize, // GLWE/GGLWE/GGSW rank.
rank: usize, // Ram plaintext (GLWE) Torus precision.
k_pt: usize, // Ram ciphertext (GLWE) Torus precision.
k_ct: usize, // Ram address (GGSW) Torus precision.
k_addr: usize, // Ram evaluation keys (GGLWE) Torus precision
k_evk: f64, // Secret-key distribution.
xs: f64, // Noise standard deviation.
xe: usize, // Maximum supported address.
max_addr: Vec<u8>, // Digit decomposition of N.
decomp_n: usize, // Digit decomposition of a Ram word.
word_size}
An instantiation of the default parameters can be obtained with:
let params = Parameters::new();
Next, the client generates a key pair using the following method:
pub fn gen_keys(params: &Parameters) -> (GLWESecret<Vec<u8>, FFT64>, EvaluationKeys)
The GLWESecret<Vec<u8>, FFT64>
struct
represents the client’s symmetric encryption key for GLWE (General
Learning With Errors). The EvaluationKeys
struct is defined
as:
pub struct EvaluationKeys {
pub(crate) auto_keys: HashMap<i64, AutomorphismKey<Vec<u8>, FFT64>>,
pub(crate) tensor_key: TensorKey<Vec<u8>, FFT64>,
}
It contains a set of public evaluation keys, which are sent to the machine operating the FHE-RAM. These keys are required for some of the homomorphic operations described later.
Finally, the client chooses some initial data to encrypt into the RAM. Before doing so, let’s take a look at how the RAM is structured.
pub struct Ram {
pub(crate) params: Parameters, // Global public parameters
pub(crate) subrams: Vec<SubRam>, // One SubRam for each word chunk
pub(crate) scratch: ScratchOwned, // Scratch space
}
The Ram
contains the public parameters, a vector of
SubRam
, and some pre-allocated scratch
space.
Our design allows words to be split into chunks (for example, a
u32
into 4 × u8
or 8 × u4
). This
is useful if homomorphic operations will be applied to the values read
from or written to the RAM. Each chunk is encrypted in its own
SubRam
.
pub(crate) struct SubRam {
: Vec<GLWECiphertext<Vec<u8>>>, // Actual encrypted data
data: Vec<Vec<GLWECiphertext<Vec<u8>>>>, // Temporary tree used by the read step
tree: StreamPacker, // GLWE packing procedure
packer: bool, // Wether it is read to be written to
state}
Each SubRam
stores a vector of GLWE ciphertexts, where
each ciphertext encrypts the respective chunk of the value at the RAM
address. The tree
holds a temporary structure populated
during a read call to prepare for a subsequent write. The
state
boolean indicates whether the SubRam
is
ready for a write. The packer
provides a memory-efficient
way to merge several GLWE ciphertexts.
The client instantiates the RAM by calling:
pub fn encrypt_sk(&mut self, data: &[u8], sk: &GLWESecret<Vec<u8>, FFT64>);
with their byte buffer data
. The encrypt_sk
method encrypts the client’s data into the RAM.
For example, when the number of chunks is set to 4 (i.e.,
4 × u8
), and the client’s byte buffer is:
, x01, x02, x03, x10, x11, x12, x13, x20, x21, x22, x23, ...] [x00
then the RAM will contain 4 SubRam
s as follows:
0] <- Vec<GLWE([x00, x10, x20, ...]), GLWE(...), ...>
SubRam[1] <- Vec<GLWE([x01, x11, x21, ...]), GLWE(...), ...>
SubRam[2] <- Vec<GLWE([x02, x12, x22, ...]), GLWE(...), ...>
SubRam[3] <- Vec<GLWE([x03, x13, x23, ...]), GLWE(...), ...> SubRam[
An address is an encryption of a monomial \(X^{-i} \equiv -X^{n-i} \mod (X^{n}+1)\), where \(0 \leq i < \text{maxaddr}\) for some power of two \(n\) (the GLWE degree). If the maximum supported address exceeds \(n\), \(X^{-i}\) cannot be directly encrypted. Instead, it is decomposed in base \(n\) as:
\[ X^{-i} = \prod_{j=0}^{\log_n(\text{maxaddr})-1} X^{-d_j} \quad \text{with} \quad |d_j| < n \]
pub struct Address {
: Vec<Coordinate<Vec<u8>>>, // Base n decomposition of the index
coordinates: usize, // Torus precision
k: usize, // Digit decomposision
rows: Base2D, // Full address base
base2d}
pub(crate) struct Base2D(pub Vec<Base1D>);
pub(crate) struct Base1D(pub Vec<u8>);
Each Coordinate
stores the \(j\)-th digit of \(i\) in base \(n\) (i.e. GGSW(\(X^{-d_j}\))).
pub(crate) struct Coordinate<D> {
pub(crate) value: Vec<GGSWCiphertext<D, FFT64>>,
pub(crate) base1d: Base1D,
}
A Coordinate
can store more than a single GGSW
ciphertext because we allow for a second level of decomposition over
base-\(n\) digits of the index \(i\). This enables address representation
digits to have small values and not depend on \(n\), which makes it easier to compute the
address homomorphically when FHE-RAM is composed with other protocols.
However, the second decomposition is not strictly needed for basic
FHE-RAM functionality.
Let \(i=237\) and \(n=2^{6}\), then the first digit decomposition of \(i\) (in base \(n\)) is \(\vec{d} = [45, 3]\). Indeed we have that \(45 + 3\cdot 2^{6} = 237\). Now lets assume that we want to bound the digits of \(\vec{d}\) by \(B < n\) for some with \(B|n\) (since \(n\) is a power of two, this implies that \(B\) is too), for example with \(B=2^{3}\). We apply a second decomposition in base \(B\) on \(\vec{d}\) and we obtain \(\vec{d'} = [[5, 5], [3, 0]]\), indeed \(5 + 5\cdot 2^{3} = 45\) and \(3 + 0\cdot 2^{3} = 3\).
As such the address \(i\) with decomposition \([[5, 5], [3, 0]]\) will contain
\[ [[\textsf{GGSW}(X^{-5}), \textsf{GGSW}(X^{-5\cdot 2^{3}})], [\textsf{GGSW}(X^{-3}), \textsf{GGSW}(X^{-0\cdot 2^{3}})]] \]
(values are negated because we want to rotate to the left)
let idx: u32 = source.next_u32() % params.max_addr() as u32;
let mut addr: Address = Address::alloc(¶ms);
.encrypt_sk(¶ms, idx, &sk); addr
Upon receiving a read request with an encrypted address
from the client, the server executes the read operation via:
pub fn read(
&mut self,
: &Address,
address: &EvaluationKeys,
keys-> Vec<GLWECiphertext<Vec<u8>>>; )
let ct: Vec<GLWECiphertext<Vec<u8>>> = ram.read(&address, &keys);
The read
method returns a
Vec<GLWECiphertext<Vec<u8>>>
, i.e. one
GLWE ciphertext per SubRam
. Each GLWE ciphertext encrypts
the corresponding chunk of the value at the requested address in its
constant coefficient.
At a high level, the read operation evaluates this homomorphic circuit:
Pack(...Pack(Pack(Vec<GLWE> x Address[0]) x Address[1]) x Address[2]), ...) x Address[last]
Vec<GLWE> x Address[i]
: rotates the coefficients
of each GLWE by the \(i\)-th
coordinate.Pack(Vec<GLWE>)
: extracts the constant
coefficient of each GLWE and packs them into a new GLWE, reducing the
number of GLWE ciphertexts by a factor of \(n\) at each step.This process continues until one GLWE remains. That GLWE is then
rotated by Address[last]
, and its constant coefficient is
extracted and returned.
The read operation can also be visualized as a tree-like circuit —
this perspective will be useful later. Here’s a diagram for a
SubRam
of 8 GLWE (with PACK reducing by 4 for
illustration):
GLWE0 GLWE1 GLWE2 GLWE3 GLWE4 GLWE5 GLWE6 GLWE7
│ │ │ │ │ │ │ │
mul mul mul mul mul mul mul mul <- Address[0]
│ │ │ │ │ │ │ │
GLWE0R GLWE1R GLWE2R GLWE3R GLWE4R GLWE5R GLWE6R GLWE7R
│┌─────┘ │ │ │┌─────┘ │ │
││┌───────────┘ │ ││┌───────────┘ │
│││┌─────────────────┘ │││┌─────────────────┘
PACK PACK
│ │
GLWE8 GLWE9
│ │
mul <- Address[1] mul <- Address[1]
│ │
GLWE8R GLWE9R
│ │
│┌──────────────────────────┘
PACK
│
GLWE10
│
mul <- Address[2]
│
GLWE10R
Let the data of the RAM be \(m_{0}, ..., m_{511}\), encrypted by batch of \(n=2^{6}\) GLWE ciphertexs, i.e. there are \(512 / n = 8\) GLWE ciphertexts:
\[ [\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{n+i}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{2n+i}X^{i}), ...] \]
Lets also take the address from the previous toy example. The address encrypts \(i=237\) decomposed with basis \([[2^{3}, 2^{3}], [2^{3}, 2^{3}]]\):
\[ [[\textsf{GGSW}(X^{-5}), \textsf{GGSW}(X^{-5\cdot 2^{3}})], [\textsf{GGSW}(X^{-3}), \textsf{GGSW}(X^{-0\cdot 2^{3}})]] \]
The read procedure will proceed as follow:
\[ [\textsf{GLWE}(\sum_{i=0}^{n-1} m_{i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{n+i+45}X^{i}), \textsf{GLWE}(\sum_{i=0}^{n-1} m_{2n+i+45}X^{i}), ...] \]
\[ [\textsf{GLWE}(\sum_{i=0}^{7} m_{in+45}X^{i})] \]
\[ [\textsf{GLWE}(\sum_{i=0}^{7} m_{(i+3)n+45}X^{i})] \]
The read_prepare_write
call is identical to
read
, except that it caches the generated tree in memory
for use in a subsequent write
.
pub fn read_prepare_write(
&mut self,
: &Address,
address: &EvaluationKeys,
keys-> Vec<GLWECiphertext<Vec<u8>>>; )
It is necessary to call read_prepare_write
at the target
address
before invoking write
.
pub fn write<DataW: AsRef<[u8]>>(
&mut self,
: &Vec<GLWECiphertext<DataW>>, // Each GLWE must encrypt [wi, 0, 0, ..., 0];
w: &Address, // Must be the same as the read_prepare_write
address: &EvaluationKeys,
keys
)
.write(&ct_w, &addr, &keys); ram
Writing a value to the FHE-RAM is conceptually the inverse of the read operation — though slightly more complex.
First, assume we have GLWE(w)
encrypting the value
w
in the constant term, and access to the cached state from
read_prepare_write
and the address
.
At the root (GLWE10R
), which holds the rotated value,
compute:
GLWE10R' <- GLWE10R - TRACE(GLWE10R) + GLWE(w)
GLWE10' <- GLWE10R' x AUTO(Address[2], -1)
Next, update the previous layer:
GLWE8R' <- GLWE8R - TRACE(GLWE8R) + TRACE(GLWE10')
GLWE8' <- GLWE8R' x AUTO(Address[1], -1)
GLWE9R' <- GLWE9R - TRACE(GLWE9R) + TRACE(GLWE10' * X)
GLWE9' <- GLWE9R' x AUTO(Address[1], -1)
In general:
a_next[i] <- a_next[i] - TRACE(a_next[i]) + TRACE(a_prev * X^i)
a_next[i] <- a_next[i] x AUTO(Address[j], -1)
The process repeats, walking back the tree, until the original RAM data is updated.
Lets follow on the previous example, which was to read on a ram of 512 elements at index \(i=237\). Assume we want to write \(w\) at this index.
We have the intermediate states
\[ [[\textsf{GGSW}(X^{-5}), \textsf{GGSW}(X^{-5\cdot 2^{3}})], [\textsf{GGSW}(X^{-3}), \textsf{GGSW}(X^{-0\cdot 2^{3}})]] \]
As per the procedure describe above, we first start with the state 4).
See code example here.
Following are out of the shelve performance results for various RAM
sizes where entry size is fixed to a 32-bit value stored in 4 chunks
(i.e. RAM contains 4 SumRam
s). Benchmarks were obtained on
an i9-12900K using a single core.
The main intention of the table below is to provide a rough estimate of the cost and how performance scales with \(\#\mathcal{D}\). Precise performance heavily depends on other factors mentioned later.
\(\#\mathcal{D}\) | Read | Write |
---|---|---|
1kb | 6ms | 7ms |
4kb | 6ms | 7ms |
16kb | 6ms | 7ms |
64kb | 38ms | 64ms |
256kb | 102ms | 228ms |
1024kb | 302ms | 871ms |
4096kb | 930ms | 3522ms |
Until ~16 KB, the cost remains roughly constant. After that point, the cost grows superlinearly as:
\[ \mathcal{O}(\log_n(\#D) \cdot \#D) \]
This cutoff occurs because at larger sizes, \(\mathcal{D}\) no longer fits in a single ciphertext — the number of ciphertexts increases proportionally to \(\#\mathcal{D}\).
Parallelization: Parallelization can be exploited at multiple levels:
u32
as
(u8, u8, u8, u8)
allows parallel processing across
chunks.Read/Write Budget: Each read or write operation introduces a small amount of additive noise to \(\#\mathcal{D}\). There is an upper limit to the number of operations that can be performed before this noise corrupts the encrypted data. A smaller noise budget allows for smaller parameters, thus better performance.
Cryptographic Parameterization: Parameterizing lattice-based cryptographic schemes is a vast topic and beyond the scope of this post. The key point is that for any given security target, there are multiple parameter choices — each presenting trade-offs between noise growth, ciphertext size, and operational complexity.