In this post, I'd like to provide a technical introduction to
key encapsulation mechanisms (KEMs), with a focus on proving the
security of various constructions.
A key encapsulation mechanism is like a public key encryption
scheme, tailored to the specific use case of sending a key to the other
party.
This can be used to establish shared key between two parties.
One of them runs the KEM, getting a shared key and a ciphertext,
and then sends that ciphertext to the other person.
That person can unwrap the ciphertext to recover the shared key.
From that point on, the parties can use that shared key to communicate
privately.
In this way, you can think of KEMs as a generalization of key exchanges,
which don't have to be symmetric.
In fact, you can use a key exchange to construct a KEM, as we'll
see later.
In this post, we'll go over:
How to define KEMs.
How to capture the security of KEMs with games.
How these notions of security compare with other variations.
How to construct a secure KEM from RSA.
How to construct a secure KEM using Elliptic Curves.
There are other important topics around KEMs that I'd also like to touch
upon, but this sample is already quite a hefty serving
if you include all of the proofs; those will have to wait for a future post.
This post is more focused on the technical side of things, and on provable
security.
I'll be using state separable proofs
pervasively throughout this post, so I'd recommend
reading my post on the subject if you'd
like to get up to speed.
Defining KEMs
To begin, let's formally define what a KEM is.
A key encapsulation mechanism (KEM), is a scheme similar to public key encryption.
This scheme consists of three algorithms:
GenEncapDecap:()→(SK,PK):PK→(K,C):(SK,C)→K
The first algorithm, Gen, creates a new key pair, consisting
of a private key, and its corresponding public key.
The second algorithm, Encap, takes in a public key,
for the recipient, and returns a symmetric key and an encapsulation
(also called a ciphertext).
With Decap, a recipient can use their private key to extract
a symmetric key from the ciphertext.
This scheme also needs to satisfy some notion of correctness.
Intuitively, decapsulating should return the same key that was encapsulated.
More formally, the following procedure must always succeed:
No matter what key pair comes out of Gen, encapsulating
and then decapsulating must return the same key.
If this didn't hold, your KEM wouldn't be very useful for establishing
a shared key between two parties, or for public key encryption,
which are two of the main use cases for KEMs.
IND Security
Naturally, having a correct scheme is easy if you don't care about
security.
The usual notion of security is similar to that of public key encryption.
The basic idea is that you shouldn't be able to tell whether or not
a specific key is hidden inside of the ciphertext.
In particular, you shouldn't be able to distinguish between receiving
a random key, and receiving the key inside of a ciphertext.
We formalize this as a pair of games, in the state separable style:
In one version of the game, we get the key inside the encapsulation,
and in the other version of the game, we get a completely random key.
An adversary should not be able to distinguish between the two games.
This means that the adversary can't learn any information about the
key from the ciphertext.
[!note] Note:
In this variant, you can query the challenge multiple times.
This is equivalent to only being able to query once, although the security
gets worse as a linear function of the number of queries you do.
For simplicity, I'll stick with the multi-query variants for the rest
of this post.
This variant of security can also be referred to as IND-CPA.
This is because the adversary is able to create encapsulations themselves,
because they know the public key.
This kind of query is a chosen plaintext, hence the chosen plaintext attack (CPA) in the name.
With symmetric encryption, on the other hand, you need to have a secret key to even
encrypt data, so the CPA capability is a meaningful distinction.
IND-CCA Security
An even stronger variant of security also allows the adversary to make
decapsulation queries on ciphertexts of their choice.
We call these chosen ciphertext attacks (CCA).
This models situations where we might know the keys inside certain ciphertexts.
While in practice these leakages could be quite limited, having a more
expansive model of security covers many more situations,
and we can construct schemes which satisfy this general model.
This is the same as the previous IND game, except that we now
have the ability to make decapsulation queries.
In order to make the game not trivially easy to win, we keep track
of which challenge ciphertexts have been produced, and refuse decapsulation
queries for that set.
Otherwise, you could ask for the decapsulation for a ciphertext you saw
earlier, and compare the result with the key you received before.
Equivalence with Other Definitions
[!note] Note:
You can skip this section.
It's mainly concerned with resolving a small discrepancy in definitions
between this post and the broader literature.
In the games we've seen so far, the adversary only sees one of the keys.
We can also model a situation where the adversary sees both of the keys:
Instead of giving them one of the keys like before, we give them both,
but we swap their order.
The adversary has to tell which key is which.
One natural question is whether or not this new game is equivalent
to our previous definitions.
It is.
As a bit of a warmup, let's prove this equivalence.
IND-Both-CCA≤2⋅IND-CCA
The idea of this reduction is that we can replace k0 from the first
encapsulation with a random key, because of IND-CCA security.
This then allows us to swap k0 with k1, and then walk our way backwards,
giving us a bound for ϵ(IND-Both-CCAb).
We start by extracting out the encapsulation in IND-Both-CCAb,
using IND-CCA:
The idea of this proof is that we can emulate IND-CCA
using IND-Both-CCA by dropping one of the keys.
The question is: which key do we drop?
Because we can't distinguish between k0 and k1, it doesn't matter
which one we choose, essentially.
First, let's define the following wrapper package:
These definitions turned out to be equivalent, but both
fit into a kind of "real-or-random" paradigm: one of the keys
is randomly selected, while the other is related to the ciphertext.
This is a general kind of paradigm, which can be used to define the security of other schemes as well.
It's useful to stay within this paradigm, since phrasing all of
our security definitions in terms of real-or-random makes it easier
to do reductions.
One alternative paradigm, that gets used for public key encryption,
is "left-or-right".
With this approach, instead of having a random output and a real output,
you instead have two outputs, and you try and distinguish between them.
For the KEM case, this would give the following game:
Unfortunately, this definition fails to capture a useful notion.
The issue is that we have no guarantee that the key is sufficiently
random, which makes it not usable for encryption later.
This KEM always returns the same key for encapsulation and decapsulation,
which makes it completely useless to construct a secure public key
encryption scheme.
It does, however, satisfy the IND-LR definition of security above.
This is because (0,0,cb) contains no information about b.
Constructing KEMs
We've now seen a bit of how KEMs work in theory, but we've yet to actually
see an example of making one.
In this section, we go over a few potential constructions of KEMs,
and show that they're secure, under appropriate assumptions.
From RSA
Let's start with RSA.
There are thousands of good explanations of RSA, so I'll settle for
a bad, but short, explanation.
In RSA, your public key consists of a modulus N, and a number e.
Your secret key consists of a factorization p,q such that N=p⋅q,
along with a secret exponent d, such that
d⋅e≡1modφ(N)
These parameters give us a trapdoor permutation for the set Z/(N):
F(x)F−1(y):=xemodN:=ydmodN
The function F is a permutation, with F−1 its inverse.
Computing the inverse should be hard without knowing the secret exponent
d, or being able to factor N.
We make this security notion precise with the following game:
The adversary makes guesses for the pre-image of the permutation,
and it should be difficult for them to succeed in their guess.
If the adversary has a good strategy for guessing, then they'll be able
to figure out what b is.
We can use RSA to create a KEM.
The pre-image x will be our key, and then
y:=xemodN is our ciphertext, hiding the key.
The recipient can use their secret to unwrap x from this ciphertext.
Now, because Z/(N) may not be a very useful key by itself,
we also make use of a hash function H:Z/(N)→K,
so that we can derive a more useful key from x.
For example, if we wanted to use our key for encryption with AES,
or ChaCha20, we'd want to use a hash function from integers to 256 bit keys.
Because the RSA function is a permutation, correctness is satisfied.
As for security, this isn't a trivial matter.
First, we'll model our hash function H as a random oracle, where
the outputs will be generated perfectly at random, on demand.
Second, rather than considering normal IND-CCA security,
instead we'll consider IND-CCA-1 security, where the adversary
can only make a single challenge query.
Making Q challenge queries instead of just a single one only decreases
security by a factor of Q, as can be shown via a hybrid argument.
(My post on state-separable proofs contains a proof of this in the general case).
IND-CCA-1≤2⋅RSA
First, let's write down the IND-CCA-1 game in this context.
The (1) notation means that Challenge can only be queried
a single time.
We've also replaced the hash function with a lazily initialized table
of random values.
This is why we're doing our proof in the "random oracle model", since
the hash function is being modelled this way.
Next, we follow the same strategy as in Theorem 12.2 of
Boneh and Shoup.
Rather than having our table h be used for the pre-images x, instead
we'll setup a table h′ for the images y.
This will allow us to minimize our use of the secret key d in decapsulation
queries, which makes it easier to extract out the RSA functionality for our
package.
Since we only have a single challenge query, we can generate the instance
before Challenge is called.
We'll also be able to do a lot of tricks with the decapsulation
queries, but it's easier to explain those after seeing the new game:
The big trick we've pulled in this game is in our decapsulation function.
By having a hash table for the images y, we avoid the need to make use
of the secret key d when decapsulating.
Whereas we'd normally compute x←ydmodN, and then query
H(x), because our hash function then computes
xemodN to index into our table, we can avoid all of that,
and use the table ourselves.
Naturally, we have IND-CCA-1b=Γb0.
From this point on, it's relatively straight sailing.
Now we use the RSA0 game to determine if we need to return k0
in our hash function.
Next, note that if we switch to using RSA1, we get the following game:
Another way to construct a KEM is with a cryptographic group, like
a suitable elliptic curve.
This is a group G of prime order q, generated by G,
and with an associated field of scalars Fq.
A key pair for the KEM will be a scalar a, for the private key,
and a point A:=a⋅G, for the public key.
In order to encapsulate a key, we generate a random scalar b,
set B:=b⋅G as our encapsulation, and H(b⋅A) as the key.
The receiver can set H(a⋅B) as their key, arriving at the same
result because of commutativity:
a⋅B=a⋅b⋅G=b⋅a⋅G=b⋅A
More formally, given a cryptographic group G,
and a hash function H:G2→K, we define the KEM as follows:
One slight difference from the brief presentation above is that we hash
both B and C, rather than just C.
This makes the security proof easier.
To characterize the security of the KEM, we need to have some kind notion
of security for the group itself.
The notion we'll be relying on (at least, in our proof), is the
"interactive computational Diffie-Hellman" problem.
This states, essentially, that given two points A=a⋅G and
B=b⋅G, it should be hard to find C=ab⋅G, even given
access to an oracle which tells you whether or not two points B^
and C^ satisfy the relation a⋅B^=C^.
Next, like with the RSA kem proof, we need to minimize our use
of the secret key when decapsulating.
We follow the same strategy as in Theorem 12.4 of
Boneh and Shoup.
We replace our hash table for pairs (B,C) with a table
for B, when C=a⋅B, and a table for (B,C), when B and C
are not related.
Since we only have a single challenge query, we can also lift generation
of those values outside the challenge.
Basically, we maintain the invariant that a⋅B^=C^⟹h[B^,C^]=h′[B^].
this allows us to make decapsulation queries without making use of the secret
key.
This will make extracting out the ICDH game much easier.
This extraction is also made easier in that the query a⋅B^=C^ is precisely the kind of query we can do in this game.
Now, notice that in the game Γb1∘ICDH1, k0 isn't
used anywhere but inside of Challenge, because Guess
always returns 0.
This means that the difference between k0 and k1 becomes a matter
of naming, so:
Γ1_0∘ICDH_1=Γ1_1∘ICDH_1
This is enough to make our full walk, and tie everything together:
Another method that's going to be increasingly important is using
lattices to construct KEMs.
Both of the schemes I've mentioned in this post, using RSA,
and using elliptic curves, will be broken by future quantum computers.
This is why NIST started a competition in order to standardize
KEMs secure against quantum computers.
Many of the candidates were based on lattices, including the finalist,
Kyber.
Other candidates used different primitives, like isogenies, or
random codes.
Composing KEMs
Talking about the post-quantum KEMs segues nicely into this next topic.
Right now, we have very well tested and trusted KEMs, which aren't
secure against quantum computers.
We also have a new KEM which should be secure against quantum computers,
but given its novelty, you may not trust it that much yet.
Because of this, many people are trying to deploy both schemes together
in a hybrid fashion.
This means that you can get post-quantum security if the new KEM stands
the test of time, but you also don't sacrifice classical security
if the new KEM you include happens to be flawed.
To achieve this, what you want is a way to combine KEMs,
mixing together KEMs A and B to create a new KEM, which should
be secure as long as at least one of the two ingredients is.
If A is broken, that's fine as long as B is secure, and vice versa.
This construction is called a KEM combiner, and it can combine two
KEMs in this way, without having to inspect how the KEMs work internally
at all.
The paper [^[GHP18]] goes over this notion of combiners,
and presents a very elegant construction.
To encapsulate, you call each of the individual
KEMs first, giving you (kA,cA) and (kB,cB).
Next, (cA,cB) becomes your ciphertext.
Now, you need some way to combine all of this information to derive
a single key k.
One idea would be to simply xor the two keys, giving you k←kA⊕kB.
Unfortunately, this is not IND-CCA secure, because the resulting key is malleable.
The idea in this paper is instead to use a kind of pseudo-random function (PRF) to derive the result:
F(kA,kB,(cA,cB))
This is a special kind of PRF called a split-key PRF.
Split-Key PRFs in Theory
More formally, a split-key PRF is a function:
F:K0×K1×X→Y
We have two types for keys, K0 and K1, as well
as an input type X, and an output type Y.
The intuition for this function is that it should behave like
a random function X→Y as long as the adversary
doesn't know either of the keys.
Even if the adversary controls one of the keys, and can query
the function F on different values for this key, they still shouldn't
be able to distinguish this function from a random one.
One subtlety is that we don't allow the adversary to query the
same input point twice, even with different keys.
A stronger security notion would be to allow queries to the same input
with a different key, but we won't need that for proving the security
of our KEM combination scheme.
Split-Key PRFs are Secure PRFs
[!note] Note:
If you're convinced of this just from reading the section header,
feel free to skip this section.
With a split-key PRF, we model security in a situation where the adversary
can control one of the keys.
When looking at the security of a PRF, the adversary doesn't control
the key at all.
One interesting fact is that a split-key PRF is necessarily a secure
PRF, considering (k0,k1) as a single "key".
PRF≤SPLIT-PRF(σ)
Let's recall the PRF game (in this context), briefly:
It turns out that split-key PRFs precisely capture what we need
to compose KEMs together.
In this section, we formally define the composition of two KEMs using
such a PRF, and prove that it's IND-CCA secure, assuming the PRF is split-key secure,
and one of the underlying KEMs is IND-CCA secure.
We start with two KEMs, A, and B.
The composition will be like the informal idea we saw earlier:
in order to encapsulate, we use the encapsulation from each KEM,
and then combine them together as a pair (cA,cB).
When decapsulating, we end up with two keys kA and kB.
We'll use a split-key PRF to combine them, including the ciphertexts, to give us:
k←F(kA,kB,(cA,cB))
A bit more formally, given two KEMs A and B, and a function
In terms of encapsulation, this works in the way you'd expect for a product
construction.
The only real trick comes from decapsulation, where we use our function
F to combine both keys and the ciphertexts to produce a final key.
Security Proof
It turns out that if F is a split-key PRF, then the security of this
scheme reduces to that of either one of the KEMs.
Because our combined KEM definition is completely symmetric with respect
to A or B, we prove this, without loss of generality, just by treating
the case where A is IND-CCA secure, and show that:
IND-CCA-1(A×FB)≤2⋅IND-CCA-1(A)+2⋅SPLIT-PRF
Let's start, as we've done a few times in this post, by explicitly writing
out the IND-CCA-1 game using our KEM:
Next, we pull the usual trick of pulling out code from Challenge,
since it only gets called a single time.
We also want to change the decapsulation function to minimize the use
of skA, since we'll be replacing explicit calls to Decap
with oracle calls to IND-CCA-1(A).
The behavior of decapsulation is the same, we just make a few preparations
for later.
First, we cache the output of the decapsulation in d.
This is because the PRF oracle we'll use later can't be queried twice
on the same input, so we'll need to use this table to avoid doing that.
Second, we check whether c^A=cA in order to explictly
use the hardcoded kA.
This will make extracting out the A KEM much easier.
With this game, we've extracted out the A KEM completely.
One key thing which enabled us to do this was checking that
c^A=cA ourselves, to avoid hitting the assertion inside
of A's Decap oracle.
Naturally, we'll be able to make the jump from IND-CCA-10(A)
to IND-CCA-11(A), but that won't be enough to complete our proof.
We still need to make use of our PRF assumption.
To that effect, let's write a game which explicitly captures what
happens after making the switch to an ideal KEM:
At the start of the game, we can ignore whatever key we get from A's
encapsulation, and instead use a random kA.
At the end of the game, in decapsulation, we've shifted things around
to make how the PRF is queried more clear.
In one branch, we query the PRF using that initial kA key,
and so we want to make sure to cache results.
In the other branch, we use an unrelated key, and so we don't care.
Our next trick is going to be to replace calls to F(kA,…)
with oracle queries to the SPLIT-PRF game.
The rest of the game doesn't matter.
The important part is that now there's no difference between k0
and k1, and so we have Γ04=Γ14.
Now, we can do the walk backwards, chaining everything together to get
our result:
In the KEM combiner paper [^[GHP18]], they present many
examples of split-key PRFs.
As an illustration, I'll choose one of the simplest ones, which
can be proven secure even without random oracles.
I'd recommend checking out the papers for other more efficient constructions.
Xoring Two PRFs
One simple construction involves combining two PRFs F0 and F1 together:
F(k0,k1,x):=F0(k0,x)⊕F1(k1,x)
This PRF is split-key because even if you can influence one of the keys,
the other side looks completely random, and thus so is the final result.
Let's prove this a bit more formally.
Without loss of generality, we can simply prove that it's split
key for the first key, k0, since the function is symmetric.
SPLIT-PRF(0)≤PRF(F0)
The high level proof idea is that what happens with F1 can't affect
the outcome, because F0 being a PRF is enough to guarantee that the
output is random.
In our split-key game, we'll easily be able to
This is because xoring any value with a (uniformly) random value
returns another random value.
Thus, xoring F(k1,x) with a random y gives us a random result.
To summarize, we have:
SPLIT-PRF0=Γ∘PRF0≈ϵ1Γ∘PRF1=SPLIT-PRF1
which concludes our proof.
□
Other Methods
The kem combiner paper [^GHP18] also provides various
other methods for creating split-key PRFs.
One method is to do F(kA,cA)⊕F(kB,cB), including only
one of the ciphertexts.
Another method, which requires the random oracle model, is to do:
H(kA⊕kB,cA,cB).
And this is just a small sample; I really recommend checking out the paper
if you want to know more; it's a very well written paper!
Conclusion
While there weren't that many topics covered in thist post, I felt
that stopping it here made for a hefty post already, especially
with the focus on proofs.
I think state-separable proofs are quite fun to write, and hopefully
they were enjoyable to read as well.
A secret about this post is that KEMs are really just an excuse
to write more state-separable proofs, since that's what motivated
me to write about the topic.
There's still a lot more interesting things to say about KEMs though.
One notion I didn't touch on at all here is that of authenticated
KEMs, where you also want to authenticate the sender in some way.
Neil Madden wrote a post on this topic as well,
and you might want to check out [^ABHKLR20],
which analyzes the HPKE standard for authenticated KEMs, outlining
various notions of security for this construction.
Another thing that would be neat to go over is the notion of
deniable authenticated KEMs, wherein it's possible to forge ciphertexts
which claim to have been sent by another person to yourself.
This makes it so that KEM ciphertexts can't be used as evidence
of communication between two parties.
This also butts heads against the notion of insider security,
which Yolan Romailler brought to my attention on twitter.
See also his blog post touching on ephemeral keys and insider
security in key exchanges.