diff mbox series

[06/10] PM: hibernate: Add kernel-based encryption

Message ID 20220504161439.6.Ifff11e11797a1bde0297577ecb2f7ebb3f9e2b04@changeid (mailing list archive)
State New, archived
Headers show
Series Encrypted Hibernation | expand

Commit Message

Evan Green May 4, 2022, 11:20 p.m. UTC
Enabling the kernel to be able to do encryption and integrity checks on
the hibernate image prevents a malicious userspace from escalating to
kernel execution via hibernation resume. As a first step toward this, add
the scaffolding needed for the kernel to do AEAD encryption on the
hibernate image, giving us both secrecy and integrity.

We currently hardwire the encryption to be gcm(aes) in 16-page chunks.
This strikes a balance between minimizing the authentication tag
overhead on storage, and keeping a modest sized staging buffer. With
this chunk size, we'd generate 2MB of authentication tag data on an 8GB
hiberation image.

The encryption currently sits on top of the core snapshot functionality,
wired up only if requested in the uswsusp path. This could potentially
be lowered into the common snapshot code given a mechanism to stitch the
key contents into the image itself.

To avoid forcing usermode to deal with sequencing the auth tags in with
the data, we stitch the auth tags in to the snapshot after each chunk of
pages. This complicates the read and write functions, as we roll through
the flow of (for read) 1) fill the staging buffer with encrypted data,
2) feed the data pages out to user mode, 3) feed the tag out to user
mode. To avoid having each syscall return a small and variable amount
of data, the encrypted versions of read and write operate in a loop,
allowing an arbitrary amount of data through per syscall.

One alternative that would simplify things here would be a streaming
interface to AEAD. Then we could just stream the entire hibernate image
through directly, and handle a single tag at the end. However there is a
school of thought that suggests a streaming interface to AEAD represents
a loaded footgun, as it tempts the caller to act on the decrypted but
not yet verified data, defeating the purpose of AEAD.

With this change alone, we don't actually protect ourselves from
malicious userspace at all, since we kindly hand the key in plaintext
to usermode. In later changes, we'll seal the key with the TPM
before handing it back to usermode, so they can't decrypt or tamper with
the key themselves.

Signed-off-by: Evan Green <evgreen@chromium.org>
---

 Documentation/power/userland-swsusp.rst |   8 +
 include/uapi/linux/suspend_ioctls.h     |  15 +-
 kernel/power/Kconfig                    |  13 +
 kernel/power/Makefile                   |   1 +
 kernel/power/snapenc.c                  | 491 ++++++++++++++++++++++++
 kernel/power/user.c                     |  40 +-
 kernel/power/user.h                     | 101 +++++
 7 files changed, 657 insertions(+), 12 deletions(-)
 create mode 100644 kernel/power/snapenc.c
 create mode 100644 kernel/power/user.h

Comments

Ken Goldman Aug. 29, 2022, 9:45 p.m. UTC | #1
On 5/4/2022 7:20 PM, Evan Green wrote:
> Enabling the kernel to be able to do encryption and integrity checks on
> the hibernate image prevents a malicious userspace from escalating to
> kernel execution via hibernation resume.  [snip]

I have a related question.

When a TPM powers up from hibernation, PCR 10 is reset.  When a
hibernate image is restored:

1. Is there a design for how PCR 10 is restored?

2. How are /sys/kernel/security/ima/[pseudofiles] saved and
restored?
Matthew Garrett Aug. 29, 2022, 9:51 p.m. UTC | #2
On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
>
> On 5/4/2022 7:20 PM, Evan Green wrote:
> > Enabling the kernel to be able to do encryption and integrity checks on
> > the hibernate image prevents a malicious userspace from escalating to
> > kernel execution via hibernation resume.  [snip]
>
> I have a related question.
>
> When a TPM powers up from hibernation, PCR 10 is reset.  When a
> hibernate image is restored:
>
> 1. Is there a design for how PCR 10 is restored?

I don't see anything that does that at present.

> 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> restored?

They're part of the running kernel state, so should re-appear without
any special casing. However, in the absence of anything repopulating
PCR 10, they'll no longer match the in-TPM value.
Jarkko Sakkinen Aug. 31, 2022, 2:48 a.m. UTC | #3
On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> >
> > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > Enabling the kernel to be able to do encryption and integrity checks on
> > > the hibernate image prevents a malicious userspace from escalating to
> > > kernel execution via hibernation resume.  [snip]
> >
> > I have a related question.
> >
> > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > hibernate image is restored:
> >
> > 1. Is there a design for how PCR 10 is restored?
> 
> I don't see anything that does that at present.
> 
> > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > restored?
> 
> They're part of the running kernel state, so should re-appear without
> any special casing. However, in the absence of anything repopulating
> PCR 10, they'll no longer match the in-TPM value.

This feature could still be supported, if IMA is disabled
in the kernel configuration, which I see a non-issue as
long as config flag checks are there.

BR, Jarkko
Evan Green Sept. 7, 2022, 8:47 p.m. UTC | #4
On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > >
> > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > the hibernate image prevents a malicious userspace from escalating to
> > > > kernel execution via hibernation resume.  [snip]
> > >
> > > I have a related question.
> > >
> > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > hibernate image is restored:
> > >
> > > 1. Is there a design for how PCR 10 is restored?
> >
> > I don't see anything that does that at present.
> >
> > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > restored?
> >
> > They're part of the running kernel state, so should re-appear without
> > any special casing. However, in the absence of anything repopulating
> > PCR 10, they'll no longer match the in-TPM value.
>
> This feature could still be supported, if IMA is disabled
> in the kernel configuration, which I see a non-issue as
> long as config flag checks are there.

Right, from what I understand about IMA, the TPM's PCR getting out of
sync with the in-kernel measurement list across a hibernate (because
TPM is reset) or kexec() (because in-memory list gets reset) is
already a problem. This series doesn't really address that, in that it
doesn't really make that situation better or worse.

-Evan
Mimi Zohar Sept. 7, 2022, 11:57 p.m. UTC | #5
On Wed, 2022-09-07 at 13:47 -0700, Evan Green wrote:
> On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >
> > On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > > >
> > > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > > the hibernate image prevents a malicious userspace from escalating to
> > > > > kernel execution via hibernation resume.  [snip]
> > > >
> > > > I have a related question.
> > > >
> > > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > > hibernate image is restored:
> > > >
> > > > 1. Is there a design for how PCR 10 is restored?
> > >
> > > I don't see anything that does that at present.
> > >
> > > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > > restored?
> > >
> > > They're part of the running kernel state, so should re-appear without
> > > any special casing. However, in the absence of anything repopulating
> > > PCR 10, they'll no longer match the in-TPM value.
> >
> > This feature could still be supported, if IMA is disabled
> > in the kernel configuration, which I see a non-issue as
> > long as config flag checks are there.
> 
> Right, from what I understand about IMA, the TPM's PCR getting out of
> sync with the in-kernel measurement list across a hibernate (because
> TPM is reset) or kexec() (because in-memory list gets reset) is
> already a problem. This series doesn't really address that, in that it
> doesn't really make that situation better or worse.

For kexec, the PCRs are not reset, so the IMA measurment list needs to
be carried across kexec and restored.  This is now being done on most
architectures.  Afterwards, the IMA measurement list does match the
PCRs.

Hibernation introduces a different situation, where the the PCRs are
reset, but the measurement list is restored, resulting in their not
matching.
Jarkko Sakkinen Sept. 8, 2022, 5:25 a.m. UTC | #6
On Wed, Sep 07, 2022 at 07:57:27PM -0400, Mimi Zohar wrote:
> On Wed, 2022-09-07 at 13:47 -0700, Evan Green wrote:
> > On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > >
> > > On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > > > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > > > >
> > > > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > > > the hibernate image prevents a malicious userspace from escalating to
> > > > > > kernel execution via hibernation resume.  [snip]
> > > > >
> > > > > I have a related question.
> > > > >
> > > > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > > > hibernate image is restored:
> > > > >
> > > > > 1. Is there a design for how PCR 10 is restored?
> > > >
> > > > I don't see anything that does that at present.
> > > >
> > > > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > > > restored?
> > > >
> > > > They're part of the running kernel state, so should re-appear without
> > > > any special casing. However, in the absence of anything repopulating
> > > > PCR 10, they'll no longer match the in-TPM value.
> > >
> > > This feature could still be supported, if IMA is disabled
> > > in the kernel configuration, which I see a non-issue as
> > > long as config flag checks are there.
> > 
> > Right, from what I understand about IMA, the TPM's PCR getting out of
> > sync with the in-kernel measurement list across a hibernate (because
> > TPM is reset) or kexec() (because in-memory list gets reset) is
> > already a problem. This series doesn't really address that, in that it
> > doesn't really make that situation better or worse.
> 
> For kexec, the PCRs are not reset, so the IMA measurment list needs to
> be carried across kexec and restored.  This is now being done on most
> architectures.  Afterwards, the IMA measurement list does match the
> PCRs.
> 
> Hibernation introduces a different situation, where the the PCRs are
> reset, but the measurement list is restored, resulting in their not
> matching.

As I said earlier the feature still can be supported if
kernel does not use IMA but obviously needs to be flagged.

BR, Jarkko
Mimi Zohar Sept. 11, 2022, 2:40 a.m. UTC | #7
On Thu, 2022-09-08 at 08:25 +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 07, 2022 at 07:57:27PM -0400, Mimi Zohar wrote:
> > On Wed, 2022-09-07 at 13:47 -0700, Evan Green wrote:
> > > On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > >
> > > > On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > > > > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > > > > >
> > > > > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > > > > the hibernate image prevents a malicious userspace from escalating to
> > > > > > > kernel execution via hibernation resume.  [snip]
> > > > > >
> > > > > > I have a related question.
> > > > > >
> > > > > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > > > > hibernate image is restored:
> > > > > >
> > > > > > 1. Is there a design for how PCR 10 is restored?
> > > > >
> > > > > I don't see anything that does that at present.
> > > > >
> > > > > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > > > > restored?
> > > > >
> > > > > They're part of the running kernel state, so should re-appear without
> > > > > any special casing. However, in the absence of anything repopulating
> > > > > PCR 10, they'll no longer match the in-TPM value.
> > > >
> > > > This feature could still be supported, if IMA is disabled
> > > > in the kernel configuration, which I see a non-issue as
> > > > long as config flag checks are there.
> > > 
> > > Right, from what I understand about IMA, the TPM's PCR getting out of
> > > sync with the in-kernel measurement list across a hibernate (because
> > > TPM is reset) or kexec() (because in-memory list gets reset) is
> > > already a problem. This series doesn't really address that, in that it
> > > doesn't really make that situation better or worse.
> > 
> > For kexec, the PCRs are not reset, so the IMA measurment list needs to
> > be carried across kexec and restored.  This is now being done on most
> > architectures.  Afterwards, the IMA measurement list does match the
> > PCRs.
> > 
> > Hibernation introduces a different situation, where the the PCRs are
> > reset, but the measurement list is restored, resulting in their not
> > matching.
> 
> As I said earlier the feature still can be supported if
> kernel does not use IMA but obviously needs to be flagged.

Jumping to the conclusion that "hibernate" is acceptable for non-IMA
enabled kernels misses the security implications of mixing (kexec) non-
IMA and IMA enabled kernels. 
I would prefer some sort of hibernate marker, the equivalent of a
"boot_aggregate" record.
Jarkko Sakkinen Sept. 20, 2022, 4:36 a.m. UTC | #8
On Sat, Sep 10, 2022 at 10:40:05PM -0400, Mimi Zohar wrote:
> On Thu, 2022-09-08 at 08:25 +0300, Jarkko Sakkinen wrote:
> > On Wed, Sep 07, 2022 at 07:57:27PM -0400, Mimi Zohar wrote:
> > > On Wed, 2022-09-07 at 13:47 -0700, Evan Green wrote:
> > > > On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > > >
> > > > > On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > > > > > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > > > > > >
> > > > > > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > > > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > > > > > the hibernate image prevents a malicious userspace from escalating to
> > > > > > > > kernel execution via hibernation resume.  [snip]
> > > > > > >
> > > > > > > I have a related question.
> > > > > > >
> > > > > > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > > > > > hibernate image is restored:
> > > > > > >
> > > > > > > 1. Is there a design for how PCR 10 is restored?
> > > > > >
> > > > > > I don't see anything that does that at present.
> > > > > >
> > > > > > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > > > > > restored?
> > > > > >
> > > > > > They're part of the running kernel state, so should re-appear without
> > > > > > any special casing. However, in the absence of anything repopulating
> > > > > > PCR 10, they'll no longer match the in-TPM value.
> > > > >
> > > > > This feature could still be supported, if IMA is disabled
> > > > > in the kernel configuration, which I see a non-issue as
> > > > > long as config flag checks are there.
> > > > 
> > > > Right, from what I understand about IMA, the TPM's PCR getting out of
> > > > sync with the in-kernel measurement list across a hibernate (because
> > > > TPM is reset) or kexec() (because in-memory list gets reset) is
> > > > already a problem. This series doesn't really address that, in that it
> > > > doesn't really make that situation better or worse.
> > > 
> > > For kexec, the PCRs are not reset, so the IMA measurment list needs to
> > > be carried across kexec and restored.  This is now being done on most
> > > architectures.  Afterwards, the IMA measurement list does match the
> > > PCRs.
> > > 
> > > Hibernation introduces a different situation, where the the PCRs are
> > > reset, but the measurement list is restored, resulting in their not
> > > matching.
> > 
> > As I said earlier the feature still can be supported if
> > kernel does not use IMA but obviously needs to be flagged.
> 
> Jumping to the conclusion that "hibernate" is acceptable for non-IMA
> enabled kernels misses the security implications of mixing (kexec) non-
> IMA and IMA enabled kernels. 
> I would prefer some sort of hibernate marker, the equivalent of a
> "boot_aggregate" record.

Not sure if this matters. If you run a kernel, which is not aware
of IMA, it's your choice. I don't undestand why here is so important
to protect user from doing illogical decisions.

If you want non-IMA kernels to support IMA, CONFIG_IMA should not
probably even exist because you are essentially saying that any
kernel play well with IMA.

BR, Jarkko
Mimi Zohar Sept. 21, 2022, 8:15 p.m. UTC | #9
On Tue, 2022-09-20 at 07:36 +0300, Jarkko Sakkinen wrote:
> On Sat, Sep 10, 2022 at 10:40:05PM -0400, Mimi Zohar wrote:
> > On Thu, 2022-09-08 at 08:25 +0300, Jarkko Sakkinen wrote:
> > > On Wed, Sep 07, 2022 at 07:57:27PM -0400, Mimi Zohar wrote:
> > > > On Wed, 2022-09-07 at 13:47 -0700, Evan Green wrote:
> > > > > On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > > > >
> > > > > > On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > > > > > > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > > > > > > >
> > > > > > > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > > > > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > > > > > > the hibernate image prevents a malicious userspace from escalating to
> > > > > > > > > kernel execution via hibernation resume.  [snip]
> > > > > > > >
> > > > > > > > I have a related question.
> > > > > > > >
> > > > > > > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > > > > > > hibernate image is restored:
> > > > > > > >
> > > > > > > > 1. Is there a design for how PCR 10 is restored?
> > > > > > >
> > > > > > > I don't see anything that does that at present.
> > > > > > >
> > > > > > > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > > > > > > restored?
> > > > > > >
> > > > > > > They're part of the running kernel state, so should re-appear without
> > > > > > > any special casing. However, in the absence of anything repopulating
> > > > > > > PCR 10, they'll no longer match the in-TPM value.
> > > > > >
> > > > > > This feature could still be supported, if IMA is disabled
> > > > > > in the kernel configuration, which I see a non-issue as
> > > > > > long as config flag checks are there.
> > > > > 
> > > > > Right, from what I understand about IMA, the TPM's PCR getting out of
> > > > > sync with the in-kernel measurement list across a hibernate (because
> > > > > TPM is reset) or kexec() (because in-memory list gets reset) is
> > > > > already a problem. This series doesn't really address that, in that it
> > > > > doesn't really make that situation better or worse.
> > > > 
> > > > For kexec, the PCRs are not reset, so the IMA measurment list needs to
> > > > be carried across kexec and restored.  This is now being done on most
> > > > architectures.  Afterwards, the IMA measurement list does match the
> > > > PCRs.
> > > > 
> > > > Hibernation introduces a different situation, where the the PCRs are
> > > > reset, but the measurement list is restored, resulting in their not
> > > > matching.
> > > 
> > > As I said earlier the feature still can be supported if
> > > kernel does not use IMA but obviously needs to be flagged.
> > 
> > Jumping to the conclusion that "hibernate" is acceptable for non-IMA
> > enabled kernels misses the security implications of mixing (kexec) non-
> > IMA and IMA enabled kernels. 
> > I would prefer some sort of hibernate marker, the equivalent of a
> > "boot_aggregate" record.
> 
> Not sure if this matters. If you run a kernel, which is not aware
> of IMA, it's your choice. I don't undestand why here is so important
> to protect user from doing illogical decisions.
> 
> If you want non-IMA kernels to support IMA, CONFIG_IMA should not
> probably even exist because you are essentially saying that any
> kernel play well with IMA.

That will never happen, nor am I suggesting it should.

Enabling hibernate or IMA shouldn't be an either-or decision, if at all
possible.  The main concern is that attestation servers be able to
detect hibernation and possibly the loss of measurement
history.  Luckily, although the PCRs are reset, the TPM
pcrUpdateCounter is not.

I would appreciate including a "hibernate" marker, similar to the
"boot_aggregate".

Mimi
Jarkko Sakkinen Sept. 23, 2022, 1:30 p.m. UTC | #10
On Wed, Sep 21, 2022 at 04:15:20PM -0400, Mimi Zohar wrote:
> On Tue, 2022-09-20 at 07:36 +0300, Jarkko Sakkinen wrote:
> > On Sat, Sep 10, 2022 at 10:40:05PM -0400, Mimi Zohar wrote:
> > > On Thu, 2022-09-08 at 08:25 +0300, Jarkko Sakkinen wrote:
> > > > On Wed, Sep 07, 2022 at 07:57:27PM -0400, Mimi Zohar wrote:
> > > > > On Wed, 2022-09-07 at 13:47 -0700, Evan Green wrote:
> > > > > > On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > > > > >
> > > > > > > On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > > > > > > > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > > > > > > > >
> > > > > > > > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > > > > > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > > > > > > > the hibernate image prevents a malicious userspace from escalating to
> > > > > > > > > > kernel execution via hibernation resume.  [snip]
> > > > > > > > >
> > > > > > > > > I have a related question.
> > > > > > > > >
> > > > > > > > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > > > > > > > hibernate image is restored:
> > > > > > > > >
> > > > > > > > > 1. Is there a design for how PCR 10 is restored?
> > > > > > > >
> > > > > > > > I don't see anything that does that at present.
> > > > > > > >
> > > > > > > > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > > > > > > > restored?
> > > > > > > >
> > > > > > > > They're part of the running kernel state, so should re-appear without
> > > > > > > > any special casing. However, in the absence of anything repopulating
> > > > > > > > PCR 10, they'll no longer match the in-TPM value.
> > > > > > >
> > > > > > > This feature could still be supported, if IMA is disabled
> > > > > > > in the kernel configuration, which I see a non-issue as
> > > > > > > long as config flag checks are there.
> > > > > > 
> > > > > > Right, from what I understand about IMA, the TPM's PCR getting out of
> > > > > > sync with the in-kernel measurement list across a hibernate (because
> > > > > > TPM is reset) or kexec() (because in-memory list gets reset) is
> > > > > > already a problem. This series doesn't really address that, in that it
> > > > > > doesn't really make that situation better or worse.
> > > > > 
> > > > > For kexec, the PCRs are not reset, so the IMA measurment list needs to
> > > > > be carried across kexec and restored.  This is now being done on most
> > > > > architectures.  Afterwards, the IMA measurement list does match the
> > > > > PCRs.
> > > > > 
> > > > > Hibernation introduces a different situation, where the the PCRs are
> > > > > reset, but the measurement list is restored, resulting in their not
> > > > > matching.
> > > > 
> > > > As I said earlier the feature still can be supported if
> > > > kernel does not use IMA but obviously needs to be flagged.
> > > 
> > > Jumping to the conclusion that "hibernate" is acceptable for non-IMA
> > > enabled kernels misses the security implications of mixing (kexec) non-
> > > IMA and IMA enabled kernels. 
> > > I would prefer some sort of hibernate marker, the equivalent of a
> > > "boot_aggregate" record.
> > 
> > Not sure if this matters. If you run a kernel, which is not aware
> > of IMA, it's your choice. I don't undestand why here is so important
> > to protect user from doing illogical decisions.
> > 
> > If you want non-IMA kernels to support IMA, CONFIG_IMA should not
> > probably even exist because you are essentially saying that any
> > kernel play well with IMA.
> 
> That will never happen, nor am I suggesting it should.
> 
> Enabling hibernate or IMA shouldn't be an either-or decision, if at all
> possible.  The main concern is that attestation servers be able to
> detect hibernation and possibly the loss of measurement
> history.  Luckily, although the PCRs are reset, the TPM
> pcrUpdateCounter is not.
> 
> I would appreciate including a "hibernate" marker, similar to the
> "boot_aggregate".

Yeah, I guess that would not do harm.

BR, Jarkko
Evan Green Sept. 27, 2022, 4:03 p.m. UTC | #11
On Fri, Sep 23, 2022 at 6:30 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> On Wed, Sep 21, 2022 at 04:15:20PM -0400, Mimi Zohar wrote:
> > On Tue, 2022-09-20 at 07:36 +0300, Jarkko Sakkinen wrote:
> > > On Sat, Sep 10, 2022 at 10:40:05PM -0400, Mimi Zohar wrote:
> > > > On Thu, 2022-09-08 at 08:25 +0300, Jarkko Sakkinen wrote:
> > > > > On Wed, Sep 07, 2022 at 07:57:27PM -0400, Mimi Zohar wrote:
> > > > > > On Wed, 2022-09-07 at 13:47 -0700, Evan Green wrote:
> > > > > > > On Tue, Aug 30, 2022 at 7:48 PM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, Aug 29, 2022 at 02:51:50PM -0700, Matthew Garrett wrote:
> > > > > > > > > On Mon, Aug 29, 2022 at 2:45 PM Ken Goldman <kgold@linux.ibm.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On 5/4/2022 7:20 PM, Evan Green wrote:
> > > > > > > > > > > Enabling the kernel to be able to do encryption and integrity checks on
> > > > > > > > > > > the hibernate image prevents a malicious userspace from escalating to
> > > > > > > > > > > kernel execution via hibernation resume.  [snip]
> > > > > > > > > >
> > > > > > > > > > I have a related question.
> > > > > > > > > >
> > > > > > > > > > When a TPM powers up from hibernation, PCR 10 is reset.  When a
> > > > > > > > > > hibernate image is restored:
> > > > > > > > > >
> > > > > > > > > > 1. Is there a design for how PCR 10 is restored?
> > > > > > > > >
> > > > > > > > > I don't see anything that does that at present.
> > > > > > > > >
> > > > > > > > > > 2. How are /sys/kernel/security/ima/[pseudofiles] saved and
> > > > > > > > > > restored?
> > > > > > > > >
> > > > > > > > > They're part of the running kernel state, so should re-appear without
> > > > > > > > > any special casing. However, in the absence of anything repopulating
> > > > > > > > > PCR 10, they'll no longer match the in-TPM value.
> > > > > > > >
> > > > > > > > This feature could still be supported, if IMA is disabled
> > > > > > > > in the kernel configuration, which I see a non-issue as
> > > > > > > > long as config flag checks are there.
> > > > > > >
> > > > > > > Right, from what I understand about IMA, the TPM's PCR getting out of
> > > > > > > sync with the in-kernel measurement list across a hibernate (because
> > > > > > > TPM is reset) or kexec() (because in-memory list gets reset) is
> > > > > > > already a problem. This series doesn't really address that, in that it
> > > > > > > doesn't really make that situation better or worse.
> > > > > >
> > > > > > For kexec, the PCRs are not reset, so the IMA measurment list needs to
> > > > > > be carried across kexec and restored.  This is now being done on most
> > > > > > architectures.  Afterwards, the IMA measurement list does match the
> > > > > > PCRs.
> > > > > >
> > > > > > Hibernation introduces a different situation, where the the PCRs are
> > > > > > reset, but the measurement list is restored, resulting in their not
> > > > > > matching.
> > > > >
> > > > > As I said earlier the feature still can be supported if
> > > > > kernel does not use IMA but obviously needs to be flagged.
> > > >
> > > > Jumping to the conclusion that "hibernate" is acceptable for non-IMA
> > > > enabled kernels misses the security implications of mixing (kexec) non-
> > > > IMA and IMA enabled kernels.
> > > > I would prefer some sort of hibernate marker, the equivalent of a
> > > > "boot_aggregate" record.
> > >
> > > Not sure if this matters. If you run a kernel, which is not aware
> > > of IMA, it's your choice. I don't undestand why here is so important
> > > to protect user from doing illogical decisions.
> > >
> > > If you want non-IMA kernels to support IMA, CONFIG_IMA should not
> > > probably even exist because you are essentially saying that any
> > > kernel play well with IMA.
> >
> > That will never happen, nor am I suggesting it should.
> >
> > Enabling hibernate or IMA shouldn't be an either-or decision, if at all
> > possible.  The main concern is that attestation servers be able to
> > detect hibernation and possibly the loss of measurement
> > history.  Luckily, although the PCRs are reset, the TPM
> > pcrUpdateCounter is not.
> >
> > I would appreciate including a "hibernate" marker, similar to the
> > "boot_aggregate".
>
> Yeah, I guess that would not do harm.

I think I understand it. It's pretty much exactly a boot_aggregate
marker that we want, correct?

Should it have its own name, or is it sufficient to simply infer that
a boot_aggregate marker that isn't the first item in the list must
come from hibernate resume?

Should it include PCR10, to essentially say "the resuming system may
have extended this, but we can't reason about it and simply treat it
as a starting value"?
-Evan

>
> BR, Jarkko
Jonathan McDowell Sept. 28, 2022, 9:42 a.m. UTC | #12
On Tue, Sep 27, 2022 at 09:03:21AM -0700, Evan Green wrote:
> On Fri, Sep 23, 2022 at 6:30 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >
> > On Wed, Sep 21, 2022 at 04:15:20PM -0400, Mimi Zohar wrote:
> >
> > > Enabling hibernate or IMA shouldn't be an either-or decision, if at all
> > > possible.  The main concern is that attestation servers be able to
> > > detect hibernation and possibly the loss of measurement
> > > history.  Luckily, although the PCRs are reset, the TPM
> > > pcrUpdateCounter is not.
> > >
> > > I would appreciate including a "hibernate" marker, similar to the
> > > "boot_aggregate".
> >
> > Yeah, I guess that would not do harm.
> 
> I think I understand it. It's pretty much exactly a boot_aggregate
> marker that we want, correct?
> 
> Should it have its own name, or is it sufficient to simply infer that
> a boot_aggregate marker that isn't the first item in the list must
> come from hibernate resume?

I think it should have its own name, because a subsequent boot_aggregate
is inserted when we kexec into a new kernel.


J.
diff mbox series

Patch

diff --git a/Documentation/power/userland-swsusp.rst b/Documentation/power/userland-swsusp.rst
index 1cf62d80a9ca10..f759915a78ce98 100644
--- a/Documentation/power/userland-swsusp.rst
+++ b/Documentation/power/userland-swsusp.rst
@@ -115,6 +115,14 @@  SNAPSHOT_S2RAM
 	to resume the system from RAM if there's enough battery power or restore
 	its state on the basis of the saved suspend image otherwise)
 
+SNAPSHOT_ENABLE_ENCRYPTION
+	Enables encryption of the hibernate image within the kernel. Upon suspend
+	(ie when the snapshot device was opened for reading), returns a blob
+	representing the random encryption key the kernel created to encrypt the
+	hibernate image with. Upon resume (ie when the snapshot device was opened
+	for writing), receives a blob from usermode containing the key material
+	previously returned during hibernate.
+
 The device's read() operation can be used to transfer the snapshot image from
 the kernel.  It has the following limitations:
 
diff --git a/include/uapi/linux/suspend_ioctls.h b/include/uapi/linux/suspend_ioctls.h
index bcce04e21c0dce..b73026ef824bb9 100644
--- a/include/uapi/linux/suspend_ioctls.h
+++ b/include/uapi/linux/suspend_ioctls.h
@@ -13,6 +13,18 @@  struct resume_swap_area {
 	__u32 dev;
 } __attribute__((packed));
 
+#define USWSUSP_KEY_NONCE_SIZE 16
+
+/*
+ * This structure is used to pass the kernel's hibernate encryption key in
+ * either direction.
+ */
+struct uswsusp_key_blob {
+	__u32 blob_len;
+	__u8 blob[512];
+	__u8 nonce[USWSUSP_KEY_NONCE_SIZE];
+} __attribute__((packed));
+
 #define SNAPSHOT_IOC_MAGIC	'3'
 #define SNAPSHOT_FREEZE			_IO(SNAPSHOT_IOC_MAGIC, 1)
 #define SNAPSHOT_UNFREEZE		_IO(SNAPSHOT_IOC_MAGIC, 2)
@@ -29,6 +41,7 @@  struct resume_swap_area {
 #define SNAPSHOT_PREF_IMAGE_SIZE	_IO(SNAPSHOT_IOC_MAGIC, 18)
 #define SNAPSHOT_AVAIL_SWAP_SIZE	_IOR(SNAPSHOT_IOC_MAGIC, 19, __kernel_loff_t)
 #define SNAPSHOT_ALLOC_SWAP_PAGE	_IOR(SNAPSHOT_IOC_MAGIC, 20, __kernel_loff_t)
-#define SNAPSHOT_IOC_MAXNR	20
+#define SNAPSHOT_ENABLE_ENCRYPTION	_IOWR(SNAPSHOT_IOC_MAGIC, 21, struct uswsusp_key_blob)
+#define SNAPSHOT_IOC_MAXNR	21
 
 #endif /* _LINUX_SUSPEND_IOCTLS_H */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index a12779650f1529..8249968962bcd5 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -92,6 +92,19 @@  config HIBERNATION_SNAPSHOT_DEV
 
 	  If in doubt, say Y.
 
+config ENCRYPTED_HIBERNATION
+	bool "Encryption support for userspace snapshots"
+	depends on HIBERNATION_SNAPSHOT_DEV
+	depends on CRYPTO_AEAD2=y
+	default n
+	help
+	  Enable support for kernel-based encryption of hibernation snapshots
+	  created by uswsusp tools.
+
+	  Say N if userspace handles the image encryption.
+
+	  If in doubt, say N.
+
 config PM_STD_PARTITION
 	string "Default resume partition"
 	depends on HIBERNATION
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 874ad834dc8daf..7be08f2e0e3b68 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -16,6 +16,7 @@  obj-$(CONFIG_SUSPEND)		+= suspend.o
 obj-$(CONFIG_PM_TEST_SUSPEND)	+= suspend_test.o
 obj-$(CONFIG_HIBERNATION)	+= hibernate.o snapshot.o swap.o
 obj-$(CONFIG_HIBERNATION_SNAPSHOT_DEV) += user.o
+obj-$(CONFIG_ENCRYPTED_HIBERNATION) += snapenc.o
 obj-$(CONFIG_PM_AUTOSLEEP)	+= autosleep.o
 obj-$(CONFIG_PM_WAKELOCKS)	+= wakelock.o
 
diff --git a/kernel/power/snapenc.c b/kernel/power/snapenc.c
new file mode 100644
index 00000000000000..cb90692d6ab83a
--- /dev/null
+++ b/kernel/power/snapenc.c
@@ -0,0 +1,491 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/* This file provides encryption support for system snapshots. */
+
+#include <linux/crypto.h>
+#include <crypto/aead.h>
+#include <crypto/gcm.h>
+#include <linux/random.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+#include "power.h"
+#include "user.h"
+
+/* Encrypt more data from the snapshot into the staging area. */
+static int snapshot_encrypt_refill(struct snapshot_data *data)
+{
+
+	u8 nonce[GCM_AES_IV_SIZE];
+	int pg_idx;
+	int res;
+	struct aead_request *req = data->aead_req;
+	DECLARE_CRYPTO_WAIT(wait);
+	size_t total = 0;
+
+	/*
+	 * The first buffer is the associated data, set to the offset to prevent
+	 * attacks that rearrange chunks.
+	 */
+	sg_set_buf(&data->sg[0], &data->crypt_total, sizeof(data->crypt_total));
+
+	/* Load the crypt buffer with snapshot pages. */
+	for (pg_idx = 0; pg_idx < CHUNK_SIZE; pg_idx++) {
+		void *buf = data->crypt_pages[pg_idx];
+
+		res = snapshot_read_next(&data->handle);
+		if (res < 0)
+			return res;
+		if (res == 0)
+			break;
+
+		WARN_ON(res != PAGE_SIZE);
+
+		/*
+		 * Copy the page into the staging area. A future optimization
+		 * could potentially skip this copy for lowmem pages.
+		 */
+		memcpy(buf, data_of(data->handle), PAGE_SIZE);
+		sg_set_buf(&data->sg[1 + pg_idx], buf, PAGE_SIZE);
+		total += PAGE_SIZE;
+	}
+
+	sg_set_buf(&data->sg[1 + pg_idx], &data->auth_tag, SNAPSHOT_AUTH_TAG_SIZE);
+	aead_request_set_callback(req, 0, crypto_req_done, &wait);
+	/*
+	 * Use incrementing nonces for each chunk, since a 64 bit value won't
+	 * roll into re-use for any given hibernate image.
+	 */
+	memcpy(&nonce[0], &data->nonce_low, sizeof(data->nonce_low));
+	memcpy(&nonce[sizeof(data->nonce_low)],
+	       &data->nonce_high,
+	       sizeof(nonce) - sizeof(data->nonce_low));
+
+	data->nonce_low += 1;
+	/* Total does not include AAD or the auth tag. */
+	aead_request_set_crypt(req, data->sg, data->sg, total, nonce);
+	res = crypto_wait_req(crypto_aead_encrypt(req), &wait);
+	if (res)
+		return res;
+
+	data->crypt_size = total;
+	data->crypt_total += total;
+	return 0;
+}
+
+/* Decrypt data from the staging area and push it to the snapshot. */
+static int snapshot_decrypt_drain(struct snapshot_data *data)
+{
+	u8 nonce[GCM_AES_IV_SIZE];
+	int page_count;
+	int pg_idx;
+	int res;
+	struct aead_request *req = data->aead_req;
+	DECLARE_CRYPTO_WAIT(wait);
+	size_t total;
+
+	/* Set up the associated data. */
+	sg_set_buf(&data->sg[0], &data->crypt_total, sizeof(data->crypt_total));
+
+	/*
+	 * Get the number of full pages, which could be short at the end. There
+	 * should also be a tag at the end, so the offset won't be an even page.
+	 */
+	page_count = data->crypt_offset >> PAGE_SHIFT;
+	total = page_count << PAGE_SHIFT;
+	if ((total == 0) || (total == data->crypt_offset))
+		return -EINVAL;
+
+	/*
+	 * Load the sg list with the crypt buffer. Inline decrypt back into the
+	 * staging buffer. A future optimization could decrypt directly into
+	 * lowmem pages.
+	 */
+	for (pg_idx = 0; pg_idx < page_count; pg_idx++)
+		sg_set_buf(&data->sg[1 + pg_idx], data->crypt_pages[pg_idx], PAGE_SIZE);
+
+	/*
+	 * It's possible this is the final decrypt, and there are fewer than
+	 * CHUNK_SIZE pages. If this is the case we would have just written the
+	 * auth tag into the first few bytes of a new page. Copy to the tag if
+	 * so.
+	 */
+	if ((page_count < CHUNK_SIZE) &&
+	    (data->crypt_offset - total) == sizeof(data->auth_tag)) {
+
+		memcpy(data->auth_tag,
+			data->crypt_pages[pg_idx],
+			sizeof(data->auth_tag));
+
+	} else if (data->crypt_offset !=
+		   ((CHUNK_SIZE << PAGE_SHIFT) + SNAPSHOT_AUTH_TAG_SIZE)) {
+
+		return -EINVAL;
+	}
+
+	sg_set_buf(&data->sg[1 + pg_idx], &data->auth_tag, SNAPSHOT_AUTH_TAG_SIZE);
+	aead_request_set_callback(req, 0, crypto_req_done, &wait);
+	memcpy(&nonce[0], &data->nonce_low, sizeof(data->nonce_low));
+	memcpy(&nonce[sizeof(data->nonce_low)],
+	       &data->nonce_high,
+	       sizeof(nonce) - sizeof(data->nonce_low));
+
+	data->nonce_low += 1;
+	aead_request_set_crypt(req, data->sg, data->sg, total + SNAPSHOT_AUTH_TAG_SIZE, nonce);
+	res = crypto_wait_req(crypto_aead_decrypt(req), &wait);
+	if (res)
+		return res;
+
+	data->crypt_size = 0;
+	data->crypt_offset = 0;
+
+	/* Push the decrypted pages further down the stack. */
+	total = 0;
+	for (pg_idx = 0; pg_idx < page_count; pg_idx++) {
+		void *buf = data->crypt_pages[pg_idx];
+
+		res = snapshot_write_next(&data->handle);
+		if (res < 0)
+			return res;
+		if (res == 0)
+			break;
+
+		if (!data_of(data->handle))
+			return -EINVAL;
+
+		WARN_ON(res != PAGE_SIZE);
+
+		/*
+		 * Copy the page into the staging area. A future optimization
+		 * could potentially skip this copy for lowmem pages.
+		 */
+		memcpy(data_of(data->handle), buf, PAGE_SIZE);
+		total += PAGE_SIZE;
+	}
+
+	data->crypt_total += total;
+	return 0;
+}
+
+static ssize_t snapshot_read_next_encrypted(struct snapshot_data *data,
+					    void **buf)
+{
+	size_t tag_off;
+
+	/* Refill the encrypted buffer if it's empty. */
+	if ((data->crypt_size == 0) ||
+	    (data->crypt_offset >=
+	     (data->crypt_size + SNAPSHOT_AUTH_TAG_SIZE))) {
+
+		int rc;
+
+		data->crypt_size = 0;
+		data->crypt_offset = 0;
+		rc = snapshot_encrypt_refill(data);
+		if (rc < 0)
+			return rc;
+	}
+
+	/* Return data pages if the offset is in that region. */
+	if (data->crypt_offset < data->crypt_size) {
+		size_t pg_idx = data->crypt_offset >> PAGE_SHIFT;
+		size_t pg_off = data->crypt_offset & (PAGE_SIZE - 1);
+		*buf = data->crypt_pages[pg_idx] + pg_off;
+		return PAGE_SIZE - pg_off;
+	}
+
+	/* Use offsets just beyond the size to return the tag. */
+	tag_off = data->crypt_offset - data->crypt_size;
+	if (tag_off > SNAPSHOT_AUTH_TAG_SIZE)
+		tag_off = SNAPSHOT_AUTH_TAG_SIZE;
+
+	*buf = data->auth_tag + tag_off;
+	return SNAPSHOT_AUTH_TAG_SIZE - tag_off;
+}
+
+static ssize_t snapshot_write_next_encrypted(struct snapshot_data *data,
+					     void **buf)
+{
+	size_t tag_off;
+
+	/* Return data pages if the offset is in that region. */
+	if (data->crypt_offset < (PAGE_SIZE * CHUNK_SIZE)) {
+		size_t pg_idx = data->crypt_offset >> PAGE_SHIFT;
+		size_t pg_off = data->crypt_offset & (PAGE_SIZE - 1);
+		*buf = data->crypt_pages[pg_idx] + pg_off;
+		return PAGE_SIZE - pg_off;
+	}
+
+	/* Use offsets just beyond the size to return the tag. */
+	tag_off = data->crypt_offset - (PAGE_SIZE * CHUNK_SIZE);
+	if (tag_off > SNAPSHOT_AUTH_TAG_SIZE)
+		tag_off = SNAPSHOT_AUTH_TAG_SIZE;
+
+	*buf = data->auth_tag + tag_off;
+	return SNAPSHOT_AUTH_TAG_SIZE - tag_off;
+}
+
+ssize_t snapshot_read_encrypted(struct snapshot_data *data,
+	char __user *buf, size_t count, loff_t *offp)
+{
+	ssize_t total = 0;
+
+	/* Loop getting buffers of varying sizes and copying to userspace. */
+	while (count) {
+		size_t copy_size;
+		size_t not_done;
+		void *src;
+		ssize_t src_size = snapshot_read_next_encrypted(data, &src);
+
+		if (src_size <= 0) {
+			if (total == 0)
+				return src_size;
+
+			break;
+		}
+
+		copy_size = min(count, (size_t)src_size);
+		not_done = copy_to_user(buf + total, src, copy_size);
+		copy_size -= not_done;
+		total += copy_size;
+		count -= copy_size;
+		data->crypt_offset += copy_size;
+		if (copy_size == 0) {
+			if (total == 0)
+				return -EFAULT;
+
+			break;
+		}
+	}
+
+	*offp += total;
+	return total;
+}
+
+ssize_t snapshot_write_encrypted(struct snapshot_data *data,
+	const char __user *buf, size_t count, loff_t *offp)
+{
+	ssize_t total = 0;
+
+	/* Loop getting buffers of varying sizes and copying from. */
+	while (count) {
+		size_t copy_size;
+		size_t not_done;
+		void *dst;
+		ssize_t dst_size = snapshot_write_next_encrypted(data, &dst);
+
+		if (dst_size <= 0) {
+			if (total == 0)
+				return dst_size;
+
+			break;
+		}
+
+		copy_size = min(count, (size_t)dst_size);
+		not_done = copy_from_user(dst, buf + total, copy_size);
+		copy_size -= not_done;
+		total += copy_size;
+		count -= copy_size;
+		data->crypt_offset += copy_size;
+		if (copy_size == 0) {
+			if (total == 0)
+				return -EFAULT;
+
+			break;
+		}
+
+		/* Drain the encrypted buffer if it's full. */
+		if ((data->crypt_offset >=
+		    ((PAGE_SIZE * CHUNK_SIZE) + SNAPSHOT_AUTH_TAG_SIZE))) {
+
+			int rc;
+
+			rc = snapshot_decrypt_drain(data);
+			if (rc < 0)
+				return rc;
+		}
+	}
+
+	*offp += total;
+	return total;
+}
+
+void snapshot_teardown_encryption(struct snapshot_data *data)
+{
+	int i;
+
+	if (data->aead_req) {
+		aead_request_free(data->aead_req);
+		data->aead_req = NULL;
+	}
+
+	if (data->aead_tfm) {
+		crypto_free_aead(data->aead_tfm);
+		data->aead_tfm = NULL;
+	}
+
+	for (i = 0; i < CHUNK_SIZE; i++) {
+		if (data->crypt_pages[i]) {
+			free_page((unsigned long)data->crypt_pages[i]);
+			data->crypt_pages[i] = NULL;
+		}
+	}
+}
+
+static int snapshot_setup_encryption_common(struct snapshot_data *data)
+{
+	int i, rc;
+
+	data->crypt_total = 0;
+	data->crypt_offset = 0;
+	data->crypt_size = 0;
+	memset(data->crypt_pages, 0, sizeof(data->crypt_pages));
+	/* This only works once per hibernate. */
+	if (data->aead_tfm)
+		return -EINVAL;
+
+	/* Set up the encryption transform */
+	data->aead_tfm = crypto_alloc_aead("gcm(aes)", 0, 0);
+	if (IS_ERR(data->aead_tfm)) {
+		rc = PTR_ERR(data->aead_tfm);
+		data->aead_tfm = NULL;
+		return rc;
+	}
+
+	rc = -ENOMEM;
+	data->aead_req = aead_request_alloc(data->aead_tfm, GFP_KERNEL);
+	if (data->aead_req == NULL)
+		goto setup_fail;
+
+	/* Allocate the staging area */
+	for (i = 0; i < CHUNK_SIZE; i++) {
+		data->crypt_pages[i] = (void *)__get_free_page(GFP_ATOMIC);
+		if (data->crypt_pages[i] == NULL)
+			goto setup_fail;
+	}
+
+	sg_init_table(data->sg, CHUNK_SIZE + 2);
+
+	/*
+	 * The associated data will be the offset so that blocks can't be
+	 * rearranged.
+	 */
+	aead_request_set_ad(data->aead_req, sizeof(data->crypt_total));
+	rc = crypto_aead_setauthsize(data->aead_tfm, SNAPSHOT_AUTH_TAG_SIZE);
+	if (rc)
+		goto setup_fail;
+
+	return 0;
+
+setup_fail:
+	snapshot_teardown_encryption(data);
+	return rc;
+}
+
+int snapshot_get_encryption_key(struct snapshot_data *data,
+	struct uswsusp_key_blob __user *key)
+{
+	u8 aead_key[SNAPSHOT_ENCRYPTION_KEY_SIZE];
+	u8 nonce[USWSUSP_KEY_NONCE_SIZE];
+	int rc;
+	/* Don't pull a random key from a world that can be reset. */
+	if (data->ready)
+		return -EPIPE;
+
+	rc = snapshot_setup_encryption_common(data);
+	if (rc)
+		return rc;
+
+	/* Build a random starting nonce. */
+	get_random_bytes(nonce, sizeof(nonce));
+	memcpy(&data->nonce_low, &nonce[0], sizeof(data->nonce_low));
+	memcpy(&data->nonce_high, &nonce[8], sizeof(data->nonce_high));
+	/* Build a random key */
+	get_random_bytes(aead_key, sizeof(aead_key));
+	rc = crypto_aead_setkey(data->aead_tfm, aead_key, sizeof(aead_key));
+	if (rc)
+		goto fail;
+
+	/* Hand the key back to user mode (to be changed!) */
+	rc = put_user(sizeof(struct uswsusp_key_blob), &key->blob_len);
+	if (rc)
+		goto fail;
+
+	rc = copy_to_user(&key->blob, &aead_key, sizeof(aead_key));
+	if (rc)
+		goto fail;
+
+	rc = copy_to_user(&key->nonce, &nonce, sizeof(nonce));
+	if (rc)
+		goto fail;
+
+	return 0;
+
+fail:
+	snapshot_teardown_encryption(data);
+	return rc;
+}
+
+int snapshot_set_encryption_key(struct snapshot_data *data,
+	struct uswsusp_key_blob __user *key)
+{
+	struct uswsusp_key_blob blob;
+	int rc;
+
+	/* It's too late if data's been pushed in. */
+	if (data->handle.cur)
+		return -EPIPE;
+
+	rc = snapshot_setup_encryption_common(data);
+	if (rc)
+		return rc;
+
+	/* Load the key from user mode. */
+	rc = copy_from_user(&blob, key, sizeof(struct uswsusp_key_blob));
+	if (rc)
+		goto crypto_setup_fail;
+
+	if (blob.blob_len != sizeof(struct uswsusp_key_blob)) {
+		rc = -EINVAL;
+		goto crypto_setup_fail;
+	}
+
+	rc = crypto_aead_setkey(data->aead_tfm,
+				blob.blob,
+				SNAPSHOT_ENCRYPTION_KEY_SIZE);
+
+	if (rc)
+		goto crypto_setup_fail;
+
+	/* Load the starting nonce. */
+	memcpy(&data->nonce_low, &blob.nonce[0], sizeof(data->nonce_low));
+	memcpy(&data->nonce_high, &blob.nonce[8], sizeof(data->nonce_high));
+	return 0;
+
+crypto_setup_fail:
+	snapshot_teardown_encryption(data);
+	return rc;
+}
+
+loff_t snapshot_get_encrypted_image_size(loff_t raw_size)
+{
+	loff_t pages = raw_size >> PAGE_SHIFT;
+	loff_t chunks = (pages + (CHUNK_SIZE - 1)) / CHUNK_SIZE;
+	/*
+	 * The encrypted size is the normal size, plus a stitched in
+	 * authentication tag for every chunk of pages.
+	 */
+	return raw_size + (chunks * SNAPSHOT_AUTH_TAG_SIZE);
+}
+
+int snapshot_finalize_decrypted_image(struct snapshot_data *data)
+{
+	int rc;
+
+	if (data->crypt_offset != 0) {
+		rc = snapshot_decrypt_drain(data);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
diff --git a/kernel/power/user.c b/kernel/power/user.c
index ad241b4ff64c58..52ad25df4518dc 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -25,18 +25,9 @@ 
 #include <linux/uaccess.h>
 
 #include "power.h"
+#include "user.h"
 
-
-static struct snapshot_data {
-	struct snapshot_handle handle;
-	int swap;
-	int mode;
-	bool frozen;
-	bool ready;
-	bool platform_support;
-	bool free_bitmaps;
-	dev_t dev;
-} snapshot_state;
+struct snapshot_data snapshot_state;
 
 int is_hibernate_resume_dev(dev_t dev)
 {
@@ -119,6 +110,7 @@  static int snapshot_release(struct inode *inode, struct file *filp)
 	} else if (data->free_bitmaps) {
 		free_basic_memory_bitmaps();
 	}
+	snapshot_teardown_encryption(data);
 	pm_notifier_call_chain(data->mode == O_RDONLY ?
 			PM_POST_HIBERNATION : PM_POST_RESTORE);
 	hibernate_release();
@@ -142,6 +134,12 @@  static ssize_t snapshot_read(struct file *filp, char __user *buf,
 		res = -ENODATA;
 		goto Unlock;
 	}
+
+	if (snapshot_encryption_enabled(data)) {
+		res = snapshot_read_encrypted(data, buf, count, offp);
+		goto Unlock;
+	}
+
 	if (!pg_offp) { /* on page boundary? */
 		res = snapshot_read_next(&data->handle);
 		if (res <= 0)
@@ -172,6 +170,11 @@  static ssize_t snapshot_write(struct file *filp, const char __user *buf,
 
 	data = filp->private_data;
 
+	if (snapshot_encryption_enabled(data)) {
+		res = snapshot_write_encrypted(data, buf, count, offp);
+		goto unlock;
+	}
+
 	if (!pg_offp) {
 		res = snapshot_write_next(&data->handle);
 		if (res <= 0)
@@ -302,6 +305,12 @@  static long snapshot_ioctl(struct file *filp, unsigned int cmd,
 		break;
 
 	case SNAPSHOT_ATOMIC_RESTORE:
+		if (snapshot_encryption_enabled(data)) {
+			error = snapshot_finalize_decrypted_image(data);
+			if (error)
+				break;
+		}
+
 		snapshot_write_finalize(&data->handle);
 		if (data->mode != O_WRONLY || !data->frozen ||
 		    !snapshot_image_loaded(&data->handle)) {
@@ -337,6 +346,8 @@  static long snapshot_ioctl(struct file *filp, unsigned int cmd,
 		}
 		size = snapshot_get_image_size();
 		size <<= PAGE_SHIFT;
+		if (snapshot_encryption_enabled(data))
+			size = snapshot_get_encrypted_image_size(size);
 		error = put_user(size, (loff_t __user *)arg);
 		break;
 
@@ -394,6 +405,13 @@  static long snapshot_ioctl(struct file *filp, unsigned int cmd,
 		error = snapshot_set_swap_area(data, (void __user *)arg);
 		break;
 
+	case SNAPSHOT_ENABLE_ENCRYPTION:
+		if (data->mode == O_RDONLY)
+			error = snapshot_get_encryption_key(data, (void __user *)arg);
+		else
+			error = snapshot_set_encryption_key(data, (void __user *)arg);
+		break;
+
 	default:
 		error = -ENOTTY;
 
diff --git a/kernel/power/user.h b/kernel/power/user.h
new file mode 100644
index 00000000000000..6823e2eba7ec53
--- /dev/null
+++ b/kernel/power/user.h
@@ -0,0 +1,101 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/crypto.h>
+#include <crypto/aead.h>
+#include <crypto/aes.h>
+
+#define SNAPSHOT_ENCRYPTION_KEY_SIZE AES_KEYSIZE_128
+#define SNAPSHOT_AUTH_TAG_SIZE 16
+
+/* Define the number of pages in a single AEAD encryption chunk. */
+#define CHUNK_SIZE 16
+
+struct snapshot_data {
+	struct snapshot_handle handle;
+	int swap;
+	int mode;
+	bool frozen;
+	bool ready;
+	bool platform_support;
+	bool free_bitmaps;
+	dev_t dev;
+
+#if defined(CONFIG_ENCRYPTED_HIBERNATION)
+	struct crypto_aead *aead_tfm;
+	struct aead_request *aead_req;
+	void *crypt_pages[CHUNK_SIZE];
+	u8 auth_tag[SNAPSHOT_AUTH_TAG_SIZE];
+	struct scatterlist sg[CHUNK_SIZE + 2]; /* Add room for AD and auth tag. */
+	size_t crypt_offset;
+	size_t crypt_size;
+	uint64_t crypt_total;
+	uint64_t nonce_low;
+	uint64_t nonce_high;
+#endif
+
+};
+
+extern struct snapshot_data snapshot_state;
+
+/* kernel/power/swapenc.c routines */
+#if defined(CONFIG_ENCRYPTED_HIBERNATION)
+
+ssize_t snapshot_read_encrypted(struct snapshot_data *data,
+	char __user *buf, size_t count, loff_t *offp);
+
+ssize_t snapshot_write_encrypted(struct snapshot_data *data,
+	const char __user *buf, size_t count, loff_t *offp);
+
+void snapshot_teardown_encryption(struct snapshot_data *data);
+int snapshot_get_encryption_key(struct snapshot_data *data,
+	struct uswsusp_key_blob __user *key);
+
+int snapshot_set_encryption_key(struct snapshot_data *data,
+	struct uswsusp_key_blob __user *key);
+
+loff_t snapshot_get_encrypted_image_size(loff_t raw_size);
+
+int snapshot_finalize_decrypted_image(struct snapshot_data *data);
+
+#define snapshot_encryption_enabled(data) (!!(data)->aead_tfm)
+
+#else
+
+ssize_t snapshot_read_encrypted(struct snapshot_data *data,
+	char __user *buf, size_t count, loff_t *offp)
+{
+	return -ENOTTY;
+}
+
+ssize_t snapshot_write_encrypted(struct snapshot_data *data,
+	const char __user *buf, size_t count, loff_t *offp)
+{
+	return -ENOTTY;
+}
+
+static void snapshot_teardown_encryption(struct snapshot_data *data) {}
+static int snapshot_get_encryption_key(struct snapshot_data *data,
+	struct uswsusp_key_blob __user *key)
+{
+	return -ENOTTY;
+}
+
+static int snapshot_set_encryption_key(struct snapshot_data *data,
+	struct uswsusp_key_blob __user *key)
+{
+	return -ENOTTY;
+}
+
+static loff_t snapshot_get_encrypted_image_size(loff_t raw_size)
+{
+	return raw_size;
+}
+
+static int snapshot_finalize_decrypted_image(struct snapshot_data *data)
+{
+	return -ENOTTY;
+}
+
+#define snapshot_encryption_enabled(data) (0)
+
+#endif