diff mbox series

[v1,3/3] i386/sev: Add KVM_EXIT_SNP_REQ_CERTS support for certificate-fetching

Message ID 20241218154939.1114831-4-michael.roth@amd.com (mailing list archive)
State New
Headers show
Series SEV-SNP: Add support for SNP certificate fetching | expand

Commit Message

Michael Roth Dec. 18, 2024, 3:49 p.m. UTC
The GHCB specification[1] defines a VMGEXIT-based Guest Request
hypercall to allow an SNP guest to issue encrypted requests directly to
SNP firmware to do things like query the attestation report for the
guest. These are generally handled purely in the kernel.

In some some cases, it's useful for the host to be able to additionally
supply the certificate chain for the signing key that SNP firmware uses
to sign these attestation reports. To allow for this, the GHCB
specification defines an Extended Guest Request where this certificate
data can be provided in a special format described in the GHCB spec.
This certificate data may be global or guest-specific depending on how
the guest was configured. Rather than providing interfaces to manage
these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
to request the certificate contents from userspace. Implement support
for that here.

To synchronize delivery of the certificates to the guest in a way where
they will not be rendered invalid by updates to SNP firmware or
attestation singing/endorsement keys by management tools outside the
purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
obtain a shared/read lock on the certificate file prior to delivering
them back to KVM. Only after this will the attestation report be
retrieved from firmware and bundled with the certificate data, so QEMU
must continue to hold the file lock until KVM confirms that the
attestation report has been retrieved/bundled. This confirmation is done
by way of the kvm_immediate_exit callback infrastructure that was
introduced in a previous patch.

[1] "Guest Hypervisor Communication Block (GHCB) Standardization",
    https://www.amd.com/en/developer/sev.html

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 qapi/qom.json                 |  23 +++-
 target/i386/kvm/kvm.c         |  10 ++
 target/i386/sev-sysemu-stub.c |   5 +
 target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
 target/i386/sev.h             |   2 +
 5 files changed, 288 insertions(+), 1 deletion(-)

Comments

Markus Armbruster Dec. 18, 2024, 5:32 p.m. UTC | #1
Michael Roth <michael.roth@amd.com> writes:

> The GHCB specification[1] defines a VMGEXIT-based Guest Request
> hypercall to allow an SNP guest to issue encrypted requests directly to
> SNP firmware to do things like query the attestation report for the
> guest. These are generally handled purely in the kernel.
>
> In some some cases, it's useful for the host to be able to additionally
> supply the certificate chain for the signing key that SNP firmware uses
> to sign these attestation reports. To allow for this, the GHCB
> specification defines an Extended Guest Request where this certificate
> data can be provided in a special format described in the GHCB spec.
> This certificate data may be global or guest-specific depending on how
> the guest was configured. Rather than providing interfaces to manage
> these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> to request the certificate contents from userspace. Implement support
> for that here.
>
> To synchronize delivery of the certificates to the guest in a way where
> they will not be rendered invalid by updates to SNP firmware or
> attestation singing/endorsement keys by management tools outside the
> purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> obtain a shared/read lock on the certificate file prior to delivering
> them back to KVM. Only after this will the attestation report be
> retrieved from firmware and bundled with the certificate data, so QEMU
> must continue to hold the file lock until KVM confirms that the
> attestation report has been retrieved/bundled. This confirmation is done
> by way of the kvm_immediate_exit callback infrastructure that was
> introduced in a previous patch.

The "management tools outside the purview of QEMU" will all obtain the
same kind of file lock?

> [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
>     https://www.amd.com/en/developer/sev.html
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  qapi/qom.json                 |  23 +++-
>  target/i386/kvm/kvm.c         |  10 ++
>  target/i386/sev-sysemu-stub.c |   5 +
>  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
>  target/i386/sev.h             |   2 +
>  5 files changed, 288 insertions(+), 1 deletion(-)
>
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 28ce24cd8d..6eaf0e7721 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -1034,6 +1034,25 @@
>  #     firmware.  Set this to true to disable the use of VCEK.
>  #     (default: false) (since: 9.1)
>  #
> +# @certs-path: Path to certificate data that can be passed to guests via
> +#              SNP Extended Guest Requests. File should be in the format
> +#              described in the GHCB specification. (default: none)
> +#              (since: 10.0)

I prefer "filename" to "path".  We have many kinds of paths: pathnames
(denoting files), QOM paths (denoting objects), qdev paths, search
paths, ...  With "filename", your readers immediately know what you're
talking about.

SevGuestProperties has a member 'dh-cert-file'.  Whether that's related
to your file I can't tell from its documentation.

> +#
> +# @certs-timeout: Max time in milliseconds to wait to obtain a read lock

Please don't abbreviate "Maximum" here.

Confident millisecond granularity will suffice forever?

> +#                 on the certificate file specified by @certs-path. This
> +#                 is not a cumulative value and only affects how long
> +#                 QEMU waits before returning execution to the vCPU and
> +#                 informing the guest of the timeout, so the guest can
> +#                 still continuing retrying for as long as it likes
> +#                 (which will be about 60 seconds for linux guests at
> +#                 the time of this writing). If the guest-side timeout
> +#                 is insufficient, set this higher to allow more time to
> +#                 fetch the certificate. If the guest-side timeout is
> +#                 sufficient, set this lower to reduce the likelihood of
> +#                 soft lockups in the guest.
> +#                 (default: 100) (since: 10.0)
> +#
>  # Since: 9.1
>  ##

Please format like

   # @certs-path: Path to certificate data that can be passed to guests
   #     via SNP Extended Guest Requests.  File should be in the format
   #     described in the GHCB specification.
   #     (default: none) (since: 10.0)
   #
   # @certs-timeout: Max time in milliseconds to wait to obtain a read
   #     lock on the certificate file specified by @certs-path.  This is
   #     not a cumulative value and only affects how long QEMU waits
   #     before returning execution to the vCPU and informing the guest
   #     of the timeout, so the guest can still continuing retrying for
   #     as long as it likes (which will be about 60 seconds for linux
   #     guests at the time of this writing).  If the guest-side timeout
   #     is insufficient, set this higher to allow more time to fetch the
   #     certificate.  If the guest-side timeout is sufficient, set this
   #     lower to reduce the likelihood of soft lockups in the guest.
   #     (default: 100) (since: 10.0)

to blend in with commit a937b6aa739 (qapi: Reformat doc comments to
conform to current conventions).

>  { 'struct': 'SevSnpGuestProperties',
> @@ -1045,7 +1064,9 @@
>              '*id-auth': 'str',
>              '*author-key-enabled': 'bool',
>              '*host-data': 'str',
> -            '*vcek-disabled': 'bool' } }
> +            '*vcek-disabled': 'bool',
> +            '*certs-path': 'str',
> +            '*certs-timeout': 'uint32' } }
>  
>  ##
>  # @ThreadContextProperties:

[...]
Daniel P. Berrangé Dec. 18, 2024, 5:50 p.m. UTC | #2
On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote:
> The GHCB specification[1] defines a VMGEXIT-based Guest Request
> hypercall to allow an SNP guest to issue encrypted requests directly to
> SNP firmware to do things like query the attestation report for the
> guest. These are generally handled purely in the kernel.
> 
> In some some cases, it's useful for the host to be able to additionally
> supply the certificate chain for the signing key that SNP firmware uses
> to sign these attestation reports. To allow for this, the GHCB
> specification defines an Extended Guest Request where this certificate
> data can be provided in a special format described in the GHCB spec.
> This certificate data may be global or guest-specific depending on how
> the guest was configured. Rather than providing interfaces to manage
> these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> to request the certificate contents from userspace. Implement support
> for that here.
> 
> To synchronize delivery of the certificates to the guest in a way where
> they will not be rendered invalid by updates to SNP firmware or
> attestation singing/endorsement keys by management tools outside the
> purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> obtain a shared/read lock on the certificate file prior to delivering
> them back to KVM. Only after this will the attestation report be
> retrieved from firmware and bundled with the certificate data, so QEMU
> must continue to hold the file lock until KVM confirms that the
> attestation report has been retrieved/bundled. This confirmation is done
> by way of the kvm_immediate_exit callback infrastructure that was
> introduced in a previous patch.
> 
> [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
>     https://www.amd.com/en/developer/sev.html
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  qapi/qom.json                 |  23 +++-
>  target/i386/kvm/kvm.c         |  10 ++
>  target/i386/sev-sysemu-stub.c |   5 +
>  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
>  target/i386/sev.h             |   2 +
>  5 files changed, 288 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 28ce24cd8d..6eaf0e7721 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -1034,6 +1034,25 @@
>  #     firmware.  Set this to true to disable the use of VCEK.
>  #     (default: false) (since: 9.1)
>  #
> +# @certs-path: Path to certificate data that can be passed to guests via
> +#              SNP Extended Guest Requests. File should be in the format
> +#              described in the GHCB specification. (default: none)
> +#              (since: 10.0)

Can we document the required format here explicitly, rather than expecting
users to go searching for specs which are often practically impossible
to find, and even harder to read & interpret ?

> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index 1a4eb1ada6..2c41bdbccf 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -157,6 +157,9 @@ struct SevSnpGuestState {
>      char *id_auth_base64;
>      uint8_t *id_auth;
>      char *host_data;
> +    char *certs_path;
> +    int certs_fd;
> +    uint32_t certs_timeout;
>  
>      struct kvm_sev_snp_launch_start kvm_start_conf;
>      struct kvm_sev_snp_launch_finish kvm_finish_conf;
> @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common)
>      }
>  }
>  
> +static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
> +{
> +    int fd, ret;
> +
> +    if (sev_snp_guest->certs_fd != -1) {
> +        return 0;
> +    }
> +
> +    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
> +    if (fd == -1) {
> +        error_report("Unable to open certificate blob at path %s, ret %d",
> +                     sev_snp_guest->certs_path, fd);
> +        return fd;
> +    }
> +
> +    ret = qemu_lock_fd(fd, 0, 0, false);
> +    if (ret == -EAGAIN || ret == -EACCES) {
> +        ret = -EAGAIN;
> +        goto out_close;
> +    } else if (ret) {
> +        goto out_close;
> +    }

This locking scheme is likely unsafe. Consider this sequence

  * QEMU runs qemu_open(path)
  * External mgmt app runs unlink(path)
  * External mgmt app runs open(path)
  * External mgmt app runs lock(fd)
  * QEMU runs qemu_lock_fd(fd)

QEMU has successfully acquired a lock on an FD that corresponds to a
deleted file, not the current existing file.

Avoiding this problem requires either that the external mgmt app agrees
to *NEVER* unlink() the files under any circumstance, or for QEMU to
run its open + lock logic in a loop, checking 'stat' and 'fstat' before
opening and after locking, in order to detect a replaced file from its
changed inode.

I'm not inclined to rely on mgmt apps never unlink()ing as that's to
easy to mess up IMHO.

With regards,
Daniel
Michael Roth Dec. 18, 2024, 10:14 p.m. UTC | #3
On Wed, Dec 18, 2024 at 06:32:05PM +0100, Markus Armbruster wrote:
> Michael Roth <michael.roth@amd.com> writes:
> 
> > The GHCB specification[1] defines a VMGEXIT-based Guest Request
> > hypercall to allow an SNP guest to issue encrypted requests directly to
> > SNP firmware to do things like query the attestation report for the
> > guest. These are generally handled purely in the kernel.
> >
> > In some some cases, it's useful for the host to be able to additionally
> > supply the certificate chain for the signing key that SNP firmware uses
> > to sign these attestation reports. To allow for this, the GHCB
> > specification defines an Extended Guest Request where this certificate
> > data can be provided in a special format described in the GHCB spec.
> > This certificate data may be global or guest-specific depending on how
> > the guest was configured. Rather than providing interfaces to manage
> > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> > to request the certificate contents from userspace. Implement support
> > for that here.
> >
> > To synchronize delivery of the certificates to the guest in a way where
> > they will not be rendered invalid by updates to SNP firmware or
> > attestation singing/endorsement keys by management tools outside the
> > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> > obtain a shared/read lock on the certificate file prior to delivering
> > them back to KVM. Only after this will the attestation report be
> > retrieved from firmware and bundled with the certificate data, so QEMU
> > must continue to hold the file lock until KVM confirms that the
> > attestation report has been retrieved/bundled. This confirmation is done
> > by way of the kvm_immediate_exit callback infrastructure that was
> > introduced in a previous patch.
> 
> The "management tools outside the purview of QEMU" will all obtain the
> same kind of file lock?

Yes, this is meant for a cooperative environment where the cloud provider
has opted to provide guest certificates in this manner and needs a way
to synchronize guest attestation requests with other parts of their
stack handling management duties like updating firmware/endorsement
keys/etc. which might occur between the time a certificate is fetched
and the time the attestation report is generated by firmware.

The idea is to provide a path of least resistance using a common
framework like filesystem locks, then heavily suggest that approach on
the kernel documentation side, so that tools purposely or naturally opt
to use this mechanism rather than every service provider coming up with
some separate thing that they'll need to work into some custom QEMU/VMM
to solve the same problem.

Of course, userspace is free to implement their own completely separate
mechanism for handling all this and completely ignore file-locking. But
QEMU is only trying to play nice with this above-mentioned reference
implementation and cooperative management tools, and not trying to
profess to provide any sort of synchronization for cases where those
sorts of management-level updates are performed without utilizing this
reference implementation for synchronization.

I should have touched on this in the schema documentation. I'll make
sure to add that in the next spin.

> 
> > [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
> >     https://www.amd.com/en/developer/sev.html
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  qapi/qom.json                 |  23 +++-
> >  target/i386/kvm/kvm.c         |  10 ++
> >  target/i386/sev-sysemu-stub.c |   5 +
> >  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
> >  target/i386/sev.h             |   2 +
> >  5 files changed, 288 insertions(+), 1 deletion(-)
> >
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index 28ce24cd8d..6eaf0e7721 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -1034,6 +1034,25 @@
> >  #     firmware.  Set this to true to disable the use of VCEK.
> >  #     (default: false) (since: 9.1)
> >  #
> > +# @certs-path: Path to certificate data that can be passed to guests via
> > +#              SNP Extended Guest Requests. File should be in the format
> > +#              described in the GHCB specification. (default: none)
> > +#              (since: 10.0)
> 
> I prefer "filename" to "path".  We have many kinds of paths: pathnames
> (denoting files), QOM paths (denoting objects), qdev paths, search
> paths, ...  With "filename", your readers immediately know what you're
> talking about.
> 
> SevGuestProperties has a member 'dh-cert-file'.  Whether that's related
> to your file I can't tell from its documentation.
> 
> > +#
> > +# @certs-timeout: Max time in milliseconds to wait to obtain a read lock
> 
> Please don't abbreviate "Maximum" here.
> 
> Confident millisecond granularity will suffice forever?

I believe so.

There will likely be some form of rate-limiting added to KVM along the lines
of this patch, with proposals on the order of 2 requests per second for guest
requests to avoid DoS'ing each other:

  https://lore.kernel.org/all/20230119213426.379312-1-dionnaglaze@google.com/

I believe guest kernels already voluntarily limit themselves to
something similar before retrying requests.

So the only case I can think of where a sub-1ms interval would be
desired would be cases where the guest has very aggressive soft-lockup
detection settings, in which case we might as well just set the timeout
to 0 and not busy-wait at all, which should be allowed for here.
(There's more documentation in the related handling in
target/i386/sev.c, but in such cases the guest will get an EAGAIN and
generally retry the request later).

> 
> > +#                 on the certificate file specified by @certs-path. This
> > +#                 is not a cumulative value and only affects how long
> > +#                 QEMU waits before returning execution to the vCPU and
> > +#                 informing the guest of the timeout, so the guest can
> > +#                 still continuing retrying for as long as it likes
> > +#                 (which will be about 60 seconds for linux guests at
> > +#                 the time of this writing). If the guest-side timeout
> > +#                 is insufficient, set this higher to allow more time to
> > +#                 fetch the certificate. If the guest-side timeout is
> > +#                 sufficient, set this lower to reduce the likelihood of
> > +#                 soft lockups in the guest.
> > +#                 (default: 100) (since: 10.0)
> > +#
> >  # Since: 9.1
> >  ##
> 
> Please format like
> 
>    # @certs-path: Path to certificate data that can be passed to guests
>    #     via SNP Extended Guest Requests.  File should be in the format
>    #     described in the GHCB specification.
>    #     (default: none) (since: 10.0)
>    #
>    # @certs-timeout: Max time in milliseconds to wait to obtain a read
>    #     lock on the certificate file specified by @certs-path.  This is
>    #     not a cumulative value and only affects how long QEMU waits
>    #     before returning execution to the vCPU and informing the guest
>    #     of the timeout, so the guest can still continuing retrying for
>    #     as long as it likes (which will be about 60 seconds for linux
>    #     guests at the time of this writing).  If the guest-side timeout
>    #     is insufficient, set this higher to allow more time to fetch the
>    #     certificate.  If the guest-side timeout is sufficient, set this
>    #     lower to reduce the likelihood of soft lockups in the guest.
>    #     (default: 100) (since: 10.0)
> 
> to blend in with commit a937b6aa739 (qapi: Reformat doc comments to
> conform to current conventions).

Will do!

-Mike

> 
> >  { 'struct': 'SevSnpGuestProperties',
> > @@ -1045,7 +1064,9 @@
> >              '*id-auth': 'str',
> >              '*author-key-enabled': 'bool',
> >              '*host-data': 'str',
> > -            '*vcek-disabled': 'bool' } }
> > +            '*vcek-disabled': 'bool',
> > +            '*certs-path': 'str',
> > +            '*certs-timeout': 'uint32' } }
> >  
> >  ##
> >  # @ThreadContextProperties:
> 
> [...]
>
Michael Roth Dec. 18, 2024, 10:29 p.m. UTC | #4
On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote:
> On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote:
> > The GHCB specification[1] defines a VMGEXIT-based Guest Request
> > hypercall to allow an SNP guest to issue encrypted requests directly to
> > SNP firmware to do things like query the attestation report for the
> > guest. These are generally handled purely in the kernel.
> > 
> > In some some cases, it's useful for the host to be able to additionally
> > supply the certificate chain for the signing key that SNP firmware uses
> > to sign these attestation reports. To allow for this, the GHCB
> > specification defines an Extended Guest Request where this certificate
> > data can be provided in a special format described in the GHCB spec.
> > This certificate data may be global or guest-specific depending on how
> > the guest was configured. Rather than providing interfaces to manage
> > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> > to request the certificate contents from userspace. Implement support
> > for that here.
> > 
> > To synchronize delivery of the certificates to the guest in a way where
> > they will not be rendered invalid by updates to SNP firmware or
> > attestation singing/endorsement keys by management tools outside the
> > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> > obtain a shared/read lock on the certificate file prior to delivering
> > them back to KVM. Only after this will the attestation report be
> > retrieved from firmware and bundled with the certificate data, so QEMU
> > must continue to hold the file lock until KVM confirms that the
> > attestation report has been retrieved/bundled. This confirmation is done
> > by way of the kvm_immediate_exit callback infrastructure that was
> > introduced in a previous patch.
> > 
> > [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
> >     https://www.amd.com/en/developer/sev.html
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  qapi/qom.json                 |  23 +++-
> >  target/i386/kvm/kvm.c         |  10 ++
> >  target/i386/sev-sysemu-stub.c |   5 +
> >  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
> >  target/i386/sev.h             |   2 +
> >  5 files changed, 288 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index 28ce24cd8d..6eaf0e7721 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -1034,6 +1034,25 @@
> >  #     firmware.  Set this to true to disable the use of VCEK.
> >  #     (default: false) (since: 9.1)
> >  #
> > +# @certs-path: Path to certificate data that can be passed to guests via
> > +#              SNP Extended Guest Requests. File should be in the format
> > +#              described in the GHCB specification. (default: none)
> > +#              (since: 10.0)
> 
> Can we document the required format here explicitly, rather than expecting
> users to go searching for specs which are often practically impossible
> to find, and even harder to read & interpret ?

It'll be difficult to summarize in a way that will be self-reliant,
since knowing the certificate format is not sufficient to make sure
it coincides with the endorsement key being used by firmware. So I can't
promise to completely reduce reliance on external specs, but at least
give a better of the format and where those external specs will come
into play in filling out the data.

If it needs to be at least somewhat self-sufficient then that might
warrant a separate document in docs/system/i386/amd-memory-encryption.rst
or somewhere thereabouts that summarizes the whole attestation flow and
how certificates tie into that.

Any preferences?

> 
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index 1a4eb1ada6..2c41bdbccf 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -157,6 +157,9 @@ struct SevSnpGuestState {
> >      char *id_auth_base64;
> >      uint8_t *id_auth;
> >      char *host_data;
> > +    char *certs_path;
> > +    int certs_fd;
> > +    uint32_t certs_timeout;
> >  
> >      struct kvm_sev_snp_launch_start kvm_start_conf;
> >      struct kvm_sev_snp_launch_finish kvm_finish_conf;
> > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common)
> >      }
> >  }
> >  
> > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
> > +{
> > +    int fd, ret;
> > +
> > +    if (sev_snp_guest->certs_fd != -1) {
> > +        return 0;
> > +    }
> > +
> > +    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
> > +    if (fd == -1) {
> > +        error_report("Unable to open certificate blob at path %s, ret %d",
> > +                     sev_snp_guest->certs_path, fd);
> > +        return fd;
> > +    }
> > +
> > +    ret = qemu_lock_fd(fd, 0, 0, false);
> > +    if (ret == -EAGAIN || ret == -EACCES) {
> > +        ret = -EAGAIN;
> > +        goto out_close;
> > +    } else if (ret) {
> > +        goto out_close;
> > +    }
> 
> This locking scheme is likely unsafe. Consider this sequence
> 
>   * QEMU runs qemu_open(path)
>   * External mgmt app runs unlink(path)
>   * External mgmt app runs open(path)
>   * External mgmt app runs lock(fd)
>   * QEMU runs qemu_lock_fd(fd)
> 
> QEMU has successfully acquired a lock on an FD that corresponds to a
> deleted file, not the current existing file.
> 
> Avoiding this problem requires either that the external mgmt app agrees
> to *NEVER* unlink() the files under any circumstance, or for QEMU to
> run its open + lock logic in a loop, checking 'stat' and 'fstat' before
> opening and after locking, in order to detect a replaced file from its
> changed inode.
> 
> I'm not inclined to rely on mgmt apps never unlink()ing as that's to
> easy to mess up IMHO.

Yah I went into more detail in my response to Markus, but long story
short is that we are assuming mgmt is cooperative in this case, and
so as you mentioned, it would never unlink files while SNP guests are
running, but instead take an exclusive lock on them and update them in
place with the understanding that doing anything otherwise would open
a race window where guests might get stale certificates.

-Mike

> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
Daniel P. Berrangé Dec. 19, 2024, 8:13 a.m. UTC | #5
On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote:
> On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote:
> > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote:
> > > The GHCB specification[1] defines a VMGEXIT-based Guest Request
> > > hypercall to allow an SNP guest to issue encrypted requests directly to
> > > SNP firmware to do things like query the attestation report for the
> > > guest. These are generally handled purely in the kernel.
> > > 
> > > In some some cases, it's useful for the host to be able to additionally
> > > supply the certificate chain for the signing key that SNP firmware uses
> > > to sign these attestation reports. To allow for this, the GHCB
> > > specification defines an Extended Guest Request where this certificate
> > > data can be provided in a special format described in the GHCB spec.
> > > This certificate data may be global or guest-specific depending on how
> > > the guest was configured. Rather than providing interfaces to manage
> > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> > > to request the certificate contents from userspace. Implement support
> > > for that here.
> > > 
> > > To synchronize delivery of the certificates to the guest in a way where
> > > they will not be rendered invalid by updates to SNP firmware or
> > > attestation singing/endorsement keys by management tools outside the
> > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> > > obtain a shared/read lock on the certificate file prior to delivering
> > > them back to KVM. Only after this will the attestation report be
> > > retrieved from firmware and bundled with the certificate data, so QEMU
> > > must continue to hold the file lock until KVM confirms that the
> > > attestation report has been retrieved/bundled. This confirmation is done
> > > by way of the kvm_immediate_exit callback infrastructure that was
> > > introduced in a previous patch.
> > > 
> > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
> > >     https://www.amd.com/en/developer/sev.html
> > > 
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > >  qapi/qom.json                 |  23 +++-
> > >  target/i386/kvm/kvm.c         |  10 ++
> > >  target/i386/sev-sysemu-stub.c |   5 +
> > >  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
> > >  target/i386/sev.h             |   2 +
> > >  5 files changed, 288 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/qapi/qom.json b/qapi/qom.json
> > > index 28ce24cd8d..6eaf0e7721 100644
> > > --- a/qapi/qom.json
> > > +++ b/qapi/qom.json
> > > @@ -1034,6 +1034,25 @@
> > >  #     firmware.  Set this to true to disable the use of VCEK.
> > >  #     (default: false) (since: 9.1)
> > >  #
> > > +# @certs-path: Path to certificate data that can be passed to guests via
> > > +#              SNP Extended Guest Requests. File should be in the format
> > > +#              described in the GHCB specification. (default: none)
> > > +#              (since: 10.0)
> > 
> > Can we document the required format here explicitly, rather than expecting
> > users to go searching for specs which are often practically impossible
> > to find, and even harder to read & interpret ?
> 
> It'll be difficult to summarize in a way that will be self-reliant,
> since knowing the certificate format is not sufficient to make sure
> it coincides with the endorsement key being used by firmware. So I can't
> promise to completely reduce reliance on external specs, but at least
> give a better of the format and where those external specs will come
> into play in filling out the data.
> 
> If it needs to be at least somewhat self-sufficient then that might
> warrant a separate document in docs/system/i386/amd-memory-encryption.rst
> or somewhere thereabouts that summarizes the whole attestation flow and
> how certificates tie into that.
> 
> Any preferences?
> 
> > 
> > > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > > index 1a4eb1ada6..2c41bdbccf 100644
> > > --- a/target/i386/sev.c
> > > +++ b/target/i386/sev.c
> > > @@ -157,6 +157,9 @@ struct SevSnpGuestState {
> > >      char *id_auth_base64;
> > >      uint8_t *id_auth;
> > >      char *host_data;
> > > +    char *certs_path;
> > > +    int certs_fd;
> > > +    uint32_t certs_timeout;
> > >  
> > >      struct kvm_sev_snp_launch_start kvm_start_conf;
> > >      struct kvm_sev_snp_launch_finish kvm_finish_conf;
> > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common)
> > >      }
> > >  }
> > >  
> > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
> > > +{
> > > +    int fd, ret;
> > > +
> > > +    if (sev_snp_guest->certs_fd != -1) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
> > > +    if (fd == -1) {
> > > +        error_report("Unable to open certificate blob at path %s, ret %d",
> > > +                     sev_snp_guest->certs_path, fd);
> > > +        return fd;
> > > +    }
> > > +
> > > +    ret = qemu_lock_fd(fd, 0, 0, false);
> > > +    if (ret == -EAGAIN || ret == -EACCES) {
> > > +        ret = -EAGAIN;
> > > +        goto out_close;
> > > +    } else if (ret) {
> > > +        goto out_close;
> > > +    }
> > 
> > This locking scheme is likely unsafe. Consider this sequence
> > 
> >   * QEMU runs qemu_open(path)
> >   * External mgmt app runs unlink(path)
> >   * External mgmt app runs open(path)
> >   * External mgmt app runs lock(fd)
> >   * QEMU runs qemu_lock_fd(fd)
> > 
> > QEMU has successfully acquired a lock on an FD that corresponds to a
> > deleted file, not the current existing file.
> > 
> > Avoiding this problem requires either that the external mgmt app agrees
> > to *NEVER* unlink() the files under any circumstance, or for QEMU to
> > run its open + lock logic in a loop, checking 'stat' and 'fstat' before
> > opening and after locking, in order to detect a replaced file from its
> > changed inode.
> > 
> > I'm not inclined to rely on mgmt apps never unlink()ing as that's to
> > easy to mess up IMHO.
> 
> Yah I went into more detail in my response to Markus, but long story
> short is that we are assuming mgmt is cooperative in this case, and
> so as you mentioned, it would never unlink files while SNP guests are
> running, but instead take an exclusive lock on them and update them in
> place with the understanding that doing anything otherwise would open
> a race window where guests might get stale certificates.

If there's an expectation & requirement that no SNP guests are running,
then IMHO this whole thing is just over-engineered. Just remove all this
locking code entirely, and document that none of this must be changed
while QEMU is running - which is a common requirement for a great many
things on the host.

With regards,
Daniel
Michael Roth Dec. 19, 2024, 1:16 p.m. UTC | #6
On Thu, Dec 19, 2024 at 08:13:44AM +0000, Daniel P. Berrangé wrote:
> On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote:
> > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote:
> > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote:
> > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request
> > > > hypercall to allow an SNP guest to issue encrypted requests directly to
> > > > SNP firmware to do things like query the attestation report for the
> > > > guest. These are generally handled purely in the kernel.
> > > > 
> > > > In some some cases, it's useful for the host to be able to additionally
> > > > supply the certificate chain for the signing key that SNP firmware uses
> > > > to sign these attestation reports. To allow for this, the GHCB
> > > > specification defines an Extended Guest Request where this certificate
> > > > data can be provided in a special format described in the GHCB spec.
> > > > This certificate data may be global or guest-specific depending on how
> > > > the guest was configured. Rather than providing interfaces to manage
> > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> > > > to request the certificate contents from userspace. Implement support
> > > > for that here.
> > > > 
> > > > To synchronize delivery of the certificates to the guest in a way where
> > > > they will not be rendered invalid by updates to SNP firmware or
> > > > attestation singing/endorsement keys by management tools outside the
> > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> > > > obtain a shared/read lock on the certificate file prior to delivering
> > > > them back to KVM. Only after this will the attestation report be
> > > > retrieved from firmware and bundled with the certificate data, so QEMU
> > > > must continue to hold the file lock until KVM confirms that the
> > > > attestation report has been retrieved/bundled. This confirmation is done
> > > > by way of the kvm_immediate_exit callback infrastructure that was
> > > > introduced in a previous patch.
> > > > 
> > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
> > > >     https://www.amd.com/en/developer/sev.html
> > > > 
> > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > ---
> > > >  qapi/qom.json                 |  23 +++-
> > > >  target/i386/kvm/kvm.c         |  10 ++
> > > >  target/i386/sev-sysemu-stub.c |   5 +
> > > >  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
> > > >  target/i386/sev.h             |   2 +
> > > >  5 files changed, 288 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/qapi/qom.json b/qapi/qom.json
> > > > index 28ce24cd8d..6eaf0e7721 100644
> > > > --- a/qapi/qom.json
> > > > +++ b/qapi/qom.json
> > > > @@ -1034,6 +1034,25 @@
> > > >  #     firmware.  Set this to true to disable the use of VCEK.
> > > >  #     (default: false) (since: 9.1)
> > > >  #
> > > > +# @certs-path: Path to certificate data that can be passed to guests via
> > > > +#              SNP Extended Guest Requests. File should be in the format
> > > > +#              described in the GHCB specification. (default: none)
> > > > +#              (since: 10.0)
> > > 
> > > Can we document the required format here explicitly, rather than expecting
> > > users to go searching for specs which are often practically impossible
> > > to find, and even harder to read & interpret ?
> > 
> > It'll be difficult to summarize in a way that will be self-reliant,
> > since knowing the certificate format is not sufficient to make sure
> > it coincides with the endorsement key being used by firmware. So I can't
> > promise to completely reduce reliance on external specs, but at least
> > give a better of the format and where those external specs will come
> > into play in filling out the data.
> > 
> > If it needs to be at least somewhat self-sufficient then that might
> > warrant a separate document in docs/system/i386/amd-memory-encryption.rst
> > or somewhere thereabouts that summarizes the whole attestation flow and
> > how certificates tie into that.
> > 
> > Any preferences?
> > 
> > > 
> > > > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > > > index 1a4eb1ada6..2c41bdbccf 100644
> > > > --- a/target/i386/sev.c
> > > > +++ b/target/i386/sev.c
> > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState {
> > > >      char *id_auth_base64;
> > > >      uint8_t *id_auth;
> > > >      char *host_data;
> > > > +    char *certs_path;
> > > > +    int certs_fd;
> > > > +    uint32_t certs_timeout;
> > > >  
> > > >      struct kvm_sev_snp_launch_start kvm_start_conf;
> > > >      struct kvm_sev_snp_launch_finish kvm_finish_conf;
> > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common)
> > > >      }
> > > >  }
> > > >  
> > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
> > > > +{
> > > > +    int fd, ret;
> > > > +
> > > > +    if (sev_snp_guest->certs_fd != -1) {
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
> > > > +    if (fd == -1) {
> > > > +        error_report("Unable to open certificate blob at path %s, ret %d",
> > > > +                     sev_snp_guest->certs_path, fd);
> > > > +        return fd;
> > > > +    }
> > > > +
> > > > +    ret = qemu_lock_fd(fd, 0, 0, false);
> > > > +    if (ret == -EAGAIN || ret == -EACCES) {
> > > > +        ret = -EAGAIN;
> > > > +        goto out_close;
> > > > +    } else if (ret) {
> > > > +        goto out_close;
> > > > +    }
> > > 
> > > This locking scheme is likely unsafe. Consider this sequence
> > > 
> > >   * QEMU runs qemu_open(path)
> > >   * External mgmt app runs unlink(path)
> > >   * External mgmt app runs open(path)
> > >   * External mgmt app runs lock(fd)
> > >   * QEMU runs qemu_lock_fd(fd)
> > > 
> > > QEMU has successfully acquired a lock on an FD that corresponds to a
> > > deleted file, not the current existing file.
> > > 
> > > Avoiding this problem requires either that the external mgmt app agrees
> > > to *NEVER* unlink() the files under any circumstance, or for QEMU to
> > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before
> > > opening and after locking, in order to detect a replaced file from its
> > > changed inode.
> > > 
> > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to
> > > easy to mess up IMHO.
> > 
> > Yah I went into more detail in my response to Markus, but long story
> > short is that we are assuming mgmt is cooperative in this case, and
> > so as you mentioned, it would never unlink files while SNP guests are
> > running, but instead take an exclusive lock on them and update them in
> > place with the understanding that doing anything otherwise would open
> > a race window where guests might get stale certificates.
> 
> If there's an expectation & requirement that no SNP guests are running,
> then IMHO this whole thing is just over-engineered. Just remove all this
> locking code entirely, and document that none of this must be changed
> while QEMU is running - which is a common requirement for a great many
> things on the host.

VCEK endorsement keys can change as a result of SNP firmware updates,
which can be done while SNP guests are running and are often done in such
a way to patch bugs/security holes. VLEK endorsement keys can similarly be
updated on a live host. Both these sorts of interactions cannot be made
compatible with bundling certificates with attestation reports without some
orchestration in place to keep them atomic relative to the endorsement
key being used by firmware to sign attestation reports. Every CSP
implementing this will need to solve it in some way, and I'm sure some
will handle all this completely differently. But it will make
interoperative management/tooling a mess, and having a reference
implementation based around something common will make it easier to
steer CSPs to that common solution and give management tools authors
*some* reference approach to target rather than expecting to retrofit
some custom solution on top.

With these patches, you can update firmware and endorsement keys while
SNP guests are running, but it requires write locks on any active
certificates as defined here and in the kernel, and doing certificate
file updates in place while that write lock is still held. I don't really
think that's over-engineered. I think it's surprisingly simple given the
potential complexity of the above-mentioned requirements.

But yes if management tries to unlink certificates while SNP guests are
running, all bets are off. But at that point they are not cooperating
with kernel/QEMU, and we don't need to care about that. And if they
really do need to blow away certificates for a complete re-install
or data-wipe or whatever, at that point they'd just need to ensure
they stop all their SNP guests first.

-Mike

> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
Daniel P. Berrangé Dec. 19, 2024, 1:37 p.m. UTC | #7
On Thu, Dec 19, 2024 at 07:16:01AM -0600, Michael Roth wrote:
> On Thu, Dec 19, 2024 at 08:13:44AM +0000, Daniel P. Berrangé wrote:
> > On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote:
> > > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote:
> > > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote:
> > > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request
> > > > > hypercall to allow an SNP guest to issue encrypted requests directly to
> > > > > SNP firmware to do things like query the attestation report for the
> > > > > guest. These are generally handled purely in the kernel.
> > > > > 
> > > > > In some some cases, it's useful for the host to be able to additionally
> > > > > supply the certificate chain for the signing key that SNP firmware uses
> > > > > to sign these attestation reports. To allow for this, the GHCB
> > > > > specification defines an Extended Guest Request where this certificate
> > > > > data can be provided in a special format described in the GHCB spec.
> > > > > This certificate data may be global or guest-specific depending on how
> > > > > the guest was configured. Rather than providing interfaces to manage
> > > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> > > > > to request the certificate contents from userspace. Implement support
> > > > > for that here.
> > > > > 
> > > > > To synchronize delivery of the certificates to the guest in a way where
> > > > > they will not be rendered invalid by updates to SNP firmware or
> > > > > attestation singing/endorsement keys by management tools outside the
> > > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> > > > > obtain a shared/read lock on the certificate file prior to delivering
> > > > > them back to KVM. Only after this will the attestation report be
> > > > > retrieved from firmware and bundled with the certificate data, so QEMU
> > > > > must continue to hold the file lock until KVM confirms that the
> > > > > attestation report has been retrieved/bundled. This confirmation is done
> > > > > by way of the kvm_immediate_exit callback infrastructure that was
> > > > > introduced in a previous patch.
> > > > > 
> > > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
> > > > >     https://www.amd.com/en/developer/sev.html
> > > > > 
> > > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > > ---
> > > > >  qapi/qom.json                 |  23 +++-
> > > > >  target/i386/kvm/kvm.c         |  10 ++
> > > > >  target/i386/sev-sysemu-stub.c |   5 +
> > > > >  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
> > > > >  target/i386/sev.h             |   2 +
> > > > >  5 files changed, 288 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/qapi/qom.json b/qapi/qom.json
> > > > > index 28ce24cd8d..6eaf0e7721 100644
> > > > > --- a/qapi/qom.json
> > > > > +++ b/qapi/qom.json
> > > > > @@ -1034,6 +1034,25 @@
> > > > >  #     firmware.  Set this to true to disable the use of VCEK.
> > > > >  #     (default: false) (since: 9.1)
> > > > >  #
> > > > > +# @certs-path: Path to certificate data that can be passed to guests via
> > > > > +#              SNP Extended Guest Requests. File should be in the format
> > > > > +#              described in the GHCB specification. (default: none)
> > > > > +#              (since: 10.0)
> > > > 
> > > > Can we document the required format here explicitly, rather than expecting
> > > > users to go searching for specs which are often practically impossible
> > > > to find, and even harder to read & interpret ?
> > > 
> > > It'll be difficult to summarize in a way that will be self-reliant,
> > > since knowing the certificate format is not sufficient to make sure
> > > it coincides with the endorsement key being used by firmware. So I can't
> > > promise to completely reduce reliance on external specs, but at least
> > > give a better of the format and where those external specs will come
> > > into play in filling out the data.
> > > 
> > > If it needs to be at least somewhat self-sufficient then that might
> > > warrant a separate document in docs/system/i386/amd-memory-encryption.rst
> > > or somewhere thereabouts that summarizes the whole attestation flow and
> > > how certificates tie into that.
> > > 
> > > Any preferences?
> > > 
> > > > 
> > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > > > > index 1a4eb1ada6..2c41bdbccf 100644
> > > > > --- a/target/i386/sev.c
> > > > > +++ b/target/i386/sev.c
> > > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState {
> > > > >      char *id_auth_base64;
> > > > >      uint8_t *id_auth;
> > > > >      char *host_data;
> > > > > +    char *certs_path;
> > > > > +    int certs_fd;
> > > > > +    uint32_t certs_timeout;
> > > > >  
> > > > >      struct kvm_sev_snp_launch_start kvm_start_conf;
> > > > >      struct kvm_sev_snp_launch_finish kvm_finish_conf;
> > > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common)
> > > > >      }
> > > > >  }
> > > > >  
> > > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
> > > > > +{
> > > > > +    int fd, ret;
> > > > > +
> > > > > +    if (sev_snp_guest->certs_fd != -1) {
> > > > > +        return 0;
> > > > > +    }
> > > > > +
> > > > > +    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
> > > > > +    if (fd == -1) {
> > > > > +        error_report("Unable to open certificate blob at path %s, ret %d",
> > > > > +                     sev_snp_guest->certs_path, fd);
> > > > > +        return fd;
> > > > > +    }
> > > > > +
> > > > > +    ret = qemu_lock_fd(fd, 0, 0, false);
> > > > > +    if (ret == -EAGAIN || ret == -EACCES) {
> > > > > +        ret = -EAGAIN;
> > > > > +        goto out_close;
> > > > > +    } else if (ret) {
> > > > > +        goto out_close;
> > > > > +    }
> > > > 
> > > > This locking scheme is likely unsafe. Consider this sequence
> > > > 
> > > >   * QEMU runs qemu_open(path)
> > > >   * External mgmt app runs unlink(path)
> > > >   * External mgmt app runs open(path)
> > > >   * External mgmt app runs lock(fd)
> > > >   * QEMU runs qemu_lock_fd(fd)
> > > > 
> > > > QEMU has successfully acquired a lock on an FD that corresponds to a
> > > > deleted file, not the current existing file.
> > > > 
> > > > Avoiding this problem requires either that the external mgmt app agrees
> > > > to *NEVER* unlink() the files under any circumstance, or for QEMU to
> > > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before
> > > > opening and after locking, in order to detect a replaced file from its
> > > > changed inode.
> > > > 
> > > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to
> > > > easy to mess up IMHO.
> > > 
> > > Yah I went into more detail in my response to Markus, but long story
> > > short is that we are assuming mgmt is cooperative in this case, and
> > > so as you mentioned, it would never unlink files while SNP guests are
> > > running, but instead take an exclusive lock on them and update them in
> > > place with the understanding that doing anything otherwise would open
> > > a race window where guests might get stale certificates.
> > 
> > If there's an expectation & requirement that no SNP guests are running,
> > then IMHO this whole thing is just over-engineered. Just remove all this
> > locking code entirely, and document that none of this must be changed
> > while QEMU is running - which is a common requirement for a great many
> > things on the host.
> 
> VCEK endorsement keys can change as a result of SNP firmware updates,
> which can be done while SNP guests are running and are often done in such
> a way to patch bugs/security holes. VLEK endorsement keys can similarly be
> updated on a live host. Both these sorts of interactions cannot be made
> compatible with bundling certificates with attestation reports without some
> orchestration in place to keep them atomic relative to the endorsement
> key being used by firmware to sign attestation reports. Every CSP
> implementing this will need to solve it in some way, and I'm sure some
> will handle all this completely differently. But it will make
> interoperative management/tooling a mess, and having a reference
> implementation based around something common will make it easier to
> steer CSPs to that common solution and give management tools authors
> *some* reference approach to target rather than expecting to retrofit
> some custom solution on top.
> 
> With these patches, you can update firmware and endorsement keys while
> SNP guests are running, but it requires write locks on any active
> certificates as defined here and in the kernel, and doing certificate
> file updates in place while that write lock is still held. I don't really
> think that's over-engineered. I think it's surprisingly simple given the
> potential complexity of the above-mentioned requirements.
> 
> But yes if management tries to unlink certificates while SNP guests are
> running, all bets are off. But at that point they are not cooperating
> with kernel/QEMU, and we don't need to care about that. And if they
> really do need to blow away certificates for a complete re-install
> or data-wipe or whatever, at that point they'd just need to ensure
> they stop all their SNP guests first.

IMHO we msut consider unlink() to be a valid thing, because the right
way for apps to perform crash safe atomic updates of existing files,
is to use rename() from a temporary file, and the rename() in has an
implicit unlink as part of its operation. ie apps would be doing:

   fd = open("foo.tmp")
   write(fd, ...)
   fsync(fd)
   close(fd)
   rename("foo.tmp", "foo")

That final rename operation will have the same effect on the locks
as unlink(). To cope with this anything doing locking has to run
in a loop comparing the inode either side of acquiring the lock

TLDR: if we're going to do locking in QEMU, it needs to be done
robustly.

With regards,
Daniel
Michael Roth Dec. 19, 2024, 5:49 p.m. UTC | #8
On Thu, Dec 19, 2024 at 01:37:18PM +0000, Daniel P. Berrangé wrote:
> On Thu, Dec 19, 2024 at 07:16:01AM -0600, Michael Roth wrote:
> > On Thu, Dec 19, 2024 at 08:13:44AM +0000, Daniel P. Berrangé wrote:
> > > On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote:
> > > > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote:
> > > > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote:
> > > > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request
> > > > > > hypercall to allow an SNP guest to issue encrypted requests directly to
> > > > > > SNP firmware to do things like query the attestation report for the
> > > > > > guest. These are generally handled purely in the kernel.
> > > > > > 
> > > > > > In some some cases, it's useful for the host to be able to additionally
> > > > > > supply the certificate chain for the signing key that SNP firmware uses
> > > > > > to sign these attestation reports. To allow for this, the GHCB
> > > > > > specification defines an Extended Guest Request where this certificate
> > > > > > data can be provided in a special format described in the GHCB spec.
> > > > > > This certificate data may be global or guest-specific depending on how
> > > > > > the guest was configured. Rather than providing interfaces to manage
> > > > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> > > > > > to request the certificate contents from userspace. Implement support
> > > > > > for that here.
> > > > > > 
> > > > > > To synchronize delivery of the certificates to the guest in a way where
> > > > > > they will not be rendered invalid by updates to SNP firmware or
> > > > > > attestation singing/endorsement keys by management tools outside the
> > > > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> > > > > > obtain a shared/read lock on the certificate file prior to delivering
> > > > > > them back to KVM. Only after this will the attestation report be
> > > > > > retrieved from firmware and bundled with the certificate data, so QEMU
> > > > > > must continue to hold the file lock until KVM confirms that the
> > > > > > attestation report has been retrieved/bundled. This confirmation is done
> > > > > > by way of the kvm_immediate_exit callback infrastructure that was
> > > > > > introduced in a previous patch.
> > > > > > 
> > > > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
> > > > > >     https://www.amd.com/en/developer/sev.html
> > > > > > 
> > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > > > ---
> > > > > >  qapi/qom.json                 |  23 +++-
> > > > > >  target/i386/kvm/kvm.c         |  10 ++
> > > > > >  target/i386/sev-sysemu-stub.c |   5 +
> > > > > >  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
> > > > > >  target/i386/sev.h             |   2 +
> > > > > >  5 files changed, 288 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/qapi/qom.json b/qapi/qom.json
> > > > > > index 28ce24cd8d..6eaf0e7721 100644
> > > > > > --- a/qapi/qom.json
> > > > > > +++ b/qapi/qom.json
> > > > > > @@ -1034,6 +1034,25 @@
> > > > > >  #     firmware.  Set this to true to disable the use of VCEK.
> > > > > >  #     (default: false) (since: 9.1)
> > > > > >  #
> > > > > > +# @certs-path: Path to certificate data that can be passed to guests via
> > > > > > +#              SNP Extended Guest Requests. File should be in the format
> > > > > > +#              described in the GHCB specification. (default: none)
> > > > > > +#              (since: 10.0)
> > > > > 
> > > > > Can we document the required format here explicitly, rather than expecting
> > > > > users to go searching for specs which are often practically impossible
> > > > > to find, and even harder to read & interpret ?
> > > > 
> > > > It'll be difficult to summarize in a way that will be self-reliant,
> > > > since knowing the certificate format is not sufficient to make sure
> > > > it coincides with the endorsement key being used by firmware. So I can't
> > > > promise to completely reduce reliance on external specs, but at least
> > > > give a better of the format and where those external specs will come
> > > > into play in filling out the data.
> > > > 
> > > > If it needs to be at least somewhat self-sufficient then that might
> > > > warrant a separate document in docs/system/i386/amd-memory-encryption.rst
> > > > or somewhere thereabouts that summarizes the whole attestation flow and
> > > > how certificates tie into that.
> > > > 
> > > > Any preferences?
> > > > 
> > > > > 
> > > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > > > > > index 1a4eb1ada6..2c41bdbccf 100644
> > > > > > --- a/target/i386/sev.c
> > > > > > +++ b/target/i386/sev.c
> > > > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState {
> > > > > >      char *id_auth_base64;
> > > > > >      uint8_t *id_auth;
> > > > > >      char *host_data;
> > > > > > +    char *certs_path;
> > > > > > +    int certs_fd;
> > > > > > +    uint32_t certs_timeout;
> > > > > >  
> > > > > >      struct kvm_sev_snp_launch_start kvm_start_conf;
> > > > > >      struct kvm_sev_snp_launch_finish kvm_finish_conf;
> > > > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common)
> > > > > >      }
> > > > > >  }
> > > > > >  
> > > > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
> > > > > > +{
> > > > > > +    int fd, ret;
> > > > > > +
> > > > > > +    if (sev_snp_guest->certs_fd != -1) {
> > > > > > +        return 0;
> > > > > > +    }
> > > > > > +
> > > > > > +    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
> > > > > > +    if (fd == -1) {
> > > > > > +        error_report("Unable to open certificate blob at path %s, ret %d",
> > > > > > +                     sev_snp_guest->certs_path, fd);
> > > > > > +        return fd;
> > > > > > +    }
> > > > > > +
> > > > > > +    ret = qemu_lock_fd(fd, 0, 0, false);
> > > > > > +    if (ret == -EAGAIN || ret == -EACCES) {
> > > > > > +        ret = -EAGAIN;
> > > > > > +        goto out_close;
> > > > > > +    } else if (ret) {
> > > > > > +        goto out_close;
> > > > > > +    }
> > > > > 
> > > > > This locking scheme is likely unsafe. Consider this sequence
> > > > > 
> > > > >   * QEMU runs qemu_open(path)
> > > > >   * External mgmt app runs unlink(path)
> > > > >   * External mgmt app runs open(path)
> > > > >   * External mgmt app runs lock(fd)
> > > > >   * QEMU runs qemu_lock_fd(fd)
> > > > > 
> > > > > QEMU has successfully acquired a lock on an FD that corresponds to a
> > > > > deleted file, not the current existing file.
> > > > > 
> > > > > Avoiding this problem requires either that the external mgmt app agrees
> > > > > to *NEVER* unlink() the files under any circumstance, or for QEMU to
> > > > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before
> > > > > opening and after locking, in order to detect a replaced file from its
> > > > > changed inode.
> > > > > 
> > > > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to
> > > > > easy to mess up IMHO.
> > > > 
> > > > Yah I went into more detail in my response to Markus, but long story
> > > > short is that we are assuming mgmt is cooperative in this case, and
> > > > so as you mentioned, it would never unlink files while SNP guests are
> > > > running, but instead take an exclusive lock on them and update them in
> > > > place with the understanding that doing anything otherwise would open
> > > > a race window where guests might get stale certificates.
> > > 
> > > If there's an expectation & requirement that no SNP guests are running,
> > > then IMHO this whole thing is just over-engineered. Just remove all this
> > > locking code entirely, and document that none of this must be changed
> > > while QEMU is running - which is a common requirement for a great many
> > > things on the host.
> > 
> > VCEK endorsement keys can change as a result of SNP firmware updates,
> > which can be done while SNP guests are running and are often done in such
> > a way to patch bugs/security holes. VLEK endorsement keys can similarly be
> > updated on a live host. Both these sorts of interactions cannot be made
> > compatible with bundling certificates with attestation reports without some
> > orchestration in place to keep them atomic relative to the endorsement
> > key being used by firmware to sign attestation reports. Every CSP
> > implementing this will need to solve it in some way, and I'm sure some
> > will handle all this completely differently. But it will make
> > interoperative management/tooling a mess, and having a reference
> > implementation based around something common will make it easier to
> > steer CSPs to that common solution and give management tools authors
> > *some* reference approach to target rather than expecting to retrofit
> > some custom solution on top.
> > 
> > With these patches, you can update firmware and endorsement keys while
> > SNP guests are running, but it requires write locks on any active
> > certificates as defined here and in the kernel, and doing certificate
> > file updates in place while that write lock is still held. I don't really
> > think that's over-engineered. I think it's surprisingly simple given the
> > potential complexity of the above-mentioned requirements.
> > 
> > But yes if management tries to unlink certificates while SNP guests are
> > running, all bets are off. But at that point they are not cooperating
> > with kernel/QEMU, and we don't need to care about that. And if they
> > really do need to blow away certificates for a complete re-install
> > or data-wipe or whatever, at that point they'd just need to ensure
> > they stop all their SNP guests first.
> 
> IMHO we msut consider unlink() to be a valid thing, because the right
> way for apps to perform crash safe atomic updates of existing files,
> is to use rename() from a temporary file, and the rename() in has an
> implicit unlink as part of its operation. ie apps would be doing:
> 
>    fd = open("foo.tmp")
>    write(fd, ...)
>    fsync(fd)
>    close(fd)
>    rename("foo.tmp", "foo")

If we still want to allow for this rather than enforcing in-place
update, one alternative would be to just allow a separate lock file
to be specified rather than locking the certificate file itself. That
would provide a bit more flexibility.

I can update the QEMU implementation to take -certs-lock-file in
addition to -certs-file so they can be specified separately. And if
-certs-lock-file is not specified then QEMU will just assume
management handles things different or has agreed to not do endorsement
key updates while SNP guests are running.

I think we'd considered something like that originally but the thinking
was that locking the certs themselves was more organic in terms of an
"obvious"/natural solution. But it does end up being a bit more
inflexible WRT how libraries/etc. might manage file updates underneath
the covers, so maybe a lock file is the better approach after all.

-Mike

> 
> That final rename operation will have the same effect on the locks
> as unlink(). To cope with this anything doing locking has to run
> in a loop comparing the inode either side of acquiring the lock
> 
> TLDR: if we're going to do locking in QEMU, it needs to be done
> robustly.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
Daniel P. Berrangé Dec. 19, 2024, 6:01 p.m. UTC | #9
On Thu, Dec 19, 2024 at 11:49:49AM -0600, Michael Roth wrote:
> On Thu, Dec 19, 2024 at 01:37:18PM +0000, Daniel P. Berrangé wrote:
> > IMHO we msut consider unlink() to be a valid thing, because the right
> > way for apps to perform crash safe atomic updates of existing files,
> > is to use rename() from a temporary file, and the rename() in has an
> > implicit unlink as part of its operation. ie apps would be doing:
> > 
> >    fd = open("foo.tmp")
> >    write(fd, ...)
> >    fsync(fd)
> >    close(fd)
> >    rename("foo.tmp", "foo")
> 
> If we still want to allow for this rather than enforcing in-place
> update, one alternative would be to just allow a separate lock file
> to be specified rather than locking the certificate file itself. That
> would provide a bit more flexibility.
> 
> I can update the QEMU implementation to take -certs-lock-file in
> addition to -certs-file so they can be specified separately. And if
> -certs-lock-file is not specified then QEMU will just assume
> management handles things different or has agreed to not do endorsement
> key updates while SNP guests are running.
> 
> I think we'd considered something like that originally but the thinking
> was that locking the certs themselves was more organic in terms of an
> "obvious"/natural solution. But it does end up being a bit more
> inflexible WRT how libraries/etc. might manage file updates underneath
> the covers, so maybe a lock file is the better approach after all.

If we want locking, I think locking the certs file directly is a nicer
idea, as it avoids everyone having to agree on the location of the
lock file, relative to the certs file.

The current locking code just needs to go inside a while(1) loop and
have fstat + stat added to detect the recreation race. For example:

  https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virpidfile.c#L376

With regards,
Daniel
diff mbox series

Patch

diff --git a/qapi/qom.json b/qapi/qom.json
index 28ce24cd8d..6eaf0e7721 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1034,6 +1034,25 @@ 
 #     firmware.  Set this to true to disable the use of VCEK.
 #     (default: false) (since: 9.1)
 #
+# @certs-path: Path to certificate data that can be passed to guests via
+#              SNP Extended Guest Requests. File should be in the format
+#              described in the GHCB specification. (default: none)
+#              (since: 10.0)
+#
+# @certs-timeout: Max time in milliseconds to wait to obtain a read lock
+#                 on the certificate file specified by @certs-path. This
+#                 is not a cumulative value and only affects how long
+#                 QEMU waits before returning execution to the vCPU and
+#                 informing the guest of the timeout, so the guest can
+#                 still continuing retrying for as long as it likes
+#                 (which will be about 60 seconds for linux guests at
+#                 the time of this writing). If the guest-side timeout
+#                 is insufficient, set this higher to allow more time to
+#                 fetch the certificate. If the guest-side timeout is
+#                 sufficient, set this lower to reduce the likelihood of
+#                 soft lockups in the guest.
+#                 (default: 100) (since: 10.0)
+#
 # Since: 9.1
 ##
 { 'struct': 'SevSnpGuestProperties',
@@ -1045,7 +1064,9 @@ 
             '*id-auth': 'str',
             '*author-key-enabled': 'bool',
             '*host-data': 'str',
-            '*vcek-disabled': 'bool' } }
+            '*vcek-disabled': 'bool',
+            '*certs-path': 'str',
+            '*certs-timeout': 'uint32' } }
 
 ##
 # @ThreadContextProperties:
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8e17942c3b..cffc7ff33e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6107,6 +6107,16 @@  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 #endif
     case KVM_EXIT_HYPERCALL:
         ret = kvm_handle_hypercall(run);
+        break;
+    case KVM_EXIT_SNP_REQ_CERTS:
+        if (!sev_snp_enabled()) {
+            error_report("KVM: Encountered a certificate request exit for a "
+                         "non-SEV-SNP guest.");
+            ret = -1;
+        } else {
+            ret = kvm_handle_snp_req_certs(cs, run);
+        }
+
         break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
diff --git a/target/i386/sev-sysemu-stub.c b/target/i386/sev-sysemu-stub.c
index d5bf886e79..685c56f62c 100644
--- a/target/i386/sev-sysemu-stub.c
+++ b/target/i386/sev-sysemu-stub.c
@@ -71,3 +71,8 @@  void hmp_info_sev(Monitor *mon, const QDict *qdict)
 void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t flash_size)
 {
 }
+
+int kvm_handle_snp_req_certs(CPUState *cpu, struct kvm_run *run)
+{
+    g_assert_not_reached();
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1a4eb1ada6..2c41bdbccf 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -157,6 +157,9 @@  struct SevSnpGuestState {
     char *id_auth_base64;
     uint8_t *id_auth;
     char *host_data;
+    char *certs_path;
+    int certs_fd;
+    uint32_t certs_timeout;
 
     struct kvm_sev_snp_launch_start kvm_start_conf;
     struct kvm_sev_snp_launch_finish kvm_finish_conf;
@@ -1355,6 +1358,215 @@  sev_snp_launch_finish(SevCommonState *sev_common)
     }
 }
 
+static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
+{
+    int fd, ret;
+
+    if (sev_snp_guest->certs_fd != -1) {
+        return 0;
+    }
+
+    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
+    if (fd == -1) {
+        error_report("Unable to open certificate blob at path %s, ret %d",
+                     sev_snp_guest->certs_path, fd);
+        return fd;
+    }
+
+    ret = qemu_lock_fd(fd, 0, 0, false);
+    if (ret == -EAGAIN || ret == -EACCES) {
+        ret = -EAGAIN;
+        goto out_close;
+    } else if (ret) {
+        goto out_close;
+    }
+
+    sev_snp_guest->certs_fd = fd;
+    return 0;
+out_close:
+    close(fd);
+    return ret;
+}
+
+static void close_certs(SevSnpGuestState *sev_snp_guest)
+{
+    if (sev_snp_guest->certs_fd == -1) {
+        return;
+    }
+
+    qemu_unlock_fd(sev_snp_guest->certs_fd, 0, 0);
+    close(sev_snp_guest->certs_fd);
+    sev_snp_guest->certs_fd = -1;
+}
+
+static ssize_t get_certs_size(SevSnpGuestState *sev_snp_guest)
+{
+    ssize_t size;
+
+    size = lseek(sev_snp_guest->certs_fd, 0, SEEK_END);
+
+    if (size < 0)
+        return -errno;
+
+    return size;
+}
+
+static int read_certs(SevSnpGuestState *sev_snp_guest, void *buf, size_t buf_len)
+{
+    ssize_t n, len = 0;
+
+    n = lseek(sev_snp_guest->certs_fd, 0, SEEK_SET);
+    if (n) {
+        return n;
+    }
+
+    while ((n = read(sev_snp_guest->certs_fd, buf, buf_len)) != 0) {
+        if (n < 0) {
+            if (errno == EINTR) {
+                continue;
+            } else {
+                len = -errno;
+                break;
+            }
+        }
+        len += n;
+    }
+
+    return len;
+}
+
+static void snp_release_certs_lock(void *opaque)
+{
+    SevSnpGuestState *sev_snp_guest = opaque;
+
+    close_certs(sev_snp_guest);
+}
+
+static void certs_timeout(void *opaque)
+{
+    bool *timed_out = opaque;
+
+    *timed_out = true;
+}
+
+int kvm_handle_snp_req_certs(CPUState *cpu, struct kvm_run *run)
+{
+    struct kvm_exit_snp_req_certs *req_certs = &run->snp_req_certs;
+    g_autofree gchar *contents = NULL;
+    SevSnpGuestState *sev_snp_guest;
+    MemTxAttrs attrs = { 0 };
+    bool timed_out = false;
+    QEMUTimer *timer;
+    uint32_t npages;
+    void *guest_buf;
+    hwaddr buf_sz;
+    hwaddr gpa;
+    int ret;
+
+    req_certs->ret = EIO;
+
+    if (!sev_snp_enabled()) {
+        return -EIO;
+    }
+
+    gpa = req_certs->gfn << TARGET_PAGE_BITS;
+    npages = req_certs->npages;
+
+    sev_snp_guest = SEV_SNP_GUEST(MACHINE(qdev_get_machine())->cgs);
+    if (!sev_snp_guest->certs_path) {
+        req_certs->ret = 0;
+        return 0;
+    }
+
+    /*
+     * -EAGAIN from open_certs_locked() indicates that a lock could not be
+     * obtained on the certificate file. It would be possible to set
+     * req_certs->ret = EAGAIN to immediately inform the requesting vCPU of
+     * this condition so that it can retry again later, however, linux guests
+     * are currently hard-limited to a 60 second timeout, at which point they
+     * will assume misbehavior and refuse to continue issuing attestation
+     * requests from that point forward.
+     *
+     * Allowing for this guest timeout to be configured in some way that will
+     * be easier to coordinate with on the QEMU side will make that approach
+     * more viable, but until then just busy-wait on the QEMU side, which
+     * allows more flexibility in how long QEMU can wait for things like SNP
+     * firmware updates/etc. which may be holding an exclusive lock on the
+     * certificate.
+     *
+     * The down-side to doing things this way is that too long of a busy-wait
+     * will result in soft-lockups in the guest, but the guest will otherwise
+     * continue normally afterward.
+     *
+     * If the QEMU-side timeout is reached, then just go ahead and indicate
+     * EAGAIN to the guest, at which point the above-mentioned guest-side
+     * timeout will trigger if QEMU's timeout exceeds that of the guest. But
+     * that's better than killing the vCPU, which is the only viable
+     * alternative at that point.
+     */
+    timer = timer_new_ms(QEMU_CLOCK_REALTIME, certs_timeout, &timed_out);
+    timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+                     sev_snp_guest->certs_timeout);
+
+    do {
+        ret = open_certs_locked(sev_snp_guest);
+        g_usleep(1000);
+    } while (ret == -EAGAIN && !timed_out);
+
+    timer_del(timer);
+
+    if (ret) {
+        if (timed_out) {
+            req_certs->ret = EAGAIN;
+            return 0;
+        }
+
+        /*
+         * It's a bit ambiguous when to propagate a generic error to the guest
+         * rather than simply letting QEMU crash. The general methodology here
+         * is to let QEMU crash if there is a misconfiguration issue that the
+         * guest has absolutely no potential role in, but to otherwise notify
+         * the guest if there is some remote possibility that the issue is
+         * related to something on the guest side.
+         */
+        return ret;
+    }
+
+    buf_sz = npages * TARGET_PAGE_SIZE;
+    if (buf_sz < get_certs_size(sev_snp_guest)) {
+        req_certs->ret = ENOSPC;
+        req_certs->npages =
+            (get_certs_size(sev_snp_guest) + TARGET_PAGE_SIZE) / TARGET_PAGE_SIZE;
+        close_certs(sev_snp_guest);
+        goto out;
+    }
+
+    guest_buf = address_space_map(&address_space_memory, gpa, &buf_sz, true, attrs);
+    if (buf_sz < npages * TARGET_PAGE_SIZE) {
+        error_report_once("Unable to map entire guest buffer, mapped size %ld (expected %ld)",
+                          buf_sz, get_certs_size(sev_snp_guest));
+        close_certs(sev_snp_guest);
+        goto out_unmap;
+    }
+
+    ret = read_certs(sev_snp_guest, guest_buf, buf_sz);
+    if (ret < 0) {
+        error_report_once("Unable to read certificate data into guest buffer, ret %d",
+                          ret);
+        close_certs(sev_snp_guest);
+        goto out_unmap;
+
+    }
+
+    req_certs->ret = 0;
+
+    add_immediate_exit_callback(cpu, snp_release_certs_lock, sev_snp_guest);
+out_unmap:
+    address_space_unmap(&address_space_memory, guest_buf, buf_sz, true, buf_sz);
+
+out:
+    return ret;
+}
 
 static void
 sev_vm_state_change(void *opaque, bool running, RunState state)
@@ -1594,6 +1806,7 @@  static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 
 static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
+    SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(cgs);
     MachineState *ms = MACHINE(qdev_get_machine());
     X86MachineState *x86ms = X86_MACHINE(ms);
 
@@ -1604,6 +1817,13 @@  static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return -1;
     }
 
+    if (sev_snp_guest->certs_path &&
+        kvm_vm_enable_cap(kvm_state, KVM_CAP_EXIT_SNP_REQ_CERTS, 0, 1)) {
+        error_setg(errp, "Failed to enable support for SEV-SNP "
+                         "certificate-fetching requests.");
+        return -1;
+    }
+
     return 0;
 }
 
@@ -2393,6 +2613,26 @@  sev_snp_guest_set_host_data(Object *obj, const char *value, Error **errp)
     memcpy(finish->host_data, blob, len);
 }
 
+static char *
+sev_snp_guest_get_certs_path(Object *obj, Error **errp)
+{
+    SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj);
+
+    return g_strdup(sev_snp_guest->certs_path);
+}
+
+static void
+sev_snp_guest_set_certs_path(Object *obj, const char *value, Error **errp)
+{
+    SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj);
+
+    if (sev_snp_guest->certs_path) {
+        g_free(sev_snp_guest->certs_path);
+    }
+
+    sev_snp_guest->certs_path = value ? g_strdup(value) : NULL;
+}
+
 static void
 sev_snp_guest_class_init(ObjectClass *oc, void *data)
 {
@@ -2428,6 +2668,9 @@  sev_snp_guest_class_init(ObjectClass *oc, void *data)
     object_class_property_add_str(oc, "host-data",
                                   sev_snp_guest_get_host_data,
                                   sev_snp_guest_set_host_data);
+    object_class_property_add_str(oc, "certs-path",
+                                  sev_snp_guest_get_certs_path,
+                                  sev_snp_guest_set_certs_path);
 }
 
 static void
@@ -2440,6 +2683,12 @@  sev_snp_guest_instance_init(Object *obj)
 
     /* default init/start/finish params for kvm */
     sev_snp_guest->kvm_start_conf.policy = DEFAULT_SEV_SNP_POLICY;
+
+    sev_snp_guest->certs_fd = -1;
+    sev_snp_guest->certs_timeout = 100;
+    object_property_add_uint32_ptr(obj, "certs-timeout",
+                                   &sev_snp_guest->certs_timeout,
+                                   OBJ_PROP_FLAG_READWRITE);
 }
 
 /* guest info specific to sev-snp */
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 858005a119..91b9c38ac1 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -68,4 +68,6 @@  void sev_es_set_reset_vector(CPUState *cpu);
 
 void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t flash_size);
 
+int kvm_handle_snp_req_certs(CPUState *cpu, struct kvm_run *run);
+
 #endif