Message ID | 20241218154939.1114831-4-michael.roth@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | SEV-SNP: Add support for SNP certificate fetching | expand |
Michael Roth <michael.roth@amd.com> writes: > The GHCB specification[1] defines a VMGEXIT-based Guest Request > hypercall to allow an SNP guest to issue encrypted requests directly to > SNP firmware to do things like query the attestation report for the > guest. These are generally handled purely in the kernel. > > In some some cases, it's useful for the host to be able to additionally > supply the certificate chain for the signing key that SNP firmware uses > to sign these attestation reports. To allow for this, the GHCB > specification defines an Extended Guest Request where this certificate > data can be provided in a special format described in the GHCB spec. > This certificate data may be global or guest-specific depending on how > the guest was configured. Rather than providing interfaces to manage > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > to request the certificate contents from userspace. Implement support > for that here. > > To synchronize delivery of the certificates to the guest in a way where > they will not be rendered invalid by updates to SNP firmware or > attestation singing/endorsement keys by management tools outside the > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > obtain a shared/read lock on the certificate file prior to delivering > them back to KVM. Only after this will the attestation report be > retrieved from firmware and bundled with the certificate data, so QEMU > must continue to hold the file lock until KVM confirms that the > attestation report has been retrieved/bundled. This confirmation is done > by way of the kvm_immediate_exit callback infrastructure that was > introduced in a previous patch. The "management tools outside the purview of QEMU" will all obtain the same kind of file lock? > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > https://www.amd.com/en/developer/sev.html > > Signed-off-by: Michael Roth <michael.roth@amd.com> > --- > qapi/qom.json | 23 +++- > target/i386/kvm/kvm.c | 10 ++ > target/i386/sev-sysemu-stub.c | 5 + > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > target/i386/sev.h | 2 + > 5 files changed, 288 insertions(+), 1 deletion(-) > > diff --git a/qapi/qom.json b/qapi/qom.json > index 28ce24cd8d..6eaf0e7721 100644 > --- a/qapi/qom.json > +++ b/qapi/qom.json > @@ -1034,6 +1034,25 @@ > # firmware. Set this to true to disable the use of VCEK. > # (default: false) (since: 9.1) > # > +# @certs-path: Path to certificate data that can be passed to guests via > +# SNP Extended Guest Requests. File should be in the format > +# described in the GHCB specification. (default: none) > +# (since: 10.0) I prefer "filename" to "path". We have many kinds of paths: pathnames (denoting files), QOM paths (denoting objects), qdev paths, search paths, ... With "filename", your readers immediately know what you're talking about. SevGuestProperties has a member 'dh-cert-file'. Whether that's related to your file I can't tell from its documentation. > +# > +# @certs-timeout: Max time in milliseconds to wait to obtain a read lock Please don't abbreviate "Maximum" here. Confident millisecond granularity will suffice forever? > +# on the certificate file specified by @certs-path. This > +# is not a cumulative value and only affects how long > +# QEMU waits before returning execution to the vCPU and > +# informing the guest of the timeout, so the guest can > +# still continuing retrying for as long as it likes > +# (which will be about 60 seconds for linux guests at > +# the time of this writing). If the guest-side timeout > +# is insufficient, set this higher to allow more time to > +# fetch the certificate. If the guest-side timeout is > +# sufficient, set this lower to reduce the likelihood of > +# soft lockups in the guest. > +# (default: 100) (since: 10.0) > +# > # Since: 9.1 > ## Please format like # @certs-path: Path to certificate data that can be passed to guests # via SNP Extended Guest Requests. File should be in the format # described in the GHCB specification. # (default: none) (since: 10.0) # # @certs-timeout: Max time in milliseconds to wait to obtain a read # lock on the certificate file specified by @certs-path. This is # not a cumulative value and only affects how long QEMU waits # before returning execution to the vCPU and informing the guest # of the timeout, so the guest can still continuing retrying for # as long as it likes (which will be about 60 seconds for linux # guests at the time of this writing). If the guest-side timeout # is insufficient, set this higher to allow more time to fetch the # certificate. If the guest-side timeout is sufficient, set this # lower to reduce the likelihood of soft lockups in the guest. # (default: 100) (since: 10.0) to blend in with commit a937b6aa739 (qapi: Reformat doc comments to conform to current conventions). > { 'struct': 'SevSnpGuestProperties', > @@ -1045,7 +1064,9 @@ > '*id-auth': 'str', > '*author-key-enabled': 'bool', > '*host-data': 'str', > - '*vcek-disabled': 'bool' } } > + '*vcek-disabled': 'bool', > + '*certs-path': 'str', > + '*certs-timeout': 'uint32' } } > > ## > # @ThreadContextProperties: [...]
On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote: > The GHCB specification[1] defines a VMGEXIT-based Guest Request > hypercall to allow an SNP guest to issue encrypted requests directly to > SNP firmware to do things like query the attestation report for the > guest. These are generally handled purely in the kernel. > > In some some cases, it's useful for the host to be able to additionally > supply the certificate chain for the signing key that SNP firmware uses > to sign these attestation reports. To allow for this, the GHCB > specification defines an Extended Guest Request where this certificate > data can be provided in a special format described in the GHCB spec. > This certificate data may be global or guest-specific depending on how > the guest was configured. Rather than providing interfaces to manage > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > to request the certificate contents from userspace. Implement support > for that here. > > To synchronize delivery of the certificates to the guest in a way where > they will not be rendered invalid by updates to SNP firmware or > attestation singing/endorsement keys by management tools outside the > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > obtain a shared/read lock on the certificate file prior to delivering > them back to KVM. Only after this will the attestation report be > retrieved from firmware and bundled with the certificate data, so QEMU > must continue to hold the file lock until KVM confirms that the > attestation report has been retrieved/bundled. This confirmation is done > by way of the kvm_immediate_exit callback infrastructure that was > introduced in a previous patch. > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > https://www.amd.com/en/developer/sev.html > > Signed-off-by: Michael Roth <michael.roth@amd.com> > --- > qapi/qom.json | 23 +++- > target/i386/kvm/kvm.c | 10 ++ > target/i386/sev-sysemu-stub.c | 5 + > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > target/i386/sev.h | 2 + > 5 files changed, 288 insertions(+), 1 deletion(-) > > diff --git a/qapi/qom.json b/qapi/qom.json > index 28ce24cd8d..6eaf0e7721 100644 > --- a/qapi/qom.json > +++ b/qapi/qom.json > @@ -1034,6 +1034,25 @@ > # firmware. Set this to true to disable the use of VCEK. > # (default: false) (since: 9.1) > # > +# @certs-path: Path to certificate data that can be passed to guests via > +# SNP Extended Guest Requests. File should be in the format > +# described in the GHCB specification. (default: none) > +# (since: 10.0) Can we document the required format here explicitly, rather than expecting users to go searching for specs which are often practically impossible to find, and even harder to read & interpret ? > diff --git a/target/i386/sev.c b/target/i386/sev.c > index 1a4eb1ada6..2c41bdbccf 100644 > --- a/target/i386/sev.c > +++ b/target/i386/sev.c > @@ -157,6 +157,9 @@ struct SevSnpGuestState { > char *id_auth_base64; > uint8_t *id_auth; > char *host_data; > + char *certs_path; > + int certs_fd; > + uint32_t certs_timeout; > > struct kvm_sev_snp_launch_start kvm_start_conf; > struct kvm_sev_snp_launch_finish kvm_finish_conf; > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common) > } > } > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest) > +{ > + int fd, ret; > + > + if (sev_snp_guest->certs_fd != -1) { > + return 0; > + } > + > + fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL); > + if (fd == -1) { > + error_report("Unable to open certificate blob at path %s, ret %d", > + sev_snp_guest->certs_path, fd); > + return fd; > + } > + > + ret = qemu_lock_fd(fd, 0, 0, false); > + if (ret == -EAGAIN || ret == -EACCES) { > + ret = -EAGAIN; > + goto out_close; > + } else if (ret) { > + goto out_close; > + } This locking scheme is likely unsafe. Consider this sequence * QEMU runs qemu_open(path) * External mgmt app runs unlink(path) * External mgmt app runs open(path) * External mgmt app runs lock(fd) * QEMU runs qemu_lock_fd(fd) QEMU has successfully acquired a lock on an FD that corresponds to a deleted file, not the current existing file. Avoiding this problem requires either that the external mgmt app agrees to *NEVER* unlink() the files under any circumstance, or for QEMU to run its open + lock logic in a loop, checking 'stat' and 'fstat' before opening and after locking, in order to detect a replaced file from its changed inode. I'm not inclined to rely on mgmt apps never unlink()ing as that's to easy to mess up IMHO. With regards, Daniel
On Wed, Dec 18, 2024 at 06:32:05PM +0100, Markus Armbruster wrote: > Michael Roth <michael.roth@amd.com> writes: > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request > > hypercall to allow an SNP guest to issue encrypted requests directly to > > SNP firmware to do things like query the attestation report for the > > guest. These are generally handled purely in the kernel. > > > > In some some cases, it's useful for the host to be able to additionally > > supply the certificate chain for the signing key that SNP firmware uses > > to sign these attestation reports. To allow for this, the GHCB > > specification defines an Extended Guest Request where this certificate > > data can be provided in a special format described in the GHCB spec. > > This certificate data may be global or guest-specific depending on how > > the guest was configured. Rather than providing interfaces to manage > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > > to request the certificate contents from userspace. Implement support > > for that here. > > > > To synchronize delivery of the certificates to the guest in a way where > > they will not be rendered invalid by updates to SNP firmware or > > attestation singing/endorsement keys by management tools outside the > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > > obtain a shared/read lock on the certificate file prior to delivering > > them back to KVM. Only after this will the attestation report be > > retrieved from firmware and bundled with the certificate data, so QEMU > > must continue to hold the file lock until KVM confirms that the > > attestation report has been retrieved/bundled. This confirmation is done > > by way of the kvm_immediate_exit callback infrastructure that was > > introduced in a previous patch. > > The "management tools outside the purview of QEMU" will all obtain the > same kind of file lock? Yes, this is meant for a cooperative environment where the cloud provider has opted to provide guest certificates in this manner and needs a way to synchronize guest attestation requests with other parts of their stack handling management duties like updating firmware/endorsement keys/etc. which might occur between the time a certificate is fetched and the time the attestation report is generated by firmware. The idea is to provide a path of least resistance using a common framework like filesystem locks, then heavily suggest that approach on the kernel documentation side, so that tools purposely or naturally opt to use this mechanism rather than every service provider coming up with some separate thing that they'll need to work into some custom QEMU/VMM to solve the same problem. Of course, userspace is free to implement their own completely separate mechanism for handling all this and completely ignore file-locking. But QEMU is only trying to play nice with this above-mentioned reference implementation and cooperative management tools, and not trying to profess to provide any sort of synchronization for cases where those sorts of management-level updates are performed without utilizing this reference implementation for synchronization. I should have touched on this in the schema documentation. I'll make sure to add that in the next spin. > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > > https://www.amd.com/en/developer/sev.html > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > --- > > qapi/qom.json | 23 +++- > > target/i386/kvm/kvm.c | 10 ++ > > target/i386/sev-sysemu-stub.c | 5 + > > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > > target/i386/sev.h | 2 + > > 5 files changed, 288 insertions(+), 1 deletion(-) > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > index 28ce24cd8d..6eaf0e7721 100644 > > --- a/qapi/qom.json > > +++ b/qapi/qom.json > > @@ -1034,6 +1034,25 @@ > > # firmware. Set this to true to disable the use of VCEK. > > # (default: false) (since: 9.1) > > # > > +# @certs-path: Path to certificate data that can be passed to guests via > > +# SNP Extended Guest Requests. File should be in the format > > +# described in the GHCB specification. (default: none) > > +# (since: 10.0) > > I prefer "filename" to "path". We have many kinds of paths: pathnames > (denoting files), QOM paths (denoting objects), qdev paths, search > paths, ... With "filename", your readers immediately know what you're > talking about. > > SevGuestProperties has a member 'dh-cert-file'. Whether that's related > to your file I can't tell from its documentation. > > > +# > > +# @certs-timeout: Max time in milliseconds to wait to obtain a read lock > > Please don't abbreviate "Maximum" here. > > Confident millisecond granularity will suffice forever? I believe so. There will likely be some form of rate-limiting added to KVM along the lines of this patch, with proposals on the order of 2 requests per second for guest requests to avoid DoS'ing each other: https://lore.kernel.org/all/20230119213426.379312-1-dionnaglaze@google.com/ I believe guest kernels already voluntarily limit themselves to something similar before retrying requests. So the only case I can think of where a sub-1ms interval would be desired would be cases where the guest has very aggressive soft-lockup detection settings, in which case we might as well just set the timeout to 0 and not busy-wait at all, which should be allowed for here. (There's more documentation in the related handling in target/i386/sev.c, but in such cases the guest will get an EAGAIN and generally retry the request later). > > > +# on the certificate file specified by @certs-path. This > > +# is not a cumulative value and only affects how long > > +# QEMU waits before returning execution to the vCPU and > > +# informing the guest of the timeout, so the guest can > > +# still continuing retrying for as long as it likes > > +# (which will be about 60 seconds for linux guests at > > +# the time of this writing). If the guest-side timeout > > +# is insufficient, set this higher to allow more time to > > +# fetch the certificate. If the guest-side timeout is > > +# sufficient, set this lower to reduce the likelihood of > > +# soft lockups in the guest. > > +# (default: 100) (since: 10.0) > > +# > > # Since: 9.1 > > ## > > Please format like > > # @certs-path: Path to certificate data that can be passed to guests > # via SNP Extended Guest Requests. File should be in the format > # described in the GHCB specification. > # (default: none) (since: 10.0) > # > # @certs-timeout: Max time in milliseconds to wait to obtain a read > # lock on the certificate file specified by @certs-path. This is > # not a cumulative value and only affects how long QEMU waits > # before returning execution to the vCPU and informing the guest > # of the timeout, so the guest can still continuing retrying for > # as long as it likes (which will be about 60 seconds for linux > # guests at the time of this writing). If the guest-side timeout > # is insufficient, set this higher to allow more time to fetch the > # certificate. If the guest-side timeout is sufficient, set this > # lower to reduce the likelihood of soft lockups in the guest. > # (default: 100) (since: 10.0) > > to blend in with commit a937b6aa739 (qapi: Reformat doc comments to > conform to current conventions). Will do! -Mike > > > { 'struct': 'SevSnpGuestProperties', > > @@ -1045,7 +1064,9 @@ > > '*id-auth': 'str', > > '*author-key-enabled': 'bool', > > '*host-data': 'str', > > - '*vcek-disabled': 'bool' } } > > + '*vcek-disabled': 'bool', > > + '*certs-path': 'str', > > + '*certs-timeout': 'uint32' } } > > > > ## > > # @ThreadContextProperties: > > [...] >
On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote: > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote: > > The GHCB specification[1] defines a VMGEXIT-based Guest Request > > hypercall to allow an SNP guest to issue encrypted requests directly to > > SNP firmware to do things like query the attestation report for the > > guest. These are generally handled purely in the kernel. > > > > In some some cases, it's useful for the host to be able to additionally > > supply the certificate chain for the signing key that SNP firmware uses > > to sign these attestation reports. To allow for this, the GHCB > > specification defines an Extended Guest Request where this certificate > > data can be provided in a special format described in the GHCB spec. > > This certificate data may be global or guest-specific depending on how > > the guest was configured. Rather than providing interfaces to manage > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > > to request the certificate contents from userspace. Implement support > > for that here. > > > > To synchronize delivery of the certificates to the guest in a way where > > they will not be rendered invalid by updates to SNP firmware or > > attestation singing/endorsement keys by management tools outside the > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > > obtain a shared/read lock on the certificate file prior to delivering > > them back to KVM. Only after this will the attestation report be > > retrieved from firmware and bundled with the certificate data, so QEMU > > must continue to hold the file lock until KVM confirms that the > > attestation report has been retrieved/bundled. This confirmation is done > > by way of the kvm_immediate_exit callback infrastructure that was > > introduced in a previous patch. > > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > > https://www.amd.com/en/developer/sev.html > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > --- > > qapi/qom.json | 23 +++- > > target/i386/kvm/kvm.c | 10 ++ > > target/i386/sev-sysemu-stub.c | 5 + > > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > > target/i386/sev.h | 2 + > > 5 files changed, 288 insertions(+), 1 deletion(-) > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > index 28ce24cd8d..6eaf0e7721 100644 > > --- a/qapi/qom.json > > +++ b/qapi/qom.json > > @@ -1034,6 +1034,25 @@ > > # firmware. Set this to true to disable the use of VCEK. > > # (default: false) (since: 9.1) > > # > > +# @certs-path: Path to certificate data that can be passed to guests via > > +# SNP Extended Guest Requests. File should be in the format > > +# described in the GHCB specification. (default: none) > > +# (since: 10.0) > > Can we document the required format here explicitly, rather than expecting > users to go searching for specs which are often practically impossible > to find, and even harder to read & interpret ? It'll be difficult to summarize in a way that will be self-reliant, since knowing the certificate format is not sufficient to make sure it coincides with the endorsement key being used by firmware. So I can't promise to completely reduce reliance on external specs, but at least give a better of the format and where those external specs will come into play in filling out the data. If it needs to be at least somewhat self-sufficient then that might warrant a separate document in docs/system/i386/amd-memory-encryption.rst or somewhere thereabouts that summarizes the whole attestation flow and how certificates tie into that. Any preferences? > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > index 1a4eb1ada6..2c41bdbccf 100644 > > --- a/target/i386/sev.c > > +++ b/target/i386/sev.c > > @@ -157,6 +157,9 @@ struct SevSnpGuestState { > > char *id_auth_base64; > > uint8_t *id_auth; > > char *host_data; > > + char *certs_path; > > + int certs_fd; > > + uint32_t certs_timeout; > > > > struct kvm_sev_snp_launch_start kvm_start_conf; > > struct kvm_sev_snp_launch_finish kvm_finish_conf; > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common) > > } > > } > > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest) > > +{ > > + int fd, ret; > > + > > + if (sev_snp_guest->certs_fd != -1) { > > + return 0; > > + } > > + > > + fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL); > > + if (fd == -1) { > > + error_report("Unable to open certificate blob at path %s, ret %d", > > + sev_snp_guest->certs_path, fd); > > + return fd; > > + } > > + > > + ret = qemu_lock_fd(fd, 0, 0, false); > > + if (ret == -EAGAIN || ret == -EACCES) { > > + ret = -EAGAIN; > > + goto out_close; > > + } else if (ret) { > > + goto out_close; > > + } > > This locking scheme is likely unsafe. Consider this sequence > > * QEMU runs qemu_open(path) > * External mgmt app runs unlink(path) > * External mgmt app runs open(path) > * External mgmt app runs lock(fd) > * QEMU runs qemu_lock_fd(fd) > > QEMU has successfully acquired a lock on an FD that corresponds to a > deleted file, not the current existing file. > > Avoiding this problem requires either that the external mgmt app agrees > to *NEVER* unlink() the files under any circumstance, or for QEMU to > run its open + lock logic in a loop, checking 'stat' and 'fstat' before > opening and after locking, in order to detect a replaced file from its > changed inode. > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to > easy to mess up IMHO. Yah I went into more detail in my response to Markus, but long story short is that we are assuming mgmt is cooperative in this case, and so as you mentioned, it would never unlink files while SNP guests are running, but instead take an exclusive lock on them and update them in place with the understanding that doing anything otherwise would open a race window where guests might get stale certificates. -Mike > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote: > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote: > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote: > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request > > > hypercall to allow an SNP guest to issue encrypted requests directly to > > > SNP firmware to do things like query the attestation report for the > > > guest. These are generally handled purely in the kernel. > > > > > > In some some cases, it's useful for the host to be able to additionally > > > supply the certificate chain for the signing key that SNP firmware uses > > > to sign these attestation reports. To allow for this, the GHCB > > > specification defines an Extended Guest Request where this certificate > > > data can be provided in a special format described in the GHCB spec. > > > This certificate data may be global or guest-specific depending on how > > > the guest was configured. Rather than providing interfaces to manage > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > > > to request the certificate contents from userspace. Implement support > > > for that here. > > > > > > To synchronize delivery of the certificates to the guest in a way where > > > they will not be rendered invalid by updates to SNP firmware or > > > attestation singing/endorsement keys by management tools outside the > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > > > obtain a shared/read lock on the certificate file prior to delivering > > > them back to KVM. Only after this will the attestation report be > > > retrieved from firmware and bundled with the certificate data, so QEMU > > > must continue to hold the file lock until KVM confirms that the > > > attestation report has been retrieved/bundled. This confirmation is done > > > by way of the kvm_immediate_exit callback infrastructure that was > > > introduced in a previous patch. > > > > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > > > https://www.amd.com/en/developer/sev.html > > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > > --- > > > qapi/qom.json | 23 +++- > > > target/i386/kvm/kvm.c | 10 ++ > > > target/i386/sev-sysemu-stub.c | 5 + > > > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > > > target/i386/sev.h | 2 + > > > 5 files changed, 288 insertions(+), 1 deletion(-) > > > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > > index 28ce24cd8d..6eaf0e7721 100644 > > > --- a/qapi/qom.json > > > +++ b/qapi/qom.json > > > @@ -1034,6 +1034,25 @@ > > > # firmware. Set this to true to disable the use of VCEK. > > > # (default: false) (since: 9.1) > > > # > > > +# @certs-path: Path to certificate data that can be passed to guests via > > > +# SNP Extended Guest Requests. File should be in the format > > > +# described in the GHCB specification. (default: none) > > > +# (since: 10.0) > > > > Can we document the required format here explicitly, rather than expecting > > users to go searching for specs which are often practically impossible > > to find, and even harder to read & interpret ? > > It'll be difficult to summarize in a way that will be self-reliant, > since knowing the certificate format is not sufficient to make sure > it coincides with the endorsement key being used by firmware. So I can't > promise to completely reduce reliance on external specs, but at least > give a better of the format and where those external specs will come > into play in filling out the data. > > If it needs to be at least somewhat self-sufficient then that might > warrant a separate document in docs/system/i386/amd-memory-encryption.rst > or somewhere thereabouts that summarizes the whole attestation flow and > how certificates tie into that. > > Any preferences? > > > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > > index 1a4eb1ada6..2c41bdbccf 100644 > > > --- a/target/i386/sev.c > > > +++ b/target/i386/sev.c > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState { > > > char *id_auth_base64; > > > uint8_t *id_auth; > > > char *host_data; > > > + char *certs_path; > > > + int certs_fd; > > > + uint32_t certs_timeout; > > > > > > struct kvm_sev_snp_launch_start kvm_start_conf; > > > struct kvm_sev_snp_launch_finish kvm_finish_conf; > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common) > > > } > > > } > > > > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest) > > > +{ > > > + int fd, ret; > > > + > > > + if (sev_snp_guest->certs_fd != -1) { > > > + return 0; > > > + } > > > + > > > + fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL); > > > + if (fd == -1) { > > > + error_report("Unable to open certificate blob at path %s, ret %d", > > > + sev_snp_guest->certs_path, fd); > > > + return fd; > > > + } > > > + > > > + ret = qemu_lock_fd(fd, 0, 0, false); > > > + if (ret == -EAGAIN || ret == -EACCES) { > > > + ret = -EAGAIN; > > > + goto out_close; > > > + } else if (ret) { > > > + goto out_close; > > > + } > > > > This locking scheme is likely unsafe. Consider this sequence > > > > * QEMU runs qemu_open(path) > > * External mgmt app runs unlink(path) > > * External mgmt app runs open(path) > > * External mgmt app runs lock(fd) > > * QEMU runs qemu_lock_fd(fd) > > > > QEMU has successfully acquired a lock on an FD that corresponds to a > > deleted file, not the current existing file. > > > > Avoiding this problem requires either that the external mgmt app agrees > > to *NEVER* unlink() the files under any circumstance, or for QEMU to > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before > > opening and after locking, in order to detect a replaced file from its > > changed inode. > > > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to > > easy to mess up IMHO. > > Yah I went into more detail in my response to Markus, but long story > short is that we are assuming mgmt is cooperative in this case, and > so as you mentioned, it would never unlink files while SNP guests are > running, but instead take an exclusive lock on them and update them in > place with the understanding that doing anything otherwise would open > a race window where guests might get stale certificates. If there's an expectation & requirement that no SNP guests are running, then IMHO this whole thing is just over-engineered. Just remove all this locking code entirely, and document that none of this must be changed while QEMU is running - which is a common requirement for a great many things on the host. With regards, Daniel
On Thu, Dec 19, 2024 at 08:13:44AM +0000, Daniel P. Berrangé wrote: > On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote: > > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote: > > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote: > > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request > > > > hypercall to allow an SNP guest to issue encrypted requests directly to > > > > SNP firmware to do things like query the attestation report for the > > > > guest. These are generally handled purely in the kernel. > > > > > > > > In some some cases, it's useful for the host to be able to additionally > > > > supply the certificate chain for the signing key that SNP firmware uses > > > > to sign these attestation reports. To allow for this, the GHCB > > > > specification defines an Extended Guest Request where this certificate > > > > data can be provided in a special format described in the GHCB spec. > > > > This certificate data may be global or guest-specific depending on how > > > > the guest was configured. Rather than providing interfaces to manage > > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > > > > to request the certificate contents from userspace. Implement support > > > > for that here. > > > > > > > > To synchronize delivery of the certificates to the guest in a way where > > > > they will not be rendered invalid by updates to SNP firmware or > > > > attestation singing/endorsement keys by management tools outside the > > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > > > > obtain a shared/read lock on the certificate file prior to delivering > > > > them back to KVM. Only after this will the attestation report be > > > > retrieved from firmware and bundled with the certificate data, so QEMU > > > > must continue to hold the file lock until KVM confirms that the > > > > attestation report has been retrieved/bundled. This confirmation is done > > > > by way of the kvm_immediate_exit callback infrastructure that was > > > > introduced in a previous patch. > > > > > > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > > > > https://www.amd.com/en/developer/sev.html > > > > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > > > --- > > > > qapi/qom.json | 23 +++- > > > > target/i386/kvm/kvm.c | 10 ++ > > > > target/i386/sev-sysemu-stub.c | 5 + > > > > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > > > > target/i386/sev.h | 2 + > > > > 5 files changed, 288 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > > > index 28ce24cd8d..6eaf0e7721 100644 > > > > --- a/qapi/qom.json > > > > +++ b/qapi/qom.json > > > > @@ -1034,6 +1034,25 @@ > > > > # firmware. Set this to true to disable the use of VCEK. > > > > # (default: false) (since: 9.1) > > > > # > > > > +# @certs-path: Path to certificate data that can be passed to guests via > > > > +# SNP Extended Guest Requests. File should be in the format > > > > +# described in the GHCB specification. (default: none) > > > > +# (since: 10.0) > > > > > > Can we document the required format here explicitly, rather than expecting > > > users to go searching for specs which are often practically impossible > > > to find, and even harder to read & interpret ? > > > > It'll be difficult to summarize in a way that will be self-reliant, > > since knowing the certificate format is not sufficient to make sure > > it coincides with the endorsement key being used by firmware. So I can't > > promise to completely reduce reliance on external specs, but at least > > give a better of the format and where those external specs will come > > into play in filling out the data. > > > > If it needs to be at least somewhat self-sufficient then that might > > warrant a separate document in docs/system/i386/amd-memory-encryption.rst > > or somewhere thereabouts that summarizes the whole attestation flow and > > how certificates tie into that. > > > > Any preferences? > > > > > > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > > > index 1a4eb1ada6..2c41bdbccf 100644 > > > > --- a/target/i386/sev.c > > > > +++ b/target/i386/sev.c > > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState { > > > > char *id_auth_base64; > > > > uint8_t *id_auth; > > > > char *host_data; > > > > + char *certs_path; > > > > + int certs_fd; > > > > + uint32_t certs_timeout; > > > > > > > > struct kvm_sev_snp_launch_start kvm_start_conf; > > > > struct kvm_sev_snp_launch_finish kvm_finish_conf; > > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common) > > > > } > > > > } > > > > > > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest) > > > > +{ > > > > + int fd, ret; > > > > + > > > > + if (sev_snp_guest->certs_fd != -1) { > > > > + return 0; > > > > + } > > > > + > > > > + fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL); > > > > + if (fd == -1) { > > > > + error_report("Unable to open certificate blob at path %s, ret %d", > > > > + sev_snp_guest->certs_path, fd); > > > > + return fd; > > > > + } > > > > + > > > > + ret = qemu_lock_fd(fd, 0, 0, false); > > > > + if (ret == -EAGAIN || ret == -EACCES) { > > > > + ret = -EAGAIN; > > > > + goto out_close; > > > > + } else if (ret) { > > > > + goto out_close; > > > > + } > > > > > > This locking scheme is likely unsafe. Consider this sequence > > > > > > * QEMU runs qemu_open(path) > > > * External mgmt app runs unlink(path) > > > * External mgmt app runs open(path) > > > * External mgmt app runs lock(fd) > > > * QEMU runs qemu_lock_fd(fd) > > > > > > QEMU has successfully acquired a lock on an FD that corresponds to a > > > deleted file, not the current existing file. > > > > > > Avoiding this problem requires either that the external mgmt app agrees > > > to *NEVER* unlink() the files under any circumstance, or for QEMU to > > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before > > > opening and after locking, in order to detect a replaced file from its > > > changed inode. > > > > > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to > > > easy to mess up IMHO. > > > > Yah I went into more detail in my response to Markus, but long story > > short is that we are assuming mgmt is cooperative in this case, and > > so as you mentioned, it would never unlink files while SNP guests are > > running, but instead take an exclusive lock on them and update them in > > place with the understanding that doing anything otherwise would open > > a race window where guests might get stale certificates. > > If there's an expectation & requirement that no SNP guests are running, > then IMHO this whole thing is just over-engineered. Just remove all this > locking code entirely, and document that none of this must be changed > while QEMU is running - which is a common requirement for a great many > things on the host. VCEK endorsement keys can change as a result of SNP firmware updates, which can be done while SNP guests are running and are often done in such a way to patch bugs/security holes. VLEK endorsement keys can similarly be updated on a live host. Both these sorts of interactions cannot be made compatible with bundling certificates with attestation reports without some orchestration in place to keep them atomic relative to the endorsement key being used by firmware to sign attestation reports. Every CSP implementing this will need to solve it in some way, and I'm sure some will handle all this completely differently. But it will make interoperative management/tooling a mess, and having a reference implementation based around something common will make it easier to steer CSPs to that common solution and give management tools authors *some* reference approach to target rather than expecting to retrofit some custom solution on top. With these patches, you can update firmware and endorsement keys while SNP guests are running, but it requires write locks on any active certificates as defined here and in the kernel, and doing certificate file updates in place while that write lock is still held. I don't really think that's over-engineered. I think it's surprisingly simple given the potential complexity of the above-mentioned requirements. But yes if management tries to unlink certificates while SNP guests are running, all bets are off. But at that point they are not cooperating with kernel/QEMU, and we don't need to care about that. And if they really do need to blow away certificates for a complete re-install or data-wipe or whatever, at that point they'd just need to ensure they stop all their SNP guests first. -Mike > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Thu, Dec 19, 2024 at 07:16:01AM -0600, Michael Roth wrote: > On Thu, Dec 19, 2024 at 08:13:44AM +0000, Daniel P. Berrangé wrote: > > On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote: > > > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote: > > > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote: > > > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request > > > > > hypercall to allow an SNP guest to issue encrypted requests directly to > > > > > SNP firmware to do things like query the attestation report for the > > > > > guest. These are generally handled purely in the kernel. > > > > > > > > > > In some some cases, it's useful for the host to be able to additionally > > > > > supply the certificate chain for the signing key that SNP firmware uses > > > > > to sign these attestation reports. To allow for this, the GHCB > > > > > specification defines an Extended Guest Request where this certificate > > > > > data can be provided in a special format described in the GHCB spec. > > > > > This certificate data may be global or guest-specific depending on how > > > > > the guest was configured. Rather than providing interfaces to manage > > > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > > > > > to request the certificate contents from userspace. Implement support > > > > > for that here. > > > > > > > > > > To synchronize delivery of the certificates to the guest in a way where > > > > > they will not be rendered invalid by updates to SNP firmware or > > > > > attestation singing/endorsement keys by management tools outside the > > > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > > > > > obtain a shared/read lock on the certificate file prior to delivering > > > > > them back to KVM. Only after this will the attestation report be > > > > > retrieved from firmware and bundled with the certificate data, so QEMU > > > > > must continue to hold the file lock until KVM confirms that the > > > > > attestation report has been retrieved/bundled. This confirmation is done > > > > > by way of the kvm_immediate_exit callback infrastructure that was > > > > > introduced in a previous patch. > > > > > > > > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > > > > > https://www.amd.com/en/developer/sev.html > > > > > > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > > > > --- > > > > > qapi/qom.json | 23 +++- > > > > > target/i386/kvm/kvm.c | 10 ++ > > > > > target/i386/sev-sysemu-stub.c | 5 + > > > > > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > > > > > target/i386/sev.h | 2 + > > > > > 5 files changed, 288 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > > > > index 28ce24cd8d..6eaf0e7721 100644 > > > > > --- a/qapi/qom.json > > > > > +++ b/qapi/qom.json > > > > > @@ -1034,6 +1034,25 @@ > > > > > # firmware. Set this to true to disable the use of VCEK. > > > > > # (default: false) (since: 9.1) > > > > > # > > > > > +# @certs-path: Path to certificate data that can be passed to guests via > > > > > +# SNP Extended Guest Requests. File should be in the format > > > > > +# described in the GHCB specification. (default: none) > > > > > +# (since: 10.0) > > > > > > > > Can we document the required format here explicitly, rather than expecting > > > > users to go searching for specs which are often practically impossible > > > > to find, and even harder to read & interpret ? > > > > > > It'll be difficult to summarize in a way that will be self-reliant, > > > since knowing the certificate format is not sufficient to make sure > > > it coincides with the endorsement key being used by firmware. So I can't > > > promise to completely reduce reliance on external specs, but at least > > > give a better of the format and where those external specs will come > > > into play in filling out the data. > > > > > > If it needs to be at least somewhat self-sufficient then that might > > > warrant a separate document in docs/system/i386/amd-memory-encryption.rst > > > or somewhere thereabouts that summarizes the whole attestation flow and > > > how certificates tie into that. > > > > > > Any preferences? > > > > > > > > > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > > > > index 1a4eb1ada6..2c41bdbccf 100644 > > > > > --- a/target/i386/sev.c > > > > > +++ b/target/i386/sev.c > > > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState { > > > > > char *id_auth_base64; > > > > > uint8_t *id_auth; > > > > > char *host_data; > > > > > + char *certs_path; > > > > > + int certs_fd; > > > > > + uint32_t certs_timeout; > > > > > > > > > > struct kvm_sev_snp_launch_start kvm_start_conf; > > > > > struct kvm_sev_snp_launch_finish kvm_finish_conf; > > > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common) > > > > > } > > > > > } > > > > > > > > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest) > > > > > +{ > > > > > + int fd, ret; > > > > > + > > > > > + if (sev_snp_guest->certs_fd != -1) { > > > > > + return 0; > > > > > + } > > > > > + > > > > > + fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL); > > > > > + if (fd == -1) { > > > > > + error_report("Unable to open certificate blob at path %s, ret %d", > > > > > + sev_snp_guest->certs_path, fd); > > > > > + return fd; > > > > > + } > > > > > + > > > > > + ret = qemu_lock_fd(fd, 0, 0, false); > > > > > + if (ret == -EAGAIN || ret == -EACCES) { > > > > > + ret = -EAGAIN; > > > > > + goto out_close; > > > > > + } else if (ret) { > > > > > + goto out_close; > > > > > + } > > > > > > > > This locking scheme is likely unsafe. Consider this sequence > > > > > > > > * QEMU runs qemu_open(path) > > > > * External mgmt app runs unlink(path) > > > > * External mgmt app runs open(path) > > > > * External mgmt app runs lock(fd) > > > > * QEMU runs qemu_lock_fd(fd) > > > > > > > > QEMU has successfully acquired a lock on an FD that corresponds to a > > > > deleted file, not the current existing file. > > > > > > > > Avoiding this problem requires either that the external mgmt app agrees > > > > to *NEVER* unlink() the files under any circumstance, or for QEMU to > > > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before > > > > opening and after locking, in order to detect a replaced file from its > > > > changed inode. > > > > > > > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to > > > > easy to mess up IMHO. > > > > > > Yah I went into more detail in my response to Markus, but long story > > > short is that we are assuming mgmt is cooperative in this case, and > > > so as you mentioned, it would never unlink files while SNP guests are > > > running, but instead take an exclusive lock on them and update them in > > > place with the understanding that doing anything otherwise would open > > > a race window where guests might get stale certificates. > > > > If there's an expectation & requirement that no SNP guests are running, > > then IMHO this whole thing is just over-engineered. Just remove all this > > locking code entirely, and document that none of this must be changed > > while QEMU is running - which is a common requirement for a great many > > things on the host. > > VCEK endorsement keys can change as a result of SNP firmware updates, > which can be done while SNP guests are running and are often done in such > a way to patch bugs/security holes. VLEK endorsement keys can similarly be > updated on a live host. Both these sorts of interactions cannot be made > compatible with bundling certificates with attestation reports without some > orchestration in place to keep them atomic relative to the endorsement > key being used by firmware to sign attestation reports. Every CSP > implementing this will need to solve it in some way, and I'm sure some > will handle all this completely differently. But it will make > interoperative management/tooling a mess, and having a reference > implementation based around something common will make it easier to > steer CSPs to that common solution and give management tools authors > *some* reference approach to target rather than expecting to retrofit > some custom solution on top. > > With these patches, you can update firmware and endorsement keys while > SNP guests are running, but it requires write locks on any active > certificates as defined here and in the kernel, and doing certificate > file updates in place while that write lock is still held. I don't really > think that's over-engineered. I think it's surprisingly simple given the > potential complexity of the above-mentioned requirements. > > But yes if management tries to unlink certificates while SNP guests are > running, all bets are off. But at that point they are not cooperating > with kernel/QEMU, and we don't need to care about that. And if they > really do need to blow away certificates for a complete re-install > or data-wipe or whatever, at that point they'd just need to ensure > they stop all their SNP guests first. IMHO we msut consider unlink() to be a valid thing, because the right way for apps to perform crash safe atomic updates of existing files, is to use rename() from a temporary file, and the rename() in has an implicit unlink as part of its operation. ie apps would be doing: fd = open("foo.tmp") write(fd, ...) fsync(fd) close(fd) rename("foo.tmp", "foo") That final rename operation will have the same effect on the locks as unlink(). To cope with this anything doing locking has to run in a loop comparing the inode either side of acquiring the lock TLDR: if we're going to do locking in QEMU, it needs to be done robustly. With regards, Daniel
On Thu, Dec 19, 2024 at 01:37:18PM +0000, Daniel P. Berrangé wrote: > On Thu, Dec 19, 2024 at 07:16:01AM -0600, Michael Roth wrote: > > On Thu, Dec 19, 2024 at 08:13:44AM +0000, Daniel P. Berrangé wrote: > > > On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote: > > > > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote: > > > > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote: > > > > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request > > > > > > hypercall to allow an SNP guest to issue encrypted requests directly to > > > > > > SNP firmware to do things like query the attestation report for the > > > > > > guest. These are generally handled purely in the kernel. > > > > > > > > > > > > In some some cases, it's useful for the host to be able to additionally > > > > > > supply the certificate chain for the signing key that SNP firmware uses > > > > > > to sign these attestation reports. To allow for this, the GHCB > > > > > > specification defines an Extended Guest Request where this certificate > > > > > > data can be provided in a special format described in the GHCB spec. > > > > > > This certificate data may be global or guest-specific depending on how > > > > > > the guest was configured. Rather than providing interfaces to manage > > > > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit > > > > > > to request the certificate contents from userspace. Implement support > > > > > > for that here. > > > > > > > > > > > > To synchronize delivery of the certificates to the guest in a way where > > > > > > they will not be rendered invalid by updates to SNP firmware or > > > > > > attestation singing/endorsement keys by management tools outside the > > > > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to > > > > > > obtain a shared/read lock on the certificate file prior to delivering > > > > > > them back to KVM. Only after this will the attestation report be > > > > > > retrieved from firmware and bundled with the certificate data, so QEMU > > > > > > must continue to hold the file lock until KVM confirms that the > > > > > > attestation report has been retrieved/bundled. This confirmation is done > > > > > > by way of the kvm_immediate_exit callback infrastructure that was > > > > > > introduced in a previous patch. > > > > > > > > > > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization", > > > > > > https://www.amd.com/en/developer/sev.html > > > > > > > > > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > > > > > --- > > > > > > qapi/qom.json | 23 +++- > > > > > > target/i386/kvm/kvm.c | 10 ++ > > > > > > target/i386/sev-sysemu-stub.c | 5 + > > > > > > target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ > > > > > > target/i386/sev.h | 2 + > > > > > > 5 files changed, 288 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/qapi/qom.json b/qapi/qom.json > > > > > > index 28ce24cd8d..6eaf0e7721 100644 > > > > > > --- a/qapi/qom.json > > > > > > +++ b/qapi/qom.json > > > > > > @@ -1034,6 +1034,25 @@ > > > > > > # firmware. Set this to true to disable the use of VCEK. > > > > > > # (default: false) (since: 9.1) > > > > > > # > > > > > > +# @certs-path: Path to certificate data that can be passed to guests via > > > > > > +# SNP Extended Guest Requests. File should be in the format > > > > > > +# described in the GHCB specification. (default: none) > > > > > > +# (since: 10.0) > > > > > > > > > > Can we document the required format here explicitly, rather than expecting > > > > > users to go searching for specs which are often practically impossible > > > > > to find, and even harder to read & interpret ? > > > > > > > > It'll be difficult to summarize in a way that will be self-reliant, > > > > since knowing the certificate format is not sufficient to make sure > > > > it coincides with the endorsement key being used by firmware. So I can't > > > > promise to completely reduce reliance on external specs, but at least > > > > give a better of the format and where those external specs will come > > > > into play in filling out the data. > > > > > > > > If it needs to be at least somewhat self-sufficient then that might > > > > warrant a separate document in docs/system/i386/amd-memory-encryption.rst > > > > or somewhere thereabouts that summarizes the whole attestation flow and > > > > how certificates tie into that. > > > > > > > > Any preferences? > > > > > > > > > > > > > > > diff --git a/target/i386/sev.c b/target/i386/sev.c > > > > > > index 1a4eb1ada6..2c41bdbccf 100644 > > > > > > --- a/target/i386/sev.c > > > > > > +++ b/target/i386/sev.c > > > > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState { > > > > > > char *id_auth_base64; > > > > > > uint8_t *id_auth; > > > > > > char *host_data; > > > > > > + char *certs_path; > > > > > > + int certs_fd; > > > > > > + uint32_t certs_timeout; > > > > > > > > > > > > struct kvm_sev_snp_launch_start kvm_start_conf; > > > > > > struct kvm_sev_snp_launch_finish kvm_finish_conf; > > > > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common) > > > > > > } > > > > > > } > > > > > > > > > > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest) > > > > > > +{ > > > > > > + int fd, ret; > > > > > > + > > > > > > + if (sev_snp_guest->certs_fd != -1) { > > > > > > + return 0; > > > > > > + } > > > > > > + > > > > > > + fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL); > > > > > > + if (fd == -1) { > > > > > > + error_report("Unable to open certificate blob at path %s, ret %d", > > > > > > + sev_snp_guest->certs_path, fd); > > > > > > + return fd; > > > > > > + } > > > > > > + > > > > > > + ret = qemu_lock_fd(fd, 0, 0, false); > > > > > > + if (ret == -EAGAIN || ret == -EACCES) { > > > > > > + ret = -EAGAIN; > > > > > > + goto out_close; > > > > > > + } else if (ret) { > > > > > > + goto out_close; > > > > > > + } > > > > > > > > > > This locking scheme is likely unsafe. Consider this sequence > > > > > > > > > > * QEMU runs qemu_open(path) > > > > > * External mgmt app runs unlink(path) > > > > > * External mgmt app runs open(path) > > > > > * External mgmt app runs lock(fd) > > > > > * QEMU runs qemu_lock_fd(fd) > > > > > > > > > > QEMU has successfully acquired a lock on an FD that corresponds to a > > > > > deleted file, not the current existing file. > > > > > > > > > > Avoiding this problem requires either that the external mgmt app agrees > > > > > to *NEVER* unlink() the files under any circumstance, or for QEMU to > > > > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before > > > > > opening and after locking, in order to detect a replaced file from its > > > > > changed inode. > > > > > > > > > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to > > > > > easy to mess up IMHO. > > > > > > > > Yah I went into more detail in my response to Markus, but long story > > > > short is that we are assuming mgmt is cooperative in this case, and > > > > so as you mentioned, it would never unlink files while SNP guests are > > > > running, but instead take an exclusive lock on them and update them in > > > > place with the understanding that doing anything otherwise would open > > > > a race window where guests might get stale certificates. > > > > > > If there's an expectation & requirement that no SNP guests are running, > > > then IMHO this whole thing is just over-engineered. Just remove all this > > > locking code entirely, and document that none of this must be changed > > > while QEMU is running - which is a common requirement for a great many > > > things on the host. > > > > VCEK endorsement keys can change as a result of SNP firmware updates, > > which can be done while SNP guests are running and are often done in such > > a way to patch bugs/security holes. VLEK endorsement keys can similarly be > > updated on a live host. Both these sorts of interactions cannot be made > > compatible with bundling certificates with attestation reports without some > > orchestration in place to keep them atomic relative to the endorsement > > key being used by firmware to sign attestation reports. Every CSP > > implementing this will need to solve it in some way, and I'm sure some > > will handle all this completely differently. But it will make > > interoperative management/tooling a mess, and having a reference > > implementation based around something common will make it easier to > > steer CSPs to that common solution and give management tools authors > > *some* reference approach to target rather than expecting to retrofit > > some custom solution on top. > > > > With these patches, you can update firmware and endorsement keys while > > SNP guests are running, but it requires write locks on any active > > certificates as defined here and in the kernel, and doing certificate > > file updates in place while that write lock is still held. I don't really > > think that's over-engineered. I think it's surprisingly simple given the > > potential complexity of the above-mentioned requirements. > > > > But yes if management tries to unlink certificates while SNP guests are > > running, all bets are off. But at that point they are not cooperating > > with kernel/QEMU, and we don't need to care about that. And if they > > really do need to blow away certificates for a complete re-install > > or data-wipe or whatever, at that point they'd just need to ensure > > they stop all their SNP guests first. > > IMHO we msut consider unlink() to be a valid thing, because the right > way for apps to perform crash safe atomic updates of existing files, > is to use rename() from a temporary file, and the rename() in has an > implicit unlink as part of its operation. ie apps would be doing: > > fd = open("foo.tmp") > write(fd, ...) > fsync(fd) > close(fd) > rename("foo.tmp", "foo") If we still want to allow for this rather than enforcing in-place update, one alternative would be to just allow a separate lock file to be specified rather than locking the certificate file itself. That would provide a bit more flexibility. I can update the QEMU implementation to take -certs-lock-file in addition to -certs-file so they can be specified separately. And if -certs-lock-file is not specified then QEMU will just assume management handles things different or has agreed to not do endorsement key updates while SNP guests are running. I think we'd considered something like that originally but the thinking was that locking the certs themselves was more organic in terms of an "obvious"/natural solution. But it does end up being a bit more inflexible WRT how libraries/etc. might manage file updates underneath the covers, so maybe a lock file is the better approach after all. -Mike > > That final rename operation will have the same effect on the locks > as unlink(). To cope with this anything doing locking has to run > in a loop comparing the inode either side of acquiring the lock > > TLDR: if we're going to do locking in QEMU, it needs to be done > robustly. > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Thu, Dec 19, 2024 at 11:49:49AM -0600, Michael Roth wrote: > On Thu, Dec 19, 2024 at 01:37:18PM +0000, Daniel P. Berrangé wrote: > > IMHO we msut consider unlink() to be a valid thing, because the right > > way for apps to perform crash safe atomic updates of existing files, > > is to use rename() from a temporary file, and the rename() in has an > > implicit unlink as part of its operation. ie apps would be doing: > > > > fd = open("foo.tmp") > > write(fd, ...) > > fsync(fd) > > close(fd) > > rename("foo.tmp", "foo") > > If we still want to allow for this rather than enforcing in-place > update, one alternative would be to just allow a separate lock file > to be specified rather than locking the certificate file itself. That > would provide a bit more flexibility. > > I can update the QEMU implementation to take -certs-lock-file in > addition to -certs-file so they can be specified separately. And if > -certs-lock-file is not specified then QEMU will just assume > management handles things different or has agreed to not do endorsement > key updates while SNP guests are running. > > I think we'd considered something like that originally but the thinking > was that locking the certs themselves was more organic in terms of an > "obvious"/natural solution. But it does end up being a bit more > inflexible WRT how libraries/etc. might manage file updates underneath > the covers, so maybe a lock file is the better approach after all. If we want locking, I think locking the certs file directly is a nicer idea, as it avoids everyone having to agree on the location of the lock file, relative to the certs file. The current locking code just needs to go inside a while(1) loop and have fstat + stat added to detect the recreation race. For example: https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virpidfile.c#L376 With regards, Daniel
diff --git a/qapi/qom.json b/qapi/qom.json index 28ce24cd8d..6eaf0e7721 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -1034,6 +1034,25 @@ # firmware. Set this to true to disable the use of VCEK. # (default: false) (since: 9.1) # +# @certs-path: Path to certificate data that can be passed to guests via +# SNP Extended Guest Requests. File should be in the format +# described in the GHCB specification. (default: none) +# (since: 10.0) +# +# @certs-timeout: Max time in milliseconds to wait to obtain a read lock +# on the certificate file specified by @certs-path. This +# is not a cumulative value and only affects how long +# QEMU waits before returning execution to the vCPU and +# informing the guest of the timeout, so the guest can +# still continuing retrying for as long as it likes +# (which will be about 60 seconds for linux guests at +# the time of this writing). If the guest-side timeout +# is insufficient, set this higher to allow more time to +# fetch the certificate. If the guest-side timeout is +# sufficient, set this lower to reduce the likelihood of +# soft lockups in the guest. +# (default: 100) (since: 10.0) +# # Since: 9.1 ## { 'struct': 'SevSnpGuestProperties', @@ -1045,7 +1064,9 @@ '*id-auth': 'str', '*author-key-enabled': 'bool', '*host-data': 'str', - '*vcek-disabled': 'bool' } } + '*vcek-disabled': 'bool', + '*certs-path': 'str', + '*certs-timeout': 'uint32' } } ## # @ThreadContextProperties: diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 8e17942c3b..cffc7ff33e 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -6107,6 +6107,16 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) #endif case KVM_EXIT_HYPERCALL: ret = kvm_handle_hypercall(run); + break; + case KVM_EXIT_SNP_REQ_CERTS: + if (!sev_snp_enabled()) { + error_report("KVM: Encountered a certificate request exit for a " + "non-SEV-SNP guest."); + ret = -1; + } else { + ret = kvm_handle_snp_req_certs(cs, run); + } + break; default: fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); diff --git a/target/i386/sev-sysemu-stub.c b/target/i386/sev-sysemu-stub.c index d5bf886e79..685c56f62c 100644 --- a/target/i386/sev-sysemu-stub.c +++ b/target/i386/sev-sysemu-stub.c @@ -71,3 +71,8 @@ void hmp_info_sev(Monitor *mon, const QDict *qdict) void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t flash_size) { } + +int kvm_handle_snp_req_certs(CPUState *cpu, struct kvm_run *run) +{ + g_assert_not_reached(); +} diff --git a/target/i386/sev.c b/target/i386/sev.c index 1a4eb1ada6..2c41bdbccf 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -157,6 +157,9 @@ struct SevSnpGuestState { char *id_auth_base64; uint8_t *id_auth; char *host_data; + char *certs_path; + int certs_fd; + uint32_t certs_timeout; struct kvm_sev_snp_launch_start kvm_start_conf; struct kvm_sev_snp_launch_finish kvm_finish_conf; @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState *sev_common) } } +static int open_certs_locked(SevSnpGuestState *sev_snp_guest) +{ + int fd, ret; + + if (sev_snp_guest->certs_fd != -1) { + return 0; + } + + fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL); + if (fd == -1) { + error_report("Unable to open certificate blob at path %s, ret %d", + sev_snp_guest->certs_path, fd); + return fd; + } + + ret = qemu_lock_fd(fd, 0, 0, false); + if (ret == -EAGAIN || ret == -EACCES) { + ret = -EAGAIN; + goto out_close; + } else if (ret) { + goto out_close; + } + + sev_snp_guest->certs_fd = fd; + return 0; +out_close: + close(fd); + return ret; +} + +static void close_certs(SevSnpGuestState *sev_snp_guest) +{ + if (sev_snp_guest->certs_fd == -1) { + return; + } + + qemu_unlock_fd(sev_snp_guest->certs_fd, 0, 0); + close(sev_snp_guest->certs_fd); + sev_snp_guest->certs_fd = -1; +} + +static ssize_t get_certs_size(SevSnpGuestState *sev_snp_guest) +{ + ssize_t size; + + size = lseek(sev_snp_guest->certs_fd, 0, SEEK_END); + + if (size < 0) + return -errno; + + return size; +} + +static int read_certs(SevSnpGuestState *sev_snp_guest, void *buf, size_t buf_len) +{ + ssize_t n, len = 0; + + n = lseek(sev_snp_guest->certs_fd, 0, SEEK_SET); + if (n) { + return n; + } + + while ((n = read(sev_snp_guest->certs_fd, buf, buf_len)) != 0) { + if (n < 0) { + if (errno == EINTR) { + continue; + } else { + len = -errno; + break; + } + } + len += n; + } + + return len; +} + +static void snp_release_certs_lock(void *opaque) +{ + SevSnpGuestState *sev_snp_guest = opaque; + + close_certs(sev_snp_guest); +} + +static void certs_timeout(void *opaque) +{ + bool *timed_out = opaque; + + *timed_out = true; +} + +int kvm_handle_snp_req_certs(CPUState *cpu, struct kvm_run *run) +{ + struct kvm_exit_snp_req_certs *req_certs = &run->snp_req_certs; + g_autofree gchar *contents = NULL; + SevSnpGuestState *sev_snp_guest; + MemTxAttrs attrs = { 0 }; + bool timed_out = false; + QEMUTimer *timer; + uint32_t npages; + void *guest_buf; + hwaddr buf_sz; + hwaddr gpa; + int ret; + + req_certs->ret = EIO; + + if (!sev_snp_enabled()) { + return -EIO; + } + + gpa = req_certs->gfn << TARGET_PAGE_BITS; + npages = req_certs->npages; + + sev_snp_guest = SEV_SNP_GUEST(MACHINE(qdev_get_machine())->cgs); + if (!sev_snp_guest->certs_path) { + req_certs->ret = 0; + return 0; + } + + /* + * -EAGAIN from open_certs_locked() indicates that a lock could not be + * obtained on the certificate file. It would be possible to set + * req_certs->ret = EAGAIN to immediately inform the requesting vCPU of + * this condition so that it can retry again later, however, linux guests + * are currently hard-limited to a 60 second timeout, at which point they + * will assume misbehavior and refuse to continue issuing attestation + * requests from that point forward. + * + * Allowing for this guest timeout to be configured in some way that will + * be easier to coordinate with on the QEMU side will make that approach + * more viable, but until then just busy-wait on the QEMU side, which + * allows more flexibility in how long QEMU can wait for things like SNP + * firmware updates/etc. which may be holding an exclusive lock on the + * certificate. + * + * The down-side to doing things this way is that too long of a busy-wait + * will result in soft-lockups in the guest, but the guest will otherwise + * continue normally afterward. + * + * If the QEMU-side timeout is reached, then just go ahead and indicate + * EAGAIN to the guest, at which point the above-mentioned guest-side + * timeout will trigger if QEMU's timeout exceeds that of the guest. But + * that's better than killing the vCPU, which is the only viable + * alternative at that point. + */ + timer = timer_new_ms(QEMU_CLOCK_REALTIME, certs_timeout, &timed_out); + timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + + sev_snp_guest->certs_timeout); + + do { + ret = open_certs_locked(sev_snp_guest); + g_usleep(1000); + } while (ret == -EAGAIN && !timed_out); + + timer_del(timer); + + if (ret) { + if (timed_out) { + req_certs->ret = EAGAIN; + return 0; + } + + /* + * It's a bit ambiguous when to propagate a generic error to the guest + * rather than simply letting QEMU crash. The general methodology here + * is to let QEMU crash if there is a misconfiguration issue that the + * guest has absolutely no potential role in, but to otherwise notify + * the guest if there is some remote possibility that the issue is + * related to something on the guest side. + */ + return ret; + } + + buf_sz = npages * TARGET_PAGE_SIZE; + if (buf_sz < get_certs_size(sev_snp_guest)) { + req_certs->ret = ENOSPC; + req_certs->npages = + (get_certs_size(sev_snp_guest) + TARGET_PAGE_SIZE) / TARGET_PAGE_SIZE; + close_certs(sev_snp_guest); + goto out; + } + + guest_buf = address_space_map(&address_space_memory, gpa, &buf_sz, true, attrs); + if (buf_sz < npages * TARGET_PAGE_SIZE) { + error_report_once("Unable to map entire guest buffer, mapped size %ld (expected %ld)", + buf_sz, get_certs_size(sev_snp_guest)); + close_certs(sev_snp_guest); + goto out_unmap; + } + + ret = read_certs(sev_snp_guest, guest_buf, buf_sz); + if (ret < 0) { + error_report_once("Unable to read certificate data into guest buffer, ret %d", + ret); + close_certs(sev_snp_guest); + goto out_unmap; + + } + + req_certs->ret = 0; + + add_immediate_exit_callback(cpu, snp_release_certs_lock, sev_snp_guest); +out_unmap: + address_space_unmap(&address_space_memory, guest_buf, buf_sz, true, buf_sz); + +out: + return ret; +} static void sev_vm_state_change(void *opaque, bool running, RunState state) @@ -1594,6 +1806,7 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) { + SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(cgs); MachineState *ms = MACHINE(qdev_get_machine()); X86MachineState *x86ms = X86_MACHINE(ms); @@ -1604,6 +1817,13 @@ static int sev_snp_kvm_init(ConfidentialGuestSupport *cgs, Error **errp) return -1; } + if (sev_snp_guest->certs_path && + kvm_vm_enable_cap(kvm_state, KVM_CAP_EXIT_SNP_REQ_CERTS, 0, 1)) { + error_setg(errp, "Failed to enable support for SEV-SNP " + "certificate-fetching requests."); + return -1; + } + return 0; } @@ -2393,6 +2613,26 @@ sev_snp_guest_set_host_data(Object *obj, const char *value, Error **errp) memcpy(finish->host_data, blob, len); } +static char * +sev_snp_guest_get_certs_path(Object *obj, Error **errp) +{ + SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj); + + return g_strdup(sev_snp_guest->certs_path); +} + +static void +sev_snp_guest_set_certs_path(Object *obj, const char *value, Error **errp) +{ + SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj); + + if (sev_snp_guest->certs_path) { + g_free(sev_snp_guest->certs_path); + } + + sev_snp_guest->certs_path = value ? g_strdup(value) : NULL; +} + static void sev_snp_guest_class_init(ObjectClass *oc, void *data) { @@ -2428,6 +2668,9 @@ sev_snp_guest_class_init(ObjectClass *oc, void *data) object_class_property_add_str(oc, "host-data", sev_snp_guest_get_host_data, sev_snp_guest_set_host_data); + object_class_property_add_str(oc, "certs-path", + sev_snp_guest_get_certs_path, + sev_snp_guest_set_certs_path); } static void @@ -2440,6 +2683,12 @@ sev_snp_guest_instance_init(Object *obj) /* default init/start/finish params for kvm */ sev_snp_guest->kvm_start_conf.policy = DEFAULT_SEV_SNP_POLICY; + + sev_snp_guest->certs_fd = -1; + sev_snp_guest->certs_timeout = 100; + object_property_add_uint32_ptr(obj, "certs-timeout", + &sev_snp_guest->certs_timeout, + OBJ_PROP_FLAG_READWRITE); } /* guest info specific to sev-snp */ diff --git a/target/i386/sev.h b/target/i386/sev.h index 858005a119..91b9c38ac1 100644 --- a/target/i386/sev.h +++ b/target/i386/sev.h @@ -68,4 +68,6 @@ void sev_es_set_reset_vector(CPUState *cpu); void pc_system_parse_sev_metadata(uint8_t *flash_ptr, size_t flash_size); +int kvm_handle_snp_req_certs(CPUState *cpu, struct kvm_run *run); + #endif
The GHCB specification[1] defines a VMGEXIT-based Guest Request hypercall to allow an SNP guest to issue encrypted requests directly to SNP firmware to do things like query the attestation report for the guest. These are generally handled purely in the kernel. In some some cases, it's useful for the host to be able to additionally supply the certificate chain for the signing key that SNP firmware uses to sign these attestation reports. To allow for this, the GHCB specification defines an Extended Guest Request where this certificate data can be provided in a special format described in the GHCB spec. This certificate data may be global or guest-specific depending on how the guest was configured. Rather than providing interfaces to manage these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit to request the certificate contents from userspace. Implement support for that here. To synchronize delivery of the certificates to the guest in a way where they will not be rendered invalid by updates to SNP firmware or attestation singing/endorsement keys by management tools outside the purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to obtain a shared/read lock on the certificate file prior to delivering them back to KVM. Only after this will the attestation report be retrieved from firmware and bundled with the certificate data, so QEMU must continue to hold the file lock until KVM confirms that the attestation report has been retrieved/bundled. This confirmation is done by way of the kvm_immediate_exit callback infrastructure that was introduced in a previous patch. [1] "Guest Hypervisor Communication Block (GHCB) Standardization", https://www.amd.com/en/developer/sev.html Signed-off-by: Michael Roth <michael.roth@amd.com> --- qapi/qom.json | 23 +++- target/i386/kvm/kvm.c | 10 ++ target/i386/sev-sysemu-stub.c | 5 + target/i386/sev.c | 249 ++++++++++++++++++++++++++++++++++ target/i386/sev.h | 2 + 5 files changed, 288 insertions(+), 1 deletion(-)