Message ID | f8811b3768c4306af7fb2732b6b3755489832c55.1621020158.git.thomas.lendacky@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure | expand |
On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: > > Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT > exit code and parameters fail. Since the VMGEXIT instruction can be issued > from userspace, even though userspace (likely) can't update the GHCB, > don't allow userspace to be able to kill the guest. > > Return a #GP request through the GHCB when validation fails, rather than > terminating the guest. Is this a gap in the spec? I don't see anything that details what should happen if the correct fields for NAE are not set in the first couple paragraphs of section 4 'GHCB Protocol'.
On 5/14/21 6:06 PM, Peter Gonda wrote: > On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: >> >> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT >> exit code and parameters fail. Since the VMGEXIT instruction can be issued >> from userspace, even though userspace (likely) can't update the GHCB, >> don't allow userspace to be able to kill the guest. >> >> Return a #GP request through the GHCB when validation fails, rather than >> terminating the guest. > > Is this a gap in the spec? I don't see anything that details what > should happen if the correct fields for NAE are not set in the first > couple paragraphs of section 4 'GHCB Protocol'. No, I don't think the spec needs to spell out everything like this. The hypervisor is free to determine its course of action in this case. I suppose the spec could suggest a course of action, but I don't think the spec should require a specific course of action. Thanks, Tom >
On Mon, May 17, 2021, Tom Lendacky wrote: > On 5/14/21 6:06 PM, Peter Gonda wrote: > > On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: > >> > >> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT > >> exit code and parameters fail. Since the VMGEXIT instruction can be issued > >> from userspace, even though userspace (likely) can't update the GHCB, > >> don't allow userspace to be able to kill the guest. > >> > >> Return a #GP request through the GHCB when validation fails, rather than > >> terminating the guest. > > > > Is this a gap in the spec? I don't see anything that details what > > should happen if the correct fields for NAE are not set in the first > > couple paragraphs of section 4 'GHCB Protocol'. > > No, I don't think the spec needs to spell out everything like this. The > hypervisor is free to determine its course of action in this case. The hypervisor can decide whether to inject/return an error or kill the guest, but what errors can be returned and how they're returned absolutely needs to be ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec is the logical place to define said ABI. For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is completely nonsensical. As is, a Linux guest appears to blindly forward the #GP, which means if something does go awry KVM has just made debugging the guest that much harder, e.g. imagine the confusion that will ensue if the end result is a SIGBUS to userspace on CPUID. There needs to be an explicit error code for "you gave me bad data", otherwise we're signing ourselves up for future pain. > I suppose the spec could suggest a course of action, but I don't think the > spec should require a specific course of action. > > Thanks, > Tom > > >
On Thu, May 20, 2021, Sean Christopherson wrote: > On Mon, May 17, 2021, Tom Lendacky wrote: > > On 5/14/21 6:06 PM, Peter Gonda wrote: > > > On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: > > >> > > >> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT > > >> exit code and parameters fail. Since the VMGEXIT instruction can be issued > > >> from userspace, even though userspace (likely) can't update the GHCB, > > >> don't allow userspace to be able to kill the guest. > > >> > > >> Return a #GP request through the GHCB when validation fails, rather than > > >> terminating the guest. > > > > > > Is this a gap in the spec? I don't see anything that details what > > > should happen if the correct fields for NAE are not set in the first > > > couple paragraphs of section 4 'GHCB Protocol'. > > > > No, I don't think the spec needs to spell out everything like this. The > > hypervisor is free to determine its course of action in this case. > > The hypervisor can decide whether to inject/return an error or kill the guest, > but what errors can be returned and how they're returned absolutely needs to be > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec > is the logical place to define said ABI. > > For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is > completely nonsensical. As is, a Linux guest appears to blindly forward the #GP, > which means if something does go awry KVM has just made debugging the guest that > much harder, e.g. imagine the confusion that will ensue if the end result is a > SIGBUS to userspace on CPUID. > > There needs to be an explicit error code for "you gave me bad data", otherwise > we're signing ourselves up for future pain. More concretely, I think the best course of action is to define a new return code in SW_EXITINFO1[31:0], e.g. '2', with additional information in SW_EXITINFO2. In theory, an old-but-sane guest will interpret the unexpected return code as fatal to whatever triggered the #VMGEXIT, e.g. SIGBUS to userspace. Unfortunately Linux isn't sane because sev_es_ghcb_hv_call() assumes any non-'1' result means success, but that's trivial to fix and IMO should be fixed irrespective of where this goes.
On Thu, May 20, 2021, Sean Christopherson wrote: > On Thu, May 20, 2021, Sean Christopherson wrote: > > On Mon, May 17, 2021, Tom Lendacky wrote: > > > On 5/14/21 6:06 PM, Peter Gonda wrote: > > > > On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: > > > >> > > > >> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT > > > >> exit code and parameters fail. Since the VMGEXIT instruction can be issued > > > >> from userspace, even though userspace (likely) can't update the GHCB, > > > >> don't allow userspace to be able to kill the guest. > > > >> > > > >> Return a #GP request through the GHCB when validation fails, rather than > > > >> terminating the guest. > > > > > > > > Is this a gap in the spec? I don't see anything that details what > > > > should happen if the correct fields for NAE are not set in the first > > > > couple paragraphs of section 4 'GHCB Protocol'. > > > > > > No, I don't think the spec needs to spell out everything like this. The > > > hypervisor is free to determine its course of action in this case. > > > > The hypervisor can decide whether to inject/return an error or kill the guest, > > but what errors can be returned and how they're returned absolutely needs to be > > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec > > is the logical place to define said ABI. > > > > For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is > > completely nonsensical. As is, a Linux guest appears to blindly forward the #GP, > > which means if something does go awry KVM has just made debugging the guest that > > much harder, e.g. imagine the confusion that will ensue if the end result is a > > SIGBUS to userspace on CPUID. > > > > There needs to be an explicit error code for "you gave me bad data", otherwise > > we're signing ourselves up for future pain. > > More concretely, I think the best course of action is to define a new return code > in SW_EXITINFO1[31:0], e.g. '2', with additional information in SW_EXITINFO2. > > In theory, an old-but-sane guest will interpret the unexpected return code as > fatal to whatever triggered the #VMGEXIT, e.g. SIGBUS to userspace. Unfortunately > Linux isn't sane because sev_es_ghcb_hv_call() assumes any non-'1' result means > success, but that's trivial to fix and IMO should be fixed irrespective of where > this goes. One last thing (hopefully): Erdem pointed out that if the GCHB GPA (or any derferenced pointers within the GHCB) is invalid or is set to a private GPA (mostly in the context of SNP) then the VMM will likely have no choice but to kill the guest in response to #VMGEXIT. It's probably a good idea to add a blurb in one of the specs explicitly calling out that #VMGEXIT can be executed from userspace, and that before returning to uesrspace the guest kernel must always ensure that the GCHB points at a legal GPA _and_ all primary fields are marked invalid.
On 5/20/21 2:16 PM, Sean Christopherson wrote: > On Mon, May 17, 2021, Tom Lendacky wrote: >> On 5/14/21 6:06 PM, Peter Gonda wrote: >>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: >>>> >>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT >>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued >>>> from userspace, even though userspace (likely) can't update the GHCB, >>>> don't allow userspace to be able to kill the guest. >>>> >>>> Return a #GP request through the GHCB when validation fails, rather than >>>> terminating the guest. >>> >>> Is this a gap in the spec? I don't see anything that details what >>> should happen if the correct fields for NAE are not set in the first >>> couple paragraphs of section 4 'GHCB Protocol'. >> >> No, I don't think the spec needs to spell out everything like this. The >> hypervisor is free to determine its course of action in this case. > > The hypervisor can decide whether to inject/return an error or kill the guest, > but what errors can be returned and how they're returned absolutely needs to be > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec > is the logical place to define said ABI. For now, that is all we have for versions 1 and 2 of the spec. We can certainly extend it in future versions if that is desired. I would suggest starting a thread on what we would like to see in the next version of the GHCB spec on the amd-sev-snp mailing list: amd-sev-snp@lists.suse.com > > For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is > completely nonsensical. As is, a Linux guest appears to blindly forward the #GP, > which means if something does go awry KVM has just made debugging the guest that > much harder, e.g. imagine the confusion that will ensue if the end result is a > SIGBUS to userspace on CPUID. I see the point you're making, but I would also say that we probably wouldn't even boot successfully if the kernel can't handle, e.g., a CPUID #VC properly. A lot of what could go wrong with required inputs, not the values, but the required state being communicated, should have already been ironed out during development of whichever OS is providing the SEV-ES support. > > There needs to be an explicit error code for "you gave me bad data", otherwise > we're signing ourselves up for future pain. I'll make note of that for the next update to the spec and we can work on it further during the spec review. Thanks, Tom > >> I suppose the spec could suggest a course of action, but I don't think the >> spec should require a specific course of action. >> >> Thanks, >> Tom >> >>>
On 5/20/21 3:22 PM, Sean Christopherson wrote: > On Thu, May 20, 2021, Sean Christopherson wrote: >> On Thu, May 20, 2021, Sean Christopherson wrote: >>> On Mon, May 17, 2021, Tom Lendacky wrote: >>>> On 5/14/21 6:06 PM, Peter Gonda wrote: >>>>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: >>>>>> >>>>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT >>>>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued >>>>>> from userspace, even though userspace (likely) can't update the GHCB, >>>>>> don't allow userspace to be able to kill the guest. >>>>>> >>>>>> Return a #GP request through the GHCB when validation fails, rather than >>>>>> terminating the guest. >>>>> >>>>> Is this a gap in the spec? I don't see anything that details what >>>>> should happen if the correct fields for NAE are not set in the first >>>>> couple paragraphs of section 4 'GHCB Protocol'. >>>> >>>> No, I don't think the spec needs to spell out everything like this. The >>>> hypervisor is free to determine its course of action in this case. >>> >>> The hypervisor can decide whether to inject/return an error or kill the guest, >>> but what errors can be returned and how they're returned absolutely needs to be >>> ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec >>> is the logical place to define said ABI. >>> >>> For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is >>> completely nonsensical. As is, a Linux guest appears to blindly forward the #GP, >>> which means if something does go awry KVM has just made debugging the guest that >>> much harder, e.g. imagine the confusion that will ensue if the end result is a >>> SIGBUS to userspace on CPUID. >>> >>> There needs to be an explicit error code for "you gave me bad data", otherwise >>> we're signing ourselves up for future pain. >> >> More concretely, I think the best course of action is to define a new return code >> in SW_EXITINFO1[31:0], e.g. '2', with additional information in SW_EXITINFO2. >> >> In theory, an old-but-sane guest will interpret the unexpected return code as >> fatal to whatever triggered the #VMGEXIT, e.g. SIGBUS to userspace. Unfortunately >> Linux isn't sane because sev_es_ghcb_hv_call() assumes any non-'1' result means >> success, but that's trivial to fix and IMO should be fixed irrespective of where >> this goes. > > One last thing (hopefully): Erdem pointed out that if the GCHB GPA (or any > derferenced pointers within the GHCB) is invalid or is set to a private GPA > (mostly in the context of SNP) then the VMM will likely have no choice but to > kill the guest in response to #VMGEXIT. > > It's probably a good idea to add a blurb in one of the specs explicitly calling > out that #VMGEXIT can be executed from userspace, and that before returning to > uesrspace the guest kernel must always ensure that the GCHB points at a legal > GPA _and_ all primary fields are marked invalid. Yes, the spec can be updated to include a "best practices" section for OSes and Hypervisors to follow without actually having to update the version of the GHCB spec, so that should be doable. Thanks, Tom >
On Thu, May 20, 2021, Tom Lendacky wrote: > On 5/20/21 2:16 PM, Sean Christopherson wrote: > > On Mon, May 17, 2021, Tom Lendacky wrote: > >> On 5/14/21 6:06 PM, Peter Gonda wrote: > >>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: > >>>> > >>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT > >>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued > >>>> from userspace, even though userspace (likely) can't update the GHCB, > >>>> don't allow userspace to be able to kill the guest. > >>>> > >>>> Return a #GP request through the GHCB when validation fails, rather than > >>>> terminating the guest. > >>> > >>> Is this a gap in the spec? I don't see anything that details what > >>> should happen if the correct fields for NAE are not set in the first > >>> couple paragraphs of section 4 'GHCB Protocol'. > >> > >> No, I don't think the spec needs to spell out everything like this. The > >> hypervisor is free to determine its course of action in this case. > > > > The hypervisor can decide whether to inject/return an error or kill the guest, > > but what errors can be returned and how they're returned absolutely needs to be > > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec > > is the logical place to define said ABI. > > For now, that is all we have for versions 1 and 2 of the spec. We can > certainly extend it in future versions if that is desired. > > I would suggest starting a thread on what we would like to see in the next > version of the GHCB spec on the amd-sev-snp mailing list: > > amd-sev-snp@lists.suse.com Will do, but in the meantime, I don't think we should merge a fix of any kind until there is consensus on what the VMM behavior will be. IMO, fixing this in upstream is not urgent; I highly doubt anyone is deploying SEV-ES in production using a bleeding edge KVM. > > For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is > > completely nonsensical. As is, a Linux guest appears to blindly forward the #GP, > > which means if something does go awry KVM has just made debugging the guest that > > much harder, e.g. imagine the confusion that will ensue if the end result is a > > SIGBUS to userspace on CPUID. > > I see the point you're making, but I would also say that we probably > wouldn't even boot successfully if the kernel can't handle, e.g., a CPUID > #VC properly. I agree that GHCB bugs in the guest will be fatal, but that doesn't give the VMM carte blanche to do whatever it wants given bad input. > A lot of what could go wrong with required inputs, not the values, but the > required state being communicated, should have already been ironed out during > development of whichever OS is providing the SEV-ES support. Yes, but better on the kernel never having a regression is a losing proposition. And it doesn't even necessarily require a regression, e.g. an existing memory corruption bug elsewhere in the guest kernel (that escaped qualification) could corrupt the GHCB. If the GHCB is corrupted at runtime, the guest needs well-defined semantics from the VMM so that the guest at least has a chance of sanely handling the error. Handling in this case would mean an oops/panic, but that's far, far better than a random pseudo-#GP that might not even be immediately logged as a failure.
Sean Christopherson <seanjc@google.com> writes: > On Thu, May 20, 2021, Tom Lendacky wrote: >> On 5/20/21 2:16 PM, Sean Christopherson wrote: >> > On Mon, May 17, 2021, Tom Lendacky wrote: >> >> On 5/14/21 6:06 PM, Peter Gonda wrote: >> >>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: >> >>>> >> >>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT >> >>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued >> >>>> from userspace, even though userspace (likely) can't update the GHCB, >> >>>> don't allow userspace to be able to kill the guest. >> >>>> >> >>>> Return a #GP request through the GHCB when validation fails, rather than >> >>>> terminating the guest. >> >>> >> >>> Is this a gap in the spec? I don't see anything that details what >> >>> should happen if the correct fields for NAE are not set in the first >> >>> couple paragraphs of section 4 'GHCB Protocol'. >> >> >> >> No, I don't think the spec needs to spell out everything like this. The >> >> hypervisor is free to determine its course of action in this case. >> > >> > The hypervisor can decide whether to inject/return an error or kill the guest, >> > but what errors can be returned and how they're returned absolutely needs to be >> > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec >> > is the logical place to define said ABI. >> >> For now, that is all we have for versions 1 and 2 of the spec. We can >> certainly extend it in future versions if that is desired. >> >> I would suggest starting a thread on what we would like to see in the next >> version of the GHCB spec on the amd-sev-snp mailing list: >> >> amd-sev-snp@lists.suse.com > > Will do, but in the meantime, I don't think we should merge a fix of any kind > until there is consensus on what the VMM behavior will be. IMO, fixing this in > upstream is not urgent; I highly doubt anyone is deploying SEV-ES in production > using a bleeding edge KVM. Sorry for resurrecting this old thread but were there any deveopments here? I may have missed something but last time I've checked a single "rep; vmmcall" from userspace was still crashing the guest. The issue, however, doesn't seem to reproduce with Vmware ESXi which probably means they're just skipping the instruction and not even injecting #GP (AFAIR, I don't have an environment to re-test handy).
On Wed, Jul 21, 2021, Vitaly Kuznetsov wrote: > Sean Christopherson <seanjc@google.com> writes: > > > On Thu, May 20, 2021, Tom Lendacky wrote: > >> On 5/20/21 2:16 PM, Sean Christopherson wrote: > >> > On Mon, May 17, 2021, Tom Lendacky wrote: > >> >> On 5/14/21 6:06 PM, Peter Gonda wrote: > >> >>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@amd.com> wrote: > >> >>>> > >> >>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT > >> >>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued > >> >>>> from userspace, even though userspace (likely) can't update the GHCB, > >> >>>> don't allow userspace to be able to kill the guest. > >> >>>> > >> >>>> Return a #GP request through the GHCB when validation fails, rather than > >> >>>> terminating the guest. > >> >>> > >> >>> Is this a gap in the spec? I don't see anything that details what > >> >>> should happen if the correct fields for NAE are not set in the first > >> >>> couple paragraphs of section 4 'GHCB Protocol'. > >> >> > >> >> No, I don't think the spec needs to spell out everything like this. The > >> >> hypervisor is free to determine its course of action in this case. > >> > > >> > The hypervisor can decide whether to inject/return an error or kill the guest, > >> > but what errors can be returned and how they're returned absolutely needs to be > >> > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec > >> > is the logical place to define said ABI. > >> > >> For now, that is all we have for versions 1 and 2 of the spec. We can > >> certainly extend it in future versions if that is desired. > >> > >> I would suggest starting a thread on what we would like to see in the next > >> version of the GHCB spec on the amd-sev-snp mailing list: > >> > >> amd-sev-snp@lists.suse.com > > > > Will do, but in the meantime, I don't think we should merge a fix of any kind > > until there is consensus on what the VMM behavior will be. IMO, fixing this in > > upstream is not urgent; I highly doubt anyone is deploying SEV-ES in production > > using a bleeding edge KVM. > > Sorry for resurrecting this old thread but were there any deveopments > here? I may have missed something but last time I've checked a single > "rep; vmmcall" from userspace was still crashing the guest. I don't think it went anywhere, I completely forgot about this. I'll bump this back to the top of my todo list, unless someone else wants the honors :-) > The issue, however, doesn't seem to reproduce with Vmware ESXi which probably > means they're just skipping the instruction and not even injecting #GP > (AFAIR, I don't have an environment to re-test handy).
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 5bc887e9a986..bc77f26f0880 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2075,7 +2075,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); } -static int sev_es_validate_vmgexit(struct vcpu_svm *svm) +static bool sev_es_validate_vmgexit(struct vcpu_svm *svm) { struct kvm_vcpu *vcpu; struct ghcb *ghcb; @@ -2174,7 +2174,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) goto vmgexit_err; } - return 0; + return true; vmgexit_err: vcpu = &svm->vcpu; @@ -2188,13 +2188,16 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm) dump_ghcb(svm); } - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; - vcpu->run->internal.ndata = 2; - vcpu->run->internal.data[0] = exit_code; - vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; + /* Clear the valid entries fields */ + memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); - return -EINVAL; + ghcb_set_sw_exit_info_1(ghcb, 1); + ghcb_set_sw_exit_info_2(ghcb, + X86_TRAP_GP | + SVM_EVTINJ_TYPE_EXEPT | + SVM_EVTINJ_VALID); + + return false; } void sev_es_unmap_ghcb(struct vcpu_svm *svm) @@ -2459,9 +2462,8 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu) exit_code = ghcb_get_sw_exit_code(ghcb); - ret = sev_es_validate_vmgexit(svm); - if (ret) - return ret; + if (!sev_es_validate_vmgexit(svm)) + return 1; sev_es_sync_from_ghcb(svm); ghcb_set_sw_exit_info_1(ghcb, 0);
Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT exit code and parameters fail. Since the VMGEXIT instruction can be issued from userspace, even though userspace (likely) can't update the GHCB, don't allow userspace to be able to kill the guest. Return a #GP request through the GHCB when validation fails, rather than terminating the guest. Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> --- arch/x86/kvm/svm/sev.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)