Message ID | 6053e8eba1456e4c1bf667f38cc20a0ea05bc72c.1742232014.git.thomas.lendacky@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: SVM: Fix SNP AP destroy race with VMRUN | expand |
On 3/17/25 12:20, Tom Lendacky wrote: > An AP destroy request for a target vCPU is typically followed by an > RMPADJUST to remove the VMSA attribute from the page currently being > used as the VMSA for the target vCPU. This can result in a vCPU that > is about to VMRUN to exit with #VMEXIT_INVALID. > > This usually does not happen as APs are typically sitting in HLT when > being destroyed and therefore the vCPU thread is not running at the time. > However, if HLT is allowed inside the VM, then the vCPU could be about to > VMRUN when the VMSA attribute is removed from the VMSA page, resulting in > a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the > guest to crash. An RMPADJUST against an in-use (already running) VMSA > results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA > attribute cannot be changed until the VMRUN for target vCPU exits. The > Qemu command line option '-overcommit cpu-pm=on' is an example of allowing > HLT inside the guest. > > Use kvm_test_request() to ensure that the target vCPU sees the AP destroy > request before returning to the initiating vCPU. > > Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event") > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Sean, If you're ok with this approach for the fix, this patch may need to be adjusted given your series around AP creation fixes, unless you want to put this as an early patch in your series. Let me know what you'd like to do. Thanks, Tom > --- > arch/x86/kvm/svm/sev.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 0d898d6b697f..a040f29bb07b 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -4071,6 +4071,16 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm) > if (kick) { > kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu); > kvm_vcpu_kick(target_vcpu); > + > + if (request == SVM_VMGEXIT_AP_DESTROY) { > + /* > + * A destroy is likely to be followed by an RMPADJUST > + * that will remove the VMSA flag, so be sure the vCPU > + * got the request in case it is on the way to a VMRUN. > + */ > + while (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu)) > + cond_resched(); > + } > } > > mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex); > > base-commit: f8d892c137f7448d7b49f5e3ad7aa7b5a48a64ed
On Mon, Mar 17, 2025, Tom Lendacky wrote: > On 3/17/25 12:20, Tom Lendacky wrote: > > An AP destroy request for a target vCPU is typically followed by an > > RMPADJUST to remove the VMSA attribute from the page currently being > > used as the VMSA for the target vCPU. This can result in a vCPU that > > is about to VMRUN to exit with #VMEXIT_INVALID. > > > > This usually does not happen as APs are typically sitting in HLT when > > being destroyed and therefore the vCPU thread is not running at the time. > > However, if HLT is allowed inside the VM, then the vCPU could be about to > > VMRUN when the VMSA attribute is removed from the VMSA page, resulting in > > a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the > > guest to crash. An RMPADJUST against an in-use (already running) VMSA > > results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA > > attribute cannot be changed until the VMRUN for target vCPU exits. The > > Qemu command line option '-overcommit cpu-pm=on' is an example of allowing > > HLT inside the guest. > > > > Use kvm_test_request() to ensure that the target vCPU sees the AP destroy > > request before returning to the initiating vCPU. > > > > Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event") > > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> > > Sean, > > If you're ok with this approach for the fix, this patch may need to be > adjusted given your series around AP creation fixes, unless you want to > put this as an early patch in your series. Let me know what you'd like > to do. This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get far enough along to consume the request. Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs to be annotated with KVM_REQUEST_WAIT. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 04e6c5604bc3..67abfe97c600 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -124,7 +124,8 @@ KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_HV_TLB_FLUSH \ KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34) +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \ + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT) #define CR0_RESERVED_BITS \ (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
On 3/17/25 12:28, Sean Christopherson wrote: > On Mon, Mar 17, 2025, Tom Lendacky wrote: >> On 3/17/25 12:20, Tom Lendacky wrote: >>> An AP destroy request for a target vCPU is typically followed by an >>> RMPADJUST to remove the VMSA attribute from the page currently being >>> used as the VMSA for the target vCPU. This can result in a vCPU that >>> is about to VMRUN to exit with #VMEXIT_INVALID. >>> >>> This usually does not happen as APs are typically sitting in HLT when >>> being destroyed and therefore the vCPU thread is not running at the time. >>> However, if HLT is allowed inside the VM, then the vCPU could be about to >>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in >>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the >>> guest to crash. An RMPADJUST against an in-use (already running) VMSA >>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA >>> attribute cannot be changed until the VMRUN for target vCPU exits. The >>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing >>> HLT inside the guest. >>> >>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy >>> request before returning to the initiating vCPU. >>> >>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event") >>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> >> >> Sean, >> >> If you're ok with this approach for the fix, this patch may need to be >> adjusted given your series around AP creation fixes, unless you want to >> put this as an early patch in your series. Let me know what you'd like >> to do. > > This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get > far enough along to consume the request. > > Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs > to be annotated with KVM_REQUEST_WAIT. Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here. This is much simpler. Let me test it out and resend if everything goes ok. Thanks, Tom > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 04e6c5604bc3..67abfe97c600 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -124,7 +124,8 @@ > KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > #define KVM_REQ_HV_TLB_FLUSH \ > KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34) > +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \ > + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT) > > #define CR0_RESERVED_BITS \ > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > >
On 3/17/25 12:36, Tom Lendacky wrote: > On 3/17/25 12:28, Sean Christopherson wrote: >> On Mon, Mar 17, 2025, Tom Lendacky wrote: >>> On 3/17/25 12:20, Tom Lendacky wrote: >>>> An AP destroy request for a target vCPU is typically followed by an >>>> RMPADJUST to remove the VMSA attribute from the page currently being >>>> used as the VMSA for the target vCPU. This can result in a vCPU that >>>> is about to VMRUN to exit with #VMEXIT_INVALID. >>>> >>>> This usually does not happen as APs are typically sitting in HLT when >>>> being destroyed and therefore the vCPU thread is not running at the time. >>>> However, if HLT is allowed inside the VM, then the vCPU could be about to >>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in >>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the >>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA >>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA >>>> attribute cannot be changed until the VMRUN for target vCPU exits. The >>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing >>>> HLT inside the guest. >>>> >>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy >>>> request before returning to the initiating vCPU. >>>> >>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event") >>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> >>> >>> Sean, >>> >>> If you're ok with this approach for the fix, this patch may need to be >>> adjusted given your series around AP creation fixes, unless you want to >>> put this as an early patch in your series. Let me know what you'd like >>> to do. >> >> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get >> far enough along to consume the request. >> >> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs >> to be annotated with KVM_REQUEST_WAIT. > > Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here. > This is much simpler. Let me test it out and resend if everything goes ok. So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let me try to track down what is happening with this approach... Thanks, Tom > > Thanks, > Tom > >> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 04e6c5604bc3..67abfe97c600 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -124,7 +124,8 @@ >> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >> #define KVM_REQ_HV_TLB_FLUSH \ >> KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34) >> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \ >> + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT) >> >> #define CR0_RESERVED_BITS \ >> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ >> >>
On 3/18/25 07:43, Tom Lendacky wrote: > On 3/17/25 12:36, Tom Lendacky wrote: >> On 3/17/25 12:28, Sean Christopherson wrote: >>> On Mon, Mar 17, 2025, Tom Lendacky wrote: >>>> On 3/17/25 12:20, Tom Lendacky wrote: >>>>> An AP destroy request for a target vCPU is typically followed by an >>>>> RMPADJUST to remove the VMSA attribute from the page currently being >>>>> used as the VMSA for the target vCPU. This can result in a vCPU that >>>>> is about to VMRUN to exit with #VMEXIT_INVALID. >>>>> >>>>> This usually does not happen as APs are typically sitting in HLT when >>>>> being destroyed and therefore the vCPU thread is not running at the time. >>>>> However, if HLT is allowed inside the VM, then the vCPU could be about to >>>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in >>>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the >>>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA >>>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA >>>>> attribute cannot be changed until the VMRUN for target vCPU exits. The >>>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing >>>>> HLT inside the guest. >>>>> >>>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy >>>>> request before returning to the initiating vCPU. >>>>> >>>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event") >>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> >>>> >>>> Sean, >>>> >>>> If you're ok with this approach for the fix, this patch may need to be >>>> adjusted given your series around AP creation fixes, unless you want to >>>> put this as an early patch in your series. Let me know what you'd like >>>> to do. >>> >>> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get >>> far enough along to consume the request. >>> >>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs >>> to be annotated with KVM_REQUEST_WAIT. >> >> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here. >> This is much simpler. Let me test it out and resend if everything goes ok. > > So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let > me try to track down what is happening with this approach... Looks like I need to use kvm_make_vcpus_request_mask() instead of just a plain kvm_make_request() followed by a kvm_vcpu_kick(). Let me try that and see how this works. Thanks, Tom > > Thanks, > Tom > >> >> Thanks, >> Tom >> >>> >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>> index 04e6c5604bc3..67abfe97c600 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -124,7 +124,8 @@ >>> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >>> #define KVM_REQ_HV_TLB_FLUSH \ >>> KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >>> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34) >>> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \ >>> + KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT) >>> >>> #define CR0_RESERVED_BITS \ >>> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ >>> >>>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 0d898d6b697f..a040f29bb07b 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -4071,6 +4071,16 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm) if (kick) { kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu); kvm_vcpu_kick(target_vcpu); + + if (request == SVM_VMGEXIT_AP_DESTROY) { + /* + * A destroy is likely to be followed by an RMPADJUST + * that will remove the VMSA flag, so be sure the vCPU + * got the request in case it is on the way to a VMRUN. + */ + while (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu)) + cond_resched(); + } } mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
An AP destroy request for a target vCPU is typically followed by an RMPADJUST to remove the VMSA attribute from the page currently being used as the VMSA for the target vCPU. This can result in a vCPU that is about to VMRUN to exit with #VMEXIT_INVALID. This usually does not happen as APs are typically sitting in HLT when being destroyed and therefore the vCPU thread is not running at the time. However, if HLT is allowed inside the VM, then the vCPU could be about to VMRUN when the VMSA attribute is removed from the VMSA page, resulting in a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the guest to crash. An RMPADJUST against an in-use (already running) VMSA results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA attribute cannot be changed until the VMRUN for target vCPU exits. The Qemu command line option '-overcommit cpu-pm=on' is an example of allowing HLT inside the guest. Use kvm_test_request() to ensure that the target vCPU sees the AP destroy request before returning to the initiating vCPU. Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> --- arch/x86/kvm/svm/sev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) base-commit: f8d892c137f7448d7b49f5e3ad7aa7b5a48a64ed