diff mbox series

KVM: SVM: Fix SNP AP destroy race with VMRUN

Message ID 6053e8eba1456e4c1bf667f38cc20a0ea05bc72c.1742232014.git.thomas.lendacky@amd.com (mailing list archive)
State New
Headers show
Series KVM: SVM: Fix SNP AP destroy race with VMRUN | expand

Commit Message

Tom Lendacky March 17, 2025, 5:20 p.m. UTC
An AP destroy request for a target vCPU is typically followed by an
RMPADJUST to remove the VMSA attribute from the page currently being
used as the VMSA for the target vCPU. This can result in a vCPU that
is about to VMRUN to exit with #VMEXIT_INVALID.

This usually does not happen as APs are typically sitting in HLT when
being destroyed and therefore the vCPU thread is not running at the time.
However, if HLT is allowed inside the VM, then the vCPU could be about to
VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
guest to crash. An RMPADJUST against an in-use (already running) VMSA
results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
attribute cannot be changed until the VMRUN for target vCPU exits. The
Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
HLT inside the guest.

Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
request before returning to the initiating vCPU.

Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 ++++++++++
 1 file changed, 10 insertions(+)


base-commit: f8d892c137f7448d7b49f5e3ad7aa7b5a48a64ed

Comments

Tom Lendacky March 17, 2025, 5:23 p.m. UTC | #1
On 3/17/25 12:20, Tom Lendacky wrote:
> An AP destroy request for a target vCPU is typically followed by an
> RMPADJUST to remove the VMSA attribute from the page currently being
> used as the VMSA for the target vCPU. This can result in a vCPU that
> is about to VMRUN to exit with #VMEXIT_INVALID.
> 
> This usually does not happen as APs are typically sitting in HLT when
> being destroyed and therefore the vCPU thread is not running at the time.
> However, if HLT is allowed inside the VM, then the vCPU could be about to
> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
> guest to crash. An RMPADJUST against an in-use (already running) VMSA
> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
> attribute cannot be changed until the VMRUN for target vCPU exits. The
> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
> HLT inside the guest.
> 
> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
> request before returning to the initiating vCPU.
> 
> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>

Sean,

If you're ok with this approach for the fix, this patch may need to be
adjusted given your series around AP creation fixes, unless you want to
put this as an early patch in your series. Let me know what you'd like
to do.

Thanks,
Tom

> ---
>  arch/x86/kvm/svm/sev.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0d898d6b697f..a040f29bb07b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4071,6 +4071,16 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
>  	if (kick) {
>  		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
>  		kvm_vcpu_kick(target_vcpu);
> +
> +		if (request == SVM_VMGEXIT_AP_DESTROY) {
> +			/*
> +			 * A destroy is likely to be followed by an RMPADJUST
> +			 * that will remove the VMSA flag, so be sure the vCPU
> +			 * got the request in case it is on the way to a VMRUN.
> +			 */
> +			while (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu))
> +				cond_resched();
> +		}
>  	}
>  
>  	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
> 
> base-commit: f8d892c137f7448d7b49f5e3ad7aa7b5a48a64ed
Sean Christopherson March 17, 2025, 5:28 p.m. UTC | #2
On Mon, Mar 17, 2025, Tom Lendacky wrote:
> On 3/17/25 12:20, Tom Lendacky wrote:
> > An AP destroy request for a target vCPU is typically followed by an
> > RMPADJUST to remove the VMSA attribute from the page currently being
> > used as the VMSA for the target vCPU. This can result in a vCPU that
> > is about to VMRUN to exit with #VMEXIT_INVALID.
> > 
> > This usually does not happen as APs are typically sitting in HLT when
> > being destroyed and therefore the vCPU thread is not running at the time.
> > However, if HLT is allowed inside the VM, then the vCPU could be about to
> > VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
> > a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
> > guest to crash. An RMPADJUST against an in-use (already running) VMSA
> > results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
> > attribute cannot be changed until the VMRUN for target vCPU exits. The
> > Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
> > HLT inside the guest.
> > 
> > Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
> > request before returning to the initiating vCPU.
> > 
> > Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
> > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Sean,
> 
> If you're ok with this approach for the fix, this patch may need to be
> adjusted given your series around AP creation fixes, unless you want to
> put this as an early patch in your series. Let me know what you'd like
> to do.

This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
far enough along to consume the request.

Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
to be annotated with KVM_REQUEST_WAIT.

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 04e6c5604bc3..67abfe97c600 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -124,7 +124,8 @@
        KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
        KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE   KVM_ARCH_REQ(34)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
+       KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
 
 #define CR0_RESERVED_BITS                                               \
        (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
Tom Lendacky March 17, 2025, 5:36 p.m. UTC | #3
On 3/17/25 12:28, Sean Christopherson wrote:
> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>> On 3/17/25 12:20, Tom Lendacky wrote:
>>> An AP destroy request for a target vCPU is typically followed by an
>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>
>>> This usually does not happen as APs are typically sitting in HLT when
>>> being destroyed and therefore the vCPU thread is not running at the time.
>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>> HLT inside the guest.
>>>
>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>> request before returning to the initiating vCPU.
>>>
>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Sean,
>>
>> If you're ok with this approach for the fix, this patch may need to be
>> adjusted given your series around AP creation fixes, unless you want to
>> put this as an early patch in your series. Let me know what you'd like
>> to do.
> 
> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
> far enough along to consume the request.
> 
> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
> to be annotated with KVM_REQUEST_WAIT.

Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
This is much simpler. Let me test it out and resend if everything goes ok.

Thanks,
Tom

> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 04e6c5604bc3..67abfe97c600 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -124,7 +124,8 @@
>         KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_HV_TLB_FLUSH \
>         KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE   KVM_ARCH_REQ(34)
> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
> +       KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>  
>  #define CR0_RESERVED_BITS                                               \
>         (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> 
>
Tom Lendacky March 18, 2025, 12:43 p.m. UTC | #4
On 3/17/25 12:36, Tom Lendacky wrote:
> On 3/17/25 12:28, Sean Christopherson wrote:
>> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>>> On 3/17/25 12:20, Tom Lendacky wrote:
>>>> An AP destroy request for a target vCPU is typically followed by an
>>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>>
>>>> This usually does not happen as APs are typically sitting in HLT when
>>>> being destroyed and therefore the vCPU thread is not running at the time.
>>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>>> HLT inside the guest.
>>>>
>>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>>> request before returning to the initiating vCPU.
>>>>
>>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>
>>> Sean,
>>>
>>> If you're ok with this approach for the fix, this patch may need to be
>>> adjusted given your series around AP creation fixes, unless you want to
>>> put this as an early patch in your series. Let me know what you'd like
>>> to do.
>>
>> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
>> far enough along to consume the request.
>>
>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>> to be annotated with KVM_REQUEST_WAIT.
> 
> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
> This is much simpler. Let me test it out and resend if everything goes ok.

So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
me try to track down what is happening with this approach...

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 04e6c5604bc3..67abfe97c600 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -124,7 +124,8 @@
>>         KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>  #define KVM_REQ_HV_TLB_FLUSH \
>>         KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE   KVM_ARCH_REQ(34)
>> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
>> +       KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>>  
>>  #define CR0_RESERVED_BITS                                               \
>>         (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>>
>>
Tom Lendacky March 18, 2025, 1:47 p.m. UTC | #5
On 3/18/25 07:43, Tom Lendacky wrote:
> On 3/17/25 12:36, Tom Lendacky wrote:
>> On 3/17/25 12:28, Sean Christopherson wrote:
>>> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>>>> On 3/17/25 12:20, Tom Lendacky wrote:
>>>>> An AP destroy request for a target vCPU is typically followed by an
>>>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>>>
>>>>> This usually does not happen as APs are typically sitting in HLT when
>>>>> being destroyed and therefore the vCPU thread is not running at the time.
>>>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>>>> HLT inside the guest.
>>>>>
>>>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>>>> request before returning to the initiating vCPU.
>>>>>
>>>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>>
>>>> Sean,
>>>>
>>>> If you're ok with this approach for the fix, this patch may need to be
>>>> adjusted given your series around AP creation fixes, unless you want to
>>>> put this as an early patch in your series. Let me know what you'd like
>>>> to do.
>>>
>>> This is unsafe as it requires userspace to do KVM_RUN _and_ for the vCPU to get
>>> far enough along to consume the request.
>>>
>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>>> to be annotated with KVM_REQUEST_WAIT.
>>
>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
>> This is much simpler. Let me test it out and resend if everything goes ok.
> 
> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
> me try to track down what is happening with this approach...

Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
plain kvm_make_request() followed by a kvm_vcpu_kick().

Let me try that and see how this works.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> Thanks,
>> Tom
>>
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 04e6c5604bc3..67abfe97c600 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -124,7 +124,8 @@
>>>         KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>>  #define KVM_REQ_HV_TLB_FLUSH \
>>>         KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE   KVM_ARCH_REQ(34)
>>> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
>>> +       KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>>>  
>>>  #define CR0_RESERVED_BITS                                               \
>>>         (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>>>
>>>
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0d898d6b697f..a040f29bb07b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4071,6 +4071,16 @@  static int sev_snp_ap_creation(struct vcpu_svm *svm)
 	if (kick) {
 		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
 		kvm_vcpu_kick(target_vcpu);
+
+		if (request == SVM_VMGEXIT_AP_DESTROY) {
+			/*
+			 * A destroy is likely to be followed by an RMPADJUST
+			 * that will remove the VMSA flag, so be sure the vCPU
+			 * got the request in case it is on the way to a VMRUN.
+			 */
+			while (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu))
+				cond_resched();
+		}
 	}
 
 	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);