diff mbox series

[v2,5/8] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM

Message ID 20210713142023.106183-6-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series My AVIC patch queue | expand

Commit Message

Maxim Levitsky July 13, 2021, 2:20 p.m. UTC
Currently on SVM, the kvm_request_apicv_update calls the
'pre_update_apicv_exec_ctrl' without doing any synchronization
and that function toggles the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.

If there is a mismatch between that memslot state and the AVIC state,
on one of vCPUs, an APIC mmio write can be lost:

For example:

VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
VCPU1: write to an APIC mmio register.

Since AVIC is still disabled on VCPU1, the access will not be  intercepted
by it, and neither will it cause MMIO fault, but rather it will just update
the dummy page mapped into the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.

Fix that by adding a lock guarding the AVIC state changes, and carefully
order the operations of kvm_request_apicv_update to avoid this race:

1. Take the lock
2. Send KVM_REQ_APICV_UPDATE
3. Update the apic inhibit reason
4. Release the lock

This ensures that at (2) all vCPUs are kicked out of the guest mode,
but don't yet see the new avic state.
Then only after (4) all other vCPUs can update their AVIC state and resume.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c       | 39 ++++++++++++++++++++++++---------------
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      |  1 +
 3 files changed, 26 insertions(+), 15 deletions(-)

Comments

Paolo Bonzini July 26, 2021, 10:34 p.m. UTC | #1
On 13/07/21 16:20, Maxim Levitsky wrote:
> +	mutex_lock(&vcpu->kvm->apicv_update_lock);
> +
>  	vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
>  	kvm_apic_update_apicv(vcpu);
>  	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
> @@ -9246,6 +9248,8 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  	 */
>  	if (!vcpu->arch.apicv_active)
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +
> +	mutex_unlock(&vcpu->kvm->apicv_update_lock);

Does this whole piece of code need the lock/unlock?  Does it work and/or 
make sense to do the unlock immediately after mutex_lock()?  This makes 
it clearer that the mutex is being to synchronize against the requestor.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ed4d1581d502..ba5d5d9ebc64 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -943,6 +943,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
>   	mutex_init(&kvm->irq_lock);
>   	mutex_init(&kvm->slots_lock);
>   	mutex_init(&kvm->slots_arch_lock);
> +	mutex_init(&kvm->apicv_update_lock);
>   	INIT_LIST_HEAD(&kvm->devices);
>   
>   	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
> 

Please add comments above fields that are protected by this lock 
(anything but apicv_inhibit_reasons?), and especially move it to kvm->arch.

Paolo
Maxim Levitsky July 27, 2021, 1:22 p.m. UTC | #2
On Tue, 2021-07-27 at 00:34 +0200, Paolo Bonzini wrote:
> On 13/07/21 16:20, Maxim Levitsky wrote:
> > +	mutex_lock(&vcpu->kvm->apicv_update_lock);
> > +
> >  	vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
> >  	kvm_apic_update_apicv(vcpu);
> >  	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
> > @@ -9246,6 +9248,8 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> >  	 */
> >  	if (!vcpu->arch.apicv_active)
> >  		kvm_make_request(KVM_REQ_EVENT, vcpu);
> > +
> > +	mutex_unlock(&vcpu->kvm->apicv_update_lock);
> 
> Does this whole piece of code need the lock/unlock?  Does it work and/or 
> make sense to do the unlock immediately after mutex_lock()?  This makes 
> it clearer that the mutex is being to synchronize against the requestor.

Yes, I do need to hold the mutex for the whole duration.

The requester does the following:

1. Take the mutex
 
2. Kick all the vCPUs out of the guest mode with KVM_REQ_EVENT
   At that point all these vCPUs will be (or soon will be) stuck on the mutex
   and guaranteed to be outside of the guest mode.
   which is exactly what I need to avoid them entering the guest
   mode as long as the AVIC's memslot state is not up to date.

3. Update kvm->arch.apicv_inhibit_reasons. I removed the cmpchg loop
   since it is now protected under the lock anyway.
   This step doesn't have to be done at this point, but should be done while mutex is held
   so that there is no need to cmpchg and such.

   This itself isn't the justification for the mutex.
 
4. Update the memslot
 
5. Release the mutex.
   Only now all other vCPUs are permitted to enter the guest mode again
   (since only now the memslot is up to date)
   and they will also update their per-vcpu AVIC enablement prior to entering it.
 
 
I think it might be possible to avoid the mutex, but I am not sure if this is worth it:
 
First of all, the above sync sequence is only really needed when we enable AVIC.

(Because from the moment we enable the memslot and to the moment the vCPU enables the AVIC,
it must not be in guest mode as otherwise it will access the dummy page in the memslot
without VMexit, instead of going through AVIC vmexit/acceleration.
 
The other way around is OK. IF we disable the memslot, and a vCPU still has a enabled AVIC, 
it will just get a page fault which will be correctly emulated as APIC read/write by
the MMIO page fault.
 
If I had a guarantee that when I enable the memslot (either as it done today or
using the kvm_zap_gfn_range (which I strongly prefer), would always raise a TLB
flush request on all vCPUs, then I could (ab)use that request to update local
AVIC state.
 
Or I can just always check if local AVIC state matches the memslot and update
if it doesn't prior to guest mode entry.
 
I still think I would prefer a mutex to be 100% sure that there are no races,
since the whole AVIC disablement isn't something that is done often anyway.
 
Best regards,
	Maxim Levitsky

> 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index ed4d1581d502..ba5d5d9ebc64 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -943,6 +943,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
> >   	mutex_init(&kvm->irq_lock);
> >   	mutex_init(&kvm->slots_lock);
> >   	mutex_init(&kvm->slots_arch_lock);
> > +	mutex_init(&kvm->apicv_update_lock);
> >   	INIT_LIST_HEAD(&kvm->devices);
> >   
> >   	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
> > 
> 
> Please add comments above fields that are protected by this lock 
> (anything but apicv_inhibit_reasons?), and especially move it to kvm->arch.
I agree, I will do this.


> 
> Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 29b92f6cbad4..a91e35b92447 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9180,6 +9180,8 @@  void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	if (!lapic_in_kernel(vcpu))
 		return;
 
+	mutex_lock(&vcpu->kvm->apicv_update_lock);
+
 	vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
 	kvm_apic_update_apicv(vcpu);
 	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
@@ -9192,6 +9194,8 @@  void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	 */
 	if (!vcpu->arch.apicv_active)
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
+
+	mutex_unlock(&vcpu->kvm->apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
 
@@ -9204,32 +9208,34 @@  EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
  */
 void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 {
-	unsigned long old, new, expected;
+	unsigned long old, new;
 
 	if (!kvm_x86_ops.check_apicv_inhibit_reasons ||
 	    !static_call(kvm_x86_check_apicv_inhibit_reasons)(bit))
 		return;
 
-	old = READ_ONCE(kvm->arch.apicv_inhibit_reasons);
-	do {
-		expected = new = old;
-		if (activate)
-			__clear_bit(bit, &new);
-		else
-			__set_bit(bit, &new);
-		if (new == old)
-			break;
-		old = cmpxchg(&kvm->arch.apicv_inhibit_reasons, expected, new);
-	} while (old != expected);
+	mutex_lock(&kvm->apicv_update_lock);
+
+	old = new = kvm->arch.apicv_inhibit_reasons;
+	if (activate)
+		__clear_bit(bit, &new);
+	else
+		__set_bit(bit, &new);
+
+	kvm->arch.apicv_inhibit_reasons = new;
 
 	if (!!old == !!new)
-		return;
+		goto out;
 
 	trace_kvm_apicv_update_request(activate, bit);
+
+	kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
+
 	if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
 		static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
 
-	kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
+out:
+	mutex_unlock(&kvm->apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
 
@@ -9436,8 +9442,11 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		 */
 		if (kvm_check_request(KVM_REQ_HV_STIMER, vcpu))
 			kvm_hv_process_stimers(vcpu);
-		if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
+		if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu)) {
+			srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 			kvm_vcpu_update_apicv(vcpu);
+			vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+		}
 		if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
 			kvm_check_async_pf_completion(vcpu);
 		if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 37cbb56ccd09..0364d35d43dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -524,6 +524,7 @@  struct kvm {
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
 	struct mutex slots_lock;
+	struct mutex apicv_update_lock;
 
 	/*
 	 * Protects the arch-specific fields of struct kvm_memory_slots in
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ed4d1581d502..ba5d5d9ebc64 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -943,6 +943,7 @@  static struct kvm *kvm_create_vm(unsigned long type)
 	mutex_init(&kvm->irq_lock);
 	mutex_init(&kvm->slots_lock);
 	mutex_init(&kvm->slots_arch_lock);
+	mutex_init(&kvm->apicv_update_lock);
 	INIT_LIST_HEAD(&kvm->devices);
 
 	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);