diff mbox series

[3/6] KVM: SVM: fix AVIC race of host->guest IPI delivery vs AVIC inhibition

Message ID 20211209115440.394441-4-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting | expand

Commit Message

Maxim Levitsky Dec. 9, 2021, 11:54 a.m. UTC
If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
inhibited, it might read a stale value of vcpu->arch.apicv_active
which can lead to the target vCPU not noticing the interrupt.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini Dec. 9, 2021, 2:11 p.m. UTC | #1
On 12/9/21 12:54, Maxim Levitsky wrote:
> If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> inhibited, it might read a stale value of vcpu->arch.apicv_active
> which can lead to the target vCPU not noticing the interrupt.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
>   1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 859ad2dc50f1..8c1b934bfa9b 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>   	 * automatically process AVIC interrupts at VMRUN.
>   	 */
>   	if (vcpu->mode == IN_GUEST_MODE) {
> +
> +		/*
> +		 * At this point we had read the vcpu->arch.apicv_active == true
> +		 * and the vcpu->mode == IN_GUEST_MODE.
> +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> +		 * it ensures that AVIC inhibition is complete and thus
> +		 * the target is really running with AVIC enabled.
> +		 */
> +
>   		int cpu = READ_ONCE(vcpu->cpu);

I don't think it's correct.  The vCPU has apicv_active written (in 
kvm_vcpu_update_apicv) before vcpu->mode.

For the acquire/release pair to work properly you need to 1) read 
apicv_active *after* vcpu->mode here 2) use store_release and 
load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.

Paolo

>   		/*
> @@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>   		put_cpu();
>   	} else {
>   		/*
> -		 * Wake the vCPU if it was blocking.  KVM will then detect the
> -		 * pending IRQ when checking if the vCPU has a wake event.
> +		 * Kick the target vCPU otherwise, to make sure
> +		 * it processes the interrupt even if its AVIC is inhibited.
>   		 */
> -		kvm_vcpu_wake_up(vcpu);
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +		kvm_vcpu_kick(vcpu);
>   	}
>   
>   	return 0;
>
Maxim Levitsky Dec. 9, 2021, 2:26 p.m. UTC | #2
On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> On 12/9/21 12:54, Maxim Levitsky wrote:
> > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > which can lead to the target vCPU not noticing the interrupt.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> >   1 file changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 859ad2dc50f1..8c1b934bfa9b 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >   	 * automatically process AVIC interrupts at VMRUN.
> >   	 */
> >   	if (vcpu->mode == IN_GUEST_MODE) {
> > +
> > +		/*
> > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > +		 * it ensures that AVIC inhibition is complete and thus
> > +		 * the target is really running with AVIC enabled.
> > +		 */
> > +
> >   		int cpu = READ_ONCE(vcpu->cpu);
> 
> I don't think it's correct.  The vCPU has apicv_active written (in 
> kvm_vcpu_update_apicv) before vcpu->mode.

I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
But apparently the memory barrier is after setting vcpu->mode.


> 
> For the acquire/release pair to work properly you need to 1) read 
> apicv_active *after* vcpu->mode here 2) use store_release and 
> load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.

store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
which I expected to be there.

And yes I see now, I need a read barrier here as well. I am still learning this.

Best regards,
	Maxim Levitsky

> 
> Paolo
> 
> >   		/*
> > @@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >   		put_cpu();
> >   	} else {
> >   		/*
> > -		 * Wake the vCPU if it was blocking.  KVM will then detect the
> > -		 * pending IRQ when checking if the vCPU has a wake event.
> > +		 * Kick the target vCPU otherwise, to make sure
> > +		 * it processes the interrupt even if its AVIC is inhibited.
> >   		 */
> > -		kvm_vcpu_wake_up(vcpu);
> > +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> > +		kvm_vcpu_kick(vcpu);
> >   	}
> >   
> >   	return 0;
> >
Sean Christopherson Dec. 9, 2021, 3:27 p.m. UTC | #3
On Thu, Dec 09, 2021, Maxim Levitsky wrote:
> On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> > On 12/9/21 12:54, Maxim Levitsky wrote:
> > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > > which can lead to the target vCPU not noticing the interrupt.
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> > >   1 file changed, 13 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index 859ad2dc50f1..8c1b934bfa9b 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> > >   	 * automatically process AVIC interrupts at VMRUN.
> > >   	 */
> > >   	if (vcpu->mode == IN_GUEST_MODE) {
> > > +
> > > +		/*
> > > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > > +		 * it ensures that AVIC inhibition is complete and thus
> > > +		 * the target is really running with AVIC enabled.
> > > +		 */
> > > +
> > >   		int cpu = READ_ONCE(vcpu->cpu);
> > 
> > I don't think it's correct.  The vCPU has apicv_active written (in 
> > kvm_vcpu_update_apicv) before vcpu->mode.
> 
> I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
> thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
> But apparently the memory barrier is after setting vcpu->mode.
> 
> 
> > 
> > For the acquire/release pair to work properly you need to 1) read 
> > apicv_active *after* vcpu->mode here 2) use store_release and 
> > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.
> 
> store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
> which I expected to be there.
> 
> And yes I see now, I need a read barrier here as well. I am still learning this.

Sans barriers and comments, can't this be written as returning an "error" if the
vCPU is not IN_GUEST_MODE?  Effectively the same thing, but a little more precise
and it avoids duplicating the lapic.c code.

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 26ed5325c593..cddf7a8da3ea 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)

 int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 {
-       if (!vcpu->arch.apicv_active)
+       if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active)
                return -1;

        kvm_lapic_set_irr(vec, vcpu->arch.apic);
@@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
                put_cpu();
        } else {
                /*
-                * Wake the vCPU if it was blocking.  KVM will then detect the
-                * pending IRQ when checking if the vCPU has a wake event.
+                * Wake the vCPU if it is blocking.  If the vCPU exited the
+                * guest since the previous vcpu->mode check, it's guaranteed
+                * to see the event before re-enterring the guest.
                 */
                kvm_vcpu_wake_up(vcpu);
        }
Maxim Levitsky Dec. 9, 2021, 3:33 p.m. UTC | #4
On Thu, 2021-12-09 at 15:27 +0000, Sean Christopherson wrote:
> On Thu, Dec 09, 2021, Maxim Levitsky wrote:
> > On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> > > On 12/9/21 12:54, Maxim Levitsky wrote:
> > > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > > > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > > > which can lead to the target vCPU not noticing the interrupt.
> > > > 
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > ---
> > > >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> > > >   1 file changed, 13 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > > index 859ad2dc50f1..8c1b934bfa9b 100644
> > > > --- a/arch/x86/kvm/svm/avic.c
> > > > +++ b/arch/x86/kvm/svm/avic.c
> > > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> > > >   	 * automatically process AVIC interrupts at VMRUN.
> > > >   	 */
> > > >   	if (vcpu->mode == IN_GUEST_MODE) {
> > > > +
> > > > +		/*
> > > > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > > > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > > > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > > > +		 * it ensures that AVIC inhibition is complete and thus
> > > > +		 * the target is really running with AVIC enabled.
> > > > +		 */
> > > > +
> > > >   		int cpu = READ_ONCE(vcpu->cpu);
> > > 
> > > I don't think it's correct.  The vCPU has apicv_active written (in 
> > > kvm_vcpu_update_apicv) before vcpu->mode.
> > 
> > I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
> > thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
> > But apparently the memory barrier is after setting vcpu->mode.
> > 
> > 
> > > For the acquire/release pair to work properly you need to 1) read 
> > > apicv_active *after* vcpu->mode here 2) use store_release and 
> > > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.
> > 
> > store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
> > which I expected to be there.
> > 
> > And yes I see now, I need a read barrier here as well. I am still learning this.
> 
> Sans barriers and comments, can't this be written as returning an "error" if the
> vCPU is not IN_GUEST_MODE?  Effectively the same thing, but a little more precise
> and it avoids duplicating the lapic.c code.

Yes, beside the fact that we already set the vIRR bit so if I return -1 here, it will be set again..
(and these are set using atomic ops)

I don't know how much that matters except the fact that while a vCPU runs a nested guest,
callers wishing to send IPI to it, will go through this code path a lot 
(even when I implement nested AVIC as it is a separate thing which is used by L2 only).

Best regards,
	Maxim Levitsky

> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 26ed5325c593..cddf7a8da3ea 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> 
>  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>  {
> -       if (!vcpu->arch.apicv_active)
> +       if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active)
>                 return -1;
> 
>         kvm_lapic_set_irr(vec, vcpu->arch.apic);
> @@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
>                 put_cpu();
>         } else {
>                 /*
> -                * Wake the vCPU if it was blocking.  KVM will then detect the
> -                * pending IRQ when checking if the vCPU has a wake event.
> +                * Wake the vCPU if it is blocking.  If the vCPU exited the
> +                * guest since the previous vcpu->mode check, it's guaranteed
> +                * to see the event before re-enterring the guest.
>                  */
>                 kvm_vcpu_wake_up(vcpu);
>         }
>
Maxim Levitsky Dec. 9, 2021, 3:35 p.m. UTC | #5
On Thu, 2021-12-09 at 17:33 +0200, Maxim Levitsky wrote:
> On Thu, 2021-12-09 at 15:27 +0000, Sean Christopherson wrote:
> > On Thu, Dec 09, 2021, Maxim Levitsky wrote:
> > > On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote:
> > > > On 12/9/21 12:54, Maxim Levitsky wrote:
> > > > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
> > > > > inhibited, it might read a stale value of vcpu->arch.apicv_active
> > > > > which can lead to the target vCPU not noticing the interrupt.
> > > > > 
> > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > ---
> > > > >   arch/x86/kvm/svm/avic.c | 16 +++++++++++++---
> > > > >   1 file changed, 13 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > > > index 859ad2dc50f1..8c1b934bfa9b 100644
> > > > > --- a/arch/x86/kvm/svm/avic.c
> > > > > +++ b/arch/x86/kvm/svm/avic.c
> > > > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> > > > >   	 * automatically process AVIC interrupts at VMRUN.
> > > > >   	 */
> > > > >   	if (vcpu->mode == IN_GUEST_MODE) {
> > > > > +
> > > > > +		/*
> > > > > +		 * At this point we had read the vcpu->arch.apicv_active == true
> > > > > +		 * and the vcpu->mode == IN_GUEST_MODE.
> > > > > +		 * Since we have a memory barrier after setting IN_GUEST_MODE,
> > > > > +		 * it ensures that AVIC inhibition is complete and thus
> > > > > +		 * the target is really running with AVIC enabled.
> > > > > +		 */
> > > > > +
> > > > >   		int cpu = READ_ONCE(vcpu->cpu);
> > > > 
> > > > I don't think it's correct.  The vCPU has apicv_active written (in 
> > > > kvm_vcpu_update_apicv) before vcpu->mode.
> > > 
> > > I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE
> > > thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value.
> > > But apparently the memory barrier is after setting vcpu->mode.
> > > 
> > > 
> > > > For the acquire/release pair to work properly you need to 1) read 
> > > > apicv_active *after* vcpu->mode here 2) use store_release and 
> > > > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here.
> > > 
> > > store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it,
> > > which I expected to be there.
> > > 
> > > And yes I see now, I need a read barrier here as well. I am still learning this.
> > 
> > Sans barriers and comments, can't this be written as returning an "error" if the
> > vCPU is not IN_GUEST_MODE?  Effectively the same thing, but a little more precise
> > and it avoids duplicating the lapic.c code.
> 
> Yes, beside the fact that we already set the vIRR bit so if I return -1 here, it will be set again..
> (and these are set using atomic ops)
> 
> I don't know how much that matters except the fact that while a vCPU runs a nested guest,
> callers wishing to send IPI to it, will go through this code path a lot 
> (even when I implement nested AVIC as it is a separate thing which is used by L2 only).

Ah, hit send too soon, makes sense now to me!
Best regards,
	Maxim Levitsky
> 
> Best regards,
> 	Maxim Levitsky
> 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 26ed5325c593..cddf7a8da3ea 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> > 
> >  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >  {
> > -       if (!vcpu->arch.apicv_active)
> > +       if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active)
> >                 return -1;
> > 
> >         kvm_lapic_set_irr(vec, vcpu->arch.apic);
> > @@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
> >                 put_cpu();
> >         } else {
> >                 /*
> > -                * Wake the vCPU if it was blocking.  KVM will then detect the
> > -                * pending IRQ when checking if the vCPU has a wake event.
> > +                * Wake the vCPU if it is blocking.  If the vCPU exited the
> > +                * guest since the previous vcpu->mode check, it's guaranteed
> > +                * to see the event before re-enterring the guest.
> >                  */
> >                 kvm_vcpu_wake_up(vcpu);
> >         }
> >
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 859ad2dc50f1..8c1b934bfa9b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -691,6 +691,15 @@  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 	 * automatically process AVIC interrupts at VMRUN.
 	 */
 	if (vcpu->mode == IN_GUEST_MODE) {
+
+		/*
+		 * At this point we had read the vcpu->arch.apicv_active == true
+		 * and the vcpu->mode == IN_GUEST_MODE.
+		 * Since we have a memory barrier after setting IN_GUEST_MODE,
+		 * it ensures that AVIC inhibition is complete and thus
+		 * the target is really running with AVIC enabled.
+		 */
+
 		int cpu = READ_ONCE(vcpu->cpu);
 
 		/*
@@ -706,10 +715,11 @@  int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 		put_cpu();
 	} else {
 		/*
-		 * Wake the vCPU if it was blocking.  KVM will then detect the
-		 * pending IRQ when checking if the vCPU has a wake event.
+		 * Kick the target vCPU otherwise, to make sure
+		 * it processes the interrupt even if its AVIC is inhibited.
 		 */
-		kvm_vcpu_wake_up(vcpu);
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
 	}
 
 	return 0;