Message ID | 20211209115440.394441-4-mlevitsk@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting | expand |
On 12/9/21 12:54, Maxim Levitsky wrote: > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got > inhibited, it might read a stale value of vcpu->arch.apicv_active > which can lead to the target vCPU not noticing the interrupt. > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > arch/x86/kvm/svm/avic.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 859ad2dc50f1..8c1b934bfa9b 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > * automatically process AVIC interrupts at VMRUN. > */ > if (vcpu->mode == IN_GUEST_MODE) { > + > + /* > + * At this point we had read the vcpu->arch.apicv_active == true > + * and the vcpu->mode == IN_GUEST_MODE. > + * Since we have a memory barrier after setting IN_GUEST_MODE, > + * it ensures that AVIC inhibition is complete and thus > + * the target is really running with AVIC enabled. > + */ > + > int cpu = READ_ONCE(vcpu->cpu); I don't think it's correct. The vCPU has apicv_active written (in kvm_vcpu_update_apicv) before vcpu->mode. For the acquire/release pair to work properly you need to 1) read apicv_active *after* vcpu->mode here 2) use store_release and load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here. Paolo > /* > @@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > put_cpu(); > } else { > /* > - * Wake the vCPU if it was blocking. KVM will then detect the > - * pending IRQ when checking if the vCPU has a wake event. > + * Kick the target vCPU otherwise, to make sure > + * it processes the interrupt even if its AVIC is inhibited. > */ > - kvm_vcpu_wake_up(vcpu); > + kvm_make_request(KVM_REQ_EVENT, vcpu); > + kvm_vcpu_kick(vcpu); > } > > return 0; >
On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote: > On 12/9/21 12:54, Maxim Levitsky wrote: > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got > > inhibited, it might read a stale value of vcpu->arch.apicv_active > > which can lead to the target vCPU not noticing the interrupt. > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > arch/x86/kvm/svm/avic.c | 16 +++++++++++++--- > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > index 859ad2dc50f1..8c1b934bfa9b 100644 > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > * automatically process AVIC interrupts at VMRUN. > > */ > > if (vcpu->mode == IN_GUEST_MODE) { > > + > > + /* > > + * At this point we had read the vcpu->arch.apicv_active == true > > + * and the vcpu->mode == IN_GUEST_MODE. > > + * Since we have a memory barrier after setting IN_GUEST_MODE, > > + * it ensures that AVIC inhibition is complete and thus > > + * the target is really running with AVIC enabled. > > + */ > > + > > int cpu = READ_ONCE(vcpu->cpu); > > I don't think it's correct. The vCPU has apicv_active written (in > kvm_vcpu_update_apicv) before vcpu->mode. I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value. But apparently the memory barrier is after setting vcpu->mode. > > For the acquire/release pair to work properly you need to 1) read > apicv_active *after* vcpu->mode here 2) use store_release and > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here. store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it, which I expected to be there. And yes I see now, I need a read barrier here as well. I am still learning this. Best regards, Maxim Levitsky > > Paolo > > > /* > > @@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > put_cpu(); > > } else { > > /* > > - * Wake the vCPU if it was blocking. KVM will then detect the > > - * pending IRQ when checking if the vCPU has a wake event. > > + * Kick the target vCPU otherwise, to make sure > > + * it processes the interrupt even if its AVIC is inhibited. > > */ > > - kvm_vcpu_wake_up(vcpu); > > + kvm_make_request(KVM_REQ_EVENT, vcpu); > > + kvm_vcpu_kick(vcpu); > > } > > > > return 0; > >
On Thu, Dec 09, 2021, Maxim Levitsky wrote: > On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote: > > On 12/9/21 12:54, Maxim Levitsky wrote: > > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got > > > inhibited, it might read a stale value of vcpu->arch.apicv_active > > > which can lead to the target vCPU not noticing the interrupt. > > > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > > --- > > > arch/x86/kvm/svm/avic.c | 16 +++++++++++++--- > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > index 859ad2dc50f1..8c1b934bfa9b 100644 > > > --- a/arch/x86/kvm/svm/avic.c > > > +++ b/arch/x86/kvm/svm/avic.c > > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > > * automatically process AVIC interrupts at VMRUN. > > > */ > > > if (vcpu->mode == IN_GUEST_MODE) { > > > + > > > + /* > > > + * At this point we had read the vcpu->arch.apicv_active == true > > > + * and the vcpu->mode == IN_GUEST_MODE. > > > + * Since we have a memory barrier after setting IN_GUEST_MODE, > > > + * it ensures that AVIC inhibition is complete and thus > > > + * the target is really running with AVIC enabled. > > > + */ > > > + > > > int cpu = READ_ONCE(vcpu->cpu); > > > > I don't think it's correct. The vCPU has apicv_active written (in > > kvm_vcpu_update_apicv) before vcpu->mode. > > I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE > thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value. > But apparently the memory barrier is after setting vcpu->mode. > > > > > > For the acquire/release pair to work properly you need to 1) read > > apicv_active *after* vcpu->mode here 2) use store_release and > > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here. > > store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it, > which I expected to be there. > > And yes I see now, I need a read barrier here as well. I am still learning this. Sans barriers and comments, can't this be written as returning an "error" if the vCPU is not IN_GUEST_MODE? Effectively the same thing, but a little more precise and it avoids duplicating the lapic.c code. diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 26ed5325c593..cddf7a8da3ea 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) { - if (!vcpu->arch.apicv_active) + if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active) return -1; kvm_lapic_set_irr(vec, vcpu->arch.apic); @@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) put_cpu(); } else { /* - * Wake the vCPU if it was blocking. KVM will then detect the - * pending IRQ when checking if the vCPU has a wake event. + * Wake the vCPU if it is blocking. If the vCPU exited the + * guest since the previous vcpu->mode check, it's guaranteed + * to see the event before re-enterring the guest. */ kvm_vcpu_wake_up(vcpu); }
On Thu, 2021-12-09 at 15:27 +0000, Sean Christopherson wrote: > On Thu, Dec 09, 2021, Maxim Levitsky wrote: > > On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote: > > > On 12/9/21 12:54, Maxim Levitsky wrote: > > > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got > > > > inhibited, it might read a stale value of vcpu->arch.apicv_active > > > > which can lead to the target vCPU not noticing the interrupt. > > > > > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > > > --- > > > > arch/x86/kvm/svm/avic.c | 16 +++++++++++++--- > > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > > index 859ad2dc50f1..8c1b934bfa9b 100644 > > > > --- a/arch/x86/kvm/svm/avic.c > > > > +++ b/arch/x86/kvm/svm/avic.c > > > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > > > * automatically process AVIC interrupts at VMRUN. > > > > */ > > > > if (vcpu->mode == IN_GUEST_MODE) { > > > > + > > > > + /* > > > > + * At this point we had read the vcpu->arch.apicv_active == true > > > > + * and the vcpu->mode == IN_GUEST_MODE. > > > > + * Since we have a memory barrier after setting IN_GUEST_MODE, > > > > + * it ensures that AVIC inhibition is complete and thus > > > > + * the target is really running with AVIC enabled. > > > > + */ > > > > + > > > > int cpu = READ_ONCE(vcpu->cpu); > > > > > > I don't think it's correct. The vCPU has apicv_active written (in > > > kvm_vcpu_update_apicv) before vcpu->mode. > > > > I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE > > thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value. > > But apparently the memory barrier is after setting vcpu->mode. > > > > > > > For the acquire/release pair to work properly you need to 1) read > > > apicv_active *after* vcpu->mode here 2) use store_release and > > > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here. > > > > store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it, > > which I expected to be there. > > > > And yes I see now, I need a read barrier here as well. I am still learning this. > > Sans barriers and comments, can't this be written as returning an "error" if the > vCPU is not IN_GUEST_MODE? Effectively the same thing, but a little more precise > and it avoids duplicating the lapic.c code. Yes, beside the fact that we already set the vIRR bit so if I return -1 here, it will be set again.. (and these are set using atomic ops) I don't know how much that matters except the fact that while a vCPU runs a nested guest, callers wishing to send IPI to it, will go through this code path a lot (even when I implement nested AVIC as it is a separate thing which is used by L2 only). Best regards, Maxim Levitsky > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 26ed5325c593..cddf7a8da3ea 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) > > int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > { > - if (!vcpu->arch.apicv_active) > + if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active) > return -1; > > kvm_lapic_set_irr(vec, vcpu->arch.apic); > @@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > put_cpu(); > } else { > /* > - * Wake the vCPU if it was blocking. KVM will then detect the > - * pending IRQ when checking if the vCPU has a wake event. > + * Wake the vCPU if it is blocking. If the vCPU exited the > + * guest since the previous vcpu->mode check, it's guaranteed > + * to see the event before re-enterring the guest. > */ > kvm_vcpu_wake_up(vcpu); > } >
On Thu, 2021-12-09 at 17:33 +0200, Maxim Levitsky wrote: > On Thu, 2021-12-09 at 15:27 +0000, Sean Christopherson wrote: > > On Thu, Dec 09, 2021, Maxim Levitsky wrote: > > > On Thu, 2021-12-09 at 15:11 +0100, Paolo Bonzini wrote: > > > > On 12/9/21 12:54, Maxim Levitsky wrote: > > > > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got > > > > > inhibited, it might read a stale value of vcpu->arch.apicv_active > > > > > which can lead to the target vCPU not noticing the interrupt. > > > > > > > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > > > > --- > > > > > arch/x86/kvm/svm/avic.c | 16 +++++++++++++--- > > > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > > > index 859ad2dc50f1..8c1b934bfa9b 100644 > > > > > --- a/arch/x86/kvm/svm/avic.c > > > > > +++ b/arch/x86/kvm/svm/avic.c > > > > > @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > > > > * automatically process AVIC interrupts at VMRUN. > > > > > */ > > > > > if (vcpu->mode == IN_GUEST_MODE) { > > > > > + > > > > > + /* > > > > > + * At this point we had read the vcpu->arch.apicv_active == true > > > > > + * and the vcpu->mode == IN_GUEST_MODE. > > > > > + * Since we have a memory barrier after setting IN_GUEST_MODE, > > > > > + * it ensures that AVIC inhibition is complete and thus > > > > > + * the target is really running with AVIC enabled. > > > > > + */ > > > > > + > > > > > int cpu = READ_ONCE(vcpu->cpu); > > > > > > > > I don't think it's correct. The vCPU has apicv_active written (in > > > > kvm_vcpu_update_apicv) before vcpu->mode. > > > > > > I thought that we have a full memory barrier just prior to setting IN_GUEST_MODE > > > thus if I see vcpu->mode == IN_GUEST_MODE then I'll see correct apicv_active value. > > > But apparently the memory barrier is after setting vcpu->mode. > > > > > > > > > > For the acquire/release pair to work properly you need to 1) read > > > > apicv_active *after* vcpu->mode here 2) use store_release and > > > > load_acquire for vcpu->mode, respectively in vcpu_enter_guest and here. > > > > > > store_release for vcpu->mode in vcpu_enter_guest means a write barrier just before setting it, > > > which I expected to be there. > > > > > > And yes I see now, I need a read barrier here as well. I am still learning this. > > > > Sans barriers and comments, can't this be written as returning an "error" if the > > vCPU is not IN_GUEST_MODE? Effectively the same thing, but a little more precise > > and it avoids duplicating the lapic.c code. > > Yes, beside the fact that we already set the vIRR bit so if I return -1 here, it will be set again.. > (and these are set using atomic ops) > > I don't know how much that matters except the fact that while a vCPU runs a nested guest, > callers wishing to send IPI to it, will go through this code path a lot > (even when I implement nested AVIC as it is a separate thing which is used by L2 only). Ah, hit send too soon, makes sense now to me! Best regards, Maxim Levitsky > > Best regards, > Maxim Levitsky > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > index 26ed5325c593..cddf7a8da3ea 100644 > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -671,7 +671,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) > > > > int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > { > > - if (!vcpu->arch.apicv_active) > > + if (vcpu->mode != IN_GUEST_MODE || !vcpu->arch.apicv_active) > > return -1; > > > > kvm_lapic_set_irr(vec, vcpu->arch.apic); > > @@ -706,8 +706,9 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > put_cpu(); > > } else { > > /* > > - * Wake the vCPU if it was blocking. KVM will then detect the > > - * pending IRQ when checking if the vCPU has a wake event. > > + * Wake the vCPU if it is blocking. If the vCPU exited the > > + * guest since the previous vcpu->mode check, it's guaranteed > > + * to see the event before re-enterring the guest. > > */ > > kvm_vcpu_wake_up(vcpu); > > } > >
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 859ad2dc50f1..8c1b934bfa9b 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -691,6 +691,15 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) * automatically process AVIC interrupts at VMRUN. */ if (vcpu->mode == IN_GUEST_MODE) { + + /* + * At this point we had read the vcpu->arch.apicv_active == true + * and the vcpu->mode == IN_GUEST_MODE. + * Since we have a memory barrier after setting IN_GUEST_MODE, + * it ensures that AVIC inhibition is complete and thus + * the target is really running with AVIC enabled. + */ + int cpu = READ_ONCE(vcpu->cpu); /* @@ -706,10 +715,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) put_cpu(); } else { /* - * Wake the vCPU if it was blocking. KVM will then detect the - * pending IRQ when checking if the vCPU has a wake event. + * Kick the target vCPU otherwise, to make sure + * it processes the interrupt even if its AVIC is inhibited. */ - kvm_vcpu_wake_up(vcpu); + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_vcpu_kick(vcpu); } return 0;
If svm_deliver_avic_intr is called just after the target vcpu's AVIC got inhibited, it might read a stale value of vcpu->arch.apicv_active which can lead to the target vCPU not noticing the interrupt. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- arch/x86/kvm/svm/avic.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)