Message ID | 20200812175129.12172-1-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nVMX: Morph notification vector IRQ on nested VM-Enter to pending PI | expand |
On Wed, Aug 12, 2020 at 10:51 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On successful nested VM-Enter, check for pending interrupts and convert > the highest priority interrupt to a pending posted interrupt if it > matches L2's notification vector. If the vCPU receives a notification > interrupt before nested VM-Enter (assuming L1 disables IRQs before doing > VM-Enter), the pending interrupt (for L1) should be recognized and > processed as a posted interrupt when interrupts become unblocked after > VM-Enter to L2. > > This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is > trying to inject an interrupt into L2 by setting the appropriate bit in > L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's > method of manually moving the vector from PIR->vIRR/RVI). KVM will > observe the IPI while the vCPU is in L1 context and so won't immediately > morph it to a posted interrupt for L2. The pending interrupt will be > seen by vmx_check_nested_events(), cause KVM to force an immediate exit > after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. > After handling the VM-Exit, L1 will see that L2 has a pending interrupt > in PIR, send another IPI, and repeat until L2 is killed. > > Note, posted interrupts require virtual interrupt deliveriy, and virtual > interrupt delivery requires exit-on-interrupt, ergo interrupts will be > unconditionally unmasked on VM-Enter if posted interrupts are enabled. > > Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") > Cc: stable@vger.kernel.org > Cc: Liran Alon <liran.alon@oracle.com> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- I don't think this is the best fix. I believe the real problem is the way that external and posted interrupts are handled in vmx_check_nested_events(). First of all, I believe that the existing call to vmx_complete_nested_posted_interrupt() at the end of vmx_check_nested_events() is far too aggressive. Unless I am missing something in the SDM, posted interrupt processing is *only* triggered when the notification vector is received in VMX non-root mode. It is not triggered on VM-entry. Looking back one block, we have: if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) { if (block_nested_events) return -EBUSY; if (!nested_exit_on_intr(vcpu)) goto no_vmexit; nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); return 0; } If nested_exit_on_intr() is true, we should first check to see if "acknowledge interrupt on exit" is set. If so, we should acknowledge the interrupt right here, with a call to kvm_cpu_get_interrupt(), rather than deep in the guts of nested_vmx_vmexit(). If the vector we get is the notification vector from VMCS12, then we should call vmx_complete_nested_posted_interrupt(). Otherwise, we should call nested_vmx_vmexit(EXIT_REASON_EXTERNAL_INTERRUPT) as we do now. Furthermore, vmx_complete_nested_posted_interrupt() should write to the L1 EOI register, as indicated in step 4 of the 7-step sequence detailed in section 29.6 of the SDM, volume 3. It skips this step today.
On Tue, Oct 06, 2020 at 10:36:09AM -0700, Jim Mattson wrote: > On Wed, Aug 12, 2020 at 10:51 AM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On successful nested VM-Enter, check for pending interrupts and convert > > the highest priority interrupt to a pending posted interrupt if it > > matches L2's notification vector. If the vCPU receives a notification > > interrupt before nested VM-Enter (assuming L1 disables IRQs before doing > > VM-Enter), the pending interrupt (for L1) should be recognized and > > processed as a posted interrupt when interrupts become unblocked after > > VM-Enter to L2. > > > > This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is > > trying to inject an interrupt into L2 by setting the appropriate bit in > > L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's > > method of manually moving the vector from PIR->vIRR/RVI). KVM will > > observe the IPI while the vCPU is in L1 context and so won't immediately > > morph it to a posted interrupt for L2. The pending interrupt will be > > seen by vmx_check_nested_events(), cause KVM to force an immediate exit > > after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. > > After handling the VM-Exit, L1 will see that L2 has a pending interrupt > > in PIR, send another IPI, and repeat until L2 is killed. > > > > Note, posted interrupts require virtual interrupt deliveriy, and virtual > > interrupt delivery requires exit-on-interrupt, ergo interrupts will be > > unconditionally unmasked on VM-Enter if posted interrupts are enabled. > > > > Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") > > Cc: stable@vger.kernel.org > > Cc: Liran Alon <liran.alon@oracle.com> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > --- > I don't think this is the best fix. I agree, even without any more explanantion :-) > I believe the real problem is the way that external and posted > interrupts are handled in vmx_check_nested_events(). > > First of all, I believe that the existing call to > vmx_complete_nested_posted_interrupt() at the end of > vmx_check_nested_events() is far too aggressive. Unless I am missing > something in the SDM, posted interrupt processing is *only* triggered > when the notification vector is received in VMX non-root mode. It is > not triggered on VM-entry. That's my understanding as well. Virtual interrupt delivery is evaluated on VM-Enter, but not posted interrupts. Evaluation of pending virtual interrupts is caused only by VM entry, TPR virtualization, EOI virtualization, self-IPI virtualization, and posted- interrupt processing. > Looking back one block, we have: > > if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) { > if (block_nested_events) > return -EBUSY; > if (!nested_exit_on_intr(vcpu)) > goto no_vmexit; > nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); > return 0; > } > > If nested_exit_on_intr() is true, we should first check to see if > "acknowledge interrupt on exit" is set. If so, we should acknowledge > the interrupt right here, with a call to kvm_cpu_get_interrupt(), > rather than deep in the guts of nested_vmx_vmexit(). If the vector we > get is the notification vector from VMCS12, then we should call > vmx_complete_nested_posted_interrupt(). Otherwise, we should call > nested_vmx_vmexit(EXIT_REASON_EXTERNAL_INTERRUPT) as we do now. That makes sense. And we can pass in exit_intr_info instead of computing it in nested_vmx_vmexit() since this is the only path that does a nested exit with EXIT_REASON_EXTERNAL_INTERRUPT. > Furthermore, vmx_complete_nested_posted_interrupt() should write to > the L1 EOI register, as indicated in step 4 of the 7-step sequence > detailed in section 29.6 of the SDM, volume 3. It skips this step > today. Yar. Thanks Jim! I'll get a series out.
On Tue, Oct 6, 2020 at 11:35 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Tue, Oct 06, 2020 at 10:36:09AM -0700, Jim Mattson wrote: > > On Wed, Aug 12, 2020 at 10:51 AM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > > On successful nested VM-Enter, check for pending interrupts and convert > > > the highest priority interrupt to a pending posted interrupt if it > > > matches L2's notification vector. If the vCPU receives a notification > > > interrupt before nested VM-Enter (assuming L1 disables IRQs before doing > > > VM-Enter), the pending interrupt (for L1) should be recognized and > > > processed as a posted interrupt when interrupts become unblocked after > > > VM-Enter to L2. > > > > > > This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is > > > trying to inject an interrupt into L2 by setting the appropriate bit in > > > L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's > > > method of manually moving the vector from PIR->vIRR/RVI). KVM will > > > observe the IPI while the vCPU is in L1 context and so won't immediately > > > morph it to a posted interrupt for L2. The pending interrupt will be > > > seen by vmx_check_nested_events(), cause KVM to force an immediate exit > > > after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. > > > After handling the VM-Exit, L1 will see that L2 has a pending interrupt > > > in PIR, send another IPI, and repeat until L2 is killed. > > > > > > Note, posted interrupts require virtual interrupt deliveriy, and virtual > > > interrupt delivery requires exit-on-interrupt, ergo interrupts will be > > > unconditionally unmasked on VM-Enter if posted interrupts are enabled. > > > > > > Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") > > > Cc: stable@vger.kernel.org > > > Cc: Liran Alon <liran.alon@oracle.com> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > > --- > > I don't think this is the best fix. > > I agree, even without any more explanantion :-) > > > I believe the real problem is the way that external and posted > > interrupts are handled in vmx_check_nested_events(). > > > > First of all, I believe that the existing call to > > vmx_complete_nested_posted_interrupt() at the end of > > vmx_check_nested_events() is far too aggressive. Unless I am missing > > something in the SDM, posted interrupt processing is *only* triggered > > when the notification vector is received in VMX non-root mode. It is > > not triggered on VM-entry. > > That's my understanding as well. Virtual interrupt delivery is evaluated > on VM-Enter, but not posted interrupts. > > Evaluation of pending virtual interrupts is caused only by VM entry, TPR > virtualization, EOI virtualization, self-IPI virtualization, and posted- > interrupt processing. > > > Looking back one block, we have: > > > > if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) { > > if (block_nested_events) > > return -EBUSY; > > if (!nested_exit_on_intr(vcpu)) > > goto no_vmexit; > > nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); > > return 0; > > } > > > > If nested_exit_on_intr() is true, we should first check to see if > > "acknowledge interrupt on exit" is set. If so, we should acknowledge > > the interrupt right here, with a call to kvm_cpu_get_interrupt(), > > rather than deep in the guts of nested_vmx_vmexit(). If the vector we > > get is the notification vector from VMCS12, then we should call > > vmx_complete_nested_posted_interrupt(). Otherwise, we should call > > nested_vmx_vmexit(EXIT_REASON_EXTERNAL_INTERRUPT) as we do now. > > That makes sense. And we can pass in exit_intr_info instead of computing > it in nested_vmx_vmexit() since this is the only path that does a nested > exit with EXIT_REASON_EXTERNAL_INTERRUPT. > > > Furthermore, vmx_complete_nested_posted_interrupt() should write to > > the L1 EOI register, as indicated in step 4 of the 7-step sequence > > detailed in section 29.6 of the SDM, volume 3. It skips this step > > today. > > Yar. > > Thanks Jim! I'll get a series out. Hey Sean, I actually ran into this issue as well before noticing your patch. I have a repro kvm-unit-test that I'll send out shortly. Thanks for looking into this! -- Oliver
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 5ccbee7165a21..ce37376bc96ec 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -488,6 +488,12 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic) } } +void kvm_apic_clear_irr(struct kvm_vcpu *vcpu, int vec) +{ + apic_clear_irr(vec, vcpu->arch.apic); +} +EXPORT_SYMBOL_GPL(kvm_apic_clear_irr); + static inline void apic_set_isr(int vec, struct kvm_lapic *apic) { struct kvm_vcpu *vcpu; @@ -2461,6 +2467,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) __apic_update_ppr(apic, &ppr); return apic_has_interrupt_for_ppr(apic, ppr); } +EXPORT_SYMBOL_GPL(kvm_apic_has_interrupt); int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) { diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 754f29beb83e3..4fb86e3a9dd3d 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -89,6 +89,7 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len, bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, int shorthand, unsigned int dest, int dest_mode); int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2); +void kvm_apic_clear_irr(struct kvm_vcpu *vcpu, int vec); bool __kvm_apic_update_irr(u32 *pir, void *regs, int *max_irr); bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir, int *max_irr); void kvm_apic_update_ppr(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 23b58c28a1c92..2acf33b110b5c 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3528,6 +3528,14 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) if (unlikely(status != NVMX_VMENTRY_SUCCESS)) goto vmentry_failed; + /* Emulate processing of posted interrupts on VM-Enter. */ + if (nested_cpu_has_posted_intr(vmcs12) && + kvm_apic_has_interrupt(vcpu) == vmx->nested.posted_intr_nv) { + vmx->nested.pi_pending = true; + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_apic_clear_irr(vcpu, vmx->nested.posted_intr_nv); + } + /* Hide L1D cache contents from the nested guest. */ vmx->vcpu.arch.l1tf_flush_l1d = true;
On successful nested VM-Enter, check for pending interrupts and convert the highest priority interrupt to a pending posted interrupt if it matches L2's notification vector. If the vCPU receives a notification interrupt before nested VM-Enter (assuming L1 disables IRQs before doing VM-Enter), the pending interrupt (for L1) should be recognized and processed as a posted interrupt when interrupts become unblocked after VM-Enter to L2. This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is trying to inject an interrupt into L2 by setting the appropriate bit in L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's method of manually moving the vector from PIR->vIRR/RVI). KVM will observe the IPI while the vCPU is in L1 context and so won't immediately morph it to a posted interrupt for L2. The pending interrupt will be seen by vmx_check_nested_events(), cause KVM to force an immediate exit after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. After handling the VM-Exit, L1 will see that L2 has a pending interrupt in PIR, send another IPI, and repeat until L2 is killed. Note, posted interrupts require virtual interrupt deliveriy, and virtual interrupt delivery requires exit-on-interrupt, ergo interrupts will be unconditionally unmasked on VM-Enter if posted interrupts are enabled. Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- I am by no means 100% confident this the a complete, correct fix. I also don't like exporting two low level LAPIC functions. But, the fix does appear to work as intended. arch/x86/kvm/lapic.c | 7 +++++++ arch/x86/kvm/lapic.h | 1 + arch/x86/kvm/vmx/nested.c | 8 ++++++++ 3 files changed, 16 insertions(+)