Message ID | 20180903122022.47000-1-liran.alon@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2 | expand |
On Mon, Sep 3, 2018 at 5:20 AM Liran Alon <liran.alon@oracle.com> wrote: > > Consider the case L1 had a IRQ/NMI event until it executed > VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed > (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, > L0 needs to evaluate if this pending event should cause an exit from > L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept > EXTERNAL_INTERRUPT). > > Usually this would be handled by L0 requesting a IRQ/NMI window > by setting VMCS accordingly. However, this setting was done on > VMCS01 and now VMCS02 is active instead. Thus, when L1 executes > VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by > requesting a KVM_REQ_EVENT. > > Note that above scenario exists when L1 KVM is about to enter L2 but > requests an "immediate-exit". As in this case, L1 will > disable-interrupts and then send a self-IPI before entering L2. What about the case where the pending IRQ is sitting in L1's posted-interrupt descriptor, and L2 takes an external-interrupt VM-exit for L1's posted-interrupt notification vector (whether or not external-interrupt exiting is enabled in vmcs12)?
> On 4 Jan 2019, at 0:52, Jim Mattson <jmattson@google.com> wrote: > > On Mon, Sep 3, 2018 at 5:20 AM Liran Alon <liran.alon@oracle.com> wrote: >> >> Consider the case L1 had a IRQ/NMI event until it executed >> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed >> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, >> L0 needs to evaluate if this pending event should cause an exit from >> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept >> EXTERNAL_INTERRUPT). >> >> Usually this would be handled by L0 requesting a IRQ/NMI window >> by setting VMCS accordingly. However, this setting was done on >> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes >> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by >> requesting a KVM_REQ_EVENT. >> >> Note that above scenario exists when L1 KVM is about to enter L2 but >> requests an "immediate-exit". As in this case, L1 will >> disable-interrupts and then send a self-IPI before entering L2. > > What about the case where the pending IRQ is sitting in L1's > posted-interrupt descriptor, and L2 takes an external-interrupt > VM-exit for L1's posted-interrupt notification vector (whether or not > external-interrupt exiting is enabled in vmcs12)? If I understood you correctly, this case is handled by extra code I have added to vmx_sync_pir_to_irr() in commits f27a85c4988d ("KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt”) and 851c1a18c541 ("KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts”) i.e. When L2 takes an external-interrupt VMExit for L1’s posted-interrupt notification vector, L0 will eventually run vcpu_enter_guest() and before re-entering L2, it will call vmx_sync_pir_to_irr() which will update L1’s vAPIC IRR and because vCPU is in guest-mode, It will either kvm_vcpu_exiting_guest_mode() or kvm_make_request(KVM_REQ_EVENT) (To either cause exit from L2 to L1 because of vcpu_run() calling vmx_check_nested_events() or directly inject interrupt to L2 by inject_pending_event()). However, I believe your question have led me to found another issue in nVMX event-injection… Consider the following scenario: 1) vCPU A: L1 disables interrupts 2) vCPU B: Sends IPI to vCPU A. ==> L0 sends IPI by writing to vCPU A vmcs01->pi_desc->PIR and send physical pi_notification_vector. 3) Physical pi_notification_vector doesn’t trigger virtual-interrupt-delivery as currently vmcs01->RFLAGS.IF==0. 4) vCPU A: L1 executes VMRESUME to enter L2. ==> L0 will nested_vmx_enter_non_root_mode() which will switch from vmcs01 to vmcs02. 5) vCPU A: Before entering L2, vcpu_enter_guest() will calls vmx_sync_pir_to_irr() which will notice (is_guest_mode(vcpu) && max_irr_updated). 6) Assuming that vmcs12 intercepts external-interrupts, kvm_vcpu_exiting_guest_mode() will be called to trigger another iteration of vcpu_run(). 7) vcpu_run() will call kvm_vcpu_running() which will call vmx_check_nested_events() which will notice L1’s vAPIC IRR has pending interrupt but because nested_run_pending==true then exit from L2 to L1 will be blocked and vmx_check_nested_events() will just return -EBUSY. 8) However, because kvm_vcpu_running() doesn’t check vmx_check_nested_events() return-value, this will be silently ignore and vcpu_run() will continue to just run another iteration of vcpu_enter_guest() which will now just enter L2 (with vmcs02) without requesting an immediate-exit as it should… I will write a kvm-unit-tests to trigger this scenario and a patch for KVM to fix this issue. What do you think? -Liran
On Sun, Jan 13, 2019 at 4:42 AM Liran Alon <liran.alon@oracle.com> wrote: > > > > > On 4 Jan 2019, at 0:52, Jim Mattson <jmattson@google.com> wrote: > > > > On Mon, Sep 3, 2018 at 5:20 AM Liran Alon <liran.alon@oracle.com> wrote: > >> > >> Consider the case L1 had a IRQ/NMI event until it executed > >> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed > >> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, > >> L0 needs to evaluate if this pending event should cause an exit from > >> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept > >> EXTERNAL_INTERRUPT). > >> > >> Usually this would be handled by L0 requesting a IRQ/NMI window > >> by setting VMCS accordingly. However, this setting was done on > >> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes > >> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by > >> requesting a KVM_REQ_EVENT. > >> > >> Note that above scenario exists when L1 KVM is about to enter L2 but > >> requests an "immediate-exit". As in this case, L1 will > >> disable-interrupts and then send a self-IPI before entering L2. > > > > What about the case where the pending IRQ is sitting in L1's > > posted-interrupt descriptor, and L2 takes an external-interrupt > > VM-exit for L1's posted-interrupt notification vector (whether or not > > external-interrupt exiting is enabled in vmcs12)? > > If I understood you correctly, this case is handled by extra code I have added to vmx_sync_pir_to_irr() in commits > f27a85c4988d ("KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt”) and > 851c1a18c541 ("KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts”) > i.e. When L2 takes an external-interrupt VMExit for L1’s posted-interrupt notification vector, L0 will eventually run vcpu_enter_guest() > and before re-entering L2, it will call vmx_sync_pir_to_irr() which will update L1’s vAPIC IRR and because vCPU is in guest-mode, > It will either kvm_vcpu_exiting_guest_mode() or kvm_make_request(KVM_REQ_EVENT) (To either cause exit from L2 to L1 because > of vcpu_run() calling vmx_check_nested_events() or directly inject interrupt to L2 by inject_pending_event()). > > However, I believe your question have led me to found another issue in nVMX event-injection… Consider the following scenario: > 1) vCPU A: L1 disables interrupts > 2) vCPU B: Sends IPI to vCPU A. ==> L0 sends IPI by writing to vCPU A vmcs01->pi_desc->PIR and send physical pi_notification_vector. > 3) Physical pi_notification_vector doesn’t trigger virtual-interrupt-delivery as currently vmcs01->RFLAGS.IF==0. > 4) vCPU A: L1 executes VMRESUME to enter L2. ==> L0 will nested_vmx_enter_non_root_mode() which will switch from vmcs01 to vmcs02. > 5) vCPU A: Before entering L2, vcpu_enter_guest() will calls vmx_sync_pir_to_irr() which will notice (is_guest_mode(vcpu) && max_irr_updated). > 6) Assuming that vmcs12 intercepts external-interrupts, kvm_vcpu_exiting_guest_mode() will be called to trigger another iteration of vcpu_run(). > 7) vcpu_run() will call kvm_vcpu_running() which will call vmx_check_nested_events() which will notice L1’s vAPIC IRR has pending interrupt but > because nested_run_pending==true then exit from L2 to L1 will be blocked and vmx_check_nested_events() will just return -EBUSY. > 8) However, because kvm_vcpu_running() doesn’t check vmx_check_nested_events() return-value, this will be silently ignore and vcpu_run() > will continue to just run another iteration of vcpu_enter_guest() which will now just enter L2 (with vmcs02) without requesting an immediate-exit as it should… > > I will write a kvm-unit-tests to trigger this scenario and a patch for KVM to fix this issue. > What do you think? Yes, it sounds like this should be fixed. Thanks for looking into it.
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1d26f3c4985b..d687c20e7634 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -12537,8 +12537,11 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual) struct vmcs12 *vmcs12 = get_vmcs12(vcpu); bool from_vmentry = !!exit_qual; u32 dummy_exit_qual; + u32 vmcs01_cpu_exec_ctrl; int r = 0; + vmcs01_cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); + enter_guest_mode(vcpu); if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual) kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu); } + /* + * If L1 had a pending IRQ/NMI until it executed + * VMLAUNCH/VMRESUME which wasn't delivered because it was + * disallowed (e.g. interrupts disabled), L0 needs to + * evaluate if this pending event should cause an exit from L2 + * to L1 or delivered directly to L2 (e.g. In case L1 don't + * intercept EXTERNAL_INTERRUPT). + * + * Usually this would be handled by L0 requesting a + * IRQ/NMI window by setting VMCS accordingly. However, + * this setting was done on VMCS01 and now VMCS02 is active + * instead. Thus, we force L0 to perform pending event + * evaluation by requesting a KVM_REQ_EVENT. + */ + if (vmcs01_cpu_exec_ctrl & + (CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) { + kvm_make_request(KVM_REQ_EVENT, vcpu); + } + /* * Note no nested_vmx_succeed or nested_vmx_fail here. At this point * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet