diff mbox series

[v3] KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2

Message ID 20180903122022.47000-1-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show
Series [v3] KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2 | expand

Commit Message

Liran Alon Sept. 3, 2018, 12:20 p.m. UTC
Consider the case L1 had a IRQ/NMI event until it executed
VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
(e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
L0 needs to evaluate if this pending event should cause an exit from
L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
EXTERNAL_INTERRUPT).

Usually this would be handled by L0 requesting a IRQ/NMI window
by setting VMCS accordingly. However, this setting was done on
VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
requesting a KVM_REQ_EVENT.

Note that above scenario exists when L1 KVM is about to enter L2 but
requests an "immediate-exit". As in this case, L1 will
disable-interrupts and then send a self-IPI before entering L2.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 arch/x86/kvm/vmx.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Jim Mattson Jan. 3, 2019, 10:52 p.m. UTC | #1
On Mon, Sep 3, 2018 at 5:20 AM Liran Alon <liran.alon@oracle.com> wrote:
>
> Consider the case L1 had a IRQ/NMI event until it executed
> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> L0 needs to evaluate if this pending event should cause an exit from
> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> EXTERNAL_INTERRUPT).
>
> Usually this would be handled by L0 requesting a IRQ/NMI window
> by setting VMCS accordingly. However, this setting was done on
> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> requesting a KVM_REQ_EVENT.
>
> Note that above scenario exists when L1 KVM is about to enter L2 but
> requests an "immediate-exit". As in this case, L1 will
> disable-interrupts and then send a self-IPI before entering L2.

What about the case where the pending IRQ is sitting in L1's
posted-interrupt descriptor, and L2 takes an external-interrupt
VM-exit for L1's posted-interrupt notification vector (whether or not
external-interrupt exiting is enabled in vmcs12)?
Liran Alon Jan. 13, 2019, 12:42 p.m. UTC | #2
> On 4 Jan 2019, at 0:52, Jim Mattson <jmattson@google.com> wrote:
> 
> On Mon, Sep 3, 2018 at 5:20 AM Liran Alon <liran.alon@oracle.com> wrote:
>> 
>> Consider the case L1 had a IRQ/NMI event until it executed
>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
>> L0 needs to evaluate if this pending event should cause an exit from
>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
>> EXTERNAL_INTERRUPT).
>> 
>> Usually this would be handled by L0 requesting a IRQ/NMI window
>> by setting VMCS accordingly. However, this setting was done on
>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
>> requesting a KVM_REQ_EVENT.
>> 
>> Note that above scenario exists when L1 KVM is about to enter L2 but
>> requests an "immediate-exit". As in this case, L1 will
>> disable-interrupts and then send a self-IPI before entering L2.
> 
> What about the case where the pending IRQ is sitting in L1's
> posted-interrupt descriptor, and L2 takes an external-interrupt
> VM-exit for L1's posted-interrupt notification vector (whether or not
> external-interrupt exiting is enabled in vmcs12)?

If I understood you correctly, this case is handled by extra code I have added to vmx_sync_pir_to_irr() in commits
f27a85c4988d ("KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt”) and
851c1a18c541 ("KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts”)
i.e. When L2 takes an external-interrupt VMExit for L1’s posted-interrupt notification vector, L0 will eventually run vcpu_enter_guest()
and before re-entering L2, it will call vmx_sync_pir_to_irr() which will update L1’s vAPIC IRR and because vCPU is in guest-mode,
It will either kvm_vcpu_exiting_guest_mode() or kvm_make_request(KVM_REQ_EVENT) (To either cause exit from L2 to L1 because
of vcpu_run() calling vmx_check_nested_events() or directly inject interrupt to L2 by inject_pending_event()).

However, I believe your question have led me to found another issue in nVMX event-injection… Consider the following scenario:
1) vCPU A: L1 disables interrupts
2) vCPU B: Sends IPI to vCPU A. ==> L0 sends IPI by writing to vCPU A vmcs01->pi_desc->PIR and send physical pi_notification_vector.
3) Physical pi_notification_vector doesn’t trigger virtual-interrupt-delivery as currently vmcs01->RFLAGS.IF==0.
4) vCPU A: L1 executes VMRESUME to enter L2. ==> L0 will nested_vmx_enter_non_root_mode() which will switch from vmcs01 to vmcs02.
5) vCPU A: Before entering L2, vcpu_enter_guest() will calls vmx_sync_pir_to_irr() which will notice (is_guest_mode(vcpu) && max_irr_updated).
6) Assuming that vmcs12 intercepts external-interrupts, kvm_vcpu_exiting_guest_mode() will be called to trigger another iteration of vcpu_run().
7) vcpu_run() will call kvm_vcpu_running() which will call vmx_check_nested_events() which will notice L1’s vAPIC IRR has pending interrupt but
because nested_run_pending==true then exit from L2 to L1 will be blocked and vmx_check_nested_events() will just return -EBUSY.
8) However, because kvm_vcpu_running() doesn’t check vmx_check_nested_events() return-value, this will be silently ignore and vcpu_run()
will continue to just run another iteration of vcpu_enter_guest() which will now just enter L2 (with vmcs02) without requesting an immediate-exit as it should…

I will write a kvm-unit-tests to trigger this scenario and a patch for KVM to fix this issue.
What do you think?

-Liran
Jim Mattson Jan. 14, 2019, 10:23 p.m. UTC | #3
On Sun, Jan 13, 2019 at 4:42 AM Liran Alon <liran.alon@oracle.com> wrote:
>
>
>
> > On 4 Jan 2019, at 0:52, Jim Mattson <jmattson@google.com> wrote:
> >
> > On Mon, Sep 3, 2018 at 5:20 AM Liran Alon <liran.alon@oracle.com> wrote:
> >>
> >> Consider the case L1 had a IRQ/NMI event until it executed
> >> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> >> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> >> L0 needs to evaluate if this pending event should cause an exit from
> >> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> >> EXTERNAL_INTERRUPT).
> >>
> >> Usually this would be handled by L0 requesting a IRQ/NMI window
> >> by setting VMCS accordingly. However, this setting was done on
> >> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> >> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> >> requesting a KVM_REQ_EVENT.
> >>
> >> Note that above scenario exists when L1 KVM is about to enter L2 but
> >> requests an "immediate-exit". As in this case, L1 will
> >> disable-interrupts and then send a self-IPI before entering L2.
> >
> > What about the case where the pending IRQ is sitting in L1's
> > posted-interrupt descriptor, and L2 takes an external-interrupt
> > VM-exit for L1's posted-interrupt notification vector (whether or not
> > external-interrupt exiting is enabled in vmcs12)?
>
> If I understood you correctly, this case is handled by extra code I have added to vmx_sync_pir_to_irr() in commits
> f27a85c4988d ("KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt”) and
> 851c1a18c541 ("KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts”)
> i.e. When L2 takes an external-interrupt VMExit for L1’s posted-interrupt notification vector, L0 will eventually run vcpu_enter_guest()
> and before re-entering L2, it will call vmx_sync_pir_to_irr() which will update L1’s vAPIC IRR and because vCPU is in guest-mode,
> It will either kvm_vcpu_exiting_guest_mode() or kvm_make_request(KVM_REQ_EVENT) (To either cause exit from L2 to L1 because
> of vcpu_run() calling vmx_check_nested_events() or directly inject interrupt to L2 by inject_pending_event()).
>
> However, I believe your question have led me to found another issue in nVMX event-injection… Consider the following scenario:
> 1) vCPU A: L1 disables interrupts
> 2) vCPU B: Sends IPI to vCPU A. ==> L0 sends IPI by writing to vCPU A vmcs01->pi_desc->PIR and send physical pi_notification_vector.
> 3) Physical pi_notification_vector doesn’t trigger virtual-interrupt-delivery as currently vmcs01->RFLAGS.IF==0.
> 4) vCPU A: L1 executes VMRESUME to enter L2. ==> L0 will nested_vmx_enter_non_root_mode() which will switch from vmcs01 to vmcs02.
> 5) vCPU A: Before entering L2, vcpu_enter_guest() will calls vmx_sync_pir_to_irr() which will notice (is_guest_mode(vcpu) && max_irr_updated).
> 6) Assuming that vmcs12 intercepts external-interrupts, kvm_vcpu_exiting_guest_mode() will be called to trigger another iteration of vcpu_run().
> 7) vcpu_run() will call kvm_vcpu_running() which will call vmx_check_nested_events() which will notice L1’s vAPIC IRR has pending interrupt but
> because nested_run_pending==true then exit from L2 to L1 will be blocked and vmx_check_nested_events() will just return -EBUSY.
> 8) However, because kvm_vcpu_running() doesn’t check vmx_check_nested_events() return-value, this will be silently ignore and vcpu_run()
> will continue to just run another iteration of vcpu_enter_guest() which will now just enter L2 (with vmcs02) without requesting an immediate-exit as it should…
>
> I will write a kvm-unit-tests to trigger this scenario and a patch for KVM to fix this issue.
> What do you think?

Yes, it sounds like this should be fixed. Thanks for looking into it.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1d26f3c4985b..d687c20e7634 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -12537,8 +12537,11 @@  static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	bool from_vmentry = !!exit_qual;
 	u32 dummy_exit_qual;
+	u32 vmcs01_cpu_exec_ctrl;
 	int r = 0;
 
+	vmcs01_cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+
 	enter_guest_mode(vcpu);
 
 	if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
@@ -12574,6 +12577,25 @@  static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
 		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
 	}
 
+	/*
+	 * If L1 had a pending IRQ/NMI until it executed
+	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
+	 * disallowed (e.g. interrupts disabled), L0 needs to
+	 * evaluate if this pending event should cause an exit from L2
+	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
+	 * intercept EXTERNAL_INTERRUPT).
+	 *
+	 * Usually this would be handled by L0 requesting a
+	 * IRQ/NMI window by setting VMCS accordingly. However,
+	 * this setting was done on VMCS01 and now VMCS02 is active
+	 * instead. Thus, we force L0 to perform pending event
+	 * evaluation by requesting a KVM_REQ_EVENT.
+	 */
+	if (vmcs01_cpu_exec_ctrl &
+		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+	}
+
 	/*
 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet