diff mbox series

[v2] KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2

Message ID 20180830095732.143592-1-liran.alon@oracle.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2 | expand

Commit Message

Liran Alon Aug. 30, 2018, 9:57 a.m. UTC
Consider the case L1 had a IRQ/NMI event until it executed
VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
(e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
L0 needs to evaluate if this pending event should cause an exit from
L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
EXTERNAL_INTERRUPT).

Usually this would be handled by L0 requesting a IRQ/NMI window
by setting VMCS accordingly. However, this setting was done on
VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
requesting a KVM_REQ_EVENT.

Note that above scenario exists when L1 KVM is about to enter L2 but
requests an "immediate-exit". As in this case, L1 will
disable-interrupts and then send a self-IPI before entering L2.

Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Comments

Radim Krčmář Aug. 30, 2018, 11:50 a.m. UTC | #1
2018-08-30 12:57+0300, Liran Alon:
> Consider the case L1 had a IRQ/NMI event until it executed
> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> L0 needs to evaluate if this pending event should cause an exit from
> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> EXTERNAL_INTERRUPT).
> 
> Usually this would be handled by L0 requesting a IRQ/NMI window
> by setting VMCS accordingly. However, this setting was done on
> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> requesting a KVM_REQ_EVENT.
> 
> Note that above scenario exists when L1 KVM is about to enter L2 but
> requests an "immediate-exit". As in this case, L1 will
> disable-interrupts and then send a self-IPI before entering L2.
> 
> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
>  		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
>  	}
>  
> +	/*
> +	 * If L1 had a pending IRQ/NMI until it executed
> +	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
> +	 * disallowed (e.g. interrupts disabled), L0 needs to
> +	 * evaluate if this pending event should cause an exit from L2
> +	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
> +	 * intercept EXTERNAL_INTERRUPT).
> +	 *
> +	 * Usually this would be handled by L0 requesting a
> +	 * IRQ/NMI window by setting VMCS accordingly. However,
> +	 * this setting was done on VMCS01 and now VMCS02 is active
> +	 * instead. Thus, we force L0 to perform pending event
> +	 * evaluation by requesting a KVM_REQ_EVENT.
> +	 */
> +	if (vmcs01_cpu_exec_ctrl &
> +		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {

Looks good, pending nested interrupts will be handled on the actual VM
entry, so we can ignore them here.

> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +	}
> +
>  	/*
>  	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
>  	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>  	 * by event injection, halt vcpu.
>  	 */
>  	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {

What is the purpose of this check?  I think the event is recognized in
when checking for runnability and will resume the VCPU,

thanks.

>  		vmx->nested.nested_run_pending = 0;
>  		return kvm_vcpu_halt(vcpu);
>  	}
> -- 
> 2.16.1
>
Liran Alon Aug. 30, 2018, 12:39 p.m. UTC | #2
> On 30 Aug 2018, at 14:50, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 
> 2018-08-30 12:57+0300, Liran Alon:
>> Consider the case L1 had a IRQ/NMI event until it executed
>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
>> L0 needs to evaluate if this pending event should cause an exit from
>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
>> EXTERNAL_INTERRUPT).
>> 
>> Usually this would be handled by L0 requesting a IRQ/NMI window
>> by setting VMCS accordingly. However, this setting was done on
>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
>> requesting a KVM_REQ_EVENT.
>> 
>> Note that above scenario exists when L1 KVM is about to enter L2 but
>> requests an "immediate-exit". As in this case, L1 will
>> disable-interrupts and then send a self-IPI before entering L2.
>> 
>> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
>> 		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
>> 	}
>> 
>> +	/*
>> +	 * If L1 had a pending IRQ/NMI until it executed
>> +	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
>> +	 * disallowed (e.g. interrupts disabled), L0 needs to
>> +	 * evaluate if this pending event should cause an exit from L2
>> +	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
>> +	 * intercept EXTERNAL_INTERRUPT).
>> +	 *
>> +	 * Usually this would be handled by L0 requesting a
>> +	 * IRQ/NMI window by setting VMCS accordingly. However,
>> +	 * this setting was done on VMCS01 and now VMCS02 is active
>> +	 * instead. Thus, we force L0 to perform pending event
>> +	 * evaluation by requesting a KVM_REQ_EVENT.
>> +	 */
>> +	if (vmcs01_cpu_exec_ctrl &
>> +		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {
> 
> Looks good, pending nested interrupts will be handled on the actual VM
> entry, so we can ignore them here.

Can you clarify?

> 
>> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +	}
>> +
>> 	/*
>> 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
>> 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
>> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>> 	 * by event injection, halt vcpu.
>> 	 */
>> 	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
>> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
>> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
>> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
> 
> What is the purpose of this check?  I think the event is recognized in
> when checking for runnability and will resume the VCPU,

That’s actually not accurate.

Without this change, kvm_vcpu_halt() will set mp_state = KVM_MP_STATE_HALTED.
Later, vcpu_run() will call kvm_vcpu_running() which will return false.
(Note that vmx_check_nested_events() won’t exit to L1 because nested_run_pending is set).
Therefore, vcpu_block() will be called which will call
kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_has_events()
which does check for pending interrupts using kvm_cpu_has_interrupt().
However, because nested_run_pending is set, vmx_interrupt_allowed() returns false and
therefore kvm_vcpu_has_events() doesn’t return “true” for the pending interrupt…

Therefore, I believe this change is required.

In addition, I have a unit-test for this scenario as-well and it fails without this change and passes with it.
Will submit it soon (It’s now on internal review).

-Liran

> 
> thanks.
> 
>> 		vmx->nested.nested_run_pending = 0;
>> 		return kvm_vcpu_halt(vcpu);
>> 	}
>> -- 
>> 2.16.1
>>
Radim Krčmář Aug. 30, 2018, 1:57 p.m. UTC | #3
2018-08-30 15:39+0300, Liran Alon:
> 
> 
> > On 30 Aug 2018, at 14:50, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > 
> > 2018-08-30 12:57+0300, Liran Alon:
> >> Consider the case L1 had a IRQ/NMI event until it executed
> >> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> >> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> >> L0 needs to evaluate if this pending event should cause an exit from
> >> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> >> EXTERNAL_INTERRUPT).
> >> 
> >> Usually this would be handled by L0 requesting a IRQ/NMI window
> >> by setting VMCS accordingly. However, this setting was done on
> >> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> >> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> >> requesting a KVM_REQ_EVENT.
> >> 
> >> Note that above scenario exists when L1 KVM is about to enter L2 but
> >> requests an "immediate-exit". As in this case, L1 will
> >> disable-interrupts and then send a self-IPI before entering L2.
> >> 
> >> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> >> ---
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
> >> 		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
> >> 	}
> >> 
> >> +	/*
> >> +	 * If L1 had a pending IRQ/NMI until it executed
> >> +	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
> >> +	 * disallowed (e.g. interrupts disabled), L0 needs to
> >> +	 * evaluate if this pending event should cause an exit from L2
> >> +	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
> >> +	 * intercept EXTERNAL_INTERRUPT).
> >> +	 *
> >> +	 * Usually this would be handled by L0 requesting a
> >> +	 * IRQ/NMI window by setting VMCS accordingly. However,
> >> +	 * this setting was done on VMCS01 and now VMCS02 is active
> >> +	 * instead. Thus, we force L0 to perform pending event
> >> +	 * evaluation by requesting a KVM_REQ_EVENT.
> >> +	 */
> >> +	if (vmcs01_cpu_exec_ctrl &
> >> +		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {
> > 
> > Looks good, pending nested interrupts will be handled on the actual VM
> > entry, so we can ignore them here.
> 
> Can you clarify?

An interrupt could have been pending in PIR (received after the last VM
entry) and we only look at posted interrupts in after disabling
interrupts before VMRESUME, which means that L1 might have a pending IRQ
and KVM is not aware of it at this point.

sync_pir_to_irr() will notice the interrupt later and probably do the
right thing.

(It was never a problem for the immediate-exit use case as that already
 had the interrupt synced, but missing posted interrupts could have
 delayed them after the VM entry and the slightly different case of
 posted interrupts was not mentioned in the comment.)

> >> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> >> +	}
> >> +
> >> 	/*
> >> 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
> >> 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
> >> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
> >> 	 * by event injection, halt vcpu.
> >> 	 */
> >> 	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
> >> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
> >> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
> >> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
> > 
> > What is the purpose of this check?  I think the event is recognized in
> > when checking for runnability and will resume the VCPU,
> 
> That’s actually not accurate.
> 
> Without this change, kvm_vcpu_halt() will set mp_state = KVM_MP_STATE_HALTED.
> Later, vcpu_run() will call kvm_vcpu_running() which will return false.
> (Note that vmx_check_nested_events() won’t exit to L1 because nested_run_pending is set).
> Therefore, vcpu_block() will be called which will call
> kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_has_events()
> which does check for pending interrupts using kvm_cpu_has_interrupt().
> However, because nested_run_pending is set, vmx_interrupt_allowed() returns false and
> therefore kvm_vcpu_has_events() doesn’t return “true” for the pending interrupt…

We clear nested_run_pending just before kvm_vcpu_halt(), which is why I
think the interrupt should be noticed.

> Therefore, I believe this change is required.
> 
> In addition, I have a unit-test for this scenario as-well and it fails without this change and passes with it.

A bug elsewhere would not surprise me. :)

> Will submit it soon (It’s now on internal review).
> 
> -Liran
> 
> > 
> > thanks.
> > 
> >> 		vmx->nested.nested_run_pending = 0;
> >> 		return kvm_vcpu_halt(vcpu);
> >> 	}
> >> -- 
> >> 2.16.1
> >> 
>
Sean Christopherson Aug. 30, 2018, 6 p.m. UTC | #4
On Thu, Aug 30, 2018 at 12:57:32PM +0300, Liran Alon wrote:
> Consider the case L1 had a IRQ/NMI event until it executed
> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> L0 needs to evaluate if this pending event should cause an exit from
> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> EXTERNAL_INTERRUPT).
> 
> Usually this would be handled by L0 requesting a IRQ/NMI window
> by setting VMCS accordingly. However, this setting was done on
> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> requesting a KVM_REQ_EVENT.
> 
> Note that above scenario exists when L1 KVM is about to enter L2 but
> requests an "immediate-exit". As in this case, L1 will
> disable-interrupts and then send a self-IPI before entering L2.
> 
> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>

The tag you're looking for is "Co-developed-by" and requires my sob.
Thanks for the recognition!

Documentation/process/5.Posting.rst:

 - Co-developed-by: states that the patch was also created by another developer
   along with the original author.  This is useful at times when multiple
   people work on a single patch.  Note, this person also needs to have a
   Signed-off-by: line in the patch as well.


Without further ado:

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

> Signed-off-by: Liran Alon <liran.alon@oracle.com>
Liran Alon Sept. 2, 2018, 3:49 p.m. UTC | #5
> On 30 Aug 2018, at 16:57, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 
> 2018-08-30 15:39+0300, Liran Alon:
>> 
>> 
>>> On 30 Aug 2018, at 14:50, Radim Krčmář <rkrcmar@redhat.com> wrote:
>>> 
>>> 2018-08-30 12:57+0300, Liran Alon:
>>>> Consider the case L1 had a IRQ/NMI event until it executed
>>>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
>>>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
>>>> L0 needs to evaluate if this pending event should cause an exit from
>>>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
>>>> EXTERNAL_INTERRUPT).
>>>> 
>>>> Usually this would be handled by L0 requesting a IRQ/NMI window
>>>> by setting VMCS accordingly. However, this setting was done on
>>>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
>>>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
>>>> requesting a KVM_REQ_EVENT.
>>>> 
>>>> Note that above scenario exists when L1 KVM is about to enter L2 but
>>>> requests an "immediate-exit". As in this case, L1 will
>>>> disable-interrupts and then send a self-IPI before entering L2.
>>>> 
>>>> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
>>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>>>> ---
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> @@ -12574,6 +12577,25 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
>>>> 		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
>>>> 	}
>>>> 
>>>> +	/*
>>>> +	 * If L1 had a pending IRQ/NMI until it executed
>>>> +	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
>>>> +	 * disallowed (e.g. interrupts disabled), L0 needs to
>>>> +	 * evaluate if this pending event should cause an exit from L2
>>>> +	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
>>>> +	 * intercept EXTERNAL_INTERRUPT).
>>>> +	 *
>>>> +	 * Usually this would be handled by L0 requesting a
>>>> +	 * IRQ/NMI window by setting VMCS accordingly. However,
>>>> +	 * this setting was done on VMCS01 and now VMCS02 is active
>>>> +	 * instead. Thus, we force L0 to perform pending event
>>>> +	 * evaluation by requesting a KVM_REQ_EVENT.
>>>> +	 */
>>>> +	if (vmcs01_cpu_exec_ctrl &
>>>> +		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {
>>> 
>>> Looks good, pending nested interrupts will be handled on the actual VM
>>> entry, so we can ignore them here.
>> 
>> Can you clarify?
> 
> An interrupt could have been pending in PIR (received after the last VM
> entry) and we only look at posted interrupts in after disabling
> interrupts before VMRESUME, which means that L1 might have a pending IRQ
> and KVM is not aware of it at this point.
> 
> sync_pir_to_irr() will notice the interrupt later and probably do the
> right thing.
> 
> (It was never a problem for the immediate-exit use case as that already
> had the interrupt synced, but missing posted interrupts could have
> delayed them after the VM entry and the slightly different case of
> posted interrupts was not mentioned in the comment.)
> 
>>>> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>> +	}
>>>> +
>>>> 	/*
>>>> 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
>>>> 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
>>>> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>>>> 	 * by event injection, halt vcpu.
>>>> 	 */
>>>> 	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
>>>> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
>>>> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
>>>> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
>>> 
>>> What is the purpose of this check?  I think the event is recognized in
>>> when checking for runnability and will resume the VCPU,
>> 
>> That’s actually not accurate.
>> 
>> Without this change, kvm_vcpu_halt() will set mp_state = KVM_MP_STATE_HALTED.
>> Later, vcpu_run() will call kvm_vcpu_running() which will return false.
>> (Note that vmx_check_nested_events() won’t exit to L1 because nested_run_pending is set).
>> Therefore, vcpu_block() will be called which will call
>> kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_has_events()
>> which does check for pending interrupts using kvm_cpu_has_interrupt().
>> However, because nested_run_pending is set, vmx_interrupt_allowed() returns false and
>> therefore kvm_vcpu_has_events() doesn’t return “true” for the pending interrupt…
> 
> We clear nested_run_pending just before kvm_vcpu_halt(), which is why I
> think the interrupt should be noticed.

Oops. Yes you are right.

> 
>> Therefore, I believe this change is required.
>> 
>> In addition, I have a unit-test for this scenario as-well and it fails without this change and passes with it.
> 
> A bug elsewhere would not surprise me. :)

I thought so as-well but actually after some digging, I found that this was an issue in my unit-test...
We should apply the patch I submitted here without this "!kvm_test_request(KVM_REQ_EVENT, vcpu)”.
Now it passes all my unit-tests.
Do you want me to submit a new patch or will you remove this while applying?

Thanks,
-Liran

> 
>> Will submit it soon (It’s now on internal review).
>> 
>> -Liran
>> 
>>> 
>>> thanks.
>>> 
>>>> 		vmx->nested.nested_run_pending = 0;
>>>> 		return kvm_vcpu_halt(vcpu);
>>>> 	}
>>>> -- 
>>>> 2.16.1
>>>> 
>>
Paolo Bonzini Sept. 11, 2018, 1:16 p.m. UTC | #6
On 30/08/2018 11:57, Liran Alon wrote:
> Consider the case L1 had a IRQ/NMI event until it executed
> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> L0 needs to evaluate if this pending event should cause an exit from
> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> EXTERNAL_INTERRUPT).
> 
> Usually this would be handled by L0 requesting a IRQ/NMI window
> by setting VMCS accordingly. However, this setting was done on
> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> requesting a KVM_REQ_EVENT.
> 
> Note that above scenario exists when L1 KVM is about to enter L2 but
> requests an "immediate-exit". As in this case, L1 will
> disable-interrupts and then send a self-IPI before entering L2.
> 
> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>  arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)

Any chance you can write a testcase for selftests/kvm?  The framework
should be more or less stable by now.

Paolo
Liran Alon Sept. 11, 2018, 4:50 p.m. UTC | #7
> On 11 Sep 2018, at 16:16, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 30/08/2018 11:57, Liran Alon wrote:
>> Consider the case L1 had a IRQ/NMI event until it executed
>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
>> L0 needs to evaluate if this pending event should cause an exit from
>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
>> EXTERNAL_INTERRUPT).
>> 
>> Usually this would be handled by L0 requesting a IRQ/NMI window
>> by setting VMCS accordingly. However, this setting was done on
>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
>> requesting a KVM_REQ_EVENT.
>> 
>> Note that above scenario exists when L1 KVM is about to enter L2 but
>> requests an "immediate-exit". As in this case, L1 will
>> disable-interrupts and then send a self-IPI before entering L2.
>> 
>> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> ---
>> arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
>> 1 file changed, 24 insertions(+), 1 deletion(-)
> 
> Any chance you can write a testcase for selftests/kvm?  The framework
> should be more or less stable by now.
> 
> Paolo

Actually, I have already written one and submitted it a week ago to the mailing list.
You can find the relevant unit-tests here:
https://patchwork.kernel.org/project/kvm/list/?series=14789

(There are also other patches I submitted to KVM nVMX which are pending to be reviewed and queued)

Regards,
-Liran
Paolo Bonzini Sept. 11, 2018, 5:24 p.m. UTC | #8
On 11/09/2018 18:50, Liran Alon wrote:
> 
> 
>> On 11 Sep 2018, at 16:16, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 30/08/2018 11:57, Liran Alon wrote:
>>> Consider the case L1 had a IRQ/NMI event until it executed
>>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
>>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
>>> L0 needs to evaluate if this pending event should cause an exit from
>>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
>>> EXTERNAL_INTERRUPT).
>>>
>>> Usually this would be handled by L0 requesting a IRQ/NMI window
>>> by setting VMCS accordingly. However, this setting was done on
>>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
>>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
>>> requesting a KVM_REQ_EVENT.
>>>
>>> Note that above scenario exists when L1 KVM is about to enter L2 but
>>> requests an "immediate-exit". As in this case, L1 will
>>> disable-interrupts and then send a self-IPI before entering L2.
>>>
>>> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>>> ---
>>> arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
>>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> Any chance you can write a testcase for selftests/kvm?  The framework
>> should be more or less stable by now.
>>
>> Paolo
> 
> Actually, I have already written one and submitted it a week ago to the mailing list.
> You can find the relevant unit-tests here:
> https://patchwork.kernel.org/project/kvm/list/?series=14789
> 
> (There are also other patches I submitted to KVM nVMX which are pending to be reviewed and queued)

Oh, that's great.  I'm still catching up with email.

Paolo
Paolo Bonzini Sept. 20, 2018, 4:47 p.m. UTC | #9
On 11/09/2018 18:50, Liran Alon wrote:
> 
> 
>> On 11 Sep 2018, at 16:16, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 30/08/2018 11:57, Liran Alon wrote:
>>> Consider the case L1 had a IRQ/NMI event until it executed
>>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
>>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
>>> L0 needs to evaluate if this pending event should cause an exit from
>>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
>>> EXTERNAL_INTERRUPT).
>>>
>>> Usually this would be handled by L0 requesting a IRQ/NMI window
>>> by setting VMCS accordingly. However, this setting was done on
>>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
>>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
>>> requesting a KVM_REQ_EVENT.
>>>
>>> Note that above scenario exists when L1 KVM is about to enter L2 but
>>> requests an "immediate-exit". As in this case, L1 will
>>> disable-interrupts and then send a self-IPI before entering L2.
>>>
>>> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>>> ---
>>> arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
>>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> Any chance you can write a testcase for selftests/kvm?  The framework
>> should be more or less stable by now.
>>
>> Paolo
> 
> Actually, I have already written one and submitted it a week ago to the mailing list.
> You can find the relevant unit-tests here:
> https://patchwork.kernel.org/project/kvm/list/?series=14789

The test doesn't pass with the new patch.

Also, the patch that was applied lacks the final hunk:

@@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	 * by event injection, halt vcpu.
 	 */
 	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
-	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
+	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
+	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
 		vmx->nested.nested_run_pending = 0;
 		return kvm_vcpu_halt(vcpu);
 	}

and the test doesn't pass even if I add this hunk.
(The v1 patch works, but it is a much bigger hammer).  Anybody has time
to take a look?  In the meanwhile I'm pushing the failing test to
kvm-unit-tests so that we don't forget.

Paolo
Nikita Leshenko Sept. 23, 2018, 2:19 p.m. UTC | #10
On Sept. 20, 2018, Paolo Bonzini wrote:

> On 11/09/2018 18:50, Liran Alon wrote:
> 
> >
> >
> >> On 11 Sep 2018, at 16:16, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 30/08/2018 11:57, Liran Alon wrote:
> >>> Consider the case L1 had a IRQ/NMI event until it executed
> >>> VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
> >>> (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
> >>> L0 needs to evaluate if this pending event should cause an exit from
> >>> L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
> >>> EXTERNAL_INTERRUPT).
> >>>
> >>> Usually this would be handled by L0 requesting a IRQ/NMI window
> >>> by setting VMCS accordingly. However, this setting was done on
> >>> VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
> >>> VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
> >>> requesting a KVM_REQ_EVENT.
> >>>
> >>> Note that above scenario exists when L1 KVM is about to enter L2 but
> >>> requests an "immediate-exit". As in this case, L1 will
> >>> disable-interrupts and then send a self-IPI before entering L2.
> >>>
> >>> Co-authored-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> >>> ---
> >>> arch/x86/kvm/vmx.c | 25 ++++++++++++++++++++++++-
> >>> 1 file changed, 24 insertions(+), 1 deletion(-)
> >>
> >> Any chance you can write a testcase for selftests/kvm?  The framework
> >> should be more or less stable by now.
> >>
> >> Paolo
> >
> > Actually, I have already written one and submitted it a week ago to the mailing list.
> > You can find the relevant unit-tests here:
> > https://patchwork.kernel.org/project/kvm/list/?series=14789
> 
> 
> The test doesn't pass with the new patch.
> 
> Also, the patch that was applied lacks the final hunk:
> 
> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>  	 * by event injection, halt vcpu.
>  	 */
>  	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
>  		vmx->nested.nested_run_pending = 0;
>  		return kvm_vcpu_halt(vcpu);
>  	}
> 
> and the test doesn't pass even if I add this hunk.
> (The v1 patch works, but it is a much bigger hammer).  Anybody has time
> to take a look?

I tried to reproduce the failure - I ran both linked tests on Linux master
(a83f87c1d2a93) but they pass.

Can you post your failure log?

Upstream kernel on an Ubuntu 18.04 with QEMU 2.11.1. The tests I ran:
- vmx_pending_event_test
- vmx_pending_event_hlt_test

Nikita
Paolo Bonzini Sept. 24, 2018, 3:21 p.m. UTC | #11
On 23/09/2018 16:19, Nikita Leshenko wrote:
>> @@ -12702,7 +12724,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>>  	 * by event injection, halt vcpu.
>>  	 */
>>  	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
>> -	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
>> +	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
>> +	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
>>  		vmx->nested.nested_run_pending = 0;
>>  		return kvm_vcpu_halt(vcpu);
>>  	}
>>
>> and the test doesn't pass even if I add this hunk.
>> (The v1 patch works, but it is a much bigger hammer).  Anybody has time
>> to take a look?
> 
> I tried to reproduce the failure - I ran both linked tests on Linux master
> (a83f87c1d2a93) but they pass.
> 
> Can you post your failure log?
> 
> Upstream kernel on an Ubuntu 18.04 with QEMU 2.11.1. The tests I ran:
> - vmx_pending_event_test
> - vmx_pending_event_hlt_test

My failure log is the following:

Test suite: vmx_pending_event_test
FAIL: x86/vmx_tests.c:2133: Assertion failed: (expected) == (actual)
	LHS: 0x0000000000000001 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001 - 1
	RHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 - 18
Expected VMX_EXTINT, got VMX_VMCALL.
	STACK: 40579c 405919 4059a3 401d4e 4038ee 400312
SUMMARY: 7 tests, 1 unexpected failures

This is on a Haswell-EP (Xeon E5 v3) machine.  I can try tomorrow on a Kaby Lake
laptop too (I'm not very well equipped to do kernel development there so I'd
rather wait for Fedora's 4.19-rc5 kernel to be available).

Paolo
Nikita Leshenko Sept. 26, 2018, 9:41 a.m. UTC | #12
> On 24 Sep 2018, at 17:21, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 23/09/2018 16:19, Nikita Leshenko wrote:
>> 
>> 
>> I tried to reproduce the failure - I ran both linked tests on Linux master
>> (a83f87c1d2a93) but they pass.
>> 
>> Can you post your failure log?
>> 
>> Upstream kernel on an Ubuntu 18.04 with QEMU 2.11.1. The tests I ran:
>> - vmx_pending_event_test
>> - vmx_pending_event_hlt_test
> 
> My failure log is the following:
> 
> Test suite: vmx_pending_event_test
> FAIL: x86/vmx_tests.c:2133: Assertion failed: (expected) == (actual)
> 	LHS: 0x0000000000000001 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001 - 1
> 	RHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 - 18
> Expected VMX_EXTINT, got VMX_VMCALL.
> 	STACK: 40579c 405919 4059a3 401d4e 4038ee 400312
> SUMMARY: 7 tests, 1 unexpected failures
> 
> This is on a Haswell-EP (Xeon E5 v3) machine.  I can try tomorrow on a Kaby Lake
> laptop too (I'm not very well equipped to do kernel development there so I'd
> rather wait for Fedora's 4.19-rc5 kernel to be available).

Could the failure be related to commit d264ee0c2 (KVM: VMX: use preemption timer
to force immediate VMExit)?

Given the multiple erratas that exist on VMX preemption timer and because this
test requires immediate exit, I think it's worth doing a checkout of commit
b5861e5cf (KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2) directly
(before the preemption timer changes are present) and running the tests again.
They still pass on my Skylake-SP (Xeon Platinum 8167M) and I wonder if the
results on your CPU will be different.

Nikita
Paolo Bonzini Oct. 1, 2018, 5:06 p.m. UTC | #13
On 26/09/2018 11:41, Nikita Leshenko wrote:
>> On 24 Sep 2018, at 17:21, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 23/09/2018 16:19, Nikita Leshenko wrote:
>>> I tried to reproduce the failure - I ran both linked tests on Linux master
>>> (a83f87c1d2a93) but they pass.
>>>
>>> Can you post your failure log?
>>>
>>> Upstream kernel on an Ubuntu 18.04 with QEMU 2.11.1. The tests I ran:
>>> - vmx_pending_event_test
>>> - vmx_pending_event_hlt_test
>>
>> My failure log is the following:
>>
>> Test suite: vmx_pending_event_test
>> FAIL: x86/vmx_tests.c:2133: Assertion failed: (expected) == (actual)
>> 	LHS: 0x0000000000000001 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001 - 1
>> 	RHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 - 18
>> Expected VMX_EXTINT, got VMX_VMCALL.
>> 	STACK: 40579c 405919 4059a3 401d4e 4038ee 400312
>> SUMMARY: 7 tests, 1 unexpected failures
>>
>> This is on a Haswell-EP (Xeon E5 v3) machine.  I can try tomorrow on a Kaby Lake
>> laptop too (I'm not very well equipped to do kernel development there so I'd
>> rather wait for Fedora's 4.19-rc5 kernel to be available).
> 
> Could the failure be related to commit d264ee0c2 (KVM: VMX: use preemption timer
> to force immediate VMExit)?
> 
> Given the multiple erratas that exist on VMX preemption timer and because this
> test requires immediate exit, I think it's worth doing a checkout of commit
> b5861e5cf (KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2) directly
> (before the preemption timer changes are present) and running the tests again.
> They still pass on my Skylake-SP (Xeon Platinum 8167M) and I wonder if the
> results on your CPU will be different.

Yeah, they pass on Kaby Lake too (Core i7-7600U) so I think we should
re-enable smp_send_reschedule on pre-Skylake processor.  Sean, what do
you think?

(And, thanks for the nice testcase finding bugs in other patches too.  :))

Paolo
Sean Christopherson Oct. 1, 2018, 6:40 p.m. UTC | #14
On Mon, 2018-10-01 at 19:06 +0200, Paolo Bonzini wrote:
> On 26/09/2018 11:41, Nikita Leshenko wrote:
> > 
> > > 
> > > On 24 Sep 2018, at 17:21, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > 
> > > On 23/09/2018 16:19, Nikita Leshenko wrote:
> > > > 
> > > > I tried to reproduce the failure - I ran both linked tests on Linux master
> > > > (a83f87c1d2a93) but they pass.
> > > > 
> > > > Can you post your failure log?
> > > > 
> > > > Upstream kernel on an Ubuntu 18.04 with QEMU 2.11.1. The tests I ran:
> > > > - vmx_pending_event_test
> > > > - vmx_pending_event_hlt_test
> > > My failure log is the following:
> > > 
> > > Test suite: vmx_pending_event_test
> > > FAIL: x86/vmx_tests.c:2133: Assertion failed: (expected) == (actual)
> > > 	LHS: 0x0000000000000001 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001 - 1
> > > 	RHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 - 18
> > > Expected VMX_EXTINT, got VMX_VMCALL.
> > > 	STACK: 40579c 405919 4059a3 401d4e 4038ee 400312
> > > SUMMARY: 7 tests, 1 unexpected failures
> > > 
> > > This is on a Haswell-EP (Xeon E5 v3) machine.  I can try tomorrow on a Kaby Lake
> > > laptop too (I'm not very well equipped to do kernel development there so I'd
> > > rather wait for Fedora's 4.19-rc5 kernel to be available).
> > Could the failure be related to commit d264ee0c2 (KVM: VMX: use preemption timer
> > to force immediate VMExit)?
> > 
> > Given the multiple erratas that exist on VMX preemption timer and because this
> > test requires immediate exit, I think it's worth doing a checkout of commit
> > b5861e5cf (KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2) directly
> > (before the preemption timer changes are present) and running the tests again.
> > They still pass on my Skylake-SP (Xeon Platinum 8167M) and I wonder if the
> > results on your CPU will be different.
> Yeah, they pass on Kaby Lake too (Core i7-7600U) so I think we should
> re-enable smp_send_reschedule on pre-Skylake processor.  Sean, what do
> you think?

That's not good.  The errata I'm aware of relate to the timer counting
at the wrong frequency.  A timer value of zero is (supposed to be)
special cased by the VMEnter ucode and shouldn't be subject to the
known errata.  In other words, this would be a new bug/errata if the
timer is armed with a value of zero and you're not seeing a VMExit.
Either that or the existing errata is *really* poorly worded.

What happens if you change the test to loop indefinitely instead of
doing VMCALL?  Do you ever see a timer VMExit or does it hang?

> (And, thanks for the nice testcase finding bugs in other patches too.  :))
> 
> Paolo
Paolo Bonzini Oct. 3, 2018, 11:51 a.m. UTC | #15
On 01/10/2018 20:40, Sean Christopherson wrote:
>>> Given the multiple erratas that exist on VMX preemption timer and because this
>>> test requires immediate exit, I think it's worth doing a checkout of commit
>>> b5861e5cf (KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2) directly
>>> (before the preemption timer changes are present) and running the tests again.
>>> They still pass on my Skylake-SP (Xeon Platinum 8167M) and I wonder if the
>>> results on your CPU will be different.
>>
>> Yeah, they pass on Kaby Lake too (Core i7-7600U) so I think we should
>> re-enable smp_send_reschedule on pre-Skylake processor.  Sean, what do
>> you think?
> 
> That's not good.  The errata I'm aware of relate to the timer counting
> at the wrong frequency.

Indeed so did I.  Now, I'm not sure why the test would pass on Nikita's
Skylake-SP, but one difference between my Kaby Lake and Haswell is that
the former is a consumer part that does not have APICv...

... and if I disable APICv the test starts to pass on the Haswell---with
APICv the KVM_REQ_EVENT is never requested because KVM does not need the
interrupt window.  I've just sent a fix ("kvm: nVMX: fix entry with
pending interrupt if APICv is enabled"), all CCed people are welcomed to
review it so that I can include it in the next pull request to Greg.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1d26f3c4985b..3348bc7ac566 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -12537,8 +12537,11 @@  static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	bool from_vmentry = !!exit_qual;
 	u32 dummy_exit_qual;
+	u32 vmcs01_cpu_exec_ctrl;
 	int r = 0;
 
+	vmcs01_cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+
 	enter_guest_mode(vcpu);
 
 	if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
@@ -12574,6 +12577,25 @@  static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, u32 *exit_qual)
 		kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
 	}
 
+	/*
+	 * If L1 had a pending IRQ/NMI until it executed
+	 * VMLAUNCH/VMRESUME which wasn't delivered because it was
+	 * disallowed (e.g. interrupts disabled), L0 needs to
+	 * evaluate if this pending event should cause an exit from L2
+	 * to L1 or delivered directly to L2 (e.g. In case L1 don't
+	 * intercept EXTERNAL_INTERRUPT).
+	 *
+	 * Usually this would be handled by L0 requesting a
+	 * IRQ/NMI window by setting VMCS accordingly. However,
+	 * this setting was done on VMCS01 and now VMCS02 is active
+	 * instead. Thus, we force L0 to perform pending event
+	 * evaluation by requesting a KVM_REQ_EVENT.
+	 */
+	if (vmcs01_cpu_exec_ctrl &
+		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING)) {
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+	}
+
 	/*
 	 * Note no nested_vmx_succeed or nested_vmx_fail here. At this point
 	 * we are no longer running L1, and VMLAUNCH/VMRESUME has not yet
@@ -12702,7 +12724,8 @@  static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	 * by event injection, halt vcpu.
 	 */
 	if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
-	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK)) {
+	    !(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
+	    !kvm_test_request(KVM_REQ_EVENT, vcpu)) {
 		vmx->nested.nested_run_pending = 0;
 		return kvm_vcpu_halt(vcpu);
 	}