diff mbox series

[2/2] KVM: nVMX: fix for disappearing L1->L2 event injection on L1 migration

Message ID 20210106105306.450602-3-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series RFC: VMX: fix for disappearing L1->L2 event injection on L1 migration | expand

Commit Message

Maxim Levitsky Jan. 6, 2021, 10:53 a.m. UTC
If migration happens while L2 entry with an injected event to L2 is pending,
we weren't including the event in the migration state and it would be
lost leading to L2 hang.

Fix this by queueing the injected event in similar manner to how we queue
interrupted injections.

This can be reproduced by running an IO intense task in L2,
and repeatedly migrating the L1.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Sean Christopherson Jan. 6, 2021, 6:17 p.m. UTC | #1
On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> If migration happens while L2 entry with an injected event to L2 is pending,
> we weren't including the event in the migration state and it would be
> lost leading to L2 hang.

But the injected event should still be in vmcs12 and KVM_STATE_NESTED_RUN_PENDING
should be set in the migration state, i.e. it should naturally be copied to
vmcs02 and thus (re)injected by vmx_set_nested_state().  Is nested_run_pending
not set?  Is the info in vmcs12 somehow lost?  Or am I off in left field...
 
> Fix this by queueing the injected event in similar manner to how we queue
> interrupted injections.
> 
> This can be reproduced by running an IO intense task in L2,
> and repeatedly migrating the L1.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index e2f26564a12de..2ea0bb14f385f 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
>  	 * Interrupt/Exception Fields
>  	 */
>  	if (vmx->nested.nested_run_pending) {
> -		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> -			     vmcs12->vm_entry_intr_info_field);
> -		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> -			     vmcs12->vm_entry_exception_error_code);
> -		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> -			     vmcs12->vm_entry_instruction_len);
> +		if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
> +			vmx_process_injected_event(&vmx->vcpu,
> +						   vmcs12->vm_entry_intr_info_field,
> +						   vmcs12->vm_entry_instruction_len,
> +						   vmcs12->vm_entry_exception_error_code);
> +
>  		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
>  			     vmcs12->guest_interruptibility_info);
>  		vmx->loaded_vmcs->nmi_known_unmasked =
> -- 
> 2.26.2
>
Maxim Levitsky Jan. 7, 2021, 2:38 a.m. UTC | #2
On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > If migration happens while L2 entry with an injected event to L2 is pending,
> > we weren't including the event in the migration state and it would be
> > lost leading to L2 hang.
> 
> But the injected event should still be in vmcs12 and KVM_STATE_NESTED_RUN_PENDING
> should be set in the migration state, i.e. it should naturally be copied to
> vmcs02 and thus (re)injected by vmx_set_nested_state().  Is nested_run_pending
> not set?  Is the info in vmcs12 somehow lost?  Or am I off in left field...


You are completely right. 
The injected event can be copied like that since the vmc(b|s)12 is migrated.

We can safely disregard both these two patches and the parallel two patches for SVM.
I am almost sure that the real root cause of this bug was that we 
weren't restoring the nested run pending flag, and I even 
happened to fix this in this patch series.

This is the trace of the bug (I removed the timestamps to make it easier to read)


kvm_exit:             vcpu 0 reason vmrun rip 0xffffffffa0688ffa info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
kvm_nested_vmrun:     rip: 0xffffffffa0688ffa vmcb: 0x0000000103594000 nrip: 0xffffffff814b3b01 int_ctl: 0x01000001 event_inj: 0x80000036 npt: on
																^^^ this is the injection
kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercepts: bc4c8027 00006e7f 00000000
kvm_fpu:              unload
kvm_userspace_exit:   reason KVM_EXIT_INTR (10)

============================================================================
migration happens here
============================================================================

...
kvm_async_pf_ready:   token 0xffffffff gva 0
kvm_apic_accept_irq:  apicid 0 vec 243 (Fixed|edge)

kvm_nested_intr_vmexit: rip: 0x000000000000fff0

^^^^^ this is the nested vmexit that shouldn't have happened, since nested run is pending,
and which erased the eventinj field which was migrated correctly just like you say.

kvm_nested_vmexit_inject: reason: interrupt ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
...


We did notice that this vmexit had a wierd RIP and I 
even explained this later to myself,
that this is the default RIP which we put to vmcb, 
and it wasn't yet updated, since it updates just prior to vm entry.

My test already survived about 170 iterations (usually it crashes after 20-40 iterations)
I am leaving the stress test running all night, let see if it survives.

V2 of the patches is on the way.

Thanks again for the help!

Best regards,
	Maxim Levitsky

>  
> > Fix this by queueing the injected event in similar manner to how we queue
> > interrupted injections.
> > 
> > This can be reproduced by running an IO intense task in L2,
> > and repeatedly migrating the L1.
> > 
> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index e2f26564a12de..2ea0bb14f385f 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> >  	 * Interrupt/Exception Fields
> >  	 */
> >  	if (vmx->nested.nested_run_pending) {
> > -		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> > -			     vmcs12->vm_entry_intr_info_field);
> > -		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> > -			     vmcs12->vm_entry_exception_error_code);
> > -		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> > -			     vmcs12->vm_entry_instruction_len);
> > +		if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
> > +			vmx_process_injected_event(&vmx->vcpu,
> > +						   vmcs12->vm_entry_intr_info_field,
> > +						   vmcs12->vm_entry_instruction_len,
> > +						   vmcs12->vm_entry_exception_error_code);
> > +
> >  		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
> >  			     vmcs12->guest_interruptibility_info);
> >  		vmx->loaded_vmcs->nmi_known_unmasked =
> > -- 
> > 2.26.2
> >
Maxim Levitsky Jan. 7, 2021, 9:41 a.m. UTC | #3
On Thu, 2021-01-07 at 04:38 +0200, Maxim Levitsky wrote:
> On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> > On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > > If migration happens while L2 entry with an injected event to L2 is pending,
> > > we weren't including the event in the migration state and it would be
> > > lost leading to L2 hang.
> > 
> > But the injected event should still be in vmcs12 and KVM_STATE_NESTED_RUN_PENDING
> > should be set in the migration state, i.e. it should naturally be copied to
> > vmcs02 and thus (re)injected by vmx_set_nested_state().  Is nested_run_pending
> > not set?  Is the info in vmcs12 somehow lost?  Or am I off in left field...
> 
> You are completely right. 
> The injected event can be copied like that since the vmc(b|s)12 is migrated.
> 
> We can safely disregard both these two patches and the parallel two patches for SVM.
> I am almost sure that the real root cause of this bug was that we 
> weren't restoring the nested run pending flag, and I even 
> happened to fix this in this patch series.
> 
> This is the trace of the bug (I removed the timestamps to make it easier to read)
> 
> 
> kvm_exit:             vcpu 0 reason vmrun rip 0xffffffffa0688ffa info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
> kvm_nested_vmrun:     rip: 0xffffffffa0688ffa vmcb: 0x0000000103594000 nrip: 0xffffffff814b3b01 int_ctl: 0x01000001 event_inj: 0x80000036 npt: on
> 																^^^ this is the injection
> kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercepts: bc4c8027 00006e7f 00000000
> kvm_fpu:              unload
> kvm_userspace_exit:   reason KVM_EXIT_INTR (10)
> 
> ============================================================================
> migration happens here
> ============================================================================
> 
> ...
> kvm_async_pf_ready:   token 0xffffffff gva 0
> kvm_apic_accept_irq:  apicid 0 vec 243 (Fixed|edge)
> 
> kvm_nested_intr_vmexit: rip: 0x000000000000fff0
> 
> ^^^^^ this is the nested vmexit that shouldn't have happened, since nested run is pending,
> and which erased the eventinj field which was migrated correctly just like you say.
> 
> kvm_nested_vmexit_inject: reason: interrupt ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> ...
> 
> 
> We did notice that this vmexit had a wierd RIP and I 
> even explained this later to myself,
> that this is the default RIP which we put to vmcb, 
> and it wasn't yet updated, since it updates just prior to vm entry.
> 
> My test already survived about 170 iterations (usually it crashes after 20-40 iterations)
> I am leaving the stress test running all night, let see if it survives.

And after leaving it overnight, the test survived about 1000 iterations.

Thanks again!

Best regards,
	Maxim Levitstky


> 
> V2 of the patches is on the way.
> 
> Thanks again for the help!
> 
> Best regards,
> 	Maxim Levitsky
> 
> >  
> > > Fix this by queueing the injected event in similar manner to how we queue
> > > interrupted injections.
> > > 
> > > This can be reproduced by running an IO intense task in L2,
> > > and repeatedly migrating the L1.
> > > 
> > > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index e2f26564a12de..2ea0bb14f385f 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> > >  	 * Interrupt/Exception Fields
> > >  	 */
> > >  	if (vmx->nested.nested_run_pending) {
> > > -		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> > > -			     vmcs12->vm_entry_intr_info_field);
> > > -		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> > > -			     vmcs12->vm_entry_exception_error_code);
> > > -		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> > > -			     vmcs12->vm_entry_instruction_len);
> > > +		if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
> > > +			vmx_process_injected_event(&vmx->vcpu,
> > > +						   vmcs12->vm_entry_intr_info_field,
> > > +						   vmcs12->vm_entry_instruction_len,
> > > +						   vmcs12->vm_entry_exception_error_code);
> > > +
> > >  		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
> > >  			     vmcs12->guest_interruptibility_info);
> > >  		vmx->loaded_vmcs->nmi_known_unmasked =
> > > -- 
> > > 2.26.2
> > >
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e2f26564a12de..2ea0bb14f385f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2355,12 +2355,12 @@  static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 	 * Interrupt/Exception Fields
 	 */
 	if (vmx->nested.nested_run_pending) {
-		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
-			     vmcs12->vm_entry_intr_info_field);
-		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
-			     vmcs12->vm_entry_exception_error_code);
-		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
-			     vmcs12->vm_entry_instruction_len);
+		if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
+			vmx_process_injected_event(&vmx->vcpu,
+						   vmcs12->vm_entry_intr_info_field,
+						   vmcs12->vm_entry_instruction_len,
+						   vmcs12->vm_entry_exception_error_code);
+
 		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
 			     vmcs12->guest_interruptibility_info);
 		vmx->loaded_vmcs->nmi_known_unmasked =