mbox series

[0/4] KVM: nVMX: Fix migration of nested guests when eVMCS is in use

Message ID 20210503150854.1144255-1-vkuznets@redhat.com (mailing list archive)
Headers show
Series KVM: nVMX: Fix migration of nested guests when eVMCS is in use | expand

Message

Vitaly Kuznetsov May 3, 2021, 3:08 p.m. UTC
Win10 guests with WSL2 enabled sometimes crash on migration when
enlightened VMCS was used. The condition seems to be induced by the
situation when L2->L1 exit is caused immediately after migration and
before L2 gets a chance to run (e.g. when there's an interrupt pending).
The issue was introduced by commit f2c7ef3ba955 ("KVM: nSVM: cancel 
KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") and the first patch
of the series addresses the immediate issue. The eVMCS mapping restoration
path, however, seems to be fragile and the rest of the series tries to
make it more future proof by including eVMCS GPA in the migration data.

Vitaly Kuznetsov (4):
  KVM: nVMX: Always make an attempt to map eVMCS after migration
  KVM: nVMX: Properly pad 'struct kvm_vmx_nested_state_hdr'
  KVM: nVMX: Introduce __nested_vmx_handle_enlightened_vmptrld()
  KVM: nVMX: Map enlightened VMCS upon restore when possible

 arch/x86/include/uapi/asm/kvm.h |  4 ++
 arch/x86/kvm/vmx/nested.c       | 82 +++++++++++++++++++++++----------
 2 files changed, 61 insertions(+), 25 deletions(-)

Comments

Paolo Bonzini May 3, 2021, 3:43 p.m. UTC | #1
On 03/05/21 17:08, Vitaly Kuznetsov wrote:
> Win10 guests with WSL2 enabled sometimes crash on migration when
> enlightened VMCS was used. The condition seems to be induced by the
> situation when L2->L1 exit is caused immediately after migration and
> before L2 gets a chance to run (e.g. when there's an interrupt pending).

Interesting, I think it gets to nested_vmx_vmexit before

                 if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
                         if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
                                 r = 0;
                                 goto out;
                         }
                 }

due to the infamous calls to check_nested_events that are scattered
through KVM?

Paolo
Vitaly Kuznetsov May 3, 2021, 3:52 p.m. UTC | #2
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 03/05/21 17:08, Vitaly Kuznetsov wrote:
>> Win10 guests with WSL2 enabled sometimes crash on migration when
>> enlightened VMCS was used. The condition seems to be induced by the
>> situation when L2->L1 exit is caused immediately after migration and
>> before L2 gets a chance to run (e.g. when there's an interrupt pending).
>
> Interesting, I think it gets to nested_vmx_vmexit before
>
>                  if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
>                          if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
>                                  r = 0;
>                                  goto out;
>                          }
>                  }
>
> due to the infamous calls to check_nested_events that are scattered
> through KVM?

Yea,

vcpu_run() -> kvm_vcpu_running() -> vmx_check_nested_events() if I
remember it correctly.