mbox series

[v3,00/11] KVM: nVMX: Fixes for nested state migration when eVMCS is in use

Message ID 20210526132026.270394-1-vkuznets@redhat.com (mailing list archive)
Headers show
Series KVM: nVMX: Fixes for nested state migration when eVMCS is in use | expand

Message

Vitaly Kuznetsov May 26, 2021, 1:20 p.m. UTC
Changes since v2:
- 'KVM: nVMX: Use '-1' in 'hv_evmcs_vmptr' to indicate that eVMCS is not in
 use'/ 'KVM: nVMX: Introduce 'EVMPTR_MAP_PENDING' post-migration state'
 patches instead of 'KVM: nVMX: Introduce nested_evmcs_is_used()' [Paolo]
- 'KVM: nVMX: Don't set 'dirty_vmcs12' flag on enlightened VMPTRLD' patch
 added [Max]
- 'KVM: nVMX: Release eVMCS when enlightened VMENTRY was disabled' patch
  added.
- 'KVM: nVMX: Make copy_vmcs12_to_enlightened()/copy_enlightened_to_vmcs12()
 return 'void'' patch added [Paolo]
- R-b tags added [Max]

Original description:

Commit f5c7e8425f18 ("KVM: nVMX: Always make an attempt to map eVMCS after
migration") fixed the most obvious reason why Hyper-V on KVM (e.g. Win10
 + WSL2) was crashing immediately after migration. It was also reported
that we have more issues to fix as, while the failure rate was lowered 
signifincatly, it was still possible to observe crashes after several
dozens of migration. Turns out, the issue arises when we manage to issue
KVM_GET_NESTED_STATE right after L2->L2 VMEXIT but before L1 gets a chance
to run. This state is tracked with 'need_vmcs12_to_shadow_sync' flag but
the flag itself is not part of saved nested state. A few other less 
significant issues are fixed along the way.

While there's no proof this series fixes all eVMCS related problems,
Win10+WSL2 was able to survive 3333 (thanks, Max!) migrations without
crashing in testing.

Patches are based on the current kvm/next tree.

Vitaly Kuznetsov (11):
  KVM: nVMX: Use '-1' in 'hv_evmcs_vmptr' to indicate that eVMCS is not
    in use
  KVM: nVMX: Don't set 'dirty_vmcs12' flag on enlightened VMPTRLD
  KVM: nVMX: Release eVMCS when enlightened VMENTRY was disabled
  KVM: nVMX: Make
    copy_vmcs12_to_enlightened()/copy_enlightened_to_vmcs12() return
    'void'
  KVM: nVMX: Introduce 'EVMPTR_MAP_PENDING' post-migration state
  KVM: nVMX: Release enlightened VMCS on VMCLEAR
  KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in
    vmx_get_nested_state()
  KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid()
  KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02()
  KVM: nVMX: Request to sync eVMCS from VMCS12 after migration
  KVM: selftests: evmcs_test: Test that KVM_STATE_NESTED_EVMCS is never
    lost

 arch/x86/kvm/vmx/evmcs.c                      |   3 +
 arch/x86/kvm/vmx/evmcs.h                      |   8 +
 arch/x86/kvm/vmx/nested.c                     | 144 +++++++++++-------
 arch/x86/kvm/vmx/nested.h                     |  11 +-
 arch/x86/kvm/vmx/vmx.c                        |   1 +
 .../testing/selftests/kvm/x86_64/evmcs_test.c |  64 ++++----
 6 files changed, 140 insertions(+), 91 deletions(-)

Comments

Vitaly Kuznetsov June 10, 2021, 2:29 p.m. UTC | #1
Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Changes since v2:
> - 'KVM: nVMX: Use '-1' in 'hv_evmcs_vmptr' to indicate that eVMCS is not in
>  use'/ 'KVM: nVMX: Introduce 'EVMPTR_MAP_PENDING' post-migration state'
>  patches instead of 'KVM: nVMX: Introduce nested_evmcs_is_used()' [Paolo]
> - 'KVM: nVMX: Don't set 'dirty_vmcs12' flag on enlightened VMPTRLD' patch
>  added [Max]
> - 'KVM: nVMX: Release eVMCS when enlightened VMENTRY was disabled' patch
>   added.
> - 'KVM: nVMX: Make copy_vmcs12_to_enlightened()/copy_enlightened_to_vmcs12()
>  return 'void'' patch added [Paolo]
> - R-b tags added [Max]
>
> Original description:
>
> Commit f5c7e8425f18 ("KVM: nVMX: Always make an attempt to map eVMCS after
> migration") fixed the most obvious reason why Hyper-V on KVM (e.g. Win10
>  + WSL2) was crashing immediately after migration. It was also reported
> that we have more issues to fix as, while the failure rate was lowered 
> signifincatly, it was still possible to observe crashes after several
> dozens of migration. Turns out, the issue arises when we manage to issue
> KVM_GET_NESTED_STATE right after L2->L2 VMEXIT but before L1 gets a chance
> to run. This state is tracked with 'need_vmcs12_to_shadow_sync' flag but
> the flag itself is not part of saved nested state. A few other less 
> significant issues are fixed along the way.
>
> While there's no proof this series fixes all eVMCS related problems,
> Win10+WSL2 was able to survive 3333 (thanks, Max!) migrations without
> crashing in testing.
>
> Patches are based on the current kvm/next tree.

Paolo, Max,

Just to double-check: are we good here? I know there are more
improvements/ideas to explore but I'd like to treat this patchset as a
set of fixes, it would be unfortunate if we miss 5.14.
Paolo Bonzini June 10, 2021, 3:31 p.m. UTC | #2
On 10/06/21 16:29, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
>> Changes since v2:
>> - 'KVM: nVMX: Use '-1' in 'hv_evmcs_vmptr' to indicate that eVMCS is not in
>>   use'/ 'KVM: nVMX: Introduce 'EVMPTR_MAP_PENDING' post-migration state'
>>   patches instead of 'KVM: nVMX: Introduce nested_evmcs_is_used()' [Paolo]
>> - 'KVM: nVMX: Don't set 'dirty_vmcs12' flag on enlightened VMPTRLD' patch
>>   added [Max]
>> - 'KVM: nVMX: Release eVMCS when enlightened VMENTRY was disabled' patch
>>    added.
>> - 'KVM: nVMX: Make copy_vmcs12_to_enlightened()/copy_enlightened_to_vmcs12()
>>   return 'void'' patch added [Paolo]
>> - R-b tags added [Max]
>>
>> Original description:
>>
>> Commit f5c7e8425f18 ("KVM: nVMX: Always make an attempt to map eVMCS after
>> migration") fixed the most obvious reason why Hyper-V on KVM (e.g. Win10
>>   + WSL2) was crashing immediately after migration. It was also reported
>> that we have more issues to fix as, while the failure rate was lowered
>> signifincatly, it was still possible to observe crashes after several
>> dozens of migration. Turns out, the issue arises when we manage to issue
>> KVM_GET_NESTED_STATE right after L2->L2 VMEXIT but before L1 gets a chance
>> to run. This state is tracked with 'need_vmcs12_to_shadow_sync' flag but
>> the flag itself is not part of saved nested state. A few other less
>> significant issues are fixed along the way.
>>
>> While there's no proof this series fixes all eVMCS related problems,
>> Win10+WSL2 was able to survive 3333 (thanks, Max!) migrations without
>> crashing in testing.
>>
>> Patches are based on the current kvm/next tree.
> 
> Paolo, Max,
> 
> Just to double-check: are we good here? I know there are more
> improvements/ideas to explore but I'd like to treat this patchset as a
> set of fixes, it would be unfortunate if we miss 5.14.
> 

Yes, I was busy the last couple of weeks but I am back now.

Paolo