mbox series

[00/15] KVM: x86/mmu: TLB fixes and related cleanups

Message ID 20210609234235.1244004-1-seanjc@google.com (mailing list archive)
Headers show
Series KVM: x86/mmu: TLB fixes and related cleanups | expand

Message

Sean Christopherson June 9, 2021, 11:42 p.m. UTC
Fixes for two (very) theoretical TLB flushing bugs (patches 1 and 4),
and clean ups on top to (hopefully) consolidate and simplifiy the TLB
flushing logic.

The basic gist of the TLB flush and MMU sync code shuffling  is to stop
relying on the logic in __kvm_mmu_new_pgd() (but keep it for forced
flushing), and instead handle the flush+sync logic in the caller
independent from whether or not the "fast" switch occurs.

I spent a fair bit of time trying to shove the necessary logic down into
__kvm_mmu_new_pgd(), but it always ended up a complete mess because the
requirements and contextual information is always different.  The rules
for MOV CR3 are different from nVMX transitions (and those vary based on
EPT+VPID), and nSVM will be different still (once it adds proper TLB
handling).  In particular, I like that nVMX no longer has special code
for synchronizing the MMU when using shadowing paging and instead relies
on the common rules for TLB flushing.

Note, this series (indirectly) relies heavily on commit b53e84eed08b
("KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU
sync"), as it uses KVM_REQ_TLB_FLUSH_GUEST (was KVM_REQ_HV_TLB_FLUSH)
to do the TLB flush _and_ the MMU sync in non-PV code.

Tested all combinations for i386, EPT, NPT, and shadow paging. I think...

The EPTP switching and INVPCID single-context changes in particular lack
meaningful coverage in kvm-unit-tests+Linux.  Long term it's on my todo
list to remedy that, but realistically I doubt I'll get it done anytime
soon.

To test EPTP switching, I hacked L1 to set up a duplicate top-level EPT
table, copy the "real" table to the duplicate table on EPT violation,
populate VMFUNC.EPTP_LIST with the two EPTPs, expose  VMFUNC.EPTP_SWITCH
to L2.  I then hacked L2 to do an EPTP switch to a random (valid) EPTP
index on every task switch.

To test INVPCID single-context I modified L1 to iterate over all possible
PCIDs using INVPCID single-context in native_flush_tlb_global().  I also
verified that the guest crashed if it didn't do any INVPCID at all
(interestingly, the guest made it through boot without the flushes when
EPT was enabled, which implies the missing MMU sync on INVPCID was the
source of the crash, not a stale TLB entry).

Sean Christopherson (15):
  KVM: nVMX: Sync all PGDs on nested transition with shadow paging
  KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
  KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
  KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush
  KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD
    switch
  KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated
    helper
  KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD"
    helpers
  KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic
  KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT
  KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation
  KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH]
  KVM: x86: Defer MMU sync on PCID invalidation
  KVM: x86: Drop pointless @reset_roots from kvm_init_mmu()
  KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur
  KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation

 arch/x86/include/asm/kvm_host.h |   6 +-
 arch/x86/kvm/hyperv.c           |   2 +-
 arch/x86/kvm/mmu.h              |   2 +-
 arch/x86/kvm/mmu/mmu.c          |  57 ++++++++-----
 arch/x86/kvm/svm/nested.c       |  40 ++++++---
 arch/x86/kvm/vmx/nested.c       | 139 ++++++++++++--------------------
 arch/x86/kvm/x86.c              |  75 ++++++++++-------
 7 files changed, 169 insertions(+), 152 deletions(-)

Comments

Paolo Bonzini June 10, 2021, 4:10 p.m. UTC | #1
On 10/06/21 01:42, Sean Christopherson wrote:
> Fixes for two (very) theoretical TLB flushing bugs (patches 1 and 4),
> and clean ups on top to (hopefully) consolidate and simplifiy the TLB
> flushing logic.
> 
> The basic gist of the TLB flush and MMU sync code shuffling  is to stop
> relying on the logic in __kvm_mmu_new_pgd() (but keep it for forced
> flushing), and instead handle the flush+sync logic in the caller
> independent from whether or not the "fast" switch occurs.
> 
> I spent a fair bit of time trying to shove the necessary logic down into
> __kvm_mmu_new_pgd(), but it always ended up a complete mess because the
> requirements and contextual information is always different.  The rules
> for MOV CR3 are different from nVMX transitions (and those vary based on
> EPT+VPID), and nSVM will be different still (once it adds proper TLB
> handling).  In particular, I like that nVMX no longer has special code
> for synchronizing the MMU when using shadowing paging and instead relies
> on the common rules for TLB flushing.
> 
> Note, this series (indirectly) relies heavily on commit b53e84eed08b
> ("KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU
> sync"), as it uses KVM_REQ_TLB_FLUSH_GUEST (was KVM_REQ_HV_TLB_FLUSH)
> to do the TLB flush _and_ the MMU sync in non-PV code.
> 
> Tested all combinations for i386, EPT, NPT, and shadow paging. I think...
> 
> The EPTP switching and INVPCID single-context changes in particular lack
> meaningful coverage in kvm-unit-tests+Linux.  Long term it's on my todo
> list to remedy that, but realistically I doubt I'll get it done anytime
> soon.
> 
> To test EPTP switching, I hacked L1 to set up a duplicate top-level EPT
> table, copy the "real" table to the duplicate table on EPT violation,
> populate VMFUNC.EPTP_LIST with the two EPTPs, expose  VMFUNC.EPTP_SWITCH
> to L2.  I then hacked L2 to do an EPTP switch to a random (valid) EPTP
> index on every task switch.
> 
> To test INVPCID single-context I modified L1 to iterate over all possible
> PCIDs using INVPCID single-context in native_flush_tlb_global().  I also
> verified that the guest crashed if it didn't do any INVPCID at all
> (interestingly, the guest made it through boot without the flushes when
> EPT was enabled, which implies the missing MMU sync on INVPCID was the
> source of the crash, not a stale TLB entry).
> 
> Sean Christopherson (15):
>    KVM: nVMX: Sync all PGDs on nested transition with shadow paging
>    KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap
>    KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch
>    KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush
>    KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD
>      switch
>    KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated
>      helper
>    KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD"
>      helpers
>    KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic
>    KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT
>    KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation
>    KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH]
>    KVM: x86: Defer MMU sync on PCID invalidation
>    KVM: x86: Drop pointless @reset_roots from kvm_init_mmu()
>    KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur
>    KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation
> 
>   arch/x86/include/asm/kvm_host.h |   6 +-
>   arch/x86/kvm/hyperv.c           |   2 +-
>   arch/x86/kvm/mmu.h              |   2 +-
>   arch/x86/kvm/mmu/mmu.c          |  57 ++++++++-----
>   arch/x86/kvm/svm/nested.c       |  40 ++++++---
>   arch/x86/kvm/vmx/nested.c       | 139 ++++++++++++--------------------
>   arch/x86/kvm/x86.c              |  75 ++++++++++-------
>   7 files changed, 169 insertions(+), 152 deletions(-)
> 

I tried this a couple times but was blocked on what is essentially your 
first patch, so thanks!  Patches queued for 5.14.

Paolo