mbox series

[v2,00/24] TDX MMU Part 2

Message ID 20241112073327.21979-1-yan.y.zhao@intel.com (mailing list archive)
Headers show
Series TDX MMU Part 2 | expand

Message

Yan Zhao Nov. 12, 2024, 7:33 a.m. UTC
Hi,

Here is v2 of the TDX “MMU part 2” series.
As discussed earlier, non-nit feedbacks from v1[0] have been applied.
- Among them, patch "KVM: TDX: MTRR: implement get_mt_mask() for TDX" was
  dropped. The feature self-snoop was not made a dependency for enabling
  TDX since checking for the feature self-snoop was not included in
  kvm_mmu_may_ignore_guest_pat() in the base code. So, strickly speaking,
  current code would incorrectly zap the mirrored root if non-coherent DMA
  devices were hot-plugged.

There were also a few minor issues noticed by me and fixed without internal
discussion (noted in each patch's version log).

It’s now ready to hand off to Paolo/kvm-coco-queue.


One remaining item that requires further discussion is "How to handle
the TDX module lock contention (i.e. SEAMCALL retry replacements)".
The basis for future discussions includes:
(1) TDH.MEM.TRACK can contend with TDH.VP.ENTER on the TD epoch lock.
(2) TDH.VP.ENTER contends with TDH.MEM* on S-EPT tree lock when 0-stepping
    mitigation is triggered.
    - The threshold of zero-step mitigation is counted per-vCPU when the
      TDX module finds that EPT violations are caused by the same RIP as
      in the last TDH.VP.ENTER for 6 consecutive times.
      The threshold value 6 is explained as 
      "There can be at most 2 mapping faults on instruction fetch
       (x86 macro-instructions length is at most 15 bytes) when the
       instruction crosses page boundary; then there can be at most 2
       mapping faults for each memory operand, when the operand crosses
       page boundary. For most of x86 macro-instructions, there are up to 2
       memory operands and each one of them is small, which brings us to
       maximum 2+2*2 = 6 legal mapping faults."
    - If the EPT violations received by KVM are caused by
      TDG.MEM.PAGE.ACCEPT, they will not trigger 0-stepping mitigation.
      Since a TD is required to call TDG.MEM.PAGE.ACCEPT before accessing a
      private memory when configured with pending_ve_disable=Y, 0-stepping
      mitigation is not expected to occur in such a TD.
(3) TDG.MEM.PAGE.ACCEPT can contend with SEAMCALLs TDH.MEM*.
    (Actually, TDG.MEM.PAGE.ATTR.RD or TDG.MEM.PAGE.ATTR.WR can also
     contend with SEAMCALLs TDH.MEM*. Although we don't need to consider
     these two TDCALLs when enabling basic TDX, they are allowed by the
     TDX module, and we can't control whether a TD invokes a TDCALL or
     not).

The "KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT" is 
still in place in this series (at the tail), but we should drop it when we
finalize on the real solution.


This series has 5 commits intended to collect Acks from x86 maintainers.
These commits introduce and export SEAMCALL wrappers to allow KVM to manage
the S-EPT (the EPT that maps private memory and is protected by the TDX
module):

  x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_sept_add() to add SEPT
    pages
  x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages
  x86/virt/tdx: Add SEAMCALL wrappers to manage TDX TLB tracking
  x86/virt/tdx: Add SEAMCALL wrappers to remove a TD private page
  x86/virt/tdx: Add SEAMCALL wrappers for TD measurement of initial
    contents
 
This series is based off of a kvm-coco-queue commit and some pre-req
series:
1. commit ee69eb746754 ("KVM: x86/mmu: Prevent aliased memslot GFNs") (in
   kvm-coco-queue).
2. v7 of "TDX host: metadata reading tweaks, bug fix and info dump" [1].
3. v1 of "KVM: VMX: Initialize TDX when loading KVM module" [2], with some
   new feedback from Sean.
4. v2 of “TDX vCPU/VM creation” [3]
 
It requires TDX module 1.5.06.00.0744[4], or later. This is due to removal
of the workarounds for the lack of the NO_RBP_MOD feature required by the
kernel. Now NO_RBP_MOD is enabled (in VM/vCPU creation patches), and this
particular version of the TDX module has a required NO_RBP_MOD related bug
fix.
A working edk2 commit is 95d8a1c ("UnitTestFrameworkPkg: Use TianoCore
mirror of subhook submodule").

 
The series has been tested as part of the development branch for the TDX
base series. The testing consisted of TDX kvm-unit-tests and booting a
Linux TD, and TDX enhanced KVM selftests.

The full KVM branch is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-11-11.3

Matching QEMU:
https://github.com/intel-staging/qemu-tdx/commits/tdx-qemu-upstream-v6.1/

[0] https://lore.kernel.org/kvm/20240904030751.117579-1-rick.p.edgecombe@intel.com/
[1] https://lore.kernel.org/kvm/cover.1731318868.git.kai.huang@intel.com/#t
[2] https://lore.kernel.org/kvm/cover.1730120881.git.kai.huang@intel.com/
[3] https://lore.kernel.org/kvm/20241030190039.77971-1-rick.p.edgecombe@intel.com/
[4] https://github.com/intel/tdx-module/releases/tag/TDX_1.5.06


Isaku Yamahata (17):
  KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU
  KVM: TDX: Add accessors VMX VMCS helpers
  KVM: TDX: Set gfn_direct_bits to shared bit
  x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_sept_add() to add SEPT
    pages
  x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages
  x86/virt/tdx: Add SEAMCALL wrappers to manage TDX TLB tracking
  x86/virt/tdx: Add SEAMCALL wrappers to remove a TD private page
  x86/virt/tdx: Add SEAMCALL wrappers for TD measurement of initial
    contents
  KVM: TDX: Require TDP MMU and mmio caching for TDX
  KVM: x86/mmu: Add setter for shadow_mmio_value
  KVM: TDX: Set per-VM shadow_mmio_value to 0
  KVM: TDX: Handle TLB tracking for TDX
  KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page
    table
  KVM: TDX: Implement hook to get max mapping level of private pages
  KVM: TDX: Add an ioctl to create initial guest memory
  KVM: TDX: Finalize VM initialization
  KVM: TDX: Handle vCPU dissociation

Rick Edgecombe (3):
  KVM: x86/mmu: Implement memslot deletion for TDX
  KVM: VMX: Teach EPT violation helper about private mem
  KVM: x86/mmu: Export kvm_tdp_map_page()

Sean Christopherson (2):
  KVM: VMX: Split out guts of EPT violation to common/exposed function
  KVM: TDX: Add load_mmu_pgd method for TDX

Yan Zhao (1):
  KVM: x86/mmu: Do not enable page track for TD guest

Yuan Yao (1):
  [HACK] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand
    SEPT

 arch/x86/include/asm/tdx.h      |   9 +
 arch/x86/include/asm/vmx.h      |   1 +
 arch/x86/include/uapi/asm/kvm.h |  10 +
 arch/x86/kvm/mmu.h              |   4 +
 arch/x86/kvm/mmu/mmu.c          |   7 +-
 arch/x86/kvm/mmu/page_track.c   |   3 +
 arch/x86/kvm/mmu/spte.c         |   8 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  37 +-
 arch/x86/kvm/vmx/common.h       |  43 ++
 arch/x86/kvm/vmx/main.c         | 104 ++++-
 arch/x86/kvm/vmx/tdx.c          | 727 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h          |  93 ++++
 arch/x86/kvm/vmx/tdx_arch.h     |  23 +
 arch/x86/kvm/vmx/vmx.c          |  25 +-
 arch/x86/kvm/vmx/x86_ops.h      |  51 +++
 arch/x86/virt/vmx/tdx/tdx.c     | 176 ++++++++
 arch/x86/virt/vmx/tdx/tdx.h     |   8 +
 virt/kvm/kvm_main.c             |   1 +
 18 files changed, 1278 insertions(+), 52 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/common.h