mbox series

[0/7] KVM: x86: Introduce new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT

Message ID 20241004140810.34231-1-nikwip@amazon.de (mailing list archive)
Headers show
Series KVM: x86: Introduce new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT | expand

Message

Nikolas Wipper Oct. 4, 2024, 2:08 p.m. UTC
This series introduces a new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT. It
allows hypervisors to inhibit remote TLB flushing of a vCPU coming from
Hyper-V hyper-calls (namely HvFlushVirtualAddressSpace(Ex) and
HvFlushirtualAddressList(Ex)). It is required to implement the
HvTranslateVirtualAddress hyper-call as part of the ongoing effort to
emulate VSM within KVM and QEMU. The hyper-call requires several new KVM
APIs, one of which is KVM_HYPERV_SET_TLB_FLUSH_INHIBIT.

Once the inhibit flag is set, any processor attempting to flush the TLB on
the marked vCPU, with a HyperV hyper-call, will be suspended until the
flag is cleared again. During the suspension the vCPU will not run at all,
neither receiving events nor running other code. It will wake up from
suspension once the vCPU it is waiting on clears the inhibit flag. This
behaviour is specified in Microsoft's "Hypervisor Top Level Functional
Specification" (TLFS).

The vCPU will block execution during the suspension, making it transparent
to the hypervisor. An alternative design to what is proposed here would be
to exit from the Hyper-V hypercall upon finding an inhibited vCPU. We
decided against it, to allow for a simpler and more performant
implementation. Exiting to user space would create an additional
synchronisation burden and make the resulting code more complex.
Additionally, since the suspension is specific to HyperV events, it
wouldn't provide any functional benefits.

The TLFS specifies that the instruction pointer is not moved during the
suspension, so upon unsuspending the hyper-calls is re-executed. This
means that, if the vCPU encounters another inhibited TLB and is
resuspended, any pending events and interrupts are still executed. This is
identical to the vCPU receiving such events right before the hyper-call.

This inhibiting of TLB flushes is necessary, to securely implement
intercepts. These allow a higher "Virtual Trust Level" (VTL) to react to
a lower VTL accessing restricted memory. In such an intercept the VTL may
want to emulate a memory access in software, however, if another processor
flushes the TLB during that operation, incorrect behaviour can result.

The patch series includes basic testing of the ioctl and suspension state.
All previously passing KVM selftests and KVM unit tests still pass.

Series overview:
- 1: Document the new ioctl
- 2: Implement the suspension state
- 3: Update TLB flush hyper-call in preparation
- 4-5: Implement the ioctl
- 6: Add traces
- 7: Implement testing

As the suspension state is transparent to the hypervisor, testing is
complicated. The current version makes use of a set time intervall to give
the vCPU time to enter the hyper-call and get suspended. Ideas for
improvement on this are very welcome.

This series, alongside my series [1] implementing KVM_TRANSLATE2, the
series by Nicolas Saenz Julienne [2] implementing the core building blocks
for VSM and the accompanying QEMU implementation [3], is capable of
booting Windows Server 2019 with VSM/CredentialGuard enabled.

All three series are also available on GitHub [4].

[1] https://lore.kernel.org/linux-kernel/20240910152207.38974-1-nikwip@amazon.de/
[2] https://lore.kernel.org/linux-hyperv/20240609154945.55332-1-nsaenz@amazon.com/
[3] https://github.com/vianpl/qemu/tree/vsm/next
[4] https://github.com/vianpl/linux/tree/vsm/next

Best,
Nikolas

Nikolas Wipper (7):
  KVM: Add API documentation for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
  KVM: x86: Implement Hyper-V's vCPU suspended state
  KVM: x86: Check vCPUs before enqueuing TLB flushes in
    kvm_hv_flush_tlb()
  KVM: Introduce KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
  KVM: x86: Implement KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
  KVM: x86: Add trace events to track Hyper-V suspensions
  KVM: selftests: Add tests for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT

 Documentation/virt/kvm/api.rst                |  41 +++
 arch/x86/include/asm/kvm_host.h               |   5 +
 arch/x86/kvm/hyperv.c                         |  86 +++++-
 arch/x86/kvm/hyperv.h                         |  17 ++
 arch/x86/kvm/trace.h                          |  39 +++
 arch/x86/kvm/x86.c                            |  41 ++-
 include/uapi/linux/kvm.h                      |  15 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/hyperv_tlb_flush_inhibit.c     | 274 ++++++++++++++++++
 9 files changed, 503 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush_inhibit.c

Comments

Sean Christopherson Oct. 14, 2024, 11:36 p.m. UTC | #1
On Fri, Oct 04, 2024, Nikolas Wipper wrote:
> This series introduces a new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT. It
> allows hypervisors to inhibit remote TLB flushing of a vCPU coming from
> Hyper-V hyper-calls (namely HvFlushVirtualAddressSpace(Ex) and
> HvFlushirtualAddressList(Ex)). It is required to implement the
> HvTranslateVirtualAddress hyper-call as part of the ongoing effort to
> emulate VSM within KVM and QEMU. The hyper-call requires several new KVM
> APIs, one of which is KVM_HYPERV_SET_TLB_FLUSH_INHIBIT.
> 
> Once the inhibit flag is set, any processor attempting to flush the TLB on
> the marked vCPU, with a HyperV hyper-call, will be suspended until the
> flag is cleared again. During the suspension the vCPU will not run at all,
> neither receiving events nor running other code. It will wake up from
> suspension once the vCPU it is waiting on clears the inhibit flag. This
> behaviour is specified in Microsoft's "Hypervisor Top Level Functional
> Specification" (TLFS).
> 
> The vCPU will block execution during the suspension, making it transparent
> to the hypervisor.
 
s/hypervisor/VMM.  In the world of KVM, the typical terminology is that KVM itself
is the hypervisor, and the userspace side is the VMM.  It's not perfect, but it's
good enough and fairly ubiquitous at this point, and thus many readers will be
quite confused as to how a vCPU blocking is transparent to KVM :-)