mbox series

[00/16] KVM: TDX: TDX interrupts

Message ID 20241209010734.3543481-1-binbin.wu@linux.intel.com (mailing list archive)
Headers show
Series KVM: TDX: TDX interrupts | expand

Message

Binbin Wu Dec. 9, 2024, 1:07 a.m. UTC
Hi,

This patch series introduces the support of interrupt handling for TDX
guests, including virtual interrupt injection and VM-Exits caused by
vectored events.

This patch set is one of several patch sets that are all needed to provide
the ability to run a functioning TD VM. It is extracted from the section
"TD vcpu exits/interrupts/hypercalls" of the extensive 130-patch TDX base
enabling series[0]. We would like get maintainers/reviewers feedback of the
implementation of interrupt handling, especially the open about the virtual
NMI injection.

Base of this series
===================
This series is based off of a kvm-coco-queue commit and some pre-req series:
1. commit ee69eb746754 ("KVM: x86/mmu: Prevent aliased memslot GFNs") (in
   kvm-coco-queue).
2. v7 of "TDX host: metadata reading tweaks and bug fixes and info dump" [1]
3. v1 of "KVM: VMX: Initialize TDX when loading KVM module" [2]
4. v2 of “TDX vCPU/VM creation” [3]
5. v2 of "TDX KVM MMU part 2" [4]
6  v1 of "TDX vcpu enter/exit" [5] with a few fixups based on review feedbacks.
7. v4 of "KVM: x86: Prep KVM hypercall handling for TDX" [6]
8. v1 of "KVM: TDX: TDX hypercalls may exit to userspace" [7]

Opens to discuss
================
- NMI injection: VMM can't inject another pending NMI to TDX guest when
  there is already one pending NMI in the TDX module in a back-to-back
  way. The solution adopted in this patch series is if there is a further
  pending NMI in KVM, collapse it to the one pending in the TDX module.
  Refer to patch "KVM: TDX: Implement methods to inject NMI" for more
  details.
- NMI VM-Exit handling: Based on current TDX module, NMI is unblocked after
  SEAMRET from SEAM root mode to VMX root mode due to NMI VM-EXit. This is
  a TDX module bug, it could cause NMI handling order issue [8][9], which
  could lead to unknown NMI warning message or nested NMI. There is a
  internal discussion ongoing about the change of TDX module based on Sean's
  suggestion [8].
  This patch series puts the NMI VM-Exit handling in noinstr section, it
  can't completely prevent the order issue from happening, but if the code
  can be instrumented, the probability of the order issue could be bigger.
  Also, no code change needed if the NMIs are blocked after TD exit to KVM.

Virtual interrupt injection
===========================

Non-NMI Interrupt
-----------------
TDX supports non-NMI interrupt injection only by posted interrupt. Posted
interrupt descriptors (PIDs) are allocated in shared memory, KVM can update
them directly. To post pending interrupts in the PID, KVM can generate a
self-IPI with notification vector prior to TD entry.
TDX guest status is protected, KVM can't get the interrupt status of TDX
guest. In this series, assumes interrupt is always allowed. A later patch
set will have the support for TDX guest to call TDVMCALL with HLT, which
passes the interrupt block flag, so that whether interrupt is allowed in
HLT will checked against the interrupt block flag.

NMI
---
KVM can request the TDX module to inject a NMI into a TDX vCPU by setting
the PEND_NMI TDVPS field to 1. Following that, KVM can call TDH.VP.ENTER to
run the vCPU and the TDX module will attempt to inject the NMI as soon as
possible.
PEND_NMI TDVPS field is a 1-bit filed, i.e. KVM can only pend one NMI in
the TDX module. Also, TDX doesn't allow KVM to request NMI-window exit
directly. When there is already one NMI pending in the TDX module, i.e. it
has not been delivered to TDX guest yet, if there is NMI pending in KVM,
collapse the pending NMI in KVM into the one pending in the TDX module.
Such collapse is OK considering on X86 bare metal, multiple NMIs could
collapse into one NMI, e.g. when NMI is blocked by SMI.  It's OS's
responsibility to poll all NMI sources in the NMI handler to avoid missing
handling of some NMI events. More details can be found in the changelog of
the patch "KVM: TDX: Implement methods to inject NMI".

SMI
---
TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs because TDX module doesn't provide a way for
VMM to inject SMI into guest TD or switch guest vCPU mode into SMM. Handle
SMI request as what KVM does for CONFIG_KVM_SMM=n, i.e. return -ENOTTY,
and add KVM_BUG_ON() to SMI related OPs for TD.

INIT/SIPI event
----------------
TDX defines its own vCPU creation and initialization sequence including
multiple SEAMCALLs.  Also, it's only allowed during TD build time. Always
block INIT and SIPI events for the TDX guest.

VM-Exits caused by vectored event
=================================

NMI
---
Similar to the VMX case, handle VM-Exit caused by NMIs within
tdx_vcpu_enter_exit(), i.e., handled before leaving the safety of noinstr.

External Interrupt
------------------
Similar to the VMX case, external interrupts are handled in
.handle_exit_irqoff() callback.

Exception
---------
Machine check, which is handled in the .handle_exit_irqoff() callback, is
the only exception type KVM handles for TDX guests. For other exceptions,
because TDX guest state is protected, exceptions in TDX guests can't be
intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to
userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs.

SMI
---
In SEAM root mode (TDX module), all interrupts are blocked. If an SMI occurs
in SEAM non-root mode (TD guest), the SMI causes VM exit to TDX module, then
SEAMRET to KVM. Once it exits to KVM, SMI is delivered and handled by kernel
handler right away.
An SMI can be "I/O SMI" or "other SMI".  For TDX, there will be no I/O SMI
because I/O instructions inside TDX guest trigger #VE and TDX guest needs to
use TDVMCALL to request VMM to do I/O emulation.
For "other SMI", there are two cases:
- MSMI case.  When BIOS eMCA MCE-SMI morphing is enabled, the #MC occurs in
  TDX guest will be delivered as an MSMI.  It causes an EXIT_REASON_OTHER_SMI
  VM exit with MSMI (bit 0) set in the exit qualification.  On VM exit, TDX
  module checks whether the "other SMI" is caused by an MSMI or not.  If so,
  TDX module marks TD as fatal, preventing further TD entries, and then
  completes the TD exit flow to KVM with the TDH.VP.ENTER outputs indicating
  TDX_NON_RECOVERABLE_TD.  After TD exit, the MSMI is delivered and eventually
  handled by the kernel machine check handler (7911f14 x86/mce: Implement
  recovery for errors in TDX/SEAM non-root mode), i.e., the memory page is
  marked as poisoned and it won't be freed to the free list when the TDX
  guest is terminated.  Since the TDX guest is dead, follow other
  non-recoverable cases, exit to userspace.
- For non-MSMI case, KVM doesn't need to do anything, just continue TDX vCPU
  execution.

Notable changes since v19
=========================

NMI injection
-------------
If there is a further pending NMI in KVM, collapse it to the one pending in
the TDX module.

Always block INIT/SIPI
----------------------
INIT/SIPI events are always blocked for TDX, so this patch series removes the
unnecessary interfaces and helpers for INIT/SIPI delivery.

Check LVTT vector before waiting for lapic expiration
-----------------------------------------------------
To avoid unnecessary wait calls, only call kvm_wait_lapic_expire() when a
timer IRQ was injected, i.e., POSTED_INTR_ON is set and the vector for
LVTT is set in PIR.

Repos
=====
The full KVM branch is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-11-20-exits-userspace

A matching QEMU is here:
https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-upstream-v6.1

Testing 
======= 
It has been tested as part of the development branch for the TDX base
series. The testing consisted of TDX kvm-unit-tests and booting a Linux
TD, and TDX enhanced KVM selftests.

[0] https://lore.kernel.org/kvm/cover.1708933498.git.isaku.yamahata@intel.com
[1] https://lore.kernel.org/kvm/cover.1731318868.git.kai.huang@intel.com
[2] https://lore.kernel.org/kvm/cover.1730120881.git.kai.huang@intel.com
[3] https://lore.kernel.org/kvm/20241030190039.77971-1-rick.p.edgecombe@intel.com
[4] https://lore.kernel.org/kvm/20241112073327.21979-1-yan.y.zhao@intel.com
[5] https://lore.kernel.org/kvm/20241121201448.36170-1-adrian.hunter@intel.com
[6] https://lore.kernel.org/kvm/20241128004344.4072099-1-seanjc@google.com
[7] https://lore.kernel.org/kvm/20241201035358.2193078-1-binbin.wu@linux.intel.com
[8] https://lore.kernel.org/kvm/Z0T_iPdmtpjrc14q@google.com
[9] https://lore.kernel.org/kvm/57aab3bf-4bae-4956-a3b7-d42e810556e3@linux.intel.com

Isaku Yamahata (13):
  KVM: VMX: Remove use of struct vcpu_vmx from posted_intr.c
  KVM: TDX: Disable PI wakeup for IPIv
  KVM: VMX: Move posted interrupt delivery code to common header
  KVM: TDX: Implement non-NMI interrupt injection
  KVM: TDX: Wait lapic expire when timer IRQ was injected
  KVM: TDX: Implement methods to inject NMI
  KVM: TDX: Complete interrupts after TD exit
  KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM
  KVM: TDX: Always block INIT/SIPI
  KVM: TDX: Inhibit APICv for TDX guest
  KVM: TDX: Add methods to ignore virtual apic related operation
  KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT
  KVM: TDX: Handle EXIT_REASON_OTHER_SMI

Sean Christopherson (3):
  KVM: TDX: Add support for find pending IRQ in a protected local APIC
  KVM: x86: Assume timer IRQ was injected if APIC state is protected
  KVM: VMX: Move NMI/exception handler to common helper

 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |  13 +-
 arch/x86/include/asm/posted_intr.h |   5 +
 arch/x86/include/uapi/asm/vmx.h    |   1 +
 arch/x86/kvm/irq.c                 |   3 +
 arch/x86/kvm/lapic.c               |  16 +-
 arch/x86/kvm/lapic.h               |   2 +
 arch/x86/kvm/smm.h                 |   3 +
 arch/x86/kvm/vmx/common.h          | 143 +++++++++++++++
 arch/x86/kvm/vmx/main.c            | 286 ++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/posted_intr.c     |  51 +++--
 arch/x86/kvm/vmx/posted_intr.h     |  13 ++
 arch/x86/kvm/vmx/tdx.c             | 137 +++++++++++++-
 arch/x86/kvm/vmx/tdx.h             |  16 ++
 arch/x86/kvm/vmx/vmx.c             | 144 ++-------------
 arch/x86/kvm/vmx/vmx.h             |  14 +-
 arch/x86/kvm/vmx/x86_ops.h         |  13 +-
 17 files changed, 677 insertions(+), 184 deletions(-)

Comments

Paolo Bonzini Dec. 10, 2024, 6:24 p.m. UTC | #1
Applied to kvm-coco-queue, thanks.

Paolo