[v19,078/130] KVM: TDX: Implement TDX vcpu enter/exit path

Message ID	dbaa6b1a6c4ebb1400be5f7099b4b9e3b54431bb.1708933498.git.isaku.yamahata@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D9141EEE9; Mon, 26 Feb 2024 08:28:37 +0000 (UTC) From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com, Sean Christopherson <seanjc@google.com>, Sagi Shahar <sagis@google.com>, Kai Huang <kai.huang@intel.com>, chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Subject: [PATCH v19 078/130] KVM: TDX: Implement TDX vcpu enter/exit path Date: Mon, 26 Feb 2024 00:26:20 -0800 Message-Id: <dbaa6b1a6c4ebb1400be5f7099b4b9e3b54431bb.1708933498.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1708933498.git.isaku.yamahata@intel.com> References: <cover.1708933498.git.isaku.yamahata@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v19,001/130] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro \| expand [v19,001/130] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro [v19,002/130] x86/virt/tdx: Move TDMR metadata fields map table to local variable [v19,003/130] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo' [v19,004/130] x86/virt/tdx: Support global metadata read for all element sizes [v19,005/130] x86/virt/tdx: Export global metadata read infrastructure [v19,006/130] x86/virt/tdx: Export TDX KeyID information [v19,007/130] x86/virt/tdx: Export SEAMCALL functions [v19,008/130] x86/tdx: Warning with 32bit build shift-count-overflow [v19,009/130] KVM: x86: Add gmem hook for determining max NPT mapping level [v19,010/130] KVM: x86: Pass is_private to gmem hook of gmem_max_level [v19,011/130] KVM: Add new members to struct kvm_gfn_range to operate on [v19,012/130] KVM: x86/mmu: Pass around full 64-bit error code for the KVM page fault [v19,013/130] KVM: x86: Use PFERR_GUEST_ENC_MASK to indicate fault is private [v19,014/130] KVM: Add KVM vcpu ioctl to pre-populate guest memory [v19,015/130] KVM: Document KVM_MEMORY_MAPPING ioctl [v19,016/130] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX [v19,017/130] KVM: x86: Implement kvm_arch_{, pre_}vcpu_memory_mapping() [v19,018/130] KVM: x86/mmu: Assume guest MMIOs are shared [v19,019/130] KVM: x86: Add is_vm_type_supported callback [v19,020/130] KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX [v19,021/130] KVM: x86/vmx: initialize loaded_vmcss_on_cpu in vmx_init() [v19,022/130] KVM: x86/vmx: Refactor KVM VMX module init/exit functions [v19,023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module [v19,024/130] KVM: TDX: Add placeholders for TDX VM/vcpu structure [v19,025/130] KVM: TDX: Make TDX VM type supported [v19,026/130,MARKER] The start of TDX KVM patch series: TDX architectural definitions [v19,027/130] KVM: TDX: Define TDX architectural definitions [v19,028/130] KVM: TDX: Add TDX "architectural" error codes [v19,029/130] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module [v19,030/130] KVM: TDX: Add helper functions to print TDX SEAMCALL error [v19,031/130,MARKER] The start of TDX KVM patch series: TD VM creation/destruction [v19,032/130] KVM: TDX: Add helper functions to allocate/free TDX private host key id [v19,033/130] KVM: TDX: Add helper function to read TDX metadata in array [v19,034/130] KVM: TDX: Get system-wide info about TDX module on initialization [v19,035/130] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl [v19,036/130] KVM: TDX: x86: Add ioctl to get TDX systemwide parameters [v19,037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific [v19,038/130] KVM: TDX: create/destroy VM structure [v19,039/130] KVM: TDX: initialize VM with TDX specific parameters [v19,040/130] KVM: TDX: Make pmu_intel.c ignore guest TD case [v19,041/130] KVM: TDX: Refuse to unplug the last cpu on the package [v19,042/130,MARKER] The start of TDX KVM patch series: TD vcpu creation/destruction [v19,043/130] KVM: TDX: create/free TDX vcpu structure [v19,044/130] KVM: TDX: Do TDX specific vcpu initialization [v19,045/130,MARKER] The start of TDX KVM patch series: KVM MMU GPA shared bits [v19,046/130] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA [v19,047/130,MARKER] The start of TDX KVM patch series: KVM TDP refactoring for TDX [v19,048/130] KVM: Allow page-sized MMU caches to be initialized with custom 64-bit values [v19,049/130] KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE [v19,050/130] KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE [v19,051/130] KVM: x86/mmu: Add Suppress VE bit to shadow_mmio_mask/shadow_present_mask [v19,052/130] KVM: x86/mmu: Track shadow MMIO value on a per-VM basis [v19,053/130] KVM: x86/mmu: Disallow fast page fault on private GPA [v19,054/130] KVM: VMX: Introduce test mode related to EPT violation VE [v19,055/130,MARKER] The start of TDX KVM patch series: KVM TDP MMU hooks [v19,056/130] KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at allocation [v19,057/130] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role [v19,058/130] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page [v19,059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases [v19,060/130] KVM: x86/tdp_mmu: Apply mmu notifier callback to only shared GPA [v19,061/130] KVM: x86/tdp_mmu: Sprinkle __must_check [v19,062/130] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU [v19,063/130,MARKER] The start of TDX KVM patch series: TDX EPT violation [v19,064/130] KVM: x86/mmu: Do not enable page track for TD guest [v19,065/130] KVM: VMX: Split out guts of EPT violation to common/exposed function [v19,066/130] KVM: TDX: Add accessors VMX VMCS helpers [v19,067/130] KVM: TDX: Add load_mmu_pgd method for TDX [v19,068/130] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT [v19,069/130] KVM: TDX: Require TDP MMU and mmio caching for TDX [v19,070/130] KVM: TDX: TDP MMU TDX support [v19,071/130] KVM: TDX: MTRR: implement get_mt_mask() for TDX [v19,072/130,MARKER] The start of TDX KVM patch series: TD finalization [v19,073/130] KVM: x86: Add hooks in kvm_arch_vcpu_memory_mapping() [v19,074/130] KVM: TDX: Create initial guest memory [v19,075/130] KVM: TDX: Extend memory measurement with initial guest memory [v19,076/130] KVM: TDX: Finalize VM initialization [v19,077/130,MARKER] The start of TDX KVM patch series: TD vcpu enter/exit [v19,078/130] KVM: TDX: Implement TDX vcpu enter/exit path [v19,079/130] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) [v19,080/130] KVM: TDX: restore host xsave state when exit from the guest TD [v19,081/130] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr [v19,082/130] KVM: TDX: restore user ret MSRs [v19,083/130] KVM: TDX: Add TSX_CTRL msr into uret_msrs list [v19,084/130,MARKER] The start of TDX KVM patch series: TD vcpu exits/interrupts/hypercalls [v19,085/130] KVM: TDX: Complete interrupts after tdexit [v19,086/130] KVM: TDX: restore debug store when TD exit [v19,087/130] KVM: TDX: handle vcpu migration over logical processor [v19,088/130] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior [v19,089/130] KVM: TDX: Add support for find pending IRQ in a protected local APIC [v19,090/130] KVM: x86: Assume timer IRQ was injected if APIC state is proteced [v19,091/130] KVM: TDX: remove use of struct vcpu_vmx from posted_interrupt.c [v19,092/130] KVM: TDX: Implement interrupt injection [v19,093/130] KVM: TDX: Implements vcpu request_immediate_exit [v19,094/130] KVM: TDX: Implement methods to inject NMI [v19,095/130] KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument [v19,096/130] KVM: VMX: Move NMI/exception handler to common helper [v19,097/130] KVM: x86: Split core of hypercall emulation to helper function [v19,098/130] KVM: TDX: Add a place holder to handle TDX VM exit [v19,099/130] KVM: TDX: Handle vmentry failure for INTEL TD guest [v19,100/130] KVM: TDX: handle EXIT_REASON_OTHER_SMI [v19,101/130] KVM: TDX: handle ept violation/misconfig exit [v19,102/130] KVM: TDX: handle EXCEPTION_NMI and EXTERNAL_INTERRUPT [v19,103/130] KVM: TDX: Handle EXIT_REASON_OTHER_SMI with MSMI [v19,104/130] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) [v19,105/130] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL [v19,106/130] KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL [v19,107/130] KVM: TDX: Handle TDX PV CPUID hypercall [v19,108/130] KVM: TDX: Handle TDX PV HLT hypercall [v19,109/130] KVM: TDX: Handle TDX PV port io hypercall [v19,110/130] KVM: TDX: Handle TDX PV MMIO hypercall [v19,111/130] KVM: TDX: Implement callbacks for MSR operations for TDX [v19,112/130] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall [v19,113/130] KVM: TDX: Handle MSR MTRRCap and MTRRDefType access [v19,114/130] KVM: TDX: Handle MSR IA32_FEAT_CTL MSR and IA32_MCG_EXT_CTL [v19,115/130] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall [v19,116/130] KVM: TDX: Silently discard SMI request [v19,117/130] KVM: TDX: Silently ignore INIT/SIPI [v19,118/130] KVM: TDX: Add methods to ignore accesses to CPU state [v19,119/130] KVM: TDX: Add methods to ignore guest instruction emulation [v19,120/130] KVM: TDX: Add a method to ignore dirty logging [v19,121/130] KVM: TDX: Add methods to ignore VMX preemption timer [v19,122/130] KVM: TDX: Add methods to ignore accesses to TSC [v19,123/130] KVM: TDX: Ignore setting up mce [v19,124/130] KVM: TDX: Add a method to ignore for TDX to ignore hypercall patch [v19,125/130] KVM: TDX: Add methods to ignore virtual apic related operation [v19,126/130] KVM: TDX: Inhibit APICv for TDX guest [v19,127/130] Documentation/virt/kvm: Document on Trust Domain Extensions(TDX) [v19,128/130] KVM: x86: design documentation on TDX support of x86 KVM TDP MMU [v19,129/130] RFC: KVM: x86: Add x86 callback to check cpuid [v19,130/130] RFC: KVM: x86, TDX: Add check for KVM_SET_CPUID2

Isaku Yamahata Feb. 26, 2024, 8:26 a.m. UTC

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch implements running TDX vcpu.  Once vcpu runs on the logical
processor (LP), the TDX vcpu is associated with it.  When the TDX vcpu
moves to another LP, the TDX vcpu needs to flush its status on the LP.
When destroying TDX vcpu, it needs to complete flush and flush cpu memory
cache.  Track which LP the TDX vcpu run and flush it as necessary.

Do nothing on sched_in event as TDX doesn't support pause loop.

TDX vcpu execution requires restoring PMU debug store after returning back
to KVM because the TDX module unconditionally resets the value.  To reuse
the existing code, export perf_restore_debug_store.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

---
v19:
- Removed export_symbol_gpl(host_xcr0) to the patch that uses it

Changes v15 -> v16:
- use __seamcall_saved_ret()
- As struct tdx_module_args doesn't match with vcpu.arch.regs, copy regs
  before/after calling __seamcall_saved_ret().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 21 +++++++++-
 arch/x86/kvm/vmx/tdx.c     | 84 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     | 33 +++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  2 +
 4 files changed, 138 insertions(+), 2 deletions(-)

Sean Christopherson March 15, 2024, 5:26 p.m. UTC | #1

On Mon, Feb 26, 2024, isaku.yamahata@intel.com wrote:
> +static noinstr void tdx_vcpu_enter_exit(struct vcpu_tdx *tdx)
> +{
> +	struct tdx_module_args args;
> +
> +	/*
> +	 * Avoid section mismatch with to_tdx() with KVM_VM_BUG().  The caller
> +	 * should call to_tdx().

C'mon.  I don't think it's unreasonable to expect that at least one of the many
people working on TDX would figure out why to_vmx() is __always_inline.

> +	 */
> +	struct kvm_vcpu *vcpu = &tdx->vcpu;
> +
> +	guest_state_enter_irqoff();
> +
> +	/*
> +	 * TODO: optimization:
> +	 * - Eliminate copy between args and vcpu->arch.regs.
> +	 * - copyin/copyout registers only if (tdx->tdvmvall.regs_mask != 0)
> +	 *   which means TDG.VP.VMCALL.
> +	 */
> +	args = (struct tdx_module_args) {
> +		.rcx = tdx->tdvpr_pa,
> +#define REG(reg, REG)	.reg = vcpu->arch.regs[VCPU_REGS_ ## REG]

Organizing tdx_module_args's registers by volatile vs. non-volatile is asinine.
This code should not need to exist.

> +	WARN_ON_ONCE(!kvm_rebooting &&
> +		     (tdx->exit_reason.full & TDX_SW_ERROR) == TDX_SW_ERROR);
> +
> +	guest_state_exit_irqoff();
> +}
> +
> +fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +
> +	if (unlikely(!tdx->initialized))
> +		return -EINVAL;
> +	if (unlikely(vcpu->kvm->vm_bugged)) {
> +		tdx->exit_reason.full = TDX_NON_RECOVERABLE_VCPU;
> +		return EXIT_FASTPATH_NONE;
> +	}
> +
> +	trace_kvm_entry(vcpu);
> +
> +	tdx_vcpu_enter_exit(tdx);
> +
> +	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
> +	trace_kvm_exit(vcpu, KVM_ISA_VMX);
> +
> +	return EXIT_FASTPATH_NONE;
> +}
> +
>  void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
>  {
>  	WARN_ON_ONCE(root_hpa & ~PAGE_MASK);
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index d822e790e3e5..81d301fbe638 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -27,6 +27,37 @@ struct kvm_tdx {
>  	struct page *source_page;
>  };
>  
> +union tdx_exit_reason {
> +	struct {
> +		/* 31:0 mirror the VMX Exit Reason format */

Then use "union vmx_exit_reason", having to maintain duplicate copies of the same
union is not something I want to do.

I'm honestly not even convinced that "union tdx_exit_reason" needs to exist.  I
added vmx_exit_reason because we kept having bugs where KVM would fail to strip
bits 31:16, and because nested VMX needs to stuff failed_vmentry, but I don't
see a similar need for TDX.

I would even go so far as to say the vcpu_tdx field shouldn't be exit_reason,
and instead should be "return_code" or something.  E.g. if the TDX module refuses
to run the vCPU, there's no VM-Enter and thus no VM-Exit (unless you count the
SEAMCALL itself, har har).  Ditto for #GP or #UD on the SEAMCALL (or any other
reason that generates TDX_SW_ERROR).

Ugh, I'm doubling down on that suggesting.  This:

	WARN_ON_ONCE(!kvm_rebooting &&
		     (tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR);

	if ((u16)tdx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
	    is_nmi(tdexit_intr_info(vcpu))) {
		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
		vmx_do_nmi_irqoff();
		kvm_after_interrupt(vcpu);
	}

is heinous.  If there's an error that leaves bits 15:0 zero, KVM will synthesize
a spurious NMI.  I don't know whether or not that can happen, but it's not
something that should even be possible in KVM, i.e. the exit reason should be
processed if and only if KVM *knows* there was a sane VM-Exit from non-root mode.

tdx_vcpu_run() has a similar issue, though it's probably benign.  If there's an
error in bits 15:0 that happens to collide with EXIT_REASON_TDCALL, weird things
will happen.

	if (tdx->exit_reason.basic == EXIT_REASON_TDCALL)
		tdx->tdvmcall.rcx = vcpu->arch.regs[VCPU_REGS_RCX];
	else
		tdx->tdvmcall.rcx = 0;

I vote for something like the below, with much more robust checking of vp_enter_ret
before it's converted to a VMX exit reason.

static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu)
{
	return (u32)vcpu->vp_enter_ret;
}

diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index af3a2b8afee8..b9b40b2eaccb 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -43,37 +43,6 @@ struct kvm_tdx {
        struct page *source_page;
 };
 
-union tdx_exit_reason {
-       struct {
-               /* 31:0 mirror the VMX Exit Reason format */
-               u64 basic               : 16;
-               u64 reserved16          : 1;
-               u64 reserved17          : 1;
-               u64 reserved18          : 1;
-               u64 reserved19          : 1;
-               u64 reserved20          : 1;
-               u64 reserved21          : 1;
-               u64 reserved22          : 1;
-               u64 reserved23          : 1;
-               u64 reserved24          : 1;
-               u64 reserved25          : 1;
-               u64 bus_lock_detected   : 1;
-               u64 enclave_mode        : 1;
-               u64 smi_pending_mtf     : 1;
-               u64 smi_from_vmx_root   : 1;
-               u64 reserved30          : 1;
-               u64 failed_vmentry      : 1;
-
-               /* 63:32 are TDX specific */
-               u64 details_l1          : 8;
-               u64 class               : 8;
-               u64 reserved61_48       : 14;
-               u64 non_recoverable     : 1;
-               u64 error               : 1;
-       };
-       u64 full;
-};
-
 struct vcpu_tdx {
        struct kvm_vcpu vcpu;
 
@@ -103,7 +72,8 @@ struct vcpu_tdx {
                };
                u64 rcx;
        } tdvmcall;
-       union tdx_exit_reason exit_reason;
+
+       u64 vp_enter_ret;
 
        bool initialized;

Isaku Yamahata March 15, 2024, 8:42 p.m. UTC | #2

On Fri, Mar 15, 2024 at 10:26:30AM -0700,
Sean Christopherson <seanjc@google.com> wrote:

> > diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> > index d822e790e3e5..81d301fbe638 100644
> > --- a/arch/x86/kvm/vmx/tdx.h
> > +++ b/arch/x86/kvm/vmx/tdx.h
> > @@ -27,6 +27,37 @@ struct kvm_tdx {
> >  	struct page *source_page;
> >  };
> >  
> > +union tdx_exit_reason {
> > +	struct {
> > +		/* 31:0 mirror the VMX Exit Reason format */
> 
> Then use "union vmx_exit_reason", having to maintain duplicate copies of the same
> union is not something I want to do.
> 
> I'm honestly not even convinced that "union tdx_exit_reason" needs to exist.  I
> added vmx_exit_reason because we kept having bugs where KVM would fail to strip
> bits 31:16, and because nested VMX needs to stuff failed_vmentry, but I don't
> see a similar need for TDX.
> 
> I would even go so far as to say the vcpu_tdx field shouldn't be exit_reason,
> and instead should be "return_code" or something.  E.g. if the TDX module refuses
> to run the vCPU, there's no VM-Enter and thus no VM-Exit (unless you count the
> SEAMCALL itself, har har).  Ditto for #GP or #UD on the SEAMCALL (or any other
> reason that generates TDX_SW_ERROR).
> 
> Ugh, I'm doubling down on that suggesting.  This:
> 
> 	WARN_ON_ONCE(!kvm_rebooting &&
> 		     (tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR);
> 
> 	if ((u16)tdx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> 	    is_nmi(tdexit_intr_info(vcpu))) {
> 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> 		vmx_do_nmi_irqoff();
> 		kvm_after_interrupt(vcpu);
> 	}
> 
> is heinous.  If there's an error that leaves bits 15:0 zero, KVM will synthesize
> a spurious NMI.  I don't know whether or not that can happen, but it's not
> something that should even be possible in KVM, i.e. the exit reason should be
> processed if and only if KVM *knows* there was a sane VM-Exit from non-root mode.
> 
> tdx_vcpu_run() has a similar issue, though it's probably benign.  If there's an
> error in bits 15:0 that happens to collide with EXIT_REASON_TDCALL, weird things
> will happen.
> 
> 	if (tdx->exit_reason.basic == EXIT_REASON_TDCALL)
> 		tdx->tdvmcall.rcx = vcpu->arch.regs[VCPU_REGS_RCX];
> 	else
> 		tdx->tdvmcall.rcx = 0;
> 
> I vote for something like the below, with much more robust checking of vp_enter_ret
> before it's converted to a VMX exit reason.
> 
> static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu)
> {
> 	return (u32)vcpu->vp_enter_ret;
> }

Thank you for the concrete suggestion.  Let me explore what safe guard check
can be done to make exit path robust.

Edgecombe, Rick P March 18, 2024, 9:01 p.m. UTC | #3

On Mon, 2024-02-26 at 00:26 -0800, isaku.yamahata@intel.com wrote:
> +fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
> +{
> +       struct vcpu_tdx *tdx = to_tdx(vcpu);
> +
> +       if (unlikely(!tdx->initialized))
> +               return -EINVAL;
> +       if (unlikely(vcpu->kvm->vm_bugged)) {
> +               tdx->exit_reason.full = TDX_NON_RECOVERABLE_VCPU;
> +               return EXIT_FASTPATH_NONE;
> +       }
> +

Isaku, can you elaborate on why this needs special handling? There is a
check in vcpu_enter_guest() like:
	if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
		r = -EIO;
		goto out;
	}

Instead it returns a SEAM error code for something actuated by KVM. But
can it even be reached because of the other check? Not sure if there is
a problem, just sticks out to me and wondering whats going on.

Isaku Yamahata March 18, 2024, 11:40 p.m. UTC | #4

On Mon, Mar 18, 2024 at 09:01:05PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Mon, 2024-02-26 at 00:26 -0800, isaku.yamahata@intel.com wrote:
> > +fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
> > +{
> > +       struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +
> > +       if (unlikely(!tdx->initialized))
> > +               return -EINVAL;
> > +       if (unlikely(vcpu->kvm->vm_bugged)) {
> > +               tdx->exit_reason.full = TDX_NON_RECOVERABLE_VCPU;
> > +               return EXIT_FASTPATH_NONE;
> > +       }
> > +
> 
> Isaku, can you elaborate on why this needs special handling? There is a
> check in vcpu_enter_guest() like:
> 	if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
> 		r = -EIO;
> 		goto out;
> 	}
> 
> Instead it returns a SEAM error code for something actuated by KVM. But
> can it even be reached because of the other check? Not sure if there is
> a problem, just sticks out to me and wondering whats going on.

The original intention is to get out the inner loop.  As Sean pointed it out,
the current code does poor job to check error of
__seamcall_saved_ret(TDH_VP_ENTER).  So it fails to call KVM_BUG_ON() when it
returns unexcepted error.

The right fix is to properly check an error from TDH_VP_ENTER and call
KVM_BUG_ON().  Then the check you pointed out should go away.

  for // out loop
     if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
         for // inner loop
           vcpu_run()
           kvm_vcpu_exit_request(vcpu).

Kirill A. Shutemov April 4, 2024, 1:22 p.m. UTC | #5

On Mon, Feb 26, 2024 at 12:26:20AM -0800, isaku.yamahata@intel.com wrote:
> @@ -491,6 +494,87 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	 */
>  }
>  
> +static noinstr void tdx_vcpu_enter_exit(struct vcpu_tdx *tdx)
> +{

...

> +	tdx->exit_reason.full = __seamcall_saved_ret(TDH_VP_ENTER, &args);

Call to __seamcall_saved_ret() leaves noinstr section.

__seamcall_saved_ret() has to be moved:

diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
index e32cf82ed47e..6b434ab12db6 100644
--- a/arch/x86/virt/vmx/tdx/seamcall.S
+++ b/arch/x86/virt/vmx/tdx/seamcall.S
@@ -44,6 +44,8 @@ SYM_FUNC_START(__seamcall_ret)
 SYM_FUNC_END(__seamcall_ret)
 EXPORT_SYMBOL_GPL(__seamcall_ret);
 
+.section .noinstr.text, "ax"
+
 /*
  * __seamcall_saved_ret() - Host-side interface functions to SEAM software
  * (the P-SEAMLDR or the TDX module), with saving output registers to the

Huang, Kai April 4, 2024, 9:51 p.m. UTC | #6

On Thu, 2024-04-04 at 16:22 +0300, Kirill A. Shutemov wrote:
> On Mon, Feb 26, 2024 at 12:26:20AM -0800, isaku.yamahata@intel.com wrote:
> > @@ -491,6 +494,87 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> >  	 */
> >  }
> >  
> > +static noinstr void tdx_vcpu_enter_exit(struct vcpu_tdx *tdx)
> > +{
> 
> ...
> 
> > +	tdx->exit_reason.full = __seamcall_saved_ret(TDH_VP_ENTER, &args);
> 
> Call to __seamcall_saved_ret() leaves noinstr section.
> 
> __seamcall_saved_ret() has to be moved:
> 
> diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
> index e32cf82ed47e..6b434ab12db6 100644
> --- a/arch/x86/virt/vmx/tdx/seamcall.S
> +++ b/arch/x86/virt/vmx/tdx/seamcall.S
> @@ -44,6 +44,8 @@ SYM_FUNC_START(__seamcall_ret)
>  SYM_FUNC_END(__seamcall_ret)
>  EXPORT_SYMBOL_GPL(__seamcall_ret);
>  
> +.section .noinstr.text, "ax"
> +
>  /*
>   * __seamcall_saved_ret() - Host-side interface functions to SEAM software
>   * (the P-SEAMLDR or the TDX module), with saving output registers to the

Alternatively, I think we can explicitly use instrumentation_begin()/end()
around __seamcall_saved_ret() here.

__seamcall_saved_ret() could be used in the future for new SEAMCALLs (e.g.,
TDH.MEM.IMPORT) for TDX guest live migration.  And for that I don't think the
caller(s) is/are tagged with noinstr.

Sean Christopherson April 4, 2024, 10:45 p.m. UTC | #7

On Thu, Apr 04, 2024, Kai Huang wrote:
> On Thu, 2024-04-04 at 16:22 +0300, Kirill A. Shutemov wrote:
> > On Mon, Feb 26, 2024 at 12:26:20AM -0800, isaku.yamahata@intel.com wrote:
> > > @@ -491,6 +494,87 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > >  	 */
> > >  }
> > >  
> > > +static noinstr void tdx_vcpu_enter_exit(struct vcpu_tdx *tdx)
> > > +{
> > 
> > ...
> > 
> > > +	tdx->exit_reason.full = __seamcall_saved_ret(TDH_VP_ENTER, &args);
> > 
> > Call to __seamcall_saved_ret() leaves noinstr section.
> > 
> > __seamcall_saved_ret() has to be moved:
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
> > index e32cf82ed47e..6b434ab12db6 100644
> > --- a/arch/x86/virt/vmx/tdx/seamcall.S
> > +++ b/arch/x86/virt/vmx/tdx/seamcall.S
> > @@ -44,6 +44,8 @@ SYM_FUNC_START(__seamcall_ret)
> >  SYM_FUNC_END(__seamcall_ret)
> >  EXPORT_SYMBOL_GPL(__seamcall_ret);
> >  
> > +.section .noinstr.text, "ax"
> > +
> >  /*
> >   * __seamcall_saved_ret() - Host-side interface functions to SEAM software
> >   * (the P-SEAMLDR or the TDX module), with saving output registers to the
> 
> Alternatively, I think we can explicitly use instrumentation_begin()/end()
> around __seamcall_saved_ret() here.

No, that will just paper over the complaint.  Dang it, I was going to say that
I called out earlier that tdx_vcpu_enter_exit() doesn't need to be noinstr, but
it looks like my brain and fingers didn't connect.

So I'll say it now :-)

I don't think tdx_vcpu_enter_exit() needs to be noinstr, because the SEAMCALL is
functionally a VM-Exit, and so all host state is saved/restored "atomically"
across the SEAMCALL (some by hardware, some by software (TDX-module)).

The reason the VM-Enter flows for VMX and SVM need to be noinstr is they do things
like load the guest's CR2, and handle NMI VM-Exits with NMIs blocks.  None of
that applies to TDX.  Either that, or there are some massive bugs lurking due to
missing code.

Huang, Kai April 4, 2024, 11:28 p.m. UTC | #8

On Thu, 2024-04-04 at 15:45 -0700, Sean Christopherson wrote:
> On Thu, Apr 04, 2024, Kai Huang wrote:
> > On Thu, 2024-04-04 at 16:22 +0300, Kirill A. Shutemov wrote:
> > > On Mon, Feb 26, 2024 at 12:26:20AM -0800, isaku.yamahata@intel.com wrote:
> > > > @@ -491,6 +494,87 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > > >  	 */
> > > >  }
> > > >  
> > > > +static noinstr void tdx_vcpu_enter_exit(struct vcpu_tdx *tdx)
> > > > +{
> > > 
> > > ...
> > > 
> > > > +	tdx->exit_reason.full = __seamcall_saved_ret(TDH_VP_ENTER, &args);
> > > 
> > > Call to __seamcall_saved_ret() leaves noinstr section.
> > > 
> > > __seamcall_saved_ret() has to be moved:
> > > 
> > > diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
> > > index e32cf82ed47e..6b434ab12db6 100644
> > > --- a/arch/x86/virt/vmx/tdx/seamcall.S
> > > +++ b/arch/x86/virt/vmx/tdx/seamcall.S
> > > @@ -44,6 +44,8 @@ SYM_FUNC_START(__seamcall_ret)
> > >  SYM_FUNC_END(__seamcall_ret)
> > >  EXPORT_SYMBOL_GPL(__seamcall_ret);
> > >  
> > > +.section .noinstr.text, "ax"
> > > +
> > >  /*
> > >   * __seamcall_saved_ret() - Host-side interface functions to SEAM software
> > >   * (the P-SEAMLDR or the TDX module), with saving output registers to the
> > 
> > Alternatively, I think we can explicitly use instrumentation_begin()/end()
> > around __seamcall_saved_ret() here.
> 
> No, that will just paper over the complaint.  Dang it, I was going to say that
> I called out earlier that tdx_vcpu_enter_exit() doesn't need to be noinstr, but
> it looks like my brain and fingers didn't connect.
> 
> So I'll say it now :-)
> 
> I don't think tdx_vcpu_enter_exit() needs to be noinstr, because the SEAMCALL is
> functionally a VM-Exit, and so all host state is saved/restored "atomically"
> across the SEAMCALL (some by hardware, some by software (TDX-module)).
> 
> The reason the VM-Enter flows for VMX and SVM need to be noinstr is they do things
> like load the guest's CR2, and handle NMI VM-Exits with NMIs blocks.  None of
> that applies to TDX.  Either that, or there are some massive bugs lurking due to
> missing code.

Ah right.  That's even better :-)

Thanks for jumping in and pointing out!

Binbin Wu April 7, 2024, 1:42 a.m. UTC | #9

On 3/16/2024 1:26 AM, Sean Christopherson wrote:
> On Mon, Feb 26, 2024, isaku.yamahata@intel.com wrote:
>> +	 */
>> +	struct kvm_vcpu *vcpu = &tdx->vcpu;
>> +
>> +	guest_state_enter_irqoff();
>> +
>> +	/*
>> +	 * TODO: optimization:
>> +	 * - Eliminate copy between args and vcpu->arch.regs.
>> +	 * - copyin/copyout registers only if (tdx->tdvmvall.regs_mask != 0)
>> +	 *   which means TDG.VP.VMCALL.
>> +	 */
>> +	args = (struct tdx_module_args) {
>> +		.rcx = tdx->tdvpr_pa,
>> +#define REG(reg, REG)	.reg = vcpu->arch.regs[VCPU_REGS_ ## REG]
> Organizing tdx_module_args's registers by volatile vs. non-volatile is asinine.
> This code should not need to exist.

Did you suggest to align the tdx_module_args with enum kvm_reg for GP 
registers, so it can be done by a simple mem copy?

>
>> +	WARN_ON_ONCE(!kvm_rebooting &&
>> +		     (tdx->exit_reason.full & TDX_SW_ERROR) == TDX_SW_ERROR);
>> +
>> +	guest_state_exit_irqoff();
>> +}
>> +
>>

[v19,078/130] KVM: TDX: Implement TDX vcpu enter/exit path

Commit Message

Comments

Patch