From patchwork Sun Dec 1 03:53:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13889433 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B84F4207A; Sun, 1 Dec 2024 03:52:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025123; cv=none; b=YQbSoNimKjA9ncZuyK59t2lXduUXqVeVrjBsxpjVrG6YuE7r22DlQBikwI5okq/9f/B+Wljw3LWs6jem1Ix90Fnmh/3qoSnSdbHDA7xshhpqXBPSP2945G38h0MDhexvgP2N8C8im+CzcF0/v1DsP1unyB1kFZyLCGnsvopTHuw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025123; c=relaxed/simple; bh=hkw9rG4cKA1zT1DbXIMaMT1/XactRoQ3dJeEtIMgxu4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ilZ29Tm79akCAxWb2f7rB1ohyqOIH+GTUUCOpfNNopXO5aoIV+ni9wCnLNaBVKnZjgO5GGjSKbDbxiGQABXdI9OnuzVyUqJOgfjSq+V/i+n3+w7BQWSCQI41dnpTx+E3EzMsMbLl+k+pbc3tEcRB8B1W2GHuc7izKlmF30TMxPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=diMdgVRU; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="diMdgVRU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733025121; x=1764561121; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hkw9rG4cKA1zT1DbXIMaMT1/XactRoQ3dJeEtIMgxu4=; b=diMdgVRUEpKtsQa8mk4m5Jt/nm8Hg7Wce8B2fe7gIDc7Q6SdfU+8x7sG 4MB0zYAPyhBICiuUS+CvaDoQB+T/Nss8WfGVGqbnzE1VHp3+uOg2DJQJo vdajsTO1BIQKyfZwJwt+wwOQu0aHJrhqLQXdxk309mN0MTVQ2aaqGY+H0 SnZBoc0vp3lxHBm/oIJBPXZJVrldnmNK9M5qlsAh8e1ehTIFQj7XruX8E oxgUVUYRB8iofKIWAUH8Fr0hUQp28WCh/JSYCePmWxSc81NYouvZvZxC+ lTqR+S3IBwNstQJIy1Xkj3xY/RG5pSVfqZOzReRMXchdPAhFqn5U+ydV7 g==; X-CSE-ConnectionGUID: 7dH7l/ytQLS+sML98QTYGw== X-CSE-MsgGUID: 6oRVV6dnSyeG2jN+hAX61w== X-IronPort-AV: E=McAfee;i="6700,10204,11272"; a="50725096" X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="50725096" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:00 -0800 X-CSE-ConnectionGUID: OBbNCpiVSoqAoz5k45kAUg== X-CSE-MsgGUID: X/VkhGZXQOiNHwDaQODcmw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="93257481" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:51:57 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Date: Sun, 1 Dec 2024 11:53:50 +0800 Message-ID: <20241201035358.2193078-2-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241201035358.2193078-1-binbin.wu@linux.intel.com> References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Introduce the wiring for handling TDX VM exits by implementing the callbacks .get_exit_info(), and .handle_exit(). Additionally, add error handling during the TDX VM exit flow, and add a place holder to handle various exit reasons. Add helper functions to retrieve exit information, exit qualifications, and more. Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.* operations for secure EPT or TD EPOCH. If contention occurs, the return value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU to attempt re-entry into the guest via the fast path. Error Handling: The following scenarios will return to userspace with KVM_EXIT_INTERNAL_ERROR. - TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM firmware is not loaded or disabled. - TDX_ERROR: This indicates some check failed in the TDX module, preventing the vCPU from running. - TDX_NON_RECOVERABLE: Set by the TDX module when the error is non-recoverable, indicating that the TDX guest is dead or the vCPU is disabled. This also covers failed_vmentry case, which must have TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled. An exception is the triple fault, which also sets TDX_NON_RECOVERABLE but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX case. - Any unhandled VM exit reason will also return to userspace with KVM_EXIT_INTERNAL_ERROR. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Chao Gao --- Hypercalls exit to userspace breakout: - Dropped Paolo's Reviewed-by since the change is not subtle. - Mention addition of .get_exit_info() handler in changelog. (Binbin) - tdh_sept_seamcall() -> tdx_seamcall_sept() in comments. (Binbin) - Do not open code TDX_ERROR_SEPT_BUSY. (Binbin) - "TDH.VP.ENTRY" -> "TDH.VP.ENTER". (Binbin) - Remove the use of union tdx_exit_reason. (Sean) https://lore.kernel.org/kvm/ZfSExlemFMKjBtZb@google.com/ - Add tdx_check_exit_reason() to check a VMX exit reason against the status code of TDH.VP.ENTER. - Move the handling of TDX_ERROR_SEPT_BUSY and (TDX_OPERAND_BUSY | TDX_OPERAND_ID_TD_EPOCH) into fast path, and add a helper function tdx_exit_handlers_fastpath(). - Remove the warning on TDX_SW_ERROR in fastpath, but return without further handling. - Call kvm_machine_check() for EXIT_REASON_MCE_DURING_VMENTRY, align with VMX case. - On failed_vmentry in fast path, return without further handling. - Exit to userspace for #UD and #GP. - Fix whitespace in tdx_get_exit_info() - Add a comment in tdx_handle_exit() to describe failed_vmentry case is handled by TDX_NON_RECOVERABLE handling. - Move the code of handling NMI, exception and external interrupts out of the patch, i.e., the NMI handling in tdx_vcpu_enter_exit() and the wiring of .handle_exit_irqoff() are removed. - Drop the check for VCPU_TD_STATE_INITIALIZED in tdx_handle_exit() because it has been checked in tdx_vcpu_pre_run(). - Update changelog. --- arch/x86/include/asm/tdx.h | 1 + arch/x86/kvm/vmx/main.c | 25 +++++- arch/x86/kvm/vmx/tdx.c | 164 ++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx_errno.h | 3 + arch/x86/kvm/vmx/x86_ops.h | 8 ++ 5 files changed, 198 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 77477b905dca..01409a59224d 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -18,6 +18,7 @@ * TDX module. */ #define TDX_ERROR _BITUL(63) +#define TDX_NON_RECOVERABLE _BITUL(62) #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index f8acb1dc7c10..4f6faeb6e8e5 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -165,6 +165,15 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return vmx_vcpu_run(vcpu, force_immediate_exit); } +static int vt_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath) +{ + if (is_td_vcpu(vcpu)) + return tdx_handle_exit(vcpu, fastpath); + + return vmx_handle_exit(vcpu, fastpath); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -212,6 +221,18 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); } +static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) +{ + if (is_td_vcpu(vcpu)) { + tdx_get_exit_info(vcpu, reason, info1, info2, intr_info, + error_code); + return; + } + + vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -305,7 +326,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_pre_run = vt_vcpu_pre_run, .vcpu_run = vt_vcpu_run, - .handle_exit = vmx_handle_exit, + .handle_exit = vt_handle_exit, .skip_emulated_instruction = vmx_skip_emulated_instruction, .update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vmx_set_interrupt_shadow, @@ -340,7 +361,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_identity_map_addr = vmx_set_identity_map_addr, .get_mt_mask = vmx_get_mt_mask, - .get_exit_info = vmx_get_exit_info, + .get_exit_info = vt_get_exit_info, .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f975bb323f60..3dcbdb5a7bf8 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -186,6 +186,54 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); } +static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu) +{ + return (union vmx_exit_reason)(u32)(to_tdx(vcpu)->vp_enter_ret); +} + +/* + * There is no simple way to check some bit(s) to decide whether the return + * value of TDH.VP.ENTER has a VMX exit reason or not. E.g., + * TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE has exit reason but with error bit + * (bit 63) set, TDX_NON_RECOVERABLE_TD_CORRUPTED_MD has no exit reason but with + * error bit cleared. + */ +static __always_inline bool tdx_has_exit_reason(struct kvm_vcpu *vcpu) +{ + u64 status = to_tdx(vcpu)->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK; + + return status == TDX_SUCCESS || status == TDX_NON_RECOVERABLE_VCPU || + status == TDX_NON_RECOVERABLE_TD || + status == TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE || + status == TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE; +} + +static __always_inline bool tdx_check_exit_reason(struct kvm_vcpu *vcpu, u16 reason) +{ + return tdx_has_exit_reason(vcpu) && + (u16)tdexit_exit_reason(vcpu).basic == reason; +} + +static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcpu) +{ + return kvm_rcx_read(vcpu); +} + +static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu *vcpu) +{ + return kvm_rdx_read(vcpu); +} + +static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu) +{ + return kvm_r8_read(vcpu); +} + +static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu) +{ + return kvm_r9_read(vcpu); +} + static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) { tdx_guest_keyid_free(kvm_tdx->hkid); @@ -824,6 +872,21 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu) guest_state_exit_irqoff(); } +static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu) +{ + u64 vp_enter_ret = to_tdx(vcpu)->vp_enter_ret; + + /* See the comment of tdx_seamcall_sept(). */ + if (unlikely(vp_enter_ret == TDX_ERROR_SEPT_BUSY)) + return EXIT_FASTPATH_REENTER_GUEST; + + /* TDH.VP.ENTER checks TD EPOCH which can contend with TDH.MEM.TRACK. */ + if (unlikely(vp_enter_ret == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_TD_EPOCH))) + return EXIT_FASTPATH_REENTER_GUEST; + + return EXIT_FASTPATH_NONE; +} + fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED; vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET; + + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) + return EXIT_FASTPATH_NONE; + + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY))) + kvm_machine_check(); + trace_kvm_exit(vcpu, KVM_ISA_VMX); - return EXIT_FASTPATH_NONE; + if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry)) + return EXIT_FASTPATH_NONE; + + return tdx_exit_handlers_fastpath(vcpu); +} + +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) +{ + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; + vcpu->mmio_needed = 0; + return 0; } void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); } +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + u64 vp_enter_ret = tdx->vp_enter_ret; + union vmx_exit_reason exit_reason; + + if (fastpath != EXIT_FASTPATH_NONE) + return 1; + + /* + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and + * TDX_SEAMCALL_VMFAILINVALID. + */ + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); + goto unhandled_exit; + } + + /* + * Without off-TD debug enabled, failed_vmentry case must have + * TDX_NON_RECOVERABLE set. + */ + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) { + /* Triple fault is non-recoverable. */ + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT))) + return tdx_handle_triple_fault(vcpu); + + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n", + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid, + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid)); + goto unhandled_exit; + } + + /* From now, the seamcall status should be TDX_SUCCESS. */ + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); + exit_reason = tdexit_exit_reason(vcpu); + + switch (exit_reason.basic) { + default: + break; + } + +unhandled_exit: + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; + vcpu->run->internal.ndata = 2; + vcpu->run->internal.data[0] = vp_enter_ret; + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; + return 0; +} + +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (tdx_has_exit_reason(vcpu)) { + /* + * Encode some useful info from the the 64 bit return code + * into the 32 bit exit 'reason'. If the VMX exit reason is + * valid, just set it to those bits. + */ + *reason = (u32)tdx->vp_enter_ret; + *info1 = tdexit_exit_qual(vcpu); + *info2 = tdexit_ext_exit_qual(vcpu); + } else { + /* + * When the VMX exit reason in vp_enter_ret is not valid, + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to + * mean the vmexit code is not valid. Set the other bits to + * try to avoid picking a value that may someday be a valid + * VMX exit code. + */ + *reason = 0xFFFFFFFF; + *info1 = 0; + *info2 = 0; + } + + *intr_info = tdexit_intr_info(vcpu); + *error_code = 0; +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h index f9dbb3a065cc..6ff4672c4181 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -10,6 +10,9 @@ * TDX SEAMCALL Status Codes (returned in RAX) */ #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL +#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL +#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL +#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL #define TDX_OPERAND_INVALID 0xC000010000000000ULL #define TDX_OPERAND_BUSY 0x8000020000000000ULL diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 02b33390e1bf..1c18943e0e1d 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -133,6 +133,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu); fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit); void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); void tdx_vcpu_put(struct kvm_vcpu *vcpu); +int tdx_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath); +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -167,6 +171,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat } static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} +static inline int tdx_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath) { return 0; } +static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, + u64 *info2, u32 *intr_info, u32 *error_code) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } From patchwork Sun Dec 1 03:53:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13889434 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEDE5DF42; Sun, 1 Dec 2024 03:52:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025126; cv=none; b=qoFmQ14xc0LYloO8+CgRCZbF54629R1R7oBM9fRieAzEsNuKr0tc9DS1THzvLpXrfUQ8qt2i0cQ+zD+TPUNbxnbJNh5QsP2sJJmjJivnczwWXNC3XZMEiUUOweQr3+h4PdDiN/dI/wODNhRoJSynQbCuxphCQ9mRw99Dw9OVE6M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025126; c=relaxed/simple; bh=tYJ6iEY5/6H5eegsNo+KnyVIV+yMoWZ7KNPYBExv3AM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kF8CSxNqYTJjPPECWmfuvKPpIGAlktZ+T9Xno2u1e8nHnFfbgEmzvxBJXhewb7m+a1RYOWinOiYXWDnWkaE5BVTvmkhUpEdp+/o9h+RBiB82zd1jFqtNxGlrMqCV2fhJp2JiTaZUf1aRO9qXWnO9yZSo7IYOb/eKNEChSuFmdrU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UqbevA0Q; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UqbevA0Q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733025125; x=1764561125; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tYJ6iEY5/6H5eegsNo+KnyVIV+yMoWZ7KNPYBExv3AM=; b=UqbevA0QPby00DdUp2VBWCAqAgWIzlyH0peOotQq0l2/c9BeOH3psDap war5wdkgxWpIH1Rblbb6iiI8cW43JUmmAeWIehfYhp4sVu4cU1VG45xu7 f/aZKbDJBBf0KPXhKT4V/CclShAsU892ro67m6f1/NKraw0eXwMXrSciE NanajWUlWDMsmNCW/9cc/G78zvqWcW9U0NOrZCDWgl166NULzFzsUQ2BL 71Jn5C9SUqQu2mBa4yiO+p9gXoAPZzIdqk76YG78tuBAP2oktiGLF8qXx 7LehZ2WKuWE0oxuqNBw26HCu3dePkv2JhSwWYyFmnOLy6b9KDwhE4mhGW A==; X-CSE-ConnectionGUID: 9lgVy8jcQUax/aWTguDM6Q== X-CSE-MsgGUID: uaNBXAzIQF+LXJHIi6TYoA== X-IronPort-AV: E=McAfee;i="6700,10204,11272"; a="50725101" X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="50725101" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:04 -0800 X-CSE-ConnectionGUID: AMJl2am2TKCBjTdRM1IBkg== X-CSE-MsgGUID: 0QDAGGB7TgaTmgBjAamxkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="93257485" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:01 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) Date: Sun, 1 Dec 2024 11:53:51 +0800 Message-ID: <20241201035358.2193078-3-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241201035358.2193078-1-binbin.wu@linux.intel.com> References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a place holder and related helper functions for preparation of TDG.VP.VMCALL handling. The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL for short) for the guest TD to call hypercall to VMM. When the guest TD issues a TDVMCALL, the guest TD exits to VMM with a new exit reason. The arguments from the guest TD and returned values from the VMM are passed in the guest registers. The guest RCX register indicates which registers are used. Define helper functions to access those registers. A new VMX exit reason TDCALL is added to indicate the exit is due to TDVMCALL from the guest TD. Define the TDCALL exit reason and add a place holder to handle such exit. Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Chao Gao --- Hypercalls exit to userspace breakout: - Update changelog. - Drop the unused tdx->tdvmcall. (Chao) - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- arch/x86/include/uapi/asm/vmx.h | 4 ++- arch/x86/kvm/vmx/tdx.c | 48 +++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h index a5faf6d88f1b..6a9f268a2d2c 100644 --- a/arch/x86/include/uapi/asm/vmx.h +++ b/arch/x86/include/uapi/asm/vmx.h @@ -92,6 +92,7 @@ #define EXIT_REASON_TPAUSE 68 #define EXIT_REASON_BUS_LOCK 74 #define EXIT_REASON_NOTIFY 75 +#define EXIT_REASON_TDCALL 77 #define VMX_EXIT_REASONS \ { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ @@ -155,7 +156,8 @@ { EXIT_REASON_UMWAIT, "UMWAIT" }, \ { EXIT_REASON_TPAUSE, "TPAUSE" }, \ { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \ - { EXIT_REASON_NOTIFY, "NOTIFY" } + { EXIT_REASON_NOTIFY, "NOTIFY" }, \ + { EXIT_REASON_TDCALL, "TDCALL" } #define VMX_EXIT_REASON_FLAGS \ { VMX_EXIT_REASONS_FAILED_VMENTRY, "FAILED_VMENTRY" } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 3dcbdb5a7bf8..19fd8a5dabd0 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -234,6 +234,41 @@ static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu) return kvm_r9_read(vcpu); } +#define BUILD_TDVMCALL_ACCESSORS(param, gpr) \ +static __always_inline \ +unsigned long tdvmcall_##param##_read(struct kvm_vcpu *vcpu) \ +{ \ + return kvm_##gpr##_read(vcpu); \ +} \ +static __always_inline void tdvmcall_##param##_write(struct kvm_vcpu *vcpu, \ + unsigned long val) \ +{ \ + kvm_##gpr##_write(vcpu, val); \ +} +BUILD_TDVMCALL_ACCESSORS(a0, r12); +BUILD_TDVMCALL_ACCESSORS(a1, r13); +BUILD_TDVMCALL_ACCESSORS(a2, r14); +BUILD_TDVMCALL_ACCESSORS(a3, r15); + +static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *vcpu) +{ + return kvm_r10_read(vcpu); +} +static __always_inline unsigned long tdvmcall_leaf(struct kvm_vcpu *vcpu) +{ + return kvm_r11_read(vcpu); +} +static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu, + long val) +{ + kvm_r10_write(vcpu, val); +} +static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu, + unsigned long val) +{ + kvm_r11_write(vcpu, val); +} + static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) { tdx_guest_keyid_free(kvm_tdx->hkid); @@ -922,6 +957,17 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) return 0; } +static int handle_tdvmcall(struct kvm_vcpu *vcpu) +{ + switch (tdvmcall_leaf(vcpu)) { + default: + break; + } + + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; +} + void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) { u64 shared_bit = (pgd_level == 5) ? TDX_SHARED_BIT_PWL_5 : @@ -1253,6 +1299,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) exit_reason = tdexit_exit_reason(vcpu); switch (exit_reason.basic) { + case EXIT_REASON_TDCALL: + return handle_tdvmcall(vcpu); default: break; } From patchwork Sun Dec 1 03:53:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13889435 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBC567DA6C; Sun, 1 Dec 2024 03:52:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025130; cv=none; b=Y3b1kDAgyUpE6LYpxDeYPXMqwCTSuupvKpQoFtPFSiKwUWZtX1tBSrjiG/gU5IrtvH/9qD60SBNqN59o5n/gSf1HCCAHXDOdOb9zTZCtgQC2+ZMhtpFxIhvhsS3DmENWkLfTffr+v/LY0dONoW+FkTG7OQsqzUCyqFn0vYZbtx4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025130; c=relaxed/simple; bh=oKmFGpwaYPYX2xRtD/D2+9iqTkBry/YGOf7oF2I0Ijg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Rf47CKBS3COHA+FYGgne8Bgw4SZmXttGbc5CUfZOM/sS5QV84f40rbbOmWSz8TJRX8G0pYKm7JSFkBoIP8e8D777laI9+0PqSngXjTub8NlctGIOxOZEzbIroAJ+jtm+KxXxE1rc1t/rlRGEMJElVb7DAVXGy/usDVPlFKzZb4w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=J7NWc9FG; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="J7NWc9FG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733025128; x=1764561128; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oKmFGpwaYPYX2xRtD/D2+9iqTkBry/YGOf7oF2I0Ijg=; b=J7NWc9FG+4t6dWCbi1Jn2dVKT3SOxZTxfMssgSWaEmVZFKx6a3M2vcYO 1vR6sUzGDpZJESyJcaW8dMptNdFcCwSXtqwwlyuC5YgbbX//RnLSpb6DR WpYaewzoqvM9OzfQUKO2vmx06alLbimo7ASuU0YY9tShBTcKOcpzO78SY fZ7nQ3qLWJTbbGY5RHI2rsXjCbxtNYFiRhre5RRBVp1gTt1lD+gPCeqwh IMg6JZ8+L4Hkh/sTFmyG/rgl+pc4x4bk53+vsM0Nk6zLr4/16VD8Y4nkN jhVj93DkWPlUx92dkItJQbDW6z03re5ul/jxCUchY/1X5I8G0Oj5MWPy1 g==; X-CSE-ConnectionGUID: j44x9Nk3QjCF9pqUqwmIEA== X-CSE-MsgGUID: Ti/LWCa7S6qhv7AbUa7Z+A== X-IronPort-AV: E=McAfee;i="6700,10204,11272"; a="50725105" X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="50725105" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:08 -0800 X-CSE-ConnectionGUID: EMEAzHAOTvyKOk3wFgyJIg== X-CSE-MsgGUID: QB/iv6xjQDm1kIesk5yytQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="93257488" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:05 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL Date: Sun, 1 Dec 2024 11:53:52 +0800 Message-ID: <20241201035358.2193078-4-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241201035358.2193078-1-binbin.wu@linux.intel.com> References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle KVM hypercall for TDX according to TDX Guest-Host Communication Interface (GHCI) specification. The TDX GHCI specification defines the ABI for the guest TD to issue hypercalls. When R10 is non-zero, it indicates the TDG.VP.VMCALL is vendor-specific. KVM uses R10 as KVM hypercall number and R11-R14 as 4 arguments, while the error code is returned in R10. Follow the ABI and handle the KVM hypercall for TDX. Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- Hypercalls exit to userspace breakout: - Renamed from "KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL" to "KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL". - Update the change log. - Rebased on Sean's "Prep KVM hypercall handling for TDX" patch set. https://lore.kernel.org/kvm/20241128004344.4072099-1-seanjc@google.com - Use the right register (i.e. R10) to set the return code after returning back from userspace. --- arch/x86/kvm/vmx/tdx.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 19fd8a5dabd0..4cc55b120ab0 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -957,8 +957,39 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu) return 0; } + +static int complete_hypercall_exit(struct kvm_vcpu *vcpu) +{ + kvm_r10_write(vcpu, vcpu->run->hypercall.ret); + return 1; +} + +static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu) +{ + int r; + + /* + * ABI for KVM tdvmcall argument: + * In Guest-Hypervisor Communication Interface(GHCI) specification, + * Non-zero leaf number (R10 != 0) is defined to indicate + * vendor-specific. KVM uses this for KVM hypercall. NOTE: KVM + * hypercall number starts from one. Zero isn't used for KVM hypercall + * number. + * + * R10: KVM hypercall number + * arguments: R11, R12, R13, R14. + */ + r = __kvm_emulate_hypercall(vcpu, r10, r11, r12, r13, r14, true, 0, + complete_hypercall_exit); + + return r > 0; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { + if (tdvmcall_exit_type(vcpu)) + return tdx_emulate_vmcall(vcpu); + switch (tdvmcall_leaf(vcpu)) { default: break; From patchwork Sun Dec 1 03:53:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13889436 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 121054087C; Sun, 1 Dec 2024 03:52:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025134; cv=none; b=NVU008PK7JQs8ncjRlpSZd+a7eK4sU1zxw9sgmMCxIVAQuYOqzNmmiNQwmFhelpt6Jgn6Sa1uHeXiHizUyF7FWfx9GDhsZoFC973ZaoCH0uwuRtAiSsIaV9hf0garyI0PCCHpd4AxU7g1uPqd4OuVVfaJfkEH9Qh6I/6DJ9q8BY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025134; c=relaxed/simple; bh=Gf6BH1HUK/lEL6B847jZ14V/yl231PpyymHcfCxR5Uw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XhSx8HGUkQmnOfwa0thKyCcjUvn0hOWldpagM6cojT0CVA9z+VohP9w9jVU9wL/bOxTSii/VpLOMUsUad6V6yXLe29hfmOWhzxd/jXXwtSBZPu+KZkgCFQZtKAdeULLYqqMwEb+kPw3z/j3zjeuYTnFoG04rvePof/SvgxwxhcY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cwA0Q6/A; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cwA0Q6/A" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733025132; x=1764561132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Gf6BH1HUK/lEL6B847jZ14V/yl231PpyymHcfCxR5Uw=; b=cwA0Q6/AXJ0jYlcU7O1bjrTveoniO19SExngIWWX3g+hxexZuz7E2rWs E8ju56WeNHQypR7u7NN4DLHybb2Hq9w5aio9090tTwAe1BQtbAUrtHpeA IsWNhUoyUkOwYVxLekaOToPw7ZhfpuWQliiN8fUFPtDVcjR34R8ruVOZw NTh2KZ4+GUTeKHTDHUUKhfSvhERer8QQu0dgOCM0n+mdVn7FU4/J4SsZg hKfotDL3ogvXFsEwTKtsTstswnzQ4mdvUxtTFzyoiREv5lgJbIgvYXmj9 cbcCqyII1FZ7QNLCehaF8nEUez7ytKl1b+MwVNLpvVsH9zsT24UDL2Ye6 A==; X-CSE-ConnectionGUID: jkaxPWb4TQ+8t039JAui5g== X-CSE-MsgGUID: moxZhxNWRDuPeL969DO1bg== X-IronPort-AV: E=McAfee;i="6700,10204,11272"; a="50725109" X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="50725109" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:12 -0800 X-CSE-ConnectionGUID: rFv0I07ESQCItJsNNOSguA== X-CSE-MsgGUID: tShM6OwGQwCmeWq4tIT46A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="93257494" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:08 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL Date: Sun, 1 Dec 2024 11:53:53 +0800 Message-ID: <20241201035358.2193078-5-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241201035358.2193078-1-binbin.wu@linux.intel.com> References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Convert TDG.VP.VMCALL to KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE and forward it to userspace for handling. MapGPA is used by TDX guest to request to map a GPA range as private or shared memory. It needs to exit to userspace for handling. KVM has already implemented a similar hypercall KVM_HC_MAP_GPA_RANGE, which will exit to userspace with exit reason KVM_EXIT_HYPERCALL. Do sanity checks, convert TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE and forward the request to userspace. To prevent a TDG.VP.VMCALL call from taking too long, the MapGPA range is split into 2MB chunks and check interrupt pending between chunks. This allows for timely injection of interrupts and prevents issues with guest lockup detection. TDX guest should retry the operation for the GPA starting at the address specified in R11 when the TDVMCALL return TDVMCALL_RETRY as status code. Note userspace needs to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE bit set for TD VM. Suggested-by: Sean Christopherson Signed-off-by: Binbin Wu --- Hypercalls exit to userspace breakout: - New added. Implement one of the hypercalls need to exit to userspace for handling after dropping "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve Sean's comment. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/ - Check interrupt pending between chunks suggested by Sean. https://lore.kernel.org/kvm/ZleJvmCawKqmpFIa@google.com/ - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) - Use vt_is_tdx_private_gpa() --- arch/x86/include/asm/shared/tdx.h | 1 + arch/x86/kvm/vmx/tdx.c | 110 ++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 3 + 3 files changed, 114 insertions(+) diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index 620327f0161f..a602d081cf1c 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -32,6 +32,7 @@ #define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL #define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL #define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL +#define TDVMCALL_STATUS_ALIGN_ERROR 0x8000000000000002ULL /* * Bitmasks of exposed registers (with VMM). diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 4cc55b120ab0..553f4cbe0693 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -985,12 +985,122 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu) return r > 0; } +/* + * Split into chunks and check interrupt pending between chunks. This allows + * for timely injection of interrupts to prevent issues with guest lockup + * detection. + */ +#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024) +static void __tdx_map_gpa(struct vcpu_tdx * tdx); + +static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx * tdx = to_tdx(vcpu); + + if(vcpu->run->hypercall.ret) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + kvm_r11_write(vcpu, tdx->map_gpa_next); + return 1; + } + + tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN; + if (tdx->map_gpa_next >= tdx->map_gpa_end) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + return 1; + } + + /* + * Stop processing the remaining part if there is pending interrupt. + * Skip checking pending virtual interrupt (reflected by + * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because + * if guest disabled interrupt, it's OK not returning back to guest + * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA + * immediately after STI or MOV/POP SS. + */ + if (pi_has_pending_interrupt(vcpu) || + kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY); + kvm_r11_write(vcpu, tdx->map_gpa_next); + return 1; + } + + __tdx_map_gpa(tdx); + /* Forward request to userspace. */ + return 0; +} + +static void __tdx_map_gpa(struct vcpu_tdx * tdx) +{ + u64 gpa = tdx->map_gpa_next; + u64 size = tdx->map_gpa_end - tdx->map_gpa_next; + + if(size > TDX_MAP_GPA_MAX_LEN) + size = TDX_MAP_GPA_MAX_LEN; + + tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL; + tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; + tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); + tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; + tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ? + KVM_MAP_GPA_RANGE_ENCRYPTED : + KVM_MAP_GPA_RANGE_DECRYPTED; + tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE; + + tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa; +} + +static int tdx_map_gpa(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx * tdx = to_tdx(vcpu); + u64 gpa = tdvmcall_a0_read(vcpu); + u64 size = tdvmcall_a1_read(vcpu); + u64 ret; + + /* + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires + * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE + * bit set. If not, the error code is not defined in GHCI for TDX, use + * TDVMCALL_STATUS_INVALID_OPERAND for this case. + */ + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) { + ret = TDVMCALL_STATUS_INVALID_OPERAND; + goto error; + } + + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) || + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) || + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) != + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) { + ret = TDVMCALL_STATUS_INVALID_OPERAND; + goto error; + } + + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) { + ret = TDVMCALL_STATUS_ALIGN_ERROR; + goto error; + } + + tdx->map_gpa_end = gpa + size; + tdx->map_gpa_next = gpa; + + __tdx_map_gpa(tdx); + /* Forward request to userspace. */ + return 0; + +error: + tdvmcall_set_return_code(vcpu, ret); + kvm_r11_write(vcpu, gpa); + return 1; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { if (tdvmcall_exit_type(vcpu)) return tdx_emulate_vmcall(vcpu); switch (tdvmcall_leaf(vcpu)) { + case TDVMCALL_MAP_GPA: + return tdx_map_gpa(vcpu); default: break; } diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 1abc94b046a0..bfae70887695 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -71,6 +71,9 @@ struct vcpu_tdx { enum tdx_prepare_switch_state prep_switch_state; u64 msr_host_kernel_gs_base; + + u64 map_gpa_next; + u64 map_gpa_end; }; void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err); From patchwork Sun Dec 1 03:53:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13889437 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F66F4207F; Sun, 1 Dec 2024 03:52:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025137; cv=none; b=GG1tWzBZmI81q23F3VGmTW7J6WNyleIhXBYZ/HeIpKwtDMj0iIDw69Q/VFebFV4nhH1Y1omB+dCd1yckkZ/WZvomhJNmSmMmkQerMAv8neADS5C7+MX01RT74WDIK5yM21h7Strv/cpuVq4zLUfCh+jh2Q5vSUDcJTYGnEYb6Xk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025137; c=relaxed/simple; bh=4S8FG+ypR0Ue5mxpFrXQ+K7j574sBZ4M9ovBehlBhdY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kT5wC+Vz3XqHq0lKrwfnU0h7WB9vlSDKHdo7dR4+6AULEoUIWaIYUOSg5WTLOIACsLgncGKQLMuS9SAyFHIJtQjWr+FW82rEVa7d/bWZVPo2P8ms+4AqAho3yimzf4KQmKv2te8zSlzuZed3CqnevYUwPkW8C8nZWlJ2Wk1acr8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hv/RO5kv; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hv/RO5kv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733025135; x=1764561135; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4S8FG+ypR0Ue5mxpFrXQ+K7j574sBZ4M9ovBehlBhdY=; b=hv/RO5kvn15+2vODfyggu2N+bF8CMdYpO4/UBRy6o/QPvVu1ydJ+AiGJ 0tjyz618miUeO60iQkBS+8ae0wdWmurxFlNTLxw0/uvsnnx6vdpWos56M N8dTuqmHYUVUsCQXhsqCGNEhXCl5OXdhWvxaVg3GsMXopc7rM6MAdzdhr TpQxhPUP0/A4A22Xm6tb8PjznCKniVtB9kKhlacdh20LBJaJ2iA4qiTAQ jNHgoEG08b0+BirPiMw06827VAIyK0c9zOzBqg8la3RJf/XtTElHyx39B pGJBekNzlLIfGDdwlNfwvLQNABXYQwSqNhN5vKYIfjWjfTQEbpBmc8wos A==; X-CSE-ConnectionGUID: N8ZoiI1SSZeNkFeKmVCJ3g== X-CSE-MsgGUID: m9HUiSVVQ2m0fDEIt0xjog== X-IronPort-AV: E=McAfee;i="6700,10204,11272"; a="50725115" X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="50725115" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:15 -0800 X-CSE-ConnectionGUID: aQKE0rxWQGOtlw2ZxwcAGg== X-CSE-MsgGUID: DAWP3CAcQSW/lKIjVfoffg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="93257502" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:12 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL Date: Sun, 1 Dec 2024 11:53:54 +0800 Message-ID: <20241201035358.2193078-6-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241201035358.2193078-1-binbin.wu@linux.intel.com> References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Convert TDG.VP.VMCALL to KVM_EXIT_SYSTEM_EVENT with a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for handling. TD guest can use TDG.VP.VMCALL to report the fatal error it has experienced. This hypercall is special because TD guest is requesting a termination with the error information, KVM needs to forward the hypercall to userspace anyway, KVM doesn't do sanity checks and let userspace decide what to do. Signed-off-by: Binbin Wu --- Hypercalls exit to userspace breakout: - New added. Implement one of the hypercalls need to exit to userspace for handling after reverting "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve Sean's comment. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/ - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- Documentation/virt/kvm/api.rst | 8 ++++++ arch/x86/kvm/vmx/tdx.c | 50 ++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 1 + 3 files changed, 59 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index edc070c6e19b..bb39da72c647 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6815,6 +6815,7 @@ should put the acknowledged interrupt vector into the 'epr' field. #define KVM_SYSTEM_EVENT_WAKEUP 4 #define KVM_SYSTEM_EVENT_SUSPEND 5 #define KVM_SYSTEM_EVENT_SEV_TERM 6 + #define KVM_SYSTEM_EVENT_TDX_FATAL 7 __u32 type; __u32 ndata; __u64 data[16]; @@ -6841,6 +6842,13 @@ Valid values for 'type' are: reset/shutdown of the VM. - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination. The guest physical address of the guest's GHCB is stored in `data[0]`. + - KVM_SYSTEM_EVENT_TDX_FATAL -- an TDX guest requested termination. + The error codes of the guest's GHCI is stored in `data[0]`. + If the bit 63 of `data[0]` is set, it indicates there is TD specified + additional information provided in a page, which is shared memory. The + guest physical address of the information page is stored in `data[1]`. + An optional error message is provided by `data[2]` ~ `data[9]`, which is + byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled. - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and KVM has recognized a wakeup event. Userspace may honor this event by marking the exiting vCPU as runnable, or deny it and call KVM_RUN again. diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 553f4cbe0693..a79f9ca962d1 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1093,6 +1093,54 @@ static int tdx_map_gpa(struct kvm_vcpu *vcpu) return 1; } +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu) +{ + u64 reg_mask = kvm_rcx_read(vcpu); + u64* opt_regs; + + /* + * Skip sanity checks and let userspace decide what to do if sanity + * checks fail. + */ + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT; + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL; + vcpu->run->system_event.ndata = 10; + /* Error codes. */ + vcpu->run->system_event.data[0] = tdvmcall_a0_read(vcpu); + /* GPA of additional information page. */ + vcpu->run->system_event.data[1] = tdvmcall_a1_read(vcpu); + /* Information passed via registers (up to 64 bytes). */ + opt_regs = &vcpu->run->system_event.data[2]; + +#define COPY_REG(REG, MASK) \ + do { \ + if (reg_mask & MASK) \ + *opt_regs = kvm_ ## REG ## _read(vcpu); \ + else \ + *opt_regs = 0; \ + opt_regs++; \ + } while (0) + + /* The order is defined in GHCI. */ + COPY_REG(r14, BIT_ULL(14)); + COPY_REG(r15, BIT_ULL(15)); + COPY_REG(rbx, BIT_ULL(3)); + COPY_REG(rdi, BIT_ULL(7)); + COPY_REG(rsi, BIT_ULL(6)); + COPY_REG(r8, BIT_ULL(8)); + COPY_REG(r9, BIT_ULL(9)); + COPY_REG(rdx, BIT_ULL(2)); + + /* + * Set the status code according to GHCI spec, although the vCPU may + * not return back to guest. + */ + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + + /* Forward request to userspace. */ + return 0; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { if (tdvmcall_exit_type(vcpu)) @@ -1101,6 +1149,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) switch (tdvmcall_leaf(vcpu)) { case TDVMCALL_MAP_GPA: return tdx_map_gpa(vcpu); + case TDVMCALL_REPORT_FATAL_ERROR: + return tdx_report_fatal_error(vcpu); default: break; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 637efc055145..c173c8dfcf83 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -375,6 +375,7 @@ struct kvm_run { #define KVM_SYSTEM_EVENT_WAKEUP 4 #define KVM_SYSTEM_EVENT_SUSPEND 5 #define KVM_SYSTEM_EVENT_SEV_TERM 6 +#define KVM_SYSTEM_EVENT_TDX_FATAL 7 __u32 type; __u32 ndata; union { From patchwork Sun Dec 1 03:53:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13889438 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 706091465A0; Sun, 1 Dec 2024 03:52:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025141; cv=none; b=eZFiB9QR4MTrfLJH+Lj2h4wv9MY95ZKXNrEuJrob3GehIBs6kpmE3kfqQo3hseKfDURuTVBotNmrCqj8TenDlkVtMUMo5COXBI1tYF9I0ZxJfrSKg69sETjjeoLpDP+gPgQ6ulsZUzNJcWSBcruTpgjicwfaism4T0IwtyLAe6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025141; c=relaxed/simple; bh=b3wjXe6z8VDuBOGKCupN/4dLqZbbP9sG+mhG8jZ+nlM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bvp57J08rH8IiV3iHuIfx/5a8lQ4GYj607jhECH0bKJwrF+fFmfkeCy8BM0DriKtRcRPfXHjVRkf0lwb3Jn4zhJDSCBLoOH+3MuV6CxapENbGBHIJqfpsBtTaJR8mhJnQ86E4+nKzhChkOPMmo7+7T6DjJPVQ8WXIYKyEbwWBBM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YUzSXC8G; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YUzSXC8G" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733025139; x=1764561139; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=b3wjXe6z8VDuBOGKCupN/4dLqZbbP9sG+mhG8jZ+nlM=; b=YUzSXC8GbuuuL0ICq9KxT9CJ2eKHLMq+djIdWw/3K7bFhBpnrw6pGrsH xpsdVhdWayzJJoXrb5xhgbTEzoWNww59KpjqJ7LWaH1UbM/0hREvv8KxP GvIP+ZNmlglnqHplA2/hsnHh8tzp04vGQfrMJxqOy476uw+BE/QaF9HY5 f8C653pYYCUYZd0h0XcMbTebeomP/LiWBVLJ4rS+Gv/sNGZp/AzrfIbTY Z+x23qtJoaG13IrXGCiFQYwDGoPK+GpWqRmPPVn/0k/daCxGewraeww8G scQDZYx3rLprwdfWJtwxInagSRrIkX/BASTFHMGJ+7IokWR7g6SgxhJ7g w==; X-CSE-ConnectionGUID: LIjbDL0qRn2b5+0krBSaiQ== X-CSE-MsgGUID: DWcdMBFDSruU+ZrpBT6lrw== X-IronPort-AV: E=McAfee;i="6700,10204,11272"; a="50725120" X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="50725120" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:19 -0800 X-CSE-ConnectionGUID: loy/5G7mRzajYPf+7efXGA== X-CSE-MsgGUID: B2Qqfq5HTZe+ajZHINK2dg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="93257518" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:16 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall Date: Sun, 1 Dec 2024 11:53:55 +0800 Message-ID: <20241201035358.2193078-7-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241201035358.2193078-1-binbin.wu@linux.intel.com> References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Emulate port I/O requested by TDX guest via TDVMCALL with leaf Instruction.IO (same value as EXIT_REASON_IO_INSTRUCTION) according to TDX Guest Host Communication Interface (GHCI). All port I/O instructions inside the TDX guest trigger the #VE exception. On #VE triggered by I/O instructions, TDX guest can call TDVMCALL with leaf Instruction.IO to request VMM to emulate I/O instructions. Similar to normal port I/O emulation, try to handle the port I/O in kernel first, if kernel can't support it, forward the request to userspace. Note string I/O operations are not supported in TDX. Guest should unroll them before calling the TDVMCALL. Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- Hypercalls exit to userspace breakout: - Renamed from "KVM: TDX: Handle TDX PV port io hypercall" to "KVM: TDX: Handle TDX PV port I/O hypercall". - Update changelog. - Add missing curly brackets. - Move reset of pio.count to tdx_complete_pio_out() and remove the stale comment. (binbin) - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) - Set status code to TDVMCALL_STATUS_SUCCESS when PIO is handled in kernel. - Don't write to R11 when it is a write operation for output. v18: - Fix out case to set R10 and R11 correctly when user space handled port out. --- arch/x86/kvm/vmx/tdx.c | 66 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index a79f9ca962d1..495991407a95 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1141,6 +1141,70 @@ static int tdx_report_fatal_error(struct kvm_vcpu *vcpu) return 0; } +static int tdx_complete_pio_out(struct kvm_vcpu *vcpu) +{ + vcpu->arch.pio.count = 0; + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + return 1; +} + +static int tdx_complete_pio_in(struct kvm_vcpu *vcpu) +{ + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + unsigned long val = 0; + int ret; + + ret = ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size, + vcpu->arch.pio.port, &val, 1); + + WARN_ON_ONCE(!ret); + + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + tdvmcall_set_return_val(vcpu, val); + + return 1; +} + +static int tdx_emulate_io(struct kvm_vcpu *vcpu) +{ + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + unsigned long val = 0; + unsigned int port; + int size, ret; + bool write; + + ++vcpu->stat.io_exits; + + size = tdvmcall_a0_read(vcpu); + write = tdvmcall_a1_read(vcpu); + port = tdvmcall_a2_read(vcpu); + + if (size != 1 && size != 2 && size != 4) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; + } + + if (write) { + val = tdvmcall_a3_read(vcpu); + ret = ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1); + } else { + ret = ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1); + } + + if (ret) { + if (!write) + tdvmcall_set_return_val(vcpu, val); + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + } else { + if (write) + vcpu->arch.complete_userspace_io = tdx_complete_pio_out; + else + vcpu->arch.complete_userspace_io = tdx_complete_pio_in; + } + + return ret; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { if (tdvmcall_exit_type(vcpu)) @@ -1151,6 +1215,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_map_gpa(vcpu); case TDVMCALL_REPORT_FATAL_ERROR: return tdx_report_fatal_error(vcpu); + case EXIT_REASON_IO_INSTRUCTION: + return tdx_emulate_io(vcpu); default: break; } From patchwork Sun Dec 1 03:53:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13889439 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18E4714A4F7; Sun, 1 Dec 2024 03:52:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025144; cv=none; b=j911RVWQUKgmDT5Bph3GVRf0hN3J6zrl9ugDq+iOvsnER4/Sikurq73fsj3KoxlJKfv9wTfvfscmflRVEaPJaEOJ9MYvy3S0yFOWzvRFvFPQFkwgLDi8ZhA+V8FiR7N4qhk2XQi3EBEj6d9dlGP4YGjeGZpkvvVGYXIjMetDCoY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733025144; c=relaxed/simple; bh=yf44A6rzGuTuDWAveUzXPHC/R2LKy+bxmXhtN5uqMNo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SS6N7KN9durPXgVdBJm9esnhbfN/XKzlfqjXeuhBAC5a2TDRimpkCjTuhUJ8JE+KU3QVvjqB0DauG92wVlzOZLi2T/MYwhfZbl+rryO9Gsd+taRxKpqEnmEMgyPv4ck6uRp54p0wCXhTiTRbSvmlAS5YxrB9y+PtIV70jWBquYE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UMceMb5w; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UMceMb5w" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733025143; x=1764561143; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yf44A6rzGuTuDWAveUzXPHC/R2LKy+bxmXhtN5uqMNo=; b=UMceMb5wwn84BGoIWWxQhv5KOir0lJ3DDY9YNnWPt9tORAC2Qzuc1ImB CulWxDKXMCT3v/JjsCIAxk9b0SiiKx0MZihljf/j0nqqo4d+DCMUbKdD7 /rbbs5LVNzVp3i1UEZSXuHvoeG2FDwo3FnrUDrQ7VStyb0CyoSb1HsXHc 2oxmlcQu0ChOvkaf3OA2zplW/iec2ZjuyieaD8sMbhx9PFN8pDutQeVV5 YKmjWUh5unEsx+EhCagWx13rA8VXOF3JP4nVWIaGXtwDY9Fb+9P2VHE1Q FLOGncTTCVNphtJnNSR20g+CD9+GemK///AfrlOBek+FgxrOe0R5Nr38Y w==; X-CSE-ConnectionGUID: g0URkc8SRCeIgQbwYmjKqw== X-CSE-MsgGUID: lsKbN4KgRsySC9z7MJ3pXQ== X-IronPort-AV: E=McAfee;i="6700,10204,11272"; a="50725125" X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="50725125" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:22 -0800 X-CSE-ConnectionGUID: YNRCVP2bSNSSLuSGTyocDg== X-CSE-MsgGUID: Sj3gc2scQwWhpPyfCJyN2w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,199,1728975600"; d="scan'208";a="93257538" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2024 19:52:19 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 7/7] KVM: TDX: Handle TDX PV MMIO hypercall Date: Sun, 1 Dec 2024 11:53:56 +0800 Message-ID: <20241201035358.2193078-8-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241201035358.2193078-1-binbin.wu@linux.intel.com> References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson Handle TDX PV MMIO hypercall when TDX guest calls TDVMCALL with the leaf #VE.RequestMMIO (same value as EXIT_REASON_EPT_VIOLATION) according to TDX Guest Host Communication Interface (GHCI) spec. For TDX, VMM is not allowed to access vCPU registers or TD's private memory, where all executed instruction are. So MMIO emulation implemented for non-TDX VMs is not possible for TDX guest. In TDX the MMIO regions are instead configured by VMM to trigger a #VE exception in the guest. The #VE handling is supposed to emulate the MMIO instruction inside the guest and convert it into a TDVMCALL with the leaf #VE.RequestMMIO, which equals to EXIT_REASON_EPT_VIOLATION. The requested MMIO address must be in shared GPA space. The shared bit is stripped after check because the existing code for MMIO emulation is not aware of the shared bit. The MMIO GPA shouldn't have a valid memslot, also the attribute of the GPA should be shared. KVM could do the checks before exiting to userspace, however, even if KVM does the check, there still will be race conditions between the check in KVM and the emulation of MMIO access in userspace due to a memslot hotplug, or a memory attribute conversion. If userspace doesn't check the attribute of the GPA and the attribute happens to be private, it will not pose a security risk or cause an MCE, but it can lead to another issue. E.g., in QEMU, treating a GPA with private attribute as shared when it falls within RAM's range can result in extra memory consumption during the emulation to the access to the HVA of the GPA. There are two options: 1) Do the check both in KVM and userspace. 2) Do the check only in QEMU. This patch chooses option 2, i.e. KVM omits the memslot and attribute checks, and expects userspace to do the checks. Similar to normal MMIO emulation, try to handle the MMIO in kernel first, if kernel can't support it, forward the request to userspace. Export needed symbols used for MMIO handling. Fragments handling is not needed for TDX PV MMIO because GPA is provided, if a MMIO access crosses page boundary, it should be continuous in GPA. Also, the size is limited to 1, 2, 4, 8 bytes. No further split needed. Allow cross page access because no extra handling needed after checking both start and end GPA are shared GPAs. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- Hypercalls exit to userspace breakout: - Update the changelog. - Remove the check of memslot for GPA. - Allow MMIO access crossing page boundary. - Move the tracepoint for KVM_TRACE_MMIO_WRITE earlier so the tracepoint handles the cases both for kernel and userspace. (Isaku) - Set TDVMCALL return code when back from userspace, which is missing in v19. - Move fast MMIO write into tdx_mmio_write() - Check GPA is shared GPA. (Binbin) - Remove extra check for size > 8u. (Binbin) - Removed KVM_BUG_ON() in tdx_complete_mmio() and tdx_emulate_mmio() - Removed vcpu->mmio_needed code since it's not used after removing KVM_BUG_ON(). - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) - Use vt_is_tdx_private_gpa() --- arch/x86/kvm/vmx/tdx.c | 109 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 1 + virt/kvm/kvm_main.c | 1 + 3 files changed, 111 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 495991407a95..50cfc795f01f 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1205,6 +1205,113 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu) return ret; } +static int tdx_complete_mmio(struct kvm_vcpu *vcpu) +{ + unsigned long val = 0; + gpa_t gpa; + int size; + + if (!vcpu->mmio_is_write) { + gpa = vcpu->mmio_fragments[0].gpa; + size = vcpu->mmio_fragments[0].len; + + memcpy(&val, vcpu->run->mmio.data, size); + tdvmcall_set_return_val(vcpu, val); + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); + } + + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + return 1; +} + +static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size, + unsigned long val) +{ + if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { + trace_kvm_fast_mmio(gpa); + return 0; + } + + trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val); + if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) && + kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val)) + return -EOPNOTSUPP; + + return 0; +} + +static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size) +{ + unsigned long val; + + if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) && + kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val)) + return -EOPNOTSUPP; + + tdvmcall_set_return_val(vcpu, val); + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); + return 0; +} + +static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) +{ + int size, write, r; + unsigned long val; + gpa_t gpa; + + size = tdvmcall_a0_read(vcpu); + write = tdvmcall_a1_read(vcpu); + gpa = tdvmcall_a2_read(vcpu); + val = write ? tdvmcall_a3_read(vcpu) : 0; + + if (size != 1 && size != 2 && size != 4 && size != 8) + goto error; + if (write != 0 && write != 1) + goto error; + + /* + * TDG.VP.VMCALL allows only shared GPA, it makes no sense to + * do MMIO emulation for private GPA. + */ + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa) || + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size - 1)) + goto error; + + gpa = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm)); + + if (write) + r = tdx_mmio_write(vcpu, gpa, size, val); + else + r = tdx_mmio_read(vcpu, gpa, size); + if (!r) { + /* Kernel completed device emulation. */ + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + return 1; + } + + /* Request the device emulation to userspace device model. */ + vcpu->mmio_is_write = write; + vcpu->arch.complete_userspace_io = tdx_complete_mmio; + + vcpu->run->mmio.phys_addr = gpa; + vcpu->run->mmio.len = size; + vcpu->run->mmio.is_write = write; + vcpu->run->exit_reason = KVM_EXIT_MMIO; + + if (write) { + memcpy(vcpu->run->mmio.data, &val, size); + } else { + vcpu->mmio_fragments[0].gpa = gpa; + vcpu->mmio_fragments[0].len = size; + trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL); + } + return 0; + +error: + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { if (tdvmcall_exit_type(vcpu)) @@ -1217,6 +1324,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_report_fatal_error(vcpu); case EXIT_REASON_IO_INSTRUCTION: return tdx_emulate_io(vcpu); + case EXIT_REASON_EPT_VIOLATION: + return tdx_emulate_mmio(vcpu); default: break; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2eb660fed754..e155ae90e9fa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13987,6 +13987,7 @@ EXPORT_SYMBOL_GPL(kvm_sev_es_string_io); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5901d03e372c..dc735d7b511b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5803,6 +5803,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, r = __kvm_io_bus_read(vcpu, bus, &range, val); return r < 0 ? r : 0; } +EXPORT_SYMBOL_GPL(kvm_io_bus_read); int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len, struct kvm_io_device *dev)