From patchwork Mon Aug 12 22:47:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761099 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 278A819A291; Mon, 12 Aug 2024 22:48:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502913; cv=none; b=SzZ8lJSfXLZQZSNYegrf6Oohab+yIivdlGd9HfLxjg1TcMcIK4jZKd5AWdPf3cGNbHqO0T2Uike4hzxfTwQ39dk13JGBh2zgxdDMYN+LKXfxDqvxBU76iF5DkxQiuXjYqMGnwKLtJMxv30S16FCkxOy58KC+chXNAA0BMl4Qa+M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502913; c=relaxed/simple; bh=HapUNVoQDT6ekuC8wHs7cr8duUHphe9+reyMXgpmruA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tLjndkP1UL13/K/eK8pDt8Khog5m+5hQRASCs8Kpx5JZ+mycUGvFaJlhKZ3mfh9EBoQbov2bFMeR9CjdilUK6pu+S+B/L2jEeWYfqSmnGUQ+9H3UGNcSZT/ErFtqVW16+zYunOKGlMA+FiDu9dGfvCJtmB6tmOXIuX0mmFGmMo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aH/d8Nv1; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aH/d8Nv1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502911; x=1755038911; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HapUNVoQDT6ekuC8wHs7cr8duUHphe9+reyMXgpmruA=; b=aH/d8Nv1a69/TjSIKzL/1wfyctYqVLAFlsozjyMoRX+JXiIQ0xjqCETM VCHU6iMY6GGBRVx0UtZPEi2xvoZY7k1tKLYpY/48hkLT3Kc05T8zzcn25 tykJTWPPYWZS7qiyWwF9wVtOJ2BnX8M1noP6ZyEdjRA5ReGRjEhOVfsrV lUcCKda34dkNoUdTgETWR/Vt8y46G654fc0ihn9Kq87/EDSY+3Q4Ej3ZV gt+7X3SqvJzqhrItAFa+Bgb38TnfQP619Iq4+xLdoO9mklj8FIqrFsTg/ HWe7vUH7GgGxc3cesWEzlr5l/Gy4AaN/MLnP1UhelraRERDSRPn/hZ7TN A==; X-CSE-ConnectionGUID: 43LzXiSQTbOnm5fpKeRd6A== X-CSE-MsgGUID: 87o8BpusReuXa7LNTg9/GQ== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041325" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041325" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:26 -0700 X-CSE-ConnectionGUID: UcIuhQ/ITguTCCZowqT3Gw== X-CSE-MsgGUID: g+TDSVwgSpuxKe9vToTPTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008333" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata Subject: [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures Date: Mon, 12 Aug 2024 15:47:56 -0700 Message-Id: <20240812224820.34826-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add TDX's own VM and vCPU structures as placeholder to manage and run TDX guests. Also add helper functions to check whether a VM/vCPU is TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU and KVM VM/vCPU. TDX protects guest VMs from malicious host. Unlike VMX guests, TDX guests are crypto-protected. KVM cannot access TDX guests' memory and vCPU states directly. Instead, TDX requires KVM to use a set of TDX architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to manage and run TDX guests. In fact, the way to manage and run TDX guests and normal VMX guests are quite different. Because of that, the current structures ('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not quite suitable for TDX guests. E.g., the majority of the members of 'struct vcpu_vmx' don't apply to TDX guests. Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct vcpu_tdx' respectively) for KVM to manage and run TDX guests. And instead of building TDX's VM and vCPU structures based on VMX's, build them directly based on 'struct kvm'. As a result, TDX and VMX guests will have different VM size and vCPU size/alignment. Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate enough space for the VM structure when creating guest. With TDX guests, ideally, KVM should allocate the VM structure based on the VM type so that the precise size can be allocated for VMX and TDX guests. But this requires more extensive code change. For now, simply choose the maximum size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure allocation for both VMX and TDX guests. This would result in small memory waste for each VM which has smaller VM structure size but this is acceptable. For simplicity, use the same way for vCPU allocation too. Otherwise KVM would need to maintain a separate 'kvm_vcpu_cache' for each VM type. Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this happens before TDX module initialization. Therefore theoretically it is possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx' (when it's larger) but TDX actually fails to initialize at a later time. Again the worst case of this is wasting couple of bytes memory for each VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time depending on TDX's status but that would require base KVM module to export either kvm_x86_ops or kvm_ops_update(). Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- uAPI breakout v1: - Re-add __always_inline to to_kvm_tdx(), to_tdx(). (Sean) - Fix bisectability issues in headers (Kai) - Add a comment around updating vt_x86_ops.vm_size. - Update the comment around updating vcpu_size/align: https://lore.kernel.org/kvm/25d2bf93854ae7410d82119227be3cb2ce47c4f2.camel@intel.com/ - Refine changelog: https://lore.kernel.org/kvm/9c592801471a137c51f583065764fbfc3081c016.camel@intel.com/ v19: - correctly update ops.vm_size, vcpu_size and, vcpu_align by Xiaoyao v14 -> v15: - use KVM_X86_TDX_VM --- arch/x86/kvm/vmx/main.c | 53 ++++++++++++++++++++++++++++++++++++++--- arch/x86/kvm/vmx/tdx.c | 2 +- arch/x86/kvm/vmx/tdx.h | 49 +++++++++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d1f821539910..21fae631c775 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -8,6 +8,39 @@ #include "posted_intr.h" #include "tdx.h" +static __init int vt_hardware_setup(void) +{ + int ret; + + ret = vmx_hardware_setup(); + if (ret) + return ret; + + /* + * Undate vt_x86_ops::vm_size here so it is ready before + * kvm_ops_update() is called in kvm_x86_vendor_init(). + * + * Note, the actual bringing up of TDX must be done after + * kvm_ops_update() because enabling TDX requires enabling + * hardware virtualization first, i.e., all online CPUs must + * be in post-VMXON state. This means the @vm_size here + * may be updated to TDX's size but TDX may fail to enable + * at later time. + * + * The VMX/VT code could update kvm_x86_ops::vm_size again + * after bringing up TDX, but this would require exporting + * either kvm_x86_ops or kvm_ops_update() from the base KVM + * module, which looks overkill. Anyway, the worst case here + * is KVM may allocate couple of more bytes than needed for + * each VM. + */ + if (enable_tdx) + vt_x86_ops.vm_size = max_t(unsigned int, vt_x86_ops.vm_size, + sizeof(struct kvm_tdx)); + + return 0; +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -159,7 +192,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { }; struct kvm_x86_init_ops vt_init_ops __initdata = { - .hardware_setup = vmx_hardware_setup, + .hardware_setup = vt_hardware_setup, .handle_intel_pt_intr = NULL, .runtime_ops = &vt_x86_ops, @@ -176,6 +209,7 @@ module_exit(vt_exit); static int __init vt_init(void) { + unsigned vcpu_size, vcpu_align; int r; r = vmx_init(); @@ -185,12 +219,25 @@ static int __init vt_init(void) /* tdx_init() has been taken */ tdx_bringup(); + /* + * TDX and VMX have different vCPU structures. Calculate the + * maximum size/align so that kvm_init() can use the larger + * values to create the kmem_vcpu_cache. + */ + vcpu_size = sizeof(struct vcpu_vmx); + vcpu_align = __alignof__(struct vcpu_vmx); + if (enable_tdx) { + vcpu_size = max_t(unsigned, vcpu_size, + sizeof(struct vcpu_tdx)); + vcpu_align = max_t(unsigned, vcpu_align, + __alignof__(struct vcpu_tdx)); + } + /* * Common KVM initialization _must_ come last, after this, /dev/kvm is * exposed to userspace! */ - r = kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), - THIS_MODULE); + r = kvm_init(vcpu_size, vcpu_align, THIS_MODULE); if (r) goto err_kvm_init; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 99f579329de9..dbcc1ed80efa 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -7,7 +7,7 @@ #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -static bool enable_tdx __ro_after_init; +bool enable_tdx __ro_after_init; module_param_named(tdx, enable_tdx, bool, 0444); static enum cpuhp_state tdx_cpuhp_state; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 766a6121f670..e6a232d58e6a 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -4,9 +4,58 @@ #ifdef CONFIG_INTEL_TDX_HOST void tdx_bringup(void); void tdx_cleanup(void); + +extern bool enable_tdx; + +struct kvm_tdx { + struct kvm kvm; + /* TDX specific members follow. */ +}; + +struct vcpu_tdx { + struct kvm_vcpu vcpu; + /* TDX specific members follow. */ +}; + +static inline bool is_td(struct kvm *kvm) +{ + return kvm->arch.vm_type == KVM_X86_TDX_VM; +} + +static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) +{ + return is_td(vcpu->kvm); +} + +static __always_inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) +{ + return container_of(kvm, struct kvm_tdx, kvm); +} + +static __always_inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) +{ + return container_of(vcpu, struct vcpu_tdx, vcpu); +} + #else static inline void tdx_bringup(void) {} static inline void tdx_cleanup(void) {} + +#define enable_tdx 0 + +struct kvm_tdx { + struct kvm kvm; +}; + +struct vcpu_tdx { + struct kvm_vcpu vcpu; +}; + +static inline bool is_td(struct kvm *kvm) { return false; } +static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; } +static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) { return NULL; } +static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) { return NULL; } + #endif #endif From patchwork Mon Aug 12 22:47:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761098 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FAD3199380; Mon, 12 Aug 2024 22:48:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502911; cv=none; b=HobnOoA8XbemKTy1aVUDRaJ+0IsK0tg5AJq87RbahFIrhpjhclrAsH+icr3iCLoxFRKOaqRPMH2aURqSj1Kd9FqM7ySMZCrnFJl5B6oZMMTOqRLqM+aHzj947BrJRzBS7HbLLUgLFzh5x2DrMMpkEdWRJ2Th93f9h9WW80Uo93w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502911; c=relaxed/simple; bh=H8ufQLSs6cAyGui2o3Xao6TbKo1fLVgdFRJsGf6+zig=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IvrvXPceqX+Z36zzd4T/xUcBJiJV0GfFu1vyt33F4YPvRFRxh09rOASSH3SEvWlxWvHtgLdt/hqDuq/um/+Hi9z6gLCF4PU8Yy7MaNoP2L/veCTxip+T7Ff3luX4Q9Dpwra/xgzxV+l4Wyd636XjwirliqZqGJSwrYrZ9zTl2Ww= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cIIefr9y; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cIIefr9y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502909; x=1755038909; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=H8ufQLSs6cAyGui2o3Xao6TbKo1fLVgdFRJsGf6+zig=; b=cIIefr9yO9AdEA7DoOZlTW2scLpKQIQTFZnRrSMPhkSIMyH2EdJIfeU2 Fq4X7uaJOZhQq2xnPZRwha5/9gIRYGJ3nfZq/hKbnexaHvbIWKghb1ubJ y2jHoQ40qvBI3RX4d7CN6xyPrOOy/7wZvrnGZr+us4tDqVggZIb2unPpG 2lE0Aern1Xbq1C9tN3RaTW3ijtpaLG6fOCRZJCrnQeYrbC17R40tTG4Tm 2Ij03q8KxqDvU+0YIifBm7WCKn1ChNm1g2O4GlfWEurVrnZxTquZ46bBE UhlINueeV2az+4G9kds6gJdPynBJIc8NARhpSUPSL02btT9ZR7UdqZukX g==; X-CSE-ConnectionGUID: GbuB/4p+TR6NxAlzHaDqVg== X-CSE-MsgGUID: Vmtf20CxRU+YYSB1miAoJA== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041330" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041330" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:27 -0700 X-CSE-ConnectionGUID: qw6OdxoOQ5CSwkf+WrmF+Q== X-CSE-MsgGUID: HduoEF6yQfKasdL6xfE04w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008338" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata , Sean Christopherson Subject: [PATCH 02/25] KVM: TDX: Define TDX architectural definitions Date: Mon, 12 Aug 2024 15:47:57 -0700 Message-Id: <20240812224820.34826-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Define architectural definitions for KVM to issue the TDX SEAMCALLs. Structures and values that are architecturally defined in the TDX module specifications the chapter of ABI Reference. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini Reviewed-by: Xiaoyao Li --- uAPI breakout v1: - Remove macros no longer needed due to reading metadata done in TDX host code: - Metadata field ID macros, bit definitions - TDX_MAX_NR_CPUID_CONFIGS - Drop unused defined (Kai) - Fix bisectability issues in headers (Kai) - Remove TDX_MAX_VCPUS define (Kai) - Remove unused TD_EXIT_OTHER_SMI_IS_MSMI define. - Move TDX vm type to separate patch - Move unions in tdx_arch.h to where they are introduced (Sean) v19: - drop tdvmcall constants by Xiaoyao v18: - Add metadata field id --- arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/tdx_arch.h | 158 ++++++++++++++++++++++++++++++++++++ 2 files changed, 160 insertions(+) create mode 100644 arch/x86/kvm/vmx/tdx_arch.h diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index e6a232d58e6a..1d6fa81a072d 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -1,6 +1,8 @@ #ifndef __KVM_X86_VMX_TDX_H #define __KVM_X86_VMX_TDX_H +#include "tdx_arch.h" + #ifdef CONFIG_INTEL_TDX_HOST void tdx_bringup(void); void tdx_cleanup(void); diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h new file mode 100644 index 000000000000..413619dd92ef --- /dev/null +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -0,0 +1,158 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* architectural constants/data definitions for TDX SEAMCALLs */ + +#ifndef __KVM_X86_TDX_ARCH_H +#define __KVM_X86_TDX_ARCH_H + +#include + +#define TDX_VERSION_SHIFT 16 + +/* + * TDX SEAMCALL API function leaves + */ +#define TDH_VP_ENTER 0 +#define TDH_MNG_ADDCX 1 +#define TDH_MEM_PAGE_ADD 2 +#define TDH_MEM_SEPT_ADD 3 +#define TDH_VP_ADDCX 4 +#define TDH_MEM_PAGE_AUG 6 +#define TDH_MEM_RANGE_BLOCK 7 +#define TDH_MNG_KEY_CONFIG 8 +#define TDH_MNG_CREATE 9 +#define TDH_VP_CREATE 10 +#define TDH_MNG_RD 11 +#define TDH_MR_EXTEND 16 +#define TDH_MR_FINALIZE 17 +#define TDH_VP_FLUSH 18 +#define TDH_MNG_VPFLUSHDONE 19 +#define TDH_MNG_KEY_FREEID 20 +#define TDH_MNG_INIT 21 +#define TDH_VP_INIT 22 +#define TDH_VP_RD 26 +#define TDH_MNG_KEY_RECLAIMID 27 +#define TDH_PHYMEM_PAGE_RECLAIM 28 +#define TDH_MEM_PAGE_REMOVE 29 +#define TDH_MEM_SEPT_REMOVE 30 +#define TDH_SYS_RD 34 +#define TDH_MEM_TRACK 38 +#define TDH_MEM_RANGE_UNBLOCK 39 +#define TDH_PHYMEM_CACHE_WB 40 +#define TDH_PHYMEM_PAGE_WBINVD 41 +#define TDH_VP_WR 43 + +/* TDX control structure (TDR/TDCS/TDVPS) field access codes */ +#define TDX_NON_ARCH BIT_ULL(63) +#define TDX_CLASS_SHIFT 56 +#define TDX_FIELD_MASK GENMASK_ULL(31, 0) + +#define __BUILD_TDX_FIELD(non_arch, class, field) \ + (((non_arch) ? TDX_NON_ARCH : 0) | \ + ((u64)(class) << TDX_CLASS_SHIFT) | \ + ((u64)(field) & TDX_FIELD_MASK)) + +#define BUILD_TDX_FIELD(class, field) \ + __BUILD_TDX_FIELD(false, (class), (field)) + +#define BUILD_TDX_FIELD_NON_ARCH(class, field) \ + __BUILD_TDX_FIELD(true, (class), (field)) + + +/* Class code for TD */ +#define TD_CLASS_EXECUTION_CONTROLS 17ULL + +/* Class code for TDVPS */ +#define TDVPS_CLASS_VMCS 0ULL +#define TDVPS_CLASS_GUEST_GPR 16ULL +#define TDVPS_CLASS_OTHER_GUEST 17ULL +#define TDVPS_CLASS_MANAGEMENT 32ULL + +enum tdx_tdcs_execution_control { + TD_TDCS_EXEC_TSC_OFFSET = 10, +}; + +/* @field is any of enum tdx_tdcs_execution_control */ +#define TDCS_EXEC(field) BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (field)) + +/* @field is the VMCS field encoding */ +#define TDVPS_VMCS(field) BUILD_TDX_FIELD(TDVPS_CLASS_VMCS, (field)) + +/* @field is any of enum tdx_guest_other_state */ +#define TDVPS_STATE(field) BUILD_TDX_FIELD(TDVPS_CLASS_OTHER_GUEST, (field)) +#define TDVPS_STATE_NON_ARCH(field) BUILD_TDX_FIELD_NON_ARCH(TDVPS_CLASS_OTHER_GUEST, (field)) + +/* Management class fields */ +enum tdx_vcpu_guest_management { + TD_VCPU_PEND_NMI = 11, +}; + +/* @field is any of enum tdx_vcpu_guest_management */ +#define TDVPS_MANAGEMENT(field) BUILD_TDX_FIELD(TDVPS_CLASS_MANAGEMENT, (field)) + +#define TDX_EXTENDMR_CHUNKSIZE 256 + +struct tdx_cpuid_value { + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; +} __packed; + +#define TDX_TD_ATTR_DEBUG BIT_ULL(0) +#define TDX_TD_ATTR_SEPT_VE_DISABLE BIT_ULL(28) +#define TDX_TD_ATTR_PKS BIT_ULL(30) +#define TDX_TD_ATTR_KL BIT_ULL(31) +#define TDX_TD_ATTR_PERFMON BIT_ULL(63) + +/* + * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B. + */ +struct td_params { + u64 attributes; + u64 xfam; + u16 max_vcpus; + u8 reserved0[6]; + + u64 eptp_controls; + u64 exec_controls; + u16 tsc_frequency; + u8 reserved1[38]; + + u64 mrconfigid[6]; + u64 mrowner[6]; + u64 mrownerconfig[6]; + u64 reserved2[4]; + + union { + DECLARE_FLEX_ARRAY(struct tdx_cpuid_value, cpuid_values); + u8 reserved3[768]; + }; +} __packed __aligned(1024); + +/* + * Guest uses MAX_PA for GPAW when set. + * 0: GPA.SHARED bit is GPA[47] + * 1: GPA.SHARED bit is GPA[51] + */ +#define TDX_EXEC_CONTROL_MAX_GPAW BIT_ULL(0) + +/* + * TDH.VP.ENTER, TDG.VP.VMCALL preserves RBP + * 0: RBP can be used for TDG.VP.VMCALL input. RBP is clobbered. + * 1: RBP can't be used for TDG.VP.VMCALL input. RBP is preserved. + */ +#define TDX_CONTROL_FLAG_NO_RBP_MOD BIT_ULL(2) + + +/* + * TDX requires the frequency to be defined in units of 25MHz, which is the + * frequency of the core crystal clock on TDX-capable platforms, i.e. the TDX + * module can only program frequencies that are multiples of 25MHz. The + * frequency must be between 100mhz and 10ghz (inclusive). + */ +#define TDX_TSC_KHZ_TO_25MHZ(tsc_in_khz) ((tsc_in_khz) / (25 * 1000)) +#define TDX_TSC_25MHZ_TO_KHZ(tsc_in_25mhz) ((tsc_in_25mhz) * (25 * 1000)) +#define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) +#define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) + +#endif /* __KVM_X86_TDX_ARCH_H */ From patchwork Mon Aug 12 22:47:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761100 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BC1D19A2AE; Mon, 12 Aug 2024 22:48:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502913; cv=none; b=IJABgDo7MyMkTH2eKHtWk4iGEgBJtdavGpUW3EGL6MVmIvSkNj57tIRLgXDSoRjitzZvLfoQxo7BOmGEgiy6bG56GJ3E3Yk/6TTxO8JEOzNXqhrE7sJiEZNvhAk95NY2TzCH51NuFnGWHNapNCh27YT650jSqlGXfYHxaljZ1io= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502913; c=relaxed/simple; bh=7VrVtd056X10mQnU4hD3X7Kr7wdDDNxHko0rQC+fBRU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JggFN+2Ld0GVJXvdcyEapB6NuH/lzMKQKG21NzfOGZ6HOar8CBdyaCZQUMuLuk8Kzcozm26ulqT8YiRcGSd5dnuff39fRo40c/Ya5yzt3Kp7v2qAsX7RI7EvkcsTMUta4mxppw1vtycpuT4n5B+czb6IxPMpNyHPY3DrryuJuHk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MmV85bOW; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MmV85bOW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502911; x=1755038911; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7VrVtd056X10mQnU4hD3X7Kr7wdDDNxHko0rQC+fBRU=; b=MmV85bOWsB+l6S+VaUIzZOqaWdMEGZ4N3e8QRSctOHRxi4FSH2Q7ZeBs xbGmVJ1ej9GMlibbITV8IFilQ5NGuZLzaxKrtye1IqlFW1yr7nGarkf8v WceC1IWAH8QEM4sbA7B3b0SJ4KhTCF03QtGJiWc+Ik9xh0AdISL2vliJK KxV/swIUsNURSQQj3VavI8ME+K37hM4oFETH0Vw7as2B5v1noKrC66Whi dgqvvue0NZU8YJwO/Lr6XnXq6U3igwBmM7/3ln7dmV/8CiiPhoqyUff6Y 0fX0WNGWGj6p5iSWahfknrBS3LqRC39dv/QF/vcO9PCgqHyyUcLyUYCul A==; X-CSE-ConnectionGUID: no6g4mHXRTmqu1UJaFVuYg== X-CSE-MsgGUID: yXdeqTt/SgqeyDIRSte00A== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041336" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041336" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:27 -0700 X-CSE-ConnectionGUID: rqlTNeK/TeCDGQSczzaj8A== X-CSE-MsgGUID: e7rvOyWxRX+OHwWmOR9emw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008341" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:27 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Sean Christopherson , Isaku Yamahata , Yuan Yao Subject: [PATCH 03/25] KVM: TDX: Add TDX "architectural" error codes Date: Mon, 12 Aug 2024 15:47:58 -0700 Message-Id: <20240812224820.34826-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH SEAMCALL and TDX guest side for TDG.VP.VMCALL. KVM issues the TDX SEAMCALLs and checks its error code. KVM handles hypercall from the TDX guest and may return an error. So error code for the TDX guest is also needed. TDX SEAMCALL uses bits 31:0 to return more information, so these error codes will only exactly match RAX[63:32]. Error codes for TDG.VP.VMCALL is defined by TDX Guest-Host-Communication interface spec. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini Reviewed-by: Yuan Yao Reviewed-by: Xiaoyao Li --- v19: - Drop TDX_EPT_WALK_FAILED, TDX_EPT_ENTRY_NOT_FREE - Rename TDG_VP_VMCALL_ => TDVMCALL_ to match the existing code - Move TDVMCALL error codes to shared/tdx.h - Added TDX_OPERAND_ID_TDR - Fix bisectability issues in headers (Kai) --- arch/x86/include/asm/shared/tdx.h | 6 ++++++ arch/x86/kvm/vmx/tdx.h | 1 + arch/x86/kvm/vmx/tdx_errno.h | 36 +++++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+) create mode 100644 arch/x86/kvm/vmx/tdx_errno.h diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index fdfd41511b02..6ebbf8ee80b3 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -28,6 +28,12 @@ #define TDVMCALL_STATUS_RETRY 1 +/* + * TDG.VP.VMCALL Status Codes (returned in R10) + */ +#define TDVMCALL_SUCCESS 0x0000000000000000ULL +#define TDVMCALL_INVALID_OPERAND 0x8000000000000000ULL + /* * Bitmasks of exposed registers (with VMM). */ diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 1d6fa81a072d..faed454385ca 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -2,6 +2,7 @@ #define __KVM_X86_VMX_TDX_H #include "tdx_arch.h" +#include "tdx_errno.h" #ifdef CONFIG_INTEL_TDX_HOST void tdx_bringup(void); diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h new file mode 100644 index 000000000000..dc3fa2a58c2c --- /dev/null +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* architectural status code for SEAMCALL */ + +#ifndef __KVM_X86_TDX_ERRNO_H +#define __KVM_X86_TDX_ERRNO_H + +#define TDX_SEAMCALL_STATUS_MASK 0xFFFFFFFF00000000ULL + +/* + * TDX SEAMCALL Status Codes (returned in RAX) + */ +#define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL +#define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL +#define TDX_OPERAND_INVALID 0xC000010000000000ULL +#define TDX_OPERAND_BUSY 0x8000020000000000ULL +#define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL +#define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL +#define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL +#define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL +#define TDX_KEY_STATE_INCORRECT 0xC000081100000000ULL +#define TDX_KEY_CONFIGURED 0x0000081500000000ULL +#define TDX_NO_HKID_READY_TO_WBCACHE 0x0000082100000000ULL +#define TDX_FLUSHVP_NOT_DONE 0x8000082400000000ULL +#define TDX_EPT_WALK_FAILED 0xC0000B0000000000ULL +#define TDX_EPT_ENTRY_STATE_INCORRECT 0xC0000B0D00000000ULL + +/* + * TDX module operand ID, appears in 31:0 part of error code as + * detail information + */ +#define TDX_OPERAND_ID_RCX 0x01 +#define TDX_OPERAND_ID_TDR 0x80 +#define TDX_OPERAND_ID_SEPT 0x92 +#define TDX_OPERAND_ID_TD_EPOCH 0xa9 + +#endif /* __KVM_X86_TDX_ERRNO_H */ From patchwork Mon Aug 12 22:47:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761101 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2CD919A2B7; Mon, 12 Aug 2024 22:48:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502913; cv=none; b=NiSPDHQD/UIRqjKPk5JL7MSbYqrcSKh7NT8WQypHVGVpsGhM/U9QQP0T81w0XPVfv8JhDWBRLayjho1As0fsiGlQxgnRLRF4PWDx6GE1fM4QmzNi1aZrnNx0acZHaPLB0ZuZ/KIFz1ZHOUR/CeAEPysP4XWKHoFzOc3ZfrwsApQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502913; c=relaxed/simple; bh=O4KNAoozn4Ay1xs1V20iKsDNVRH9VIPe+59YYboNtkg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SSvRtdN3g5nv9PrJZmTMOtYZLO0EWyGMN0yDiqbiNTu8a3SlWbQZP8+g9VeAURcUzmvB99juY3Ws64/94+6H+x2wBKbfI+Xgo/uyGaxcQDhRz8k8N/80C+lg5mTkZtdzvFS8VrASj6C2LnfqR+bjf81fLQK/SJWwAX5irVrVXU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZsmDBZHy; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZsmDBZHy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502911; x=1755038911; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=O4KNAoozn4Ay1xs1V20iKsDNVRH9VIPe+59YYboNtkg=; b=ZsmDBZHyeuSm0uVk0xo9JmSUAVcCnkmmTyI6Uo5aG1spVZiG+WRXjDu2 FkQebG2LErNJ8JkUKnIeVjPdX4ZOPDWFHHisFapZ8m4rEd0VaX+jVUrWA +wCzrNGfOAPLTx6OO+c+7DTQcOQuI+XC454e/U2KMIn61iujNG+lOBCZU RyCslMXR2gasWb6RJ2lbLnyJVV5uAI+X5l8gynCQ7ffOcDKjz39mrGeAW g5cS4+PGT4bDXWEby8Q28Ae261ANpwDtChP7Wvph2uJ7aNEFhO7R9V5As aRtjc91aLibo3W7jiBtOMKZXs1n84GltdzX97w7QKhafqehO1x0YBmxRE Q==; X-CSE-ConnectionGUID: TlfJ4cpdTdGiSEEWFqOKlA== X-CSE-MsgGUID: mrN5PzCXRoCK1HjMz2WcTQ== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041344" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041344" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:28 -0700 X-CSE-ConnectionGUID: HT8Ekh3nTVGgqvgp0xvFtA== X-CSE-MsgGUID: XbajYQitRg6a2RUuMng/Vg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008348" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:28 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata , Sean Christopherson , Binbin Wu , Yuan Yao Subject: [PATCH 04/25] KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module Date: Mon, 12 Aug 2024 15:47:59 -0700 Message-Id: <20240812224820.34826-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata A VMM interacts with the TDX module using a new instruction (SEAMCALL). For instance, a TDX VMM does not have full access to the VM control structure corresponding to VMX VMCS. Instead, a VMM induces the TDX module to act on behalf via SEAMCALLs. Define C wrapper functions for SEAMCALLs for readability. Some SEAMCALL APIs donate host pages to TDX module or guest TD, and the donated pages are encrypted. Those require the VMM to flush the cache lines to avoid cache line alias. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. v15 -> v16: - use struct tdx_module_args instead of struct tdx_module_output - Add tdh_mem_sept_rd() for SEPT_VE_DISABLE=1. --- arch/x86/kvm/vmx/tdx.h | 14 +- arch/x86/kvm/vmx/tdx_ops.h | 387 +++++++++++++++++++++++++++++++++++++ 2 files changed, 399 insertions(+), 2 deletions(-) create mode 100644 arch/x86/kvm/vmx/tdx_ops.h diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index faed454385ca..78f84c53a948 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -12,12 +12,14 @@ extern bool enable_tdx; struct kvm_tdx { struct kvm kvm; - /* TDX specific members follow. */ + + unsigned long tdr_pa; }; struct vcpu_tdx { struct kvm_vcpu vcpu; - /* TDX specific members follow. */ + + unsigned long tdvpr_pa; }; static inline bool is_td(struct kvm *kvm) @@ -40,6 +42,14 @@ static __always_inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_tdx, vcpu); } +/* + * SEAMCALL wrappers + * + * Put it here as most of those wrappers need declaration of + * 'struct kvm_tdx' and 'struct vcpu_tdx'. + */ +#include "tdx_ops.h" + #else static inline void tdx_bringup(void) {} static inline void tdx_cleanup(void) {} diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h new file mode 100644 index 000000000000..a9b9ad15f6a8 --- /dev/null +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -0,0 +1,387 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Constants/data definitions for TDX SEAMCALLs + * + * This file is included by "tdx.h" after declarations of 'struct + * kvm_tdx' and 'struct vcpu_tdx'. C file should never include + * this header directly. + */ + +#ifndef __KVM_X86_TDX_OPS_H +#define __KVM_X86_TDX_OPS_H + +#include +#include +#include + +#include "x86.h" + +static inline u64 tdh_mng_addcx(struct kvm_tdx *kvm_tdx, hpa_t addr) +{ + struct tdx_module_args in = { + .rcx = addr, + .rdx = kvm_tdx->tdr_pa, + }; + + clflush_cache_range(__va(addr), PAGE_SIZE); + return seamcall(TDH_MNG_ADDCX, &in); +} + +static inline u64 tdh_mem_page_add(struct kvm_tdx *kvm_tdx, gpa_t gpa, + hpa_t hpa, hpa_t source, + u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa, + .rdx = kvm_tdx->tdr_pa, + .r8 = hpa, + .r9 = source, + }; + u64 ret; + + clflush_cache_range(__va(hpa), PAGE_SIZE); + ret = seamcall_ret(TDH_MEM_PAGE_ADD, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_mem_sept_add(struct kvm_tdx *kvm_tdx, gpa_t gpa, + int level, hpa_t page, + u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa | level, + .rdx = kvm_tdx->tdr_pa, + .r8 = page, + }; + u64 ret; + + clflush_cache_range(__va(page), PAGE_SIZE); + + ret = seamcall_ret(TDH_MEM_SEPT_ADD, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_mem_sept_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa, + int level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa | level, + .rdx = kvm_tdx->tdr_pa, + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_SEPT_REMOVE, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_vp_addcx(struct vcpu_tdx *tdx, hpa_t addr) +{ + struct tdx_module_args in = { + .rcx = addr, + .rdx = tdx->tdvpr_pa, + }; + + clflush_cache_range(__va(addr), PAGE_SIZE); + return seamcall(TDH_VP_ADDCX, &in); +} + +static inline u64 tdh_mem_page_aug(struct kvm_tdx *kvm_tdx, gpa_t gpa, hpa_t hpa, + u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa, + .rdx = kvm_tdx->tdr_pa, + .r8 = hpa, + }; + u64 ret; + + clflush_cache_range(__va(hpa), PAGE_SIZE); + ret = seamcall_ret(TDH_MEM_PAGE_AUG, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_mem_range_block(struct kvm_tdx *kvm_tdx, gpa_t gpa, + int level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa | level, + .rdx = kvm_tdx->tdr_pa, + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_mng_key_config(struct kvm_tdx *kvm_tdx) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + }; + + return seamcall(TDH_MNG_KEY_CONFIG, &in); +} + +static inline u64 tdh_mng_create(struct kvm_tdx *kvm_tdx, int hkid) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + .rdx = hkid, + }; + + clflush_cache_range(__va(kvm_tdx->tdr_pa), PAGE_SIZE); + return seamcall(TDH_MNG_CREATE, &in); +} + +static inline u64 tdh_vp_create(struct vcpu_tdx *tdx) +{ + struct tdx_module_args in = { + .rcx = tdx->tdvpr_pa, + .rdx = to_kvm_tdx(tdx->vcpu.kvm)->tdr_pa, + }; + + clflush_cache_range(__va(tdx->tdvpr_pa), PAGE_SIZE); + return seamcall(TDH_VP_CREATE, &in); +} + +static inline u64 tdh_mng_rd(struct kvm_tdx *kvm_tdx, u64 field, u64 *data) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + .rdx = field, + }; + u64 ret; + + ret = seamcall_ret(TDH_MNG_RD, &in); + + *data = in.r8; + + return ret; +} + +static inline u64 tdh_mr_extend(struct kvm_tdx *kvm_tdx, gpa_t gpa, + u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa, + .rdx = kvm_tdx->tdr_pa, + }; + u64 ret; + + ret = seamcall_ret(TDH_MR_EXTEND, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_mr_finalize(struct kvm_tdx *kvm_tdx) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + }; + + return seamcall(TDH_MR_FINALIZE, &in); +} + +static inline u64 tdh_vp_flush(struct vcpu_tdx *tdx) +{ + struct tdx_module_args in = { + .rcx = tdx->tdvpr_pa, + }; + + return seamcall(TDH_VP_FLUSH, &in); +} + +static inline u64 tdh_mng_vpflushdone(struct kvm_tdx *kvm_tdx) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + }; + + return seamcall(TDH_MNG_VPFLUSHDONE, &in); +} + +static inline u64 tdh_mng_key_freeid(struct kvm_tdx *kvm_tdx) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + }; + + return seamcall(TDH_MNG_KEY_FREEID, &in); +} + +static inline u64 tdh_mng_init(struct kvm_tdx *kvm_tdx, hpa_t td_params, + u64 *rcx) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + .rdx = td_params, + }; + u64 ret; + + ret = seamcall_ret(TDH_MNG_INIT, &in); + + *rcx = in.rcx; + + return ret; +} + +static inline u64 tdh_vp_init(struct vcpu_tdx *tdx, u64 rcx) +{ + struct tdx_module_args in = { + .rcx = tdx->tdvpr_pa, + .rdx = rcx, + }; + + return seamcall(TDH_VP_INIT, &in); +} + +static inline u64 tdh_vp_init_apicid(struct vcpu_tdx *tdx, u64 rcx, u32 x2apicid) +{ + struct tdx_module_args in = { + .rcx = tdx->tdvpr_pa, + .rdx = rcx, + .r8 = x2apicid, + }; + + /* apicid requires version == 1. */ + return seamcall(TDH_VP_INIT | (1ULL << TDX_VERSION_SHIFT), &in); +} + +static inline u64 tdh_vp_rd(struct vcpu_tdx *tdx, u64 field, u64 *data) +{ + struct tdx_module_args in = { + .rcx = tdx->tdvpr_pa, + .rdx = field, + }; + u64 ret; + + ret = seamcall_ret(TDH_VP_RD, &in); + + *data = in.r8; + + return ret; +} + +static inline u64 tdh_mng_key_reclaimid(struct kvm_tdx *kvm_tdx) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + }; + + return seamcall(TDH_MNG_KEY_RECLAIMID, &in); +} + +static inline u64 tdh_phymem_page_reclaim(hpa_t page, u64 *rcx, u64 *rdx, + u64 *r8) +{ + struct tdx_module_args in = { + .rcx = page, + }; + u64 ret; + + ret = seamcall_ret(TDH_PHYMEM_PAGE_RECLAIM, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + *r8 = in.r8; + + return ret; +} + +static inline u64 tdh_mem_page_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa, + int level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa | level, + .rdx = kvm_tdx->tdr_pa, + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_mem_track(struct kvm_tdx *kvm_tdx) +{ + struct tdx_module_args in = { + .rcx = kvm_tdx->tdr_pa, + }; + + return seamcall(TDH_MEM_TRACK, &in); +} + +static inline u64 tdh_mem_range_unblock(struct kvm_tdx *kvm_tdx, gpa_t gpa, + int level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args in = { + .rcx = gpa | level, + .rdx = kvm_tdx->tdr_pa, + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_RANGE_UNBLOCK, &in); + + *rcx = in.rcx; + *rdx = in.rdx; + + return ret; +} + +static inline u64 tdh_phymem_cache_wb(bool resume) +{ + struct tdx_module_args in = { + .rcx = resume ? 1 : 0, + }; + + return seamcall(TDH_PHYMEM_CACHE_WB, &in); +} + +static inline u64 tdh_phymem_page_wbinvd(hpa_t page) +{ + struct tdx_module_args in = { + .rcx = page, + }; + + return seamcall(TDH_PHYMEM_PAGE_WBINVD, &in); +} + +static inline u64 tdh_vp_wr(struct vcpu_tdx *tdx, u64 field, u64 val, u64 mask) +{ + struct tdx_module_args in = { + .rcx = tdx->tdvpr_pa, + .rdx = field, + .r8 = val, + .r9 = mask, + }; + + return seamcall(TDH_VP_WR, &in); +} + +#endif /* __KVM_X86_TDX_OPS_H */ From patchwork Mon Aug 12 22:48:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761102 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47FA319AD94; Mon, 12 Aug 2024 22:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502915; cv=none; b=Fb6P1h7uIQKDTvH9VnKIphxjzlze4vT+TsTE3s/WMIWywex5Ck8sRKAB2e7YkikEDt4Sd22G9IDk2PJuQhK6VayS2fGyshQyUPblSN3CHOSHqm7SUsawiXU+HqlbJRvrKdXfu7mKXHbinD+gmgU8ZzU3fDfLAPvZtsNyxZRIoWY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502915; c=relaxed/simple; bh=VT61EAOEqeW+bKLIJKXCdw1CBLyvciFpNO6T7mZ4HlA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=g8EVUieiytb4Mg/QpxVQwPJqPj/knELiXpMgjq2Hx9dUMpVX5MGZwJ24SJKsxdN9yavsW7c3ofMdK45GaHMAjXHKJNx0dGBMs3ckF6nlYokDxGh51ruo/sPlfeRPLDVaccIhOoscSl3emysm0vWThapWqnbaePHumPjLUC2s9N4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=g5vsVRUt; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="g5vsVRUt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502913; x=1755038913; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VT61EAOEqeW+bKLIJKXCdw1CBLyvciFpNO6T7mZ4HlA=; b=g5vsVRUtnre2WpH9S8mssYXLH9En9XwMuSEXVLd84YiZ8sg5nvkwl4up 8mSUBBLLyS9nes9AYAklkUTkOklxJ1iJh+jfpKLYLuZ5NlVD1fLaebko+ Tn2cYKoQi+2hNMu86n5glrhV+vp0XQFU9GDa7PK5VqLZEAciAWE1yTMEm R7kXQeTTFs3tAhJh8op6CLaitfxrUBQfNIv0yFzpD1FYQ6XapG+hlcQwJ xz/YED4WdvYWP91St0rH7IdHMVkEHk6l8iI48LO5OUDjiqRqldmNdKgsj iS1qrktD4Dfu1SmgZ0K2Tf/nCMjDpWwWGdAv/X8bPBC7TH7xne+bbC0nF Q==; X-CSE-ConnectionGUID: /lWNKRrLQqO8dELR37VNyQ== X-CSE-MsgGUID: RTVlqtBMQgSdkE8BB9dm9Q== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041352" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041352" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:29 -0700 X-CSE-ConnectionGUID: CDc8z0VuRoWMRQKexj6BbA== X-CSE-MsgGUID: +hwN9Uu7Si2edeKMx6kh/Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008352" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:28 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata , Binbin Wu , Yuan Yao Subject: [PATCH 05/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error Date: Mon, 12 Aug 2024 15:48:00 -0700 Message-Id: <20240812224820.34826-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add helper functions to print out errors from the TDX module in a uniform manner. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v1: - Update for the wrapper functions for SEAMCALLs. (Sean) - Reorder header file include to adjust argument change of the C wrapper. - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - dropped unnecessary include v18: - Added Reviewed-by Binbin. --- arch/x86/kvm/vmx/tdx_ops.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index a9b9ad15f6a8..3f64c871a3f2 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -16,6 +16,21 @@ #include "x86.h" +#define pr_tdx_error(__fn, __err) \ + pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err) + +#define pr_tdx_error_N(__fn, __err, __fmt, ...) \ + pr_err_ratelimited("SEAMCALL %s failed: 0x%llx, " __fmt, #__fn, __err, __VA_ARGS__) + +#define pr_tdx_error_1(__fn, __err, __rcx) \ + pr_tdx_error_N(__fn, __err, "rcx 0x%llx\n", __rcx) + +#define pr_tdx_error_2(__fn, __err, __rcx, __rdx) \ + pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx) + +#define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \ + pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8) + static inline u64 tdh_mng_addcx(struct kvm_tdx *kvm_tdx, hpa_t addr) { struct tdx_module_args in = { From patchwork Mon Aug 12 22:48:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761103 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5D4919ADBE; Mon, 12 Aug 2024 22:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502915; cv=none; b=r4C/doqC+UIzmuFMJ7shK01ROPzQpqgHh9imr834GsRorze/wDUK+JSVE1Ogjo0Cglkt2+s3yP950Iqb8u5ASW9SlpmvQzxsp6Oo0mnhBGeaKwBM6DmHDJfggJLGTSVM0/CZANoJZk/3FwWVHKqxnaBTi6aZZjtWC/+h2PBzDzc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502915; c=relaxed/simple; bh=Aa0gvO6cAMnN8Becpb5toQUl9zgCOERQkYWJR9wRO2k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qR43YRwIcQMBg+OS75FV+LFx/FYD6a5jPeJPDsG1Huoe02EHtjYF2G7Y958cNHdln42xGGfpER5KCUHr0tIwS1ODFh9NPRWTgGRtg14DRAnVPNQKNvfMBsmNnjZdO41Dq0uuQyZcWJH2B96tVfWld1+7hzZ4iR2vI6ly88S4Gmg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=P9XROs6P; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="P9XROs6P" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502914; x=1755038914; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Aa0gvO6cAMnN8Becpb5toQUl9zgCOERQkYWJR9wRO2k=; b=P9XROs6PgZam1ffjYKPdpR+PCcMu9wYycBDCnmphyF3XCA539HmBZdf/ yc2P0g6mEFh8Kc3KMVGA1eaBkIQ3HK7YbFm3LaWJ7XJPv8nW6nbf94I7B fzOK6rTMQcSrHOAFs70883HBQEsH2JSzJvO7PWXgXNbc1/uitx8f8Nh0m 5i8U8v7zIE4Q1Kcvv+5lCnB3zY89WE2qfKo8JvbxBxexNoFgseFy7fOgV wgHLApOXTt9vpxsfwfLFyiklV9kAVL9pDMoKDLt9eNu5UccBxpDMkmaa4 ndmO8kIj3vCWS3YVwDC0HDFcFb3wdZhhtfKpg/lxDaiTZKtyh/Fj5n1oN g==; X-CSE-ConnectionGUID: Kd3/HCQfRGiSNsuiwoX5Ww== X-CSE-MsgGUID: tEnlHu2SRK6S/obgOkx/Zw== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041360" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041360" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:29 -0700 X-CSE-ConnectionGUID: /OYfftySRbGn/UY7hpPNMA== X-CSE-MsgGUID: R7WjzlVjSvKSUI9RKYjVgQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008360" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:29 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 06/25] x86/virt/tdx: Export TDX KeyID information Date: Mon, 12 Aug 2024 15:48:01 -0700 Message-Id: <20240812224820.34826-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Kai Huang Each TDX guest must be protected by its own unique TDX KeyID. KVM will need to tell the TDX module the unique KeyID for a TDX guest when KVM creates it. Export the TDX KeyID range that can be used by TDX guests for KVM to use. KVM can then manage these KeyIDs and assign one for each TDX guest when it is created. Each TDX guest has a root control structure called "Trust Domain Root" (TDR). Unlike the rest of the TDX guest, the TDR is protected by the TDX global KeyID. When tearing down the TDR, KVM will need to pass the TDX global KeyID explicitly to the TDX module to flush cache associated to the TDR. Also export the TDX global KeyID for KVM to tear down the TDR. Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - New patch --- arch/x86/include/asm/tdx.h | 4 ++++ arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 56c3a5512c22..8e0eef4f74f5 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -176,6 +176,10 @@ struct tdx_sysinfo { const struct tdx_sysinfo *tdx_get_sysinfo(void); +extern u32 tdx_global_keyid; +extern u32 tdx_guest_keyid_start; +extern u32 tdx_nr_guest_keyids; + u64 __seamcall(u64 fn, struct tdx_module_args *args); u64 __seamcall_ret(u64 fn, struct tdx_module_args *args); u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 478d894f46a2..96640cfb1830 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -39,9 +39,14 @@ #include #include "tdx.h" -static u32 tdx_global_keyid __ro_after_init; -static u32 tdx_guest_keyid_start __ro_after_init; -static u32 tdx_nr_guest_keyids __ro_after_init; +u32 tdx_global_keyid __ro_after_init; +EXPORT_SYMBOL_GPL(tdx_global_keyid); + +u32 tdx_guest_keyid_start __ro_after_init; +EXPORT_SYMBOL_GPL(tdx_guest_keyid_start); + +u32 tdx_nr_guest_keyids __ro_after_init; +EXPORT_SYMBOL_GPL(tdx_nr_guest_keyids); static DEFINE_PER_CPU(bool, tdx_lp_initialized); From patchwork Mon Aug 12 22:48:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761104 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0C5319B3C4; Mon, 12 Aug 2024 22:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502916; cv=none; b=QDdGYdvMv5nfR7t97TbnDwWHkVeChvvClFnrGXh0UkzbZookAnyR2zcO6Ex+dKCYScp68srCivVi7RaFF0I0rmSZEtzyJp9KDH4pG2VUpakSCDvnSYg/gGr8rw3xUvSQ8xqpOs5C177HDmPsPezytVGy0hu07+xGGWzbP4gsskI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502916; c=relaxed/simple; bh=t1hPqRdy3HbA5wWiotcscwrx1WvzkmojXvu4BmnMQ9o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=C1+GY0bOTBkQmpuc66xHI5axyIGvRsnx8BH8+Tn5uGnp0PNK7HM//2ZBxF8CCFCoy+mZAScds0XSOsXm0pX3/GcdRFLNcBWrc7TGTCPvX8Po+lGQOpwf75usDtTFtwbP1eQUriuRtm33vPN1UPOzNRXlYCchA+70xMMA7SWF5Fk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Tohj3idk; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Tohj3idk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502914; x=1755038914; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t1hPqRdy3HbA5wWiotcscwrx1WvzkmojXvu4BmnMQ9o=; b=Tohj3idkKlKCCVu8mR7zc/XOwcbzaZ1v1tvvhv9vIhphizRbyhmuBZJQ WWjETSzCke+IRy9jVWQ1UmqD6vAsP30zjX7OASN8GVtkT26QgD3lnYAsn Qgy9ZYjV2oEnNmrttqae5S2f90cKQIZuwvpmpQSZ9QZNfmNMORRWt45cy aM2c0XuTw35hz7AbU8pzuM+XLbS4oztxHATtmS8rsIPiyNiHzrRX6vcXy 8BKVsdDe0DpOaA4HOfPQXf/oVxNqzVxoUgsUD9vS0dZgvnR4LvwpfUkd3 sFvlihUykOb3g0FyCu/hVEkrIRWKESN/FacrkEa8SZtkpPbcoYuYq3G+c w==; X-CSE-ConnectionGUID: tCaosbcRRnSBFe7euyWJ6g== X-CSE-MsgGUID: L2cieGPyQPKN5kpxX7fmbg== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041369" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041369" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:30 -0700 X-CSE-ConnectionGUID: mU5j0ngTSNikdasniSbsHg== X-CSE-MsgGUID: U0Hxk3AwQOSh/3v3YTcnxw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008375" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:30 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata Subject: [PATCH 07/25] KVM: TDX: Add helper functions to allocate/free TDX private host key id Date: Mon, 12 Aug 2024 15:48:02 -0700 Message-Id: <20240812224820.34826-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add helper functions to allocate/free TDX private host key id (HKID). The memory controller encrypts TDX memory with the assigned HKIDs. Each TDX guest must be protected by its own unique TDX HKID. The HW has a fixed set of these HKID keys. Out of those, some are set aside for use by for other TDX components, but most are saved for guest use. The code that does this partitioning, records the range chosen to be available for guest use in the tdx_guest_keyid_start and tdx_nr_guest_keyids variables. Use this range of HKIDs reserved for guest use with the kernel's IDA allocator library helper to create a mini TDX HKID allocator that can be called when setting up a TD. This way it can have an exclusive HKID, as is required. This allocator will be used in future changes. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Update the commit message - Delete stale comment on global hkdi - Deleted WARN_ON_ONCE() as it doesn't seemed very usefull v19: - Removed stale comment in tdx_guest_keyid_alloc() by Binbin - Update sanity check in tdx_guest_keyid_free() by Binbin v18: - Moved the functions to kvm tdx from arch/x86/virt/vmx/tdx/ - Drop exporting symbols as the host tdx does. --- arch/x86/kvm/vmx/tdx.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index dbcc1ed80efa..b1c885ce8c9c 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -14,6 +14,21 @@ static enum cpuhp_state tdx_cpuhp_state; static const struct tdx_sysinfo *tdx_sysinfo; +/* TDX KeyID pool */ +static DEFINE_IDA(tdx_guest_keyid_pool); + +static int __used tdx_guest_keyid_alloc(void) +{ + return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, + tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, + GFP_KERNEL); +} + +static void __used tdx_guest_keyid_free(int keyid) +{ + ida_free(&tdx_guest_keyid_pool, keyid); +} + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; From patchwork Mon Aug 12 22:48:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761106 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63B4F19D06E; Mon, 12 Aug 2024 22:48:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502918; cv=none; b=cIoRiTV0Ipm3JYdPMCFPXHm+P8Tl43fUmHIaKiMv9WLyygqC+kZuJN8IvdqPyCEKT4cPXRNX1Dv8Gp5Sc5Y/nfshO6gxGygQSgydKdQQ2IsYDCHYn9IDPvWih2fWk780x4m0XollzCVXqQ7bZo0V1+KVEH9h1q2w2tQcIUVK9vE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502918; c=relaxed/simple; bh=Ip2xcWFpC3D5OZKePr1UW34/GHSzjwOxZgWLMHIv1ZY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ondjujQFE+KyrpNSueVEaq+92kYwKehkHmC3HxsGq0gPJHMakgkPKpIPXC/DzFhjQ2/TEJ7Jiqg8dVxJ4LKzDm+/jd2bb5NNCWOK5KrVBbtO0KY6gMK4fAo0XMDpXeEWLJaKvPAe8evxowaPgb0pGb9fJTcKJ5j5kJn4RfoU5X4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ANzz2u77; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ANzz2u77" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502916; x=1755038916; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ip2xcWFpC3D5OZKePr1UW34/GHSzjwOxZgWLMHIv1ZY=; b=ANzz2u77S9vZVIvh81qtr4EdCwvpG4yboul+E2zGwnFl+5WTld5VrvCI 7Kr5d8CIWH/7iD295tnu0mAiFoyOTGk3phkwn9YKlCrusq4hDgCsLM9YQ AxK9DsGwzlFKFQV8UULks/AmT++QqCeo75dNJGCJUsQ/1VZ4C16ha0EGZ SvBtNNFDETHvs9KQQ1kbb3Xd2h46T8QFmxJ8PTElUoFNgWUICc06GpWpm FPnPD6jnEJz2wPExsXcN8jhN27pBnGnMpRUKjxvSjYRaX361Zx9JwNY1E mMr3lqukYST+FyQIynW8aprNwM5GJ2NEA3D6Ivb424pOnVlrDteltf6pD Q==; X-CSE-ConnectionGUID: sKSj8v3xSWafKkC2LrIclA== X-CSE-MsgGUID: JXZv1tU2R7yZwglnaB5FPw== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041379" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041379" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:30 -0700 X-CSE-ConnectionGUID: /zEa/GbWSKqZgVKsrusMSg== X-CSE-MsgGUID: r87SrUY4TVuwoapcXFtd8w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008385" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:30 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata Subject: [PATCH 08/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl Date: Mon, 12 Aug 2024 15:48:03 -0700 Message-Id: <20240812224820.34826-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata KVM_MEMORY_ENCRYPT_OP was introduced for VM-scoped operations specific for guest state-protected VM. It defined subcommands for technology-specific operations under KVM_MEMORY_ENCRYPT_OP. Despite its name, the subcommands are not limited to memory encryption, but various technology-specific operations are defined. It's natural to repurpose KVM_MEMORY_ENCRYPT_OP for TDX specific operations and define subcommands. Add a place holder function for TDX specific VM-scoped ioctl as mem_enc_op. TDX specific sub-commands will be added to retrieve/pass TDX specific parameters. Make mem_enc_ioctl non-optional as it's always filled. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - rename error->hw_error (Kai) - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module doesn't include it anymore. - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h - Drop middle paragraph in the commit log (Tony) v15: - change struct kvm_tdx_cmd to drop unused member. --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/uapi/asm/kvm.h | 26 ++++++++++++++++++++++++ arch/x86/kvm/vmx/main.c | 10 ++++++++++ arch/x86/kvm/vmx/tdx.c | 32 ++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 6 ++++++ arch/x86/kvm/x86.c | 4 ---- 6 files changed, 75 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index af58cabcf82f..538f50eee86d 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -123,7 +123,7 @@ KVM_X86_OP(leave_smm) KVM_X86_OP(enable_smi_window) #endif KVM_X86_OP_OPTIONAL(dev_get_attr) -KVM_X86_OP_OPTIONAL(mem_enc_ioctl) +KVM_X86_OP(mem_enc_ioctl) KVM_X86_OP_OPTIONAL(mem_enc_register_region) KVM_X86_OP_OPTIONAL(mem_enc_unregister_region) KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index cba4351b3091..d91f1bad800e 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -926,4 +926,30 @@ struct kvm_hyperv_eventfd { #define KVM_X86_SNP_VM 4 #define KVM_X86_TDX_VM 5 +/* Trust Domain eXtension sub-ioctl() commands. */ +enum kvm_tdx_cmd_id { + KVM_TDX_CAPABILITIES = 0, + + KVM_TDX_CMD_NR_MAX, +}; + +struct kvm_tdx_cmd { + /* enum kvm_tdx_cmd_id */ + __u32 id; + /* flags for sub-commend. If sub-command doesn't use this, set zero. */ + __u32 flags; + /* + * data for each sub-command. An immediate or a pointer to the actual + * data in process virtual address. If sub-command doesn't use it, + * set zero. + */ + __u64 data; + /* + * Auxiliary error code. The sub-command may return TDX SEAMCALL + * status code in addition to -Exxx. + * Defined for consistency with struct kvm_sev_cmd. + */ + __u64 hw_error; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 21fae631c775..59f4d2d42620 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -41,6 +41,14 @@ static __init int vt_hardware_setup(void) return 0; } +static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) +{ + if (!is_td(kvm)) + return -ENOTTY; + + return tdx_vm_ioctl(kvm, argp); +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -189,6 +197,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, .get_untagged_addr = vmx_get_untagged_addr, + + .mem_enc_ioctl = vt_mem_enc_ioctl, }; struct kvm_x86_init_ops vt_init_ops __initdata = { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b1c885ce8c9c..de14e80d8f3a 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2,6 +2,7 @@ #include #include #include "capabilities.h" +#include "x86_ops.h" #include "tdx.h" #undef pr_fmt @@ -29,6 +30,37 @@ static void __used tdx_guest_keyid_free(int keyid) ida_free(&tdx_guest_keyid_pool, keyid); } +int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) +{ + struct kvm_tdx_cmd tdx_cmd; + int r; + + if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd))) + return -EFAULT; + + /* + * Userspace should never set @error. It is used to fill + * hardware-defined error by the kernel. + */ + if (tdx_cmd.hw_error) + return -EINVAL; + + mutex_lock(&kvm->lock); + + switch (tdx_cmd.id) { + default: + r = -EINVAL; + goto out; + } + + if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd))) + r = -EFAULT; + +out: + mutex_unlock(&kvm->lock); + return r; +} + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 133afc4d196e..c69ca640abe6 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -118,4 +118,10 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); #endif void vmx_setup_mce(struct kvm_vcpu *vcpu); +#ifdef CONFIG_INTEL_TDX_HOST +int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); +#else +static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; } +#endif + #endif /* __KVM_X86_VMX_X86_OPS_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a8944266a54d..7914ea50fd04 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7313,10 +7313,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) goto out; } case KVM_MEMORY_ENCRYPT_OP: { - r = -ENOTTY; - if (!kvm_x86_ops.mem_enc_ioctl) - goto out; - r = kvm_x86_call(mem_enc_ioctl)(kvm, argp); break; } From patchwork Mon Aug 12 22:48:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761105 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63BC719D070; Mon, 12 Aug 2024 22:48:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502918; cv=none; b=bjK2OqFTc9MHojqn12BlIgZBtk26NchpgIvWxbbcVRrU+JDB0Vwgh+D+uL+bqdid7aX9y/YDMaa5Nl4WOjVD0bvIgidelJRQ/f7yOs9hUg9Fvycb5zgyTKNrbNPHycGyVfu/N1tNbPm/DvnDAifrwoxukQC5IWEoGCqgtACc2Mo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502918; c=relaxed/simple; bh=j4Y2BGbvsHIJYnljwI5pMtQGF+TawdHNTeLi+06hyjs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aOW8c0UBXw56s2Ym8tKj09JlD9nn06iGp0t5WY7eAljMHQnnR2u75Atn8t3m/sQwicvE7MfyBYOqWNrpMXBPb/WuNRxYyRSYISfzaBCvl3QTg4ZlpkfXD4OpJhaI0lBaA8IuH+LW9ntKCgjH0o1t2d1OWvISuyBGEUlMRmYdzww= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=e16vSp34; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="e16vSp34" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502916; x=1755038916; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j4Y2BGbvsHIJYnljwI5pMtQGF+TawdHNTeLi+06hyjs=; b=e16vSp34OlLQehOlN4MgOJ7wIQiPsv/UQyPYr72WjjCVXEIm/z39SlpP 06dsK9+Z6GOaa2UfaENEtk2gysgeZdet5yI1FQCQNaDNcmbJ80DNosGHv UhV5EbZ4pH0O27Kq1+f8Xu4KSG/THFqCE7p/UHfwIdgzXwICoWaeO8AhM LDqMqtkNPcYo00bOIUpO/lCwllpIpflT4Nk3ipBrIZ/SszjChHrF0j34F jMN5zbOn+c0fges3rmPbwTqOocmGzh1E7EVb21Unse7umZkTh44kkY+8K R+jinHAOOHbpdHr7zppOGbWyywM5/CpV+xxIsOQMgMW/qv26bRc/IqsEc A==; X-CSE-ConnectionGUID: CGzepzP2Rd68ETRH7N/Dxw== X-CSE-MsgGUID: dcHGAejCRiKrfa70Eas6Nw== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041386" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041386" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:31 -0700 X-CSE-ConnectionGUID: eIQJRq6aRjKJJEqy2kwwxA== X-CSE-MsgGUID: y8Hkr8EvSkSIroagU+9mgQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008391" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:31 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata , Binbin Wu Subject: [PATCH 09/25] KVM: TDX: Get system-wide info about TDX module on initialization Date: Mon, 12 Aug 2024 15:48:04 -0700 Message-Id: <20240812224820.34826-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX KVM needs system-wide information about the TDX module, store it in struct tdx_info. Release the allocated memory on module unloading by hardware_unsetup() callback. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu --- uAPI breakout v1: - Mention about hardware_unsetup(). (Binbin) - Added Reviewed-by. (Binbin) - Eliminated tdx_md_read(). (Kai) - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module doesn't include it anymore. - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h v19: - Added features0 - Use tdx_sys_metadata_read() - Fix error recovery path by Yuan Change v18: - Newly Added --- arch/x86/include/uapi/asm/kvm.h | 28 +++++++++++++ arch/x86/kvm/vmx/tdx.c | 70 +++++++++++++++++++++++++++++++++ 2 files changed, 98 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index d91f1bad800e..47caf508cca7 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -952,4 +952,32 @@ struct kvm_tdx_cmd { __u64 hw_error; }; +#define KVM_TDX_CPUID_NO_SUBLEAF ((__u32)-1) + +struct kvm_tdx_cpuid_config { + __u32 leaf; + __u32 sub_leaf; + __u32 eax; + __u32 ebx; + __u32 ecx; + __u32 edx; +}; + +/* supported_gpaw */ +#define TDX_CAP_GPAW_48 (1 << 0) +#define TDX_CAP_GPAW_52 (1 << 1) + +struct kvm_tdx_capabilities { + __u64 attrs_fixed0; + __u64 attrs_fixed1; + __u64 xfam_fixed0; + __u64 xfam_fixed1; + __u32 supported_gpaw; + __u32 padding; + __u64 reserved[251]; + + __u32 nr_cpuid_configs; + struct kvm_tdx_cpuid_config cpuid_configs[]; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index de14e80d8f3a..90b44ebaf864 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -3,6 +3,7 @@ #include #include "capabilities.h" #include "x86_ops.h" +#include "mmu.h" #include "tdx.h" #undef pr_fmt @@ -30,6 +31,72 @@ static void __used tdx_guest_keyid_free(int keyid) ida_free(&tdx_guest_keyid_pool, keyid); } +static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) +{ + const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; + struct kvm_tdx_capabilities __user *user_caps; + struct kvm_tdx_capabilities *caps = NULL; + int i, ret = 0; + + /* flags is reserved for future use */ + if (cmd->flags) + return -EINVAL; + + caps = kmalloc(sizeof(*caps), GFP_KERNEL); + if (!caps) + return -ENOMEM; + + user_caps = u64_to_user_ptr(cmd->data); + if (copy_from_user(caps, user_caps, sizeof(*caps))) { + ret = -EFAULT; + goto out; + } + + if (caps->nr_cpuid_configs < td_conf->num_cpuid_config) { + ret = -E2BIG; + goto out; + } + + *caps = (struct kvm_tdx_capabilities) { + .attrs_fixed0 = td_conf->attributes_fixed0, + .attrs_fixed1 = td_conf->attributes_fixed1, + .xfam_fixed0 = td_conf->xfam_fixed0, + .xfam_fixed1 = td_conf->xfam_fixed1, + .supported_gpaw = TDX_CAP_GPAW_48 | + ((kvm_host.maxphyaddr >= 52 && + cpu_has_vmx_ept_5levels()) ? TDX_CAP_GPAW_52 : 0), + .nr_cpuid_configs = td_conf->num_cpuid_config, + .padding = 0, + }; + + if (copy_to_user(user_caps, caps, sizeof(*caps))) { + ret = -EFAULT; + goto out; + } + + for (i = 0; i < td_conf->num_cpuid_config; i++) { + struct kvm_tdx_cpuid_config cpuid_config = { + .leaf = (u32)td_conf->cpuid_config_leaves[i], + .sub_leaf = td_conf->cpuid_config_leaves[i] >> 32, + .eax = (u32)td_conf->cpuid_config_values[i].eax_ebx, + .ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32, + .ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx, + .edx = td_conf->cpuid_config_values[i].ecx_edx >> 32, + }; + + if (copy_to_user(&(user_caps->cpuid_configs[i]), &cpuid_config, + sizeof(struct kvm_tdx_cpuid_config))) { + ret = -EFAULT; + break; + } + } + +out: + /* kfree() accepts NULL. */ + kfree(caps); + return ret; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -48,6 +115,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) mutex_lock(&kvm->lock); switch (tdx_cmd.id) { + case KVM_TDX_CAPABILITIES: + r = tdx_get_capabilities(&tdx_cmd); + break; default: r = -EINVAL; goto out; From patchwork Mon Aug 12 22:48:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761107 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E087519D088; Mon, 12 Aug 2024 22:48:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502919; cv=none; b=Q7kL6W4fy+gSwVyxR/xJuy7r0rwuffbbiWMSJRXb+G5pA8MQVwAuehw0AhOB9maMI1E15Jc1ikf4o+iURLp/rG+Q3aY6woTXc/IBP2GvB5f5EfT0rhna5esWitrrjUNqYXi+TJWEhUQwCV80kA9gJTBCTDjetuREXi6y5uzCivc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502919; c=relaxed/simple; bh=jsaF1/4L2ZlPKq7Wkr79QD6T8YiApcLdcfsVHvTCHZw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GPyVQOKV4CrpsvEXqHDnGtM4DDQYc2QNkasD8b1LRCZpFULC+pdQJRaOuKFDTf6BI1fh/PQ81xry9gjcXGTC61v0qt62G4MH8islYxr7pu5+e/V4ukHPp7z830WLGlFqCEdyeA1rhHOTqumDdTC3sKFjoB01tPVO0H1aDQlczak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fUYhOnLH; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fUYhOnLH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502917; x=1755038917; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jsaF1/4L2ZlPKq7Wkr79QD6T8YiApcLdcfsVHvTCHZw=; b=fUYhOnLH29/jhiOzZy3IF8yxwohcWayZUoh7ChHRSG6mgDq0MWgNOmzD LRMhZG0dvo6V3ZrCBbVvzJqPoVAFaAYbGb/QGQko6ZRTyAfCTXe6Mf5fM J0rZINtF3FPA0Bs2C+U9/vyZNbTbpXUreemtwGC9A5okPVNpHFEDvkor1 vBbXuksGKImW+6xSQHHU7jje6148cRDSgVQoC83enmGm79IvJR6/KkOoA CgoJQ9MswLM6V7Nn0irJKx4OR7044dY2z507Hyj0Al4V+z+6eU1UlE2aY kQL3yAqie08mIA0kVZ0WAOmkyL3Y0FDWn9fpseoB0fkVgRT0olNToDP8S w==; X-CSE-ConnectionGUID: ExkMCNS3Rq2kJBoYRmU9YA== X-CSE-MsgGUID: LUEpLIe1R3i3CrkFgAsNpw== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041392" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041392" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:31 -0700 X-CSE-ConnectionGUID: HQ/zeQ9XRhCATq3TeOPOYA== X-CSE-MsgGUID: XTMaglR/Q7OVjKVurpAJWQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008395" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:31 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 10/25] KVM: TDX: Initialize KVM supported capabilities when module setup Date: Mon, 12 Aug 2024 15:48:05 -0700 Message-Id: <20240812224820.34826-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Xiaoyao Li While TDX module reports a set of capabilities/features that it supports, what KVM currently supports might be a subset of them. E.g., DEBUG and PERFMON are supported by TDX module but currently not supported by KVM. Introduce a new struct kvm_tdx_caps to store KVM's capabilities of TDX. supported_attrs and suppported_xfam are validated against fixed0/1 values enumerated by TDX module. Configurable CPUID bits derive from TDX module plus applying KVM's capabilities (KVM_GET_SUPPORTED_CPUID), i.e., mask off the bits that are configurable in the view of TDX module but not supported by KVM yet. KVM_TDX_CPUID_NO_SUBLEAF is the concept from TDX module, switch it to 0 and use KVM_CPUID_FLAG_SIGNIFCANT_INDEX, which are the concept of KVM. Signed-off-by: Xiaoyao Li Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Change setup_kvm_tdx_caps() to use the exported 'struct tdx_sysinfo' pointer. - Change how to copy 'kvm_tdx_cpuid_config' since 'struct tdx_sysinfo' doesn't have 'kvm_tdx_cpuid_config'. - Updates for uAPI changes --- arch/x86/include/uapi/asm/kvm.h | 2 - arch/x86/kvm/vmx/tdx.c | 81 +++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 47caf508cca7..c9eb2e2f5559 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -952,8 +952,6 @@ struct kvm_tdx_cmd { __u64 hw_error; }; -#define KVM_TDX_CPUID_NO_SUBLEAF ((__u32)-1) - struct kvm_tdx_cpuid_config { __u32 leaf; __u32 sub_leaf; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 90b44ebaf864..d89973e554f6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -31,6 +31,19 @@ static void __used tdx_guest_keyid_free(int keyid) ida_free(&tdx_guest_keyid_pool, keyid); } +#define KVM_TDX_CPUID_NO_SUBLEAF ((__u32)-1) + +struct kvm_tdx_caps { + u64 supported_attrs; + u64 supported_xfam; + + u16 num_cpuid_config; + /* This must the last member. */ + DECLARE_FLEX_ARRAY(struct kvm_tdx_cpuid_config, cpuid_configs); +}; + +static struct kvm_tdx_caps *kvm_tdx_caps; + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; @@ -131,6 +144,68 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) return r; } +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE) + +static int __init setup_kvm_tdx_caps(void) +{ + const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; + u64 kvm_supported; + int i; + + kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) + + sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config, + GFP_KERNEL); + if (!kvm_tdx_caps) + return -ENOMEM; + + kvm_supported = KVM_SUPPORTED_TD_ATTRS; + if ((kvm_supported & td_conf->attributes_fixed1) != td_conf->attributes_fixed1) + goto err; + + kvm_tdx_caps->supported_attrs = kvm_supported & td_conf->attributes_fixed0; + + kvm_supported = kvm_caps.supported_xcr0 | kvm_caps.supported_xss; + + /* + * PT and CET can be exposed to TD guest regardless of KVM's XSS, PT + * and, CET support. + */ + kvm_supported |= XFEATURE_MASK_PT | XFEATURE_MASK_CET_USER | + XFEATURE_MASK_CET_KERNEL; + if ((kvm_supported & td_conf->xfam_fixed1) != td_conf->xfam_fixed1) + goto err; + + kvm_tdx_caps->supported_xfam = kvm_supported & td_conf->xfam_fixed0; + + kvm_tdx_caps->num_cpuid_config = td_conf->num_cpuid_config; + for (i = 0; i < td_conf->num_cpuid_config; i++) { + struct kvm_tdx_cpuid_config source = { + .leaf = (u32)td_conf->cpuid_config_leaves[i], + .sub_leaf = td_conf->cpuid_config_leaves[i] >> 32, + .eax = (u32)td_conf->cpuid_config_values[i].eax_ebx, + .ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32, + .ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx, + .edx = td_conf->cpuid_config_values[i].ecx_edx >> 32, + }; + struct kvm_tdx_cpuid_config *dest = + &kvm_tdx_caps->cpuid_configs[i]; + + memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config)); + if (dest->sub_leaf == KVM_TDX_CPUID_NO_SUBLEAF) + dest->sub_leaf = 0; + } + + return 0; +err: + kfree(kvm_tdx_caps); + return -EIO; +} + +static void free_kvm_tdx_cap(void) +{ + kfree(kvm_tdx_caps); +} + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; @@ -217,11 +292,16 @@ static int __init __tdx_bringup(void) goto get_sysinfo_err; } + r = setup_kvm_tdx_caps(); + if (r) + goto get_sysinfo_err; + /* * Leave hardware virtualization enabled after TDX is enabled * successfully. TDX CPU hotplug depends on this. */ return 0; + get_sysinfo_err: __do_tdx_cleanup(); tdx_bringup_err: @@ -232,6 +312,7 @@ static int __init __tdx_bringup(void) void tdx_cleanup(void) { if (enable_tdx) { + free_kvm_tdx_cap(); __do_tdx_cleanup(); kvm_disable_virtualization(); } From patchwork Mon Aug 12 22:48:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761108 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9297919DF43; Mon, 12 Aug 2024 22:48:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502920; cv=none; b=MzOfdj8g2Ho1OkOy7LYNrw+WydTAUTQZ4ootbuw7D/48tNejm/bf3qtyXWZZK0WZC9mI15a20I8wE4OpWaANPPR8R14X081RG6w/YQp6O7ruKyxnjJHal3AwbryiieGXkN4j8aVk1nEH9wmExi46dwCeFPAUxLxDQLva99IXSRU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502920; c=relaxed/simple; bh=faCFeMnLwdeP2y5ciEHpYYg4AEHUQfDTUSGo11ATEJ0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JibsbbYc1K3oMDmeutiw3N5vE8XJ1Z5W6CJLLAxSSozWdiFcLSlTGuKa04Vi+2XdNtFQjpK3G7iCMGMhnclENjV1OILC0B+T/Yr5xki1ZJV2VWWrs95K8nQ/wC4bXBI5qT8FjaAIpK1PYBz+jyfmCdfEH2abCKUpMBvFTp8Se2M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TrELjjbz; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TrELjjbz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502918; x=1755038918; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=faCFeMnLwdeP2y5ciEHpYYg4AEHUQfDTUSGo11ATEJ0=; b=TrELjjbzVsM2u0sW4JqD42GHkCpfS2pqzgUbYSTCPlbVxnntKXdrDIzT PAyjGgfCifuuLU7HZLLGLKqLFLfWSe6a6W2jM5DaA1H2G4RSLn4OfUy81 eeR+x0IDbVoOdBMBD97VDtUULOvjbYhCB2FNxMkDxee23d+HqFoNOP+pZ wAc746gWMD1sG2H8bxLYJ/av2KBZ7MOIS2NdkBVNO8H1fr4h7yM+hQraQ Dww5h4+3rDu/lCuS4AKr8l/y7b9nZb9CBOpKK1/nVa02KHP66+irTAGCa Pt7CbAWzrro+nP/W+9NCAgJNZDH/AX9SSuKs8m0iED3cKzctNdofnrRNv Q==; X-CSE-ConnectionGUID: YcyvFwqtRiG2sHxycpnmZQ== X-CSE-MsgGUID: ls5FUr7NTg2Eysiv1C4big== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041399" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041399" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:32 -0700 X-CSE-ConnectionGUID: b1TW5PqMQb6h3fcfEeKgAQ== X-CSE-MsgGUID: +m02pT79TNWrJq3KcWwAZg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008401" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:32 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 11/25] KVM: TDX: Report kvm_tdx_caps in KVM_TDX_CAPABILITIES Date: Mon, 12 Aug 2024 15:48:06 -0700 Message-Id: <20240812224820.34826-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Xiaoyao Li Report raw capabilities of TDX module to userspace isn't so useful and incorrect, because some of the capabilities might not be supported by KVM. Instead, report the KVM capp'ed capbilities to userspace. Removed the supported_gpaw field. Because CPUID.0x80000008.EAX[23:16] of KVM_SUPPORTED_CPUID enumerates the 5 level EPT support, i.e., if GPAW52 is supported or not. Note, GPAW48 should be always supported. Thus no need for explicit enumeration. Signed-off-by: Xiaoyao Li Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Code change due to previous patches changed to use exported 'struct tdx_sysinfo' pointer. --- arch/x86/include/uapi/asm/kvm.h | 14 +++---------- arch/x86/kvm/vmx/tdx.c | 36 ++++++++------------------------- 2 files changed, 11 insertions(+), 39 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index c9eb2e2f5559..2e3caa5a58fd 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -961,18 +961,10 @@ struct kvm_tdx_cpuid_config { __u32 edx; }; -/* supported_gpaw */ -#define TDX_CAP_GPAW_48 (1 << 0) -#define TDX_CAP_GPAW_52 (1 << 1) - struct kvm_tdx_capabilities { - __u64 attrs_fixed0; - __u64 attrs_fixed1; - __u64 xfam_fixed0; - __u64 xfam_fixed1; - __u32 supported_gpaw; - __u32 padding; - __u64 reserved[251]; + __u64 supported_attrs; + __u64 supported_xfam; + __u64 reserved[254]; __u32 nr_cpuid_configs; struct kvm_tdx_cpuid_config cpuid_configs[]; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d89973e554f6..f9faec217ea9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -49,7 +49,7 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; struct kvm_tdx_capabilities __user *user_caps; struct kvm_tdx_capabilities *caps = NULL; - int i, ret = 0; + int ret = 0; /* flags is reserved for future use */ if (cmd->flags) @@ -70,39 +70,19 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) goto out; } - *caps = (struct kvm_tdx_capabilities) { - .attrs_fixed0 = td_conf->attributes_fixed0, - .attrs_fixed1 = td_conf->attributes_fixed1, - .xfam_fixed0 = td_conf->xfam_fixed0, - .xfam_fixed1 = td_conf->xfam_fixed1, - .supported_gpaw = TDX_CAP_GPAW_48 | - ((kvm_host.maxphyaddr >= 52 && - cpu_has_vmx_ept_5levels()) ? TDX_CAP_GPAW_52 : 0), - .nr_cpuid_configs = td_conf->num_cpuid_config, - .padding = 0, - }; + caps->supported_attrs = kvm_tdx_caps->supported_attrs; + caps->supported_xfam = kvm_tdx_caps->supported_xfam; + caps->nr_cpuid_configs = kvm_tdx_caps->num_cpuid_config; if (copy_to_user(user_caps, caps, sizeof(*caps))) { ret = -EFAULT; goto out; } - for (i = 0; i < td_conf->num_cpuid_config; i++) { - struct kvm_tdx_cpuid_config cpuid_config = { - .leaf = (u32)td_conf->cpuid_config_leaves[i], - .sub_leaf = td_conf->cpuid_config_leaves[i] >> 32, - .eax = (u32)td_conf->cpuid_config_values[i].eax_ebx, - .ebx = td_conf->cpuid_config_values[i].eax_ebx >> 32, - .ecx = (u32)td_conf->cpuid_config_values[i].ecx_edx, - .edx = td_conf->cpuid_config_values[i].ecx_edx >> 32, - }; - - if (copy_to_user(&(user_caps->cpuid_configs[i]), &cpuid_config, - sizeof(struct kvm_tdx_cpuid_config))) { - ret = -EFAULT; - break; - } - } + if (copy_to_user(user_caps->cpuid_configs, &kvm_tdx_caps->cpuid_configs, + kvm_tdx_caps->num_cpuid_config * + sizeof(kvm_tdx_caps->cpuid_configs[0]))) + ret = -EFAULT; out: /* kfree() accepts NULL. */ From patchwork Mon Aug 12 22:48:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761109 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7D4819DF4A; Mon, 12 Aug 2024 22:48:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502921; cv=none; b=Z9CAxZG5ETxIW+JVLnsBxkK+PjBEcDrFWVz2F+Rku/vS4EPluOLrkOaq3QvGQodVA5mnd2N3FShUD6yXZSAahP8dwi8xXVectYAS5/Zl/p3ekTaacyzl9pwPn7KC8i1HCRVJ3QrQKvaIGVggVWR71qJLopnfFVHvGseBnQX6d2g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502921; c=relaxed/simple; bh=u1yqSineS7vnqG2wmXny+7h3te9Gd4UdU9NttFQ+NT0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ghCwM7KmZD+ajQ1W0BfJwWFCtAS+7J4OGnFfzBUrPzjE+7gi7hD7eiLqpmgc705P5ge+2QaAO7sg2FHE/yA/S5aA8UN8so0i92nu5i8uCRy81//GuOuB77ghAzxbuhLCVR2H4yb1+hYsj0haK3fZyl1u1Or2CaVWEy7u9k0oDQY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lutrIzFt; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lutrIzFt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502918; x=1755038918; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u1yqSineS7vnqG2wmXny+7h3te9Gd4UdU9NttFQ+NT0=; b=lutrIzFtg1/vHSlD1K/yRYy2wZLD4W/OIJGXPy63wD+DLTKoz6eE6pyL e1Lm1eRyfZMb95sinjC0v6UQ1+0v1YiiWGFkFkYVMLP6rws8sN8TnWKuY c4dewEsQ87OUMbSvf3gdJ4vbSubB5Y299JjHy8AHaedmCuOE6Crdo0Ag+ p0gGyL50nb5X+VC1Krbrvtj5ClHW+64lgtRShXKfI+6r9YxuGs/ruSMaS caQy3HDMS2Gy/jmAQwYv97TQcM05WOY3sq4Al9yxX4jpHxNBvKg9ABddg SeYwC/NLt4zAgKwVyz/Vj0mtGVr/tXte3d5qPslXtVt3Z692LUboMeRre w==; X-CSE-ConnectionGUID: tRM2A13sQxePtkeTEA77Rg== X-CSE-MsgGUID: 3+85Ux+HQQGbmCh2K50s0w== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041407" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041407" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:33 -0700 X-CSE-ConnectionGUID: Qtmt1G4bRXK3KqphsqSXNw== X-CSE-MsgGUID: 1c9oayPeRKCLy7Y6ddom1g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008405" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:33 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata Subject: [PATCH 12/25] KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests Date: Mon, 12 Aug 2024 15:48:07 -0700 Message-Id: <20240812224820.34826-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX has its own mechanism to control the maximum number of vCPUs that the TDX guest can use. When creating a TDX guest, the maximum number of vCPUs of the guest needs to be passed to the TDX module as part of the measurement of the guest. Depending on TDX module's version, it may also report the maximum vCPUs it can support for all TDX guests. Because the maximum number of vCPUs is part of the measurement, thus part of attestation, it's better to allow the userspace to be able to configure it. E.g. the users may want to precisely control the maximum number of vCPUs their precious VMs can use. The actual control itself must be done via the TDH.MNG.INIT SEAMCALL, where the number of maximum cpus is part of the input to the TDX module, but KVM needs to support the "per-VM maximum number of vCPUs" and reflect that in the KVM_CAP_MAX_VCPUS. Currently, the KVM x86 always reports KVM_MAX_VCPUS for all VMs but doesn't allow to enable KVM_CAP_MAX_VCPUS to configure the number of maximum vCPUs on VM-basis. Add "per-VM maximum number of vCPUs" to KVM x86/TDX to accommodate TDX's needs. Specifically, use KVM's existing KVM_ENABLE_CAP IOCTL() to allow the userspace to configure the maximum vCPUs by making KVM x86 support enabling the KVM_CAP_MAX_VCPUS cap on VM-basis. For that, add a new 'kvm_x86_ops::vm_enable_cap()' callback and call it from kvm_vm_ioctl_enable_cap() as a placeholder to handle the KVM_CAP_MAX_VCPUS for TDX guests (and other KVM_CAP_xx for TDX and/or other VMs if needed in the future). Implement the callback for TDX guest to check whether the maximum vCPUs passed from usrspace can be supported by TDX, and if it can, override the 'struct kvm::max_vcpus'. Leave VMX guests and all AMD guests unsupported to avoid any side-effect for those VMs. Accordingly, in the KVM_CHECK_EXTENSION IOCTL(), change to return the 'struct kvm::max_vcpus' for a given VM for the KVM_CAP_MAX_VCPUS. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Change to use exported 'struct tdx_sysinfo' pointer. - Remove the code to read 'max_vcpus_per_td' since it is now done in TDX host code. - Drop max_vcpu ops to use kvm.max_vcpus - Remove TDX_MAX_VCPUS (Kai) - Use type cast (u16) instead of calling memcpy() when reading the 'max_vcpus_per_td' (Kai) - Improve change log and change patch title from "KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific" (Kai) --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx/main.c | 10 ++++++++++ arch/x86/kvm/vmx/tdx.c | 29 +++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 5 +++++ arch/x86/kvm/x86.c | 4 ++++ 6 files changed, 50 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 538f50eee86d..bd7434fe5d37 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -19,6 +19,7 @@ KVM_X86_OP(hardware_disable) KVM_X86_OP(hardware_unsetup) KVM_X86_OP(has_emulated_msr) KVM_X86_OP(vcpu_after_set_cpuid) +KVM_X86_OP_OPTIONAL(vm_enable_cap) KVM_X86_OP(vm_init) KVM_X86_OP_OPTIONAL(vm_destroy) KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c754183e0932..9d15f810f046 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1648,6 +1648,7 @@ struct kvm_x86_ops { void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu); unsigned int vm_size; + int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap); int (*vm_init)(struct kvm *kvm); void (*vm_destroy)(struct kvm *kvm); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 59f4d2d42620..cd53091ddaab 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -7,6 +7,7 @@ #include "pmu.h" #include "posted_intr.h" #include "tdx.h" +#include "tdx_arch.h" static __init int vt_hardware_setup(void) { @@ -41,6 +42,14 @@ static __init int vt_hardware_setup(void) return 0; } +static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) +{ + if (is_td(kvm)) + return tdx_vm_enable_cap(kvm, cap); + + return -EINVAL; +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -72,6 +81,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .has_emulated_msr = vmx_has_emulated_msr, .vm_size = sizeof(struct kvm_vmx), + .vm_enable_cap = vt_vm_enable_cap, .vm_init = vmx_vm_init, .vm_destroy = vmx_vm_destroy, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f9faec217ea9..84cd9b4f90b5 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -44,6 +44,35 @@ struct kvm_tdx_caps { static struct kvm_tdx_caps *kvm_tdx_caps; +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) +{ + int r; + + switch (cap->cap) { + case KVM_CAP_MAX_VCPUS: { + if (cap->flags || cap->args[0] == 0) + return -EINVAL; + if (cap->args[0] > KVM_MAX_VCPUS || + cap->args[0] > tdx_sysinfo->td_conf.max_vcpus_per_td) + return -E2BIG; + + mutex_lock(&kvm->lock); + if (kvm->created_vcpus) + r = -EBUSY; + else { + kvm->max_vcpus = cap->args[0]; + r = 0; + } + mutex_unlock(&kvm->lock); + break; + } + default: + r = -EINVAL; + break; + } + return r; +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index c69ca640abe6..c1bdf7d8fee3 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -119,8 +119,13 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); void vmx_setup_mce(struct kvm_vcpu *vcpu); #ifdef CONFIG_INTEL_TDX_HOST +int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap); int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); #else +static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) +{ + return -EINVAL; +}; static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; } #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7914ea50fd04..751b3841c48f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4754,6 +4754,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) break; case KVM_CAP_MAX_VCPUS: r = KVM_MAX_VCPUS; + if (kvm) + r = kvm->max_vcpus; break; case KVM_CAP_MAX_VCPU_ID: r = KVM_MAX_VCPU_IDS; @@ -6772,6 +6774,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } default: r = -EINVAL; + if (kvm_x86_ops.vm_enable_cap) + r = static_call(kvm_x86_vm_enable_cap)(kvm, cap); break; } return r; From patchwork Mon Aug 12 22:48:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761110 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CDAA19DF6D; Mon, 12 Aug 2024 22:48:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502922; cv=none; b=TIT3SEzQKjkD4iAJsrrtIEa8Q2YqDmyY6mNm71nV1YAWOjtugdsBGJ+Fk0HPT6Q9HO1kl8evt++OX04awLW0Wn3oVHiYEqBAzrlfFJ3xUIfikN6jzCl+h4aX+1EtVC7ErO8960L1DA385U37ByEHDLoW5mUwsbptpWuVNAs6suc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502922; c=relaxed/simple; bh=UziL5guC6X8gak1C38TVA9qj3fY3wzi7e1GU2TW3/Ko=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NAzt/DdFir5kbfgxFaljtQw/YJ8p67MwKIVAM0nrFfB0KujgjFU8Z8DxXc7EX3shRTDCQatWJvENa+JJP15lfo6i7VGaiNt6A4xsbEvikxtXc0UqOWWR5jemc1hXaW/CGXJMngOX74uVviWDZR9SCt7IQ+SATmnuh9heRYd2Cs0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=j3Jf8tkY; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="j3Jf8tkY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502919; x=1755038919; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UziL5guC6X8gak1C38TVA9qj3fY3wzi7e1GU2TW3/Ko=; b=j3Jf8tkYYSVsb/0N3FMlbV9K3mo55DsS+JzWgEaNQlwP6nxaOMJ2boiL 3a+veIXDGOqzJlfKl19rDx0m88Q58XNGrELAm4ZpvmM0untia6pXkIh8u Vx/CBxSSl8a3M0tloSf7xMwtNKn9eIGWOwP/FL26A3HuhVE459mcmrSkY qIKnq2N0UFVVZsIp4xmVr13w8JYzPOv9+9zfD5p2jnDScspdQ+6tmXho9 LNnzhB0lGueTcyU3lJgO+2Y3slQzhgxPYHGpaKzWZ7y663hmVDz5wpml1 zvpEFpJehi6FUQe80hq7YV/wU0pqZc4e+r24hN7+LP0klgqCRgp9cuFju Q==; X-CSE-ConnectionGUID: bZfPeI9aSB+bDkkX5MEWqw== X-CSE-MsgGUID: 1up8BalfSOKMUMltiqMaoA== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041411" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041411" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:34 -0700 X-CSE-ConnectionGUID: 4Tu19MY8R0KiRQC8HD1h1g== X-CSE-MsgGUID: Jcbte1KZTHG15FYE26Z9dg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008412" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:33 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata , Sean Christopherson , Yan Zhao Subject: [PATCH 13/25] KVM: TDX: create/destroy VM structure Date: Mon, 12 Aug 2024 15:48:08 -0700 Message-Id: <20240812224820.34826-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement managing the TDX private KeyID to implement, create, destroy and free for a TDX guest. When creating at TDX guest, assign a TDX private KeyID for the TDX guest for memory encryption, and allocate pages for the guest. These are used for the Trust Domain Root (TDR) and Trust Domain Control Structure (TDCS). On destruction, free the allocated pages, and the KeyID. Before tearing down the private page tables, TDX requires the guest TD to be destroyed by reclaiming the KeyID. Do it at vm_destroy() kvm_x86_ops hook. Add a call for vm_free() at the end of kvm_arch_destroy_vm() because the per-VM TDR needs to be freed after the KeyID. Signed-off-by: Isaku Yamahata Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Kai Huang Signed-off-by: Kai Huang Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Fix unnecessary include re-ordering (Chao) - Fix the unpaired curly brackets (Chao) - Drop the tdx_mng_key_config_lock (Chao) - Drop unnecessary is_hkid_assigned() check (Chao) - Use KVM_GENERIC_PRIVATE_MEM and undo the removal of EXPERT (Binbin) - Drop the word typically from comments (Binbin) - Clarify comments for the need of global tdx_lock mutex (Kai) - Add function comments for tdx_clear_page() (Kai) - Clarify comments for tdx_clear_page() poisoned page (Kai) - Move and update comments for limitations of __tdx_reclaim_page() (Kai) - Drop comment related to "rare to contend" (Kai) - Drop comment related to TDR and target page (Tony) - Make code easier to read with line breaks between paragraphs (Kai) - Use cond_resched() retry (Kai) - Use for loop for retries (Tony) - Use switch to handle errors (Tony) - Drop loop for tdh_mng_key_config() (Tony) - Rename tdx_reclaim_control_page() td_page_pa to ctrl_page_pa (Kai) - Reorganize comments for tdx_reclaim_control_page() (Kai) - Use smp_func_do_phymem_cache_wb() naming to indicate SMP (Kai) - Use bool resume in smp_func_do_phymem_cache_wb() (Kai) - Add comment on retrying to smp_func_do_phymem_cache_wb() (Kai) - Move code change to tdx_module_setup() to __tdx_bringup() due to initializing is done in post hardware_setup() now and tdx_module_setup() is removed. Remove the code to use API to read global metadata but use exported 'struct tdx_sysinfo' pointer. - Replace 'tdx_info->nr_tdcs_pages' with a wrapper tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't have nr_tdcs_pages directly. - Replace tdx_info->max_vcpus_per_td with the new exported pointer in tdx_vm_init(). - Add comment to tdx_mmu_release_hkid() on KeyID allocated (Kai) - Update comments for tdx_mmu_release_hkid() for locking (Kai) - Clarify tdx_mmu_release_hkid() comments for freeing HKID (Kai) - Use KVM_BUG_ON() for SEAMCALLs in tdx_mmu_release_hkid() (Kai) - Use continue for loop in tdx_vm_free() (Kai) - Clarify comments in tdx_vm_free() for reclaiming TDCS (Kai) - Use KVM_BUG_ON() for tdx_vm_free() - Prettify format with line breaks in tdx_vm_free() (Tony) - Prettify formatting for __tdx_td_init() with line breaks (Kai) - Simplify comments for __tdx_td_init() locking (Kai) - Update patch description (Kai) --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 2 + arch/x86/kvm/vmx/main.c | 27 +- arch/x86/kvm/vmx/tdx.c | 482 ++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx.h | 3 + arch/x86/kvm/vmx/x86_ops.h | 6 + arch/x86/kvm/x86.c | 1 + 8 files changed, 519 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index bd7434fe5d37..12ee66bc9026 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -22,6 +22,7 @@ KVM_X86_OP(vcpu_after_set_cpuid) KVM_X86_OP_OPTIONAL(vm_enable_cap) KVM_X86_OP(vm_init) KVM_X86_OP_OPTIONAL(vm_destroy) +KVM_X86_OP_OPTIONAL(vm_free) KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate) KVM_X86_OP(vcpu_create) KVM_X86_OP(vcpu_free) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9d15f810f046..188cd684bffb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1651,6 +1651,7 @@ struct kvm_x86_ops { int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap); int (*vm_init)(struct kvm *kvm); void (*vm_destroy)(struct kvm *kvm); + void (*vm_free)(struct kvm *kvm); /* Create, but do not attach this VCPU */ int (*vcpu_precreate)(struct kvm *kvm); diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 472a1537b7a9..49f83564ed30 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -90,6 +90,8 @@ config KVM_SW_PROTECTED_VM config KVM_INTEL tristate "KVM for Intel (and compatible) processors support" depends on KVM && IA32_FEAT_CTL + select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST + select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST help Provides support for KVM on processors equipped with Intel's VT extensions, a.k.a. Virtual Machine Extensions (VMX). diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index cd53091ddaab..c079a5b057d8 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -50,6 +50,28 @@ static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) return -EINVAL; } +static int vt_vm_init(struct kvm *kvm) +{ + if (is_td(kvm)) + return tdx_vm_init(kvm); + + return vmx_vm_init(kvm); +} + +static void vt_vm_destroy(struct kvm *kvm) +{ + if (is_td(kvm)) + return tdx_mmu_release_hkid(kvm); + + vmx_vm_destroy(kvm); +} + +static void vt_vm_free(struct kvm *kvm) +{ + if (is_td(kvm)) + tdx_vm_free(kvm); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -82,8 +104,9 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vm_size = sizeof(struct kvm_vmx), .vm_enable_cap = vt_vm_enable_cap, - .vm_init = vmx_vm_init, - .vm_destroy = vmx_vm_destroy, + .vm_init = vt_vm_init, + .vm_destroy = vt_vm_destroy, + .vm_free = vt_vm_free, .vcpu_precreate = vmx_vcpu_precreate, .vcpu_create = vmx_vcpu_create, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 84cd9b4f90b5..a0954c3928e2 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -5,6 +5,7 @@ #include "x86_ops.h" #include "mmu.h" #include "tdx.h" +#include "tdx_ops.h" #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -19,14 +20,14 @@ static const struct tdx_sysinfo *tdx_sysinfo; /* TDX KeyID pool */ static DEFINE_IDA(tdx_guest_keyid_pool); -static int __used tdx_guest_keyid_alloc(void) +static int tdx_guest_keyid_alloc(void) { return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, GFP_KERNEL); } -static void __used tdx_guest_keyid_free(int keyid) +static void tdx_guest_keyid_free(int keyid) { ida_free(&tdx_guest_keyid_pool, keyid); } @@ -73,6 +74,305 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) return r; } +/* + * Some SEAMCALLs acquire the TDX module globally, and can fail with + * TDX_OPERAND_BUSY. Use a global mutex to serialize these SEAMCALLs. + */ +static DEFINE_MUTEX(tdx_lock); + +/* Maximum number of retries to attempt for SEAMCALLs. */ +#define TDX_SEAMCALL_RETRIES 10000 + +static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) +{ + return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); +} + +static inline bool is_td_created(struct kvm_tdx *kvm_tdx) +{ + return kvm_tdx->tdr_pa; +} + +static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) +{ + tdx_guest_keyid_free(kvm_tdx->hkid); + kvm_tdx->hkid = -1; +} + +static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) +{ + return kvm_tdx->hkid > 0; +} + +static void tdx_clear_page(unsigned long page_pa) +{ + const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); + void *page = __va(page_pa); + unsigned long i; + + /* + * The page could have been poisoned. MOVDIR64B also clears + * the poison bit so the kernel can safely use the page again. + */ + for (i = 0; i < PAGE_SIZE; i += 64) + movdir64b(page + i, zero_page); + /* + * MOVDIR64B store uses WC buffer. Prevent following memory reads + * from seeing potentially poisoned cache. + */ + __mb(); +} + +static u64 ____tdx_reclaim_page(hpa_t pa, u64 *rcx, u64 *rdx, u64 *r8) +{ + u64 err; + int i; + + for (i = TDX_SEAMCALL_RETRIES; i > 0; i--) { + err = tdh_phymem_page_reclaim(pa, rcx, rdx, r8); + switch (err) { + case TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX: + case TDX_OPERAND_BUSY | TDX_OPERAND_ID_TDR: + cond_resched(); + continue; + default: + goto out; + } + } + +out: + return err; +} + +/* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */ +static int __tdx_reclaim_page(hpa_t pa) +{ + u64 err, rcx, rdx, r8; + + err = ____tdx_reclaim_page(pa, &rcx, &rdx, &r8); + if (WARN_ON_ONCE(err)) { + pr_tdx_error_3(TDH_PHYMEM_PAGE_RECLAIM, err, rcx, rdx, r8); + return -EIO; + } + + return 0; +} + +static int tdx_reclaim_page(hpa_t pa) +{ + int r; + + r = __tdx_reclaim_page(pa); + if (!r) + tdx_clear_page(pa); + return r; +} + + +/* + * Reclaim the TD control page(s) which are crypto-protected by TDX guest's + * private KeyID. Assume the cache associated with the TDX private KeyID has + * been flushed. + */ +static void tdx_reclaim_control_page(unsigned long ctrl_page_pa) +{ + /* + * Leak the page if the kernel failed to reclaim the page. + * The kernel cannot use it safely anymore. + */ + if (tdx_reclaim_page(ctrl_page_pa)) + return; + + free_page((unsigned long)__va(ctrl_page_pa)); +} + +static void smp_func_do_phymem_cache_wb(void *unused) +{ + u64 err = 0; + bool resume; + int i; + + /* + * TDH.PHYMEM.CACHE.WB flushes caches associated with any TDX private + * KeyID on the package or core. The TDX module may not finish the + * cache flush but return TDX_INTERRUPTED_RESUMEABLE instead. The + * kernel should retry it until it returns success w/o rescheduling. + */ + for (i = TDX_SEAMCALL_RETRIES; i > 0; i--) { + resume = !!err; + err = tdh_phymem_cache_wb(resume); + switch (err) { + case TDX_INTERRUPTED_RESUMABLE: + continue; + case TDX_NO_HKID_READY_TO_WBCACHE: + err = TDX_SUCCESS; /* Already done by other thread */ + fallthrough; + default: + goto out; + } + } + +out: + if (WARN_ON_ONCE(err)) + pr_tdx_error(TDH_PHYMEM_CACHE_WB, err); +} + +void tdx_mmu_release_hkid(struct kvm *kvm) +{ + bool packages_allocated, targets_allocated; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + cpumask_var_t packages, targets; + u64 err; + int i; + + if (!is_hkid_assigned(kvm_tdx)) + return; + + /* KeyID has been allocated but guest is not yet configured */ + if (!is_td_created(kvm_tdx)) { + tdx_hkid_free(kvm_tdx); + return; + } + + packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); + targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL); + cpus_read_lock(); + + /* + * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock + * and can fail with TDX_OPERAND_BUSY when it fails to get the lock. + * Multiple TDX guests can be destroyed simultaneously. Take the + * mutex to prevent it from getting error. + */ + mutex_lock(&tdx_lock); + + /* + * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(), + * and TDH.MNG.KEY.FREEID() to free the HKID. When the HKID is assigned, + * we need to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). When + * the HKID is free, we need to use TDH.PHYMEM.PAGE.RECLAIM(). Get lock + * to not present transient state of HKID. + */ + write_lock(&kvm->mmu_lock); + + for_each_online_cpu(i) { + if (packages_allocated && + cpumask_test_and_set_cpu(topology_physical_package_id(i), + packages)) + continue; + if (targets_allocated) + cpumask_set_cpu(i, targets); + } + if (targets_allocated) + on_each_cpu_mask(targets, smp_func_do_phymem_cache_wb, NULL, true); + else + on_each_cpu(smp_func_do_phymem_cache_wb, NULL, true); + /* + * In the case of error in smp_func_do_phymem_cache_wb(), the following + * tdh_mng_key_freeid() will fail. + */ + err = tdh_mng_key_freeid(kvm_tdx); + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MNG_KEY_FREEID, err); + pr_err("tdh_mng_key_freeid() failed. HKID %d is leaked.\n", + kvm_tdx->hkid); + } else { + tdx_hkid_free(kvm_tdx); + } + + write_unlock(&kvm->mmu_lock); + mutex_unlock(&tdx_lock); + cpus_read_unlock(); + free_cpumask_var(targets); + free_cpumask_var(packages); +} + +static inline u8 tdx_sysinfo_nr_tdcs_pages(void) +{ + return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE; +} + +void tdx_vm_free(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + u64 err; + int i; + + /* + * tdx_mmu_release_hkid() failed to reclaim HKID. Something went wrong + * heavily with TDX module. Give up freeing TD pages. As the function + * already warned, don't warn it again. + */ + if (is_hkid_assigned(kvm_tdx)) + return; + + if (kvm_tdx->tdcs_pa) { + for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) { + if (!kvm_tdx->tdcs_pa[i]) + continue; + + tdx_reclaim_control_page(kvm_tdx->tdcs_pa[i]); + } + kfree(kvm_tdx->tdcs_pa); + kvm_tdx->tdcs_pa = NULL; + } + + if (!kvm_tdx->tdr_pa) + return; + + if (__tdx_reclaim_page(kvm_tdx->tdr_pa)) + return; + + /* + * Use a SEAMCALL to ask the TDX module to flush the cache based on the + * KeyID. TDX module may access TDR while operating on TD (Especially + * when it is reclaiming TDCS). + */ + err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(kvm_tdx->tdr_pa, + tdx_global_keyid)); + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); + return; + } + tdx_clear_page(kvm_tdx->tdr_pa); + + free_page((unsigned long)__va(kvm_tdx->tdr_pa)); + kvm_tdx->tdr_pa = 0; +} + +static int tdx_do_tdh_mng_key_config(void *param) +{ + struct kvm_tdx *kvm_tdx = param; + u64 err; + + /* TDX_RND_NO_ENTROPY related retries are handled by sc_retry() */ + err = tdh_mng_key_config(kvm_tdx); + + if (KVM_BUG_ON(err, &kvm_tdx->kvm)) { + pr_tdx_error(TDH_MNG_KEY_CONFIG, err); + return -EIO; + } + + return 0; +} + +static int __tdx_td_init(struct kvm *kvm); + +int tdx_vm_init(struct kvm *kvm) +{ + kvm->arch.has_private_mem = true; + + /* + * TDX has its own limit of the number of vcpus in addition to + * KVM_MAX_VCPUS. + */ + kvm->max_vcpus = min(kvm->max_vcpus, + tdx_sysinfo->td_conf.max_vcpus_per_td); + + /* Place holder for TDX specific logic. */ + return __tdx_td_init(kvm); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; @@ -119,6 +419,179 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) return ret; } +static int __tdx_td_init(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + cpumask_var_t packages; + unsigned long *tdcs_pa = NULL; + unsigned long tdr_pa = 0; + unsigned long va; + int ret, i; + u64 err; + + ret = tdx_guest_keyid_alloc(); + if (ret < 0) + return ret; + kvm_tdx->hkid = ret; + + va = __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) + goto free_hkid; + tdr_pa = __pa(va); + + tdcs_pa = kcalloc(tdx_sysinfo_nr_tdcs_pages(), sizeof(*kvm_tdx->tdcs_pa), + GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!tdcs_pa) + goto free_tdr; + + for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) { + va = __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) + goto free_tdcs; + tdcs_pa[i] = __pa(va); + } + + if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) { + ret = -ENOMEM; + goto free_tdcs; + } + + cpus_read_lock(); + + /* + * Need at least one CPU of the package to be online in order to + * program all packages for host key id. Check it. + */ + for_each_present_cpu(i) + cpumask_set_cpu(topology_physical_package_id(i), packages); + for_each_online_cpu(i) + cpumask_clear_cpu(topology_physical_package_id(i), packages); + if (!cpumask_empty(packages)) { + ret = -EIO; + /* + * Because it's hard for human operator to figure out the + * reason, warn it. + */ +#define MSG_ALLPKG "All packages need to have online CPU to create TD. Online CPU and retry.\n" + pr_warn_ratelimited(MSG_ALLPKG); + goto free_packages; + } + + /* + * TDH.MNG.CREATE tries to grab the global TDX module and fails + * with TDX_OPERAND_BUSY when it fails to grab. Take the global + * lock to prevent it from failure. + */ + mutex_lock(&tdx_lock); + kvm_tdx->tdr_pa = tdr_pa; + err = tdh_mng_create(kvm_tdx, kvm_tdx->hkid); + mutex_unlock(&tdx_lock); + + if (err == TDX_RND_NO_ENTROPY) { + kvm_tdx->tdr_pa = 0; + ret = -EAGAIN; + goto free_packages; + } + + if (WARN_ON_ONCE(err)) { + kvm_tdx->tdr_pa = 0; + pr_tdx_error(TDH_MNG_CREATE, err); + ret = -EIO; + goto free_packages; + } + + for_each_online_cpu(i) { + int pkg = topology_physical_package_id(i); + + if (cpumask_test_and_set_cpu(pkg, packages)) + continue; + + /* + * Program the memory controller in the package with an + * encryption key associated to a TDX private host key id + * assigned to this TDR. Concurrent operations on same memory + * controller results in TDX_OPERAND_BUSY. No locking needed + * beyond the cpus_read_lock() above as it serializes against + * hotplug and the first online CPU of the package is always + * used. We never have two CPUs in the same socket trying to + * program the key. + */ + ret = smp_call_on_cpu(i, tdx_do_tdh_mng_key_config, + kvm_tdx, true); + if (ret) + break; + } + cpus_read_unlock(); + free_cpumask_var(packages); + if (ret) { + i = 0; + goto teardown; + } + + kvm_tdx->tdcs_pa = tdcs_pa; + for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) { + err = tdh_mng_addcx(kvm_tdx, tdcs_pa[i]); + if (err == TDX_RND_NO_ENTROPY) { + /* Here it's hard to allow userspace to retry. */ + ret = -EBUSY; + goto teardown; + } + if (WARN_ON_ONCE(err)) { + pr_tdx_error(TDH_MNG_ADDCX, err); + ret = -EIO; + goto teardown; + } + } + + /* + * Note, TDH_MNG_INIT cannot be invoked here. TDH_MNG_INIT requires a dedicated + * ioctl() to define the configure CPUID values for the TD. + */ + return 0; + + /* + * The sequence for freeing resources from a partially initialized TD + * varies based on where in the initialization flow failure occurred. + * Simply use the full teardown and destroy, which naturally play nice + * with partial initialization. + */ +teardown: + for (; i < tdx_sysinfo_nr_tdcs_pages(); i++) { + if (tdcs_pa[i]) { + free_page((unsigned long)__va(tdcs_pa[i])); + tdcs_pa[i] = 0; + } + } + if (!kvm_tdx->tdcs_pa) + kfree(tdcs_pa); + tdx_mmu_release_hkid(kvm); + tdx_vm_free(kvm); + + return ret; + +free_packages: + cpus_read_unlock(); + free_cpumask_var(packages); + +free_tdcs: + for (i = 0; i < tdx_sysinfo_nr_tdcs_pages(); i++) { + if (tdcs_pa[i]) + free_page((unsigned long)__va(tdcs_pa[i])); + } + kfree(tdcs_pa); + kvm_tdx->tdcs_pa = NULL; + +free_tdr: + if (tdr_pa) + free_page((unsigned long)__va(tdr_pa)); + kvm_tdx->tdr_pa = 0; + +free_hkid: + tdx_hkid_free(kvm_tdx); + + return ret; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -274,6 +747,11 @@ static int __init __tdx_bringup(void) { int r; + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { + pr_warn("MOVDIR64B is reqiured for TDX\n"); + return -EOPNOTSUPP; + } + if (!enable_ept) { pr_err("Cannot enable TDX with EPT disabled.\n"); return -EINVAL; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 78f84c53a948..268959d0f74f 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -14,6 +14,9 @@ struct kvm_tdx { struct kvm kvm; unsigned long tdr_pa; + unsigned long *tdcs_pa; + + int hkid; }; struct vcpu_tdx { diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index c1bdf7d8fee3..96c74880bd36 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -120,12 +120,18 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu); #ifdef CONFIG_INTEL_TDX_HOST int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap); +int tdx_vm_init(struct kvm *kvm); +void tdx_mmu_release_hkid(struct kvm *kvm); +void tdx_vm_free(struct kvm *kvm); int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); #else static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { return -EINVAL; }; +static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } +static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} +static inline void tdx_vm_free(struct kvm *kvm) {} static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; } #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 751b3841c48f..ce2ef63f30f2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12852,6 +12852,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm); kvm_hv_destroy_vm(kvm); + static_call_cond(kvm_x86_vm_free)(kvm); } static void memslot_rmap_free(struct kvm_memory_slot *slot) From patchwork Mon Aug 12 22:48:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761111 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D40F19E811; Mon, 12 Aug 2024 22:48:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502923; cv=none; b=aoA+y6HMY/Kote1l3ogeI9l3r4xElL+SctFHMG2fTaYMNQvXecncY7zg9aoGHjm1q55yhrNahOH8Nj4uQPkziRvGvFmDqiIf9SVHqAijDA1I2LgOSjkYME2tEKTmlMxEpr+d0WRoiMSO2LLaKIyDOdW6asEs/INeZLNwA63ux+I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502923; c=relaxed/simple; bh=ODvXJRuT8dd+sZxLAOTQBkCIn9z8TdIvs/2HprMYjag=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=G6IIMWDqQOWhiKxCV9AAKb7jhUSkj1lup2dHBs+hl/D+mC4X+9+/BFOqchxl7lnvbgAYseCpHwoJnBs7BFZlIkUhHaNn4+ocpzSZvoQE8EO4zR4+qgjqB+Q1x8xIqsMh7bLVH1MDbWdIXuVglKbXiV/GGJFUPBmIJU72Lu0POMc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BKAGlP5d; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BKAGlP5d" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502921; x=1755038921; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ODvXJRuT8dd+sZxLAOTQBkCIn9z8TdIvs/2HprMYjag=; b=BKAGlP5dhASKbgV+PKxVJ+k2AQfhUcOV2d0UeESvS6YpkFqRE/73yqur XKdaxcd5b4idMkRIdl7AMPIleAB181Y4TD2gHXeOTFDoSuI8hMGLz8Q9b 2HkLieiyNqKOb8D/xtsYD51VARi+AfYcDWXAD8KlcXFqfnlMhDWkuyyPd eFJTe7IH+TU9RwTtZ/H3IC250lmD7YrBHSelCa+/ofnn683LvuBAxT2Cf of7rY/LSzeFabF45ik7NoCezuKSA3aFg+JLL7uIqw/WQnvsUxEO7vFeFr AoSAu9papu6v1pYtWCIVegYeDuBo/r5aG3ZaI5MGLMm1Ui+KXO+B1JdpH Q==; X-CSE-ConnectionGUID: IrSEYkzbQlmX7N7ea1MfXg== X-CSE-MsgGUID: 65O6QiU0QdmJEcmbDaNLVg== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041418" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041418" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:34 -0700 X-CSE-ConnectionGUID: PrcaDnW/SQOO2WQXCbFnMw== X-CSE-MsgGUID: oHQE6b+ZT6q5CR8fEDvgxQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008419" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:34 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata Subject: [PATCH 14/25] KVM: TDX: initialize VM with TDX specific parameters Date: Mon, 12 Aug 2024 15:48:09 -0700 Message-Id: <20240812224820.34826-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata After the crypto-protection key has been configured, TDX requires a VM-scope initialization as a step of creating the TDX guest. This "per-VM" TDX initialization does the global configurations/features that the TDX guest can support, such as guest's CPUIDs (emulated by the TDX module), the maximum number of vcpus etc. This "per-VM" TDX initialization must be done before any "vcpu-scope" TDX initialization. To match this better, require the KVM_TDX_INIT_VM IOCTL() to be done before KVM creates any vcpus. Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Drop TDX_TD_XFAM_CET and use XFEATURE_MASK_CET_{USER, KERNEL}. - Update for the wrapper functions for SEAMCALLs. (Sean) - Move gfn_shared_mask settings into this patch due to MMU section move - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) - Allow userspace configure xfam directly - Check if user sets non-configurable bits in CPUIDs - Rename error->hw_error - Move code change to tdx_module_setup() to __tdx_bringup() due to initializing is done in post hardware_setup() now and tdx_module_setup() is removed. Remove the code to use API to read global metadata but use exported 'struct tdx_sysinfo' pointer. - Replace 'tdx_info->nr_tdcs_pages' with a wrapper tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't have nr_tdcs_pages directly. - Replace tdx_info->max_vcpus_per_td with the new exported pointer in tdx_vm_init(). - Decrease the reserved space for struct kvm_tdx_init_vm (Kai) - Use sizeof_field() for struct kvm_tdx_init_vm cpuids (Tony) - No need to init init_vm, it gets copied over in tdx_td_init() (Chao) - Use kmalloc() instead of () kzalloc for init_vm in tdx_td_init() (Chao) - Add more line breaks to tdx_td_init() to make code easier to read (Tony) - Clarify patch description (Kai) v19: - Check NO_RBP_MOD of feature0 and set it - Update the comment for PT and CET v18: - remove the change of tools/arch/x86/include/uapi/asm/kvm.h - typo in comment. sha348 => sha384 - updated comment in setup_tdparams_xfam() - fix setup_tdparams_xfam() to use init_vm instead of td_params v16: - Removed AMX check as the KVM upstream supports AMX. - Added CET flag to guest supported xss --- arch/x86/include/uapi/asm/kvm.h | 24 ++++ arch/x86/kvm/cpuid.c | 7 + arch/x86/kvm/cpuid.h | 2 + arch/x86/kvm/vmx/tdx.c | 237 ++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx.h | 4 + arch/x86/kvm/vmx/tdx_ops.h | 12 ++ 6 files changed, 276 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 2e3caa5a58fd..95ae2d4a4697 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -929,6 +929,7 @@ struct kvm_hyperv_eventfd { /* Trust Domain eXtension sub-ioctl() commands. */ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, + KVM_TDX_INIT_VM, KVM_TDX_CMD_NR_MAX, }; @@ -970,4 +971,27 @@ struct kvm_tdx_capabilities { struct kvm_tdx_cpuid_config cpuid_configs[]; }; +struct kvm_tdx_init_vm { + __u64 attributes; + __u64 xfam; + __u64 mrconfigid[6]; /* sha384 digest */ + __u64 mrowner[6]; /* sha384 digest */ + __u64 mrownerconfig[6]; /* sha384 digest */ + + /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */ + __u64 reserved[12]; + + /* + * Call KVM_TDX_INIT_VM before vcpu creation, thus before + * KVM_SET_CPUID2. + * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the + * TDX module directly virtualizes those CPUIDs without VMM. The user + * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with + * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of + * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX + * module doesn't virtualize. + */ + struct kvm_cpuid2 cpuid; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 2617be544480..7310d8a8a503 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1487,6 +1487,13 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, return r; } +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2( + struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index) +{ + return cpuid_entry2_find(entries, nent, function, index); +} +EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2); + struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, u32 function, u32 index) { diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 41697cca354e..00570227e2ae 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -13,6 +13,8 @@ void kvm_set_cpu_caps(void); void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); void kvm_update_pv_runtime(struct kvm_vcpu *vcpu); +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries, + int nent, u32 function, u64 index); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, u32 function, u32 index); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index a0954c3928e2..a6c711715a4a 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -7,6 +7,7 @@ #include "tdx.h" #include "tdx_ops.h" + #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -356,12 +357,16 @@ static int tdx_do_tdh_mng_key_config(void *param) return 0; } -static int __tdx_td_init(struct kvm *kvm); - int tdx_vm_init(struct kvm *kvm) { kvm->arch.has_private_mem = true; + /* + * This function initializes only KVM software construct. It doesn't + * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc. + * It is handled by KVM_TDX_INIT_VM, __tdx_td_init(). + */ + /* * TDX has its own limit of the number of vcpus in addition to * KVM_MAX_VCPUS. @@ -369,8 +374,7 @@ int tdx_vm_init(struct kvm *kvm) kvm->max_vcpus = min(kvm->max_vcpus, tdx_sysinfo->td_conf.max_vcpus_per_td); - /* Place holder for TDX specific logic. */ - return __tdx_td_init(kvm); + return 0; } static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) @@ -419,7 +423,123 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) return ret; } -static int __tdx_td_init(struct kvm *kvm) +static int setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid, + struct td_params *td_params) +{ + const struct kvm_cpuid_entry2 *entry; + int max_pa = 36; + + entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, 0); + if (entry) + max_pa = entry->eax & 0xff; + + td_params->eptp_controls = VMX_EPTP_MT_WB; + /* + * No CPU supports 4-level && max_pa > 48. + * "5-level paging and 5-level EPT" section 4.1 4-level EPT + * "4-level EPT is limited to translating 48-bit guest-physical + * addresses." + * cpu_has_vmx_ept_5levels() check is just in case. + */ + if (!cpu_has_vmx_ept_5levels() && max_pa > 48) + return -EINVAL; + if (cpu_has_vmx_ept_5levels() && max_pa > 48) { + td_params->eptp_controls |= VMX_EPTP_PWL_5; + td_params->exec_controls |= TDX_EXEC_CONTROL_MAX_GPAW; + } else { + td_params->eptp_controls |= VMX_EPTP_PWL_4; + } + + return 0; +} + +static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid, + struct td_params *td_params) +{ + const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; + const struct kvm_tdx_cpuid_config *c; + const struct kvm_cpuid_entry2 *entry; + struct tdx_cpuid_value *value; + int i; + + /* + * td_params.cpuid_values: The number and the order of cpuid_value must + * be same to the one of struct tdsysinfo.{num_cpuid_config, cpuid_configs} + * It's assumed that td_params was zeroed. + */ + for (i = 0; i < td_conf->num_cpuid_config; i++) { + c = &kvm_tdx_caps->cpuid_configs[i]; + entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, + c->leaf, c->sub_leaf); + if (!entry) + continue; + + /* + * Check the user input value doesn't set any non-configurable + * bits reported by kvm_tdx_caps. + */ + if ((entry->eax & c->eax) != entry->eax || + (entry->ebx & c->ebx) != entry->ebx || + (entry->ecx & c->ecx) != entry->ecx || + (entry->edx & c->edx) != entry->edx) + return -EINVAL; + + value = &td_params->cpuid_values[i]; + value->eax = entry->eax; + value->ebx = entry->ebx; + value->ecx = entry->ecx; + value->edx = entry->edx; + } + + return 0; +} + +static int setup_tdparams(struct kvm *kvm, struct td_params *td_params, + struct kvm_tdx_init_vm *init_vm) +{ + const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; + struct kvm_cpuid2 *cpuid = &init_vm->cpuid; + int ret; + + if (kvm->created_vcpus) + return -EBUSY; + + if (init_vm->attributes & ~kvm_tdx_caps->supported_attrs) + return -EINVAL; + + if (init_vm->xfam & ~kvm_tdx_caps->supported_xfam) + return -EINVAL; + + td_params->max_vcpus = kvm->max_vcpus; + td_params->attributes = init_vm->attributes | td_conf->attributes_fixed1; + td_params->xfam = init_vm->xfam | td_conf->xfam_fixed1; + + /* td_params->exec_controls = TDX_CONTROL_FLAG_NO_RBP_MOD; */ + td_params->tsc_frequency = TDX_TSC_KHZ_TO_25MHZ(kvm->arch.default_tsc_khz); + + ret = setup_tdparams_eptp_controls(cpuid, td_params); + if (ret) + return ret; + + ret = setup_tdparams_cpuids(cpuid, td_params); + if (ret) + return ret; + +#define MEMCPY_SAME_SIZE(dst, src) \ + do { \ + BUILD_BUG_ON(sizeof(dst) != sizeof(src)); \ + memcpy((dst), (src), sizeof(dst)); \ + } while (0) + + MEMCPY_SAME_SIZE(td_params->mrconfigid, init_vm->mrconfigid); + MEMCPY_SAME_SIZE(td_params->mrowner, init_vm->mrowner); + MEMCPY_SAME_SIZE(td_params->mrownerconfig, init_vm->mrownerconfig); + + return 0; +} + +static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params, + u64 *seamcall_err) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); cpumask_var_t packages; @@ -427,8 +547,9 @@ static int __tdx_td_init(struct kvm *kvm) unsigned long tdr_pa = 0; unsigned long va; int ret, i; - u64 err; + u64 err, rcx; + *seamcall_err = 0; ret = tdx_guest_keyid_alloc(); if (ret < 0) return ret; @@ -543,10 +664,23 @@ static int __tdx_td_init(struct kvm *kvm) } } - /* - * Note, TDH_MNG_INIT cannot be invoked here. TDH_MNG_INIT requires a dedicated - * ioctl() to define the configure CPUID values for the TD. - */ + err = tdh_mng_init(kvm_tdx, __pa(td_params), &rcx); + if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_INVALID) { + /* + * Because a user gives operands, don't warn. + * Return a hint to the user because it's sometimes hard for the + * user to figure out which operand is invalid. SEAMCALL status + * code includes which operand caused invalid operand error. + */ + *seamcall_err = err; + ret = -EINVAL; + goto teardown; + } else if (WARN_ON_ONCE(err)) { + pr_tdx_error_1(TDH_MNG_INIT, err, rcx); + ret = -EIO; + goto teardown; + } + return 0; /* @@ -592,6 +726,86 @@ static int __tdx_td_init(struct kvm *kvm) return ret; } +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct kvm_tdx_init_vm *init_vm; + struct td_params *td_params = NULL; + int ret; + + BUILD_BUG_ON(sizeof(*init_vm) != 256 + sizeof_field(struct kvm_tdx_init_vm, cpuid)); + BUILD_BUG_ON(sizeof(struct td_params) != 1024); + + if (is_hkid_assigned(kvm_tdx)) + return -EINVAL; + + if (cmd->flags) + return -EINVAL; + + init_vm = kmalloc(sizeof(*init_vm) + + sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES, + GFP_KERNEL); + if (!init_vm) + return -ENOMEM; + + if (copy_from_user(init_vm, u64_to_user_ptr(cmd->data), sizeof(*init_vm))) { + ret = -EFAULT; + goto out; + } + + if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) { + ret = -E2BIG; + goto out; + } + + if (copy_from_user(init_vm->cpuid.entries, + u64_to_user_ptr(cmd->data) + sizeof(*init_vm), + flex_array_size(init_vm, cpuid.entries, init_vm->cpuid.nent))) { + ret = -EFAULT; + goto out; + } + + if (memchr_inv(init_vm->reserved, 0, sizeof(init_vm->reserved))) { + ret = -EINVAL; + goto out; + } + + if (init_vm->cpuid.padding) { + ret = -EINVAL; + goto out; + } + + td_params = kzalloc(sizeof(struct td_params), GFP_KERNEL); + if (!td_params) { + ret = -ENOMEM; + goto out; + } + + ret = setup_tdparams(kvm, td_params, init_vm); + if (ret) + goto out; + + ret = __tdx_td_init(kvm, td_params, &cmd->hw_error); + if (ret) + goto out; + + kvm_tdx->tsc_offset = td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFFSET); + kvm_tdx->attributes = td_params->attributes; + kvm_tdx->xfam = td_params->xfam; + + if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW) + kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51)); + else + kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47)); + +out: + /* kfree() accepts NULL. */ + kfree(init_vm); + kfree(td_params); + + return ret; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -613,6 +827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) case KVM_TDX_CAPABILITIES: r = tdx_get_capabilities(&tdx_cmd); break; + case KVM_TDX_INIT_VM: + r = tdx_td_init(kvm, &tdx_cmd); + break; default: r = -EINVAL; goto out; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 268959d0f74f..8912cb6d5bc2 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -16,7 +16,11 @@ struct kvm_tdx { unsigned long tdr_pa; unsigned long *tdcs_pa; + u64 attributes; + u64 xfam; int hkid; + + u64 tsc_offset; }; struct vcpu_tdx { diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index 3f64c871a3f2..0363d8544f42 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -399,4 +399,16 @@ static inline u64 tdh_vp_wr(struct vcpu_tdx *tdx, u64 field, u64 val, u64 mask) return seamcall(TDH_VP_WR, &in); } +static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 field) +{ + u64 err, data; + + err = tdh_mng_rd(kvm_tdx, TDCS_EXEC(field), &data); + if (unlikely(err)) { + pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err); + return 0; + } + return data; +} + #endif /* __KVM_X86_TDX_OPS_H */ From patchwork Mon Aug 12 22:48:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761112 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9836F19E83E; Mon, 12 Aug 2024 22:48:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502924; cv=none; b=s3MP8Uu/XttAaJC/HOBQha01X1VVOtl5YKz+krKiMls9FX+ti68k98Clh8WfpUjWx9xPOQaTPhYTIDpIxWv+plB0fTtXr0VP/baxcCX7+Ct0hny0AldStYIna7kbNLp34OaLbCvx+HTT3j+AGesdx/R3ZyVr7Q+KkZ9EW6DHkdU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502924; c=relaxed/simple; bh=bccW5CWtw/8alFuLFBB3Kt/y/VkostqPZ1hIk3f9fyU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RT++FfpeRlLuD7tIKSjWkR9J1aA+WVBLaElmXTsbMtEA0dP7ixfRRYcXqgsCuvW4Kjfu4Hcdfqi132ZFKerPxVcLD7QDeTcCaiuOa17TXsEtOOlGpmhN9skm6QeM0qGwbJONqgZMD78o0esKs6yXEQLd8i2RkNvcKoB6GCfgqyk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hnp3OL4m; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hnp3OL4m" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502921; x=1755038921; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bccW5CWtw/8alFuLFBB3Kt/y/VkostqPZ1hIk3f9fyU=; b=hnp3OL4mJbRbbu9wZWItWIDnSlevb1uCjA/2Q1P8GiVR/oXVk22Pfn6w kTcAPqdEC5rqfiPLn+UI+knIfTNpErte/XTK4eARKvLekmMMDAf8fnllJ XAmtJkDuBaK8Ymusw+pGxsfuKGF49fASB9KH0V9fcofR5ykh3Bz8PX0VM hdYo3JAoChz9iffrJiAQSPNhIkvwUOGanRudv9kctVRKXHigtkbvUtveZ dPXq5yEr0yJ66Qwqg3ZVu5eeXaIBbK1VUtSy1pTlqkhlrh4ZjFxECvwA7 mZKKF5Sf0jbz/1SInp1YuXghoHzWblOCIK5LN7ZCYvIv4EbSdcyKfehzV A==; X-CSE-ConnectionGUID: IbrtmWERTeSW5k0KI01kPQ== X-CSE-MsgGUID: NtuizGEeQwWohOKZhqWnOA== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041426" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041426" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:36 -0700 X-CSE-ConnectionGUID: CgxBVwi0R7mYqYFA/oa8zw== X-CSE-MsgGUID: yYjnmLgvS7GY5QwQSN7FWQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008422" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:34 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata Subject: [PATCH 15/25] KVM: TDX: Make pmu_intel.c ignore guest TD case Date: Mon, 12 Aug 2024 15:48:10 -0700 Message-Id: <20240812224820.34826-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Because TDX KVM doesn't support PMU yet (it's future work of TDX KVM support as another patch series) and pmu_intel.c touches vmx specific structure in vcpu initialization, as workaround add dummy structure to struct vcpu_tdx and pmu_intel.c can ignore TDX case. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Fix bisectability issues in headers (Kai) - Fix rebase error from v19 (Chao Gao) - Make helpers static (Tony Lindgren) - Improve whitespace (Tony Lindgren) v18: - Removed unnecessary change to vmx.c which caused kernel warning. --- arch/x86/kvm/vmx/pmu_intel.c | 45 +++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/pmu_intel.h | 28 ++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 8 +++++++ arch/x86/kvm/vmx/vmx.h | 34 +-------------------------- 4 files changed, 81 insertions(+), 34 deletions(-) create mode 100644 arch/x86/kvm/vmx/pmu_intel.h diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 83382a4d1d66..e4ae76d5d424 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -19,6 +19,7 @@ #include "lapic.h" #include "nested.h" #include "pmu.h" +#include "tdx.h" /* * Perf's "BASE" is wildly misleading, architectural PMUs use bits 31:16 of ECX @@ -34,6 +35,26 @@ #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0) +static struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_INTEL_TDX_HOST + if (is_td_vcpu(vcpu)) + return &to_tdx(vcpu)->lbr_desc; +#endif + + return &to_vmx(vcpu)->lbr_desc; +} + +static struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_INTEL_TDX_HOST + if (is_td_vcpu(vcpu)) + return &to_tdx(vcpu)->lbr_desc.records; +#endif + + return &to_vmx(vcpu)->lbr_desc.records; +} + static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data) { struct kvm_pmc *pmc; @@ -129,6 +150,22 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr) return get_gp_pmc(pmu, msr, MSR_IA32_PMC0); } +static bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return false; + + return cpuid_model_is_consistent(vcpu); +} + +bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return false; + + return !!vcpu_to_lbr_records(vcpu)->nr; +} + static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index) { struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu); @@ -194,6 +231,9 @@ static inline void intel_pmu_release_guest_lbr_event(struct kvm_vcpu *vcpu) { struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu); + if (is_td_vcpu(vcpu)) + return; + if (lbr_desc->event) { perf_event_release_kernel(lbr_desc->event); lbr_desc->event = NULL; @@ -235,6 +275,9 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu) PERF_SAMPLE_BRANCH_USER, }; + if (WARN_ON_ONCE(is_td_vcpu(vcpu))) + return 0; + if (unlikely(lbr_desc->event)) { __set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use); return 0; @@ -542,7 +585,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters); perf_capabilities = vcpu_get_perf_capabilities(vcpu); - if (cpuid_model_is_consistent(vcpu) && + if (intel_pmu_lbr_is_compatible(vcpu) && (perf_capabilities & PMU_CAP_LBR_FMT)) memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps)); else diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h new file mode 100644 index 000000000000..5620d0882cdc --- /dev/null +++ b/arch/x86/kvm/vmx/pmu_intel.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __KVM_X86_VMX_PMU_INTEL_H +#define __KVM_X86_VMX_PMU_INTEL_H + +#include + +bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu); +int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu); + +struct lbr_desc { + /* Basic info about guest LBR records. */ + struct x86_pmu_lbr records; + + /* + * Emulate LBR feature via passthrough LBR registers when the + * per-vcpu guest LBR event is scheduled on the current pcpu. + * + * The records may be inaccurate if the host reclaims the LBR. + */ + struct perf_event *event; + + /* True if LBRs are marked as not intercepted in the MSR bitmap */ + bool msr_passthrough; +}; + +extern struct x86_pmu_lbr vmx_lbr_caps; + +#endif /* __KVM_X86_VMX_PMU_INTEL_H */ diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 8912cb6d5bc2..ca948f26b755 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -10,6 +10,8 @@ void tdx_cleanup(void); extern bool enable_tdx; +#include "pmu_intel.h" + struct kvm_tdx { struct kvm kvm; @@ -27,6 +29,12 @@ struct vcpu_tdx { struct kvm_vcpu vcpu; unsigned long tdvpr_pa; + + /* + * Dummy to make pmu_intel not corrupt memory. + * TODO: Support PMU for TDX. Future work. + */ + struct lbr_desc lbr_desc; }; static inline bool is_td(struct kvm *kvm) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index d91c778affd4..07c64731eb37 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -11,6 +11,7 @@ #include "capabilities.h" #include "../kvm_cache_regs.h" +#include "pmu_intel.h" #include "vmcs.h" #include "vmx_ops.h" #include "../cpuid.h" @@ -94,24 +95,6 @@ union vmx_exit_reason { u32 full; }; -struct lbr_desc { - /* Basic info about guest LBR records. */ - struct x86_pmu_lbr records; - - /* - * Emulate LBR feature via passthrough LBR registers when the - * per-vcpu guest LBR event is scheduled on the current pcpu. - * - * The records may be inaccurate if the host reclaims the LBR. - */ - struct perf_event *event; - - /* True if LBRs are marked as not intercepted in the MSR bitmap */ - bool msr_passthrough; -}; - -extern struct x86_pmu_lbr vmx_lbr_caps; - /* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. @@ -665,21 +648,6 @@ static __always_inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } -static inline struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu) -{ - return &to_vmx(vcpu)->lbr_desc; -} - -static inline struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu) -{ - return &vcpu_to_lbr_desc(vcpu)->records; -} - -static inline bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu) -{ - return !!vcpu_to_lbr_records(vcpu)->nr; -} - void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu); int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu); void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu); From patchwork Mon Aug 12 22:48:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761114 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D142C19FA6B; Mon, 12 Aug 2024 22:48:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502925; cv=none; b=SB5udNhU0MxEfRF3ynHwhLVZVy2b7BRfeZ2VXiBJ5BlJig1E79IgdpGU2HT1TNKbwz3WJFWRx2WBYw7nPF+s3GtQl8DytYrWqA0NfxC85OGoP1ZkllYQiTwZsVFlZI5FYmT5ff0aKQyJWptgMzPxNqrnHxQ4w1jGPQBIKLbVmwg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502925; c=relaxed/simple; bh=IeFL19qbpFuEDdYy6LJ6GkBeYN7B0Q4mmhVbQUcOmjk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Df3ZEoOx4J46GMww4n935G+ue8pQ6IjUNoVUDcOEx0QIb9Ui08/vAwNQyxpcrWv9dFCu6BxV0hynMk9ULLiEpcl1pAegdjsi9kfiUZHmgyEkgtwyg3REjn66qd86MDbHlOhQRzfhTLKGD0JufYgXgYfphUOps0omRBC4hS1VFF0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=E4i7RFAc; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="E4i7RFAc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502924; x=1755038924; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IeFL19qbpFuEDdYy6LJ6GkBeYN7B0Q4mmhVbQUcOmjk=; b=E4i7RFAcmphhin37y5Uerzorh7wTEKNG9fSIC6KpasfjAtMFEdW0mKOs F7cxq4IpZ2lRr5myQHUjHetSLhlU/2U7Vbh2Ii9nzVS3GFyty+76tjLI9 jxCY0YxlTx0C/XTr6naJj8fB+z0m02oqqNoVBmBMha2JHyZVGsYq67xLI PChPJlP/k+GGsgHlCGtNRy/zINu9kHyuGZGjgDDtE0izSxi8llAiygRNQ NnqygFN00uaaXKfGXadqfT6GFdnKsXgkqs69txThYgBgF5gdxtNY9FtBH Z/szaiQ022RUhdd9ihCfAjiX4pnNzLQ4USluX5atX/SzMp9bcavzI0m1B g==; X-CSE-ConnectionGUID: ydsfM056QBy6+do1Ialsqg== X-CSE-MsgGUID: Eg7Hb8+/T8+xZ8rYvtaKIQ== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041433" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041433" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:36 -0700 X-CSE-ConnectionGUID: MkgkZBn6SzK+nwHB2NRhLw== X-CSE-MsgGUID: WPHKxZjvQVilvgUaiv2QUA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008426" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:35 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata , Binbin Wu Subject: [PATCH 16/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest Date: Mon, 12 Aug 2024 15:48:11 -0700 Message-Id: <20240812224820.34826-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Destroying TDX guest requires there's at least one cpu online for each package, because reclaiming the TDX KeyID of the guest (as part of the teardown process) requires to call some SEAMCALL (on any cpu) on all packages. Do not offline the last cpu of one package when there's any TDX guest running, otherwise KVM may not be able to teardown TDX guest resulting in leaking of TDX KeyID and other resources like TDX guest control structure pages. Add a tdx_arch_offline_cpu() and call it in kvm_offline_cpu() to provide a placeholder for TDX specific check. The default __weak version simply returns 0 (allow to offline) so other ARCHs are not impacted. Implement the x86 version, which calls a new 'kvm_x86_ops::offline_cpu()' callback. Implement the TDX version 'offline_cpu()' to prevent the cpu from going offline if it is the last cpu on the package. Co-developed-by: Kai Huang Signed-off-by: Kai Huang Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu --- uAPI breakout v1: - Remove nr_configured_keyid, use ida_is_empty() instead (Chao) - Change to use a simpler way to check whether the to-go-offline cpu is the last online cpu on the package. (Chao) - Improve the changelog (Kai) - Improve the patch title to call out "when there's TDX guest". (Kai) - Significantly reduce the code by using TDX's own CPUHP callback, instead of hooking into KVM's. - Update changelog to reflect the change. v18: - Added reviewed-by BinBin --- arch/x86/kvm/vmx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index a6c711715a4a..531e87983b90 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -921,6 +921,42 @@ static int tdx_online_cpu(unsigned int cpu) return r; } +static int tdx_offline_cpu(unsigned int cpu) +{ + int i; + + /* No TD is running. Allow any cpu to be offline. */ + if (ida_is_empty(&tdx_guest_keyid_pool)) + return 0; + + /* + * In order to reclaim TDX HKID, (i.e. when deleting guest TD), need to + * call TDH.PHYMEM.PAGE.WBINVD on all packages to program all memory + * controller with pconfig. If we have active TDX HKID, refuse to + * offline the last online cpu. + */ + for_each_online_cpu(i) { + /* + * Found another online cpu on the same package. + * Allow to offline. + */ + if (i != cpu && topology_physical_package_id(i) == + topology_physical_package_id(cpu)) + return 0; + } + + /* + * This is the last cpu of this package. Don't offline it. + * + * Because it's hard for human operator to understand the + * reason, warn it. + */ +#define MSG_ALLPKG_ONLINE \ + "TDX requires all packages to have an online CPU. Delete all TDs in order to offline all CPUs of a package.\n" + pr_warn_ratelimited(MSG_ALLPKG_ONLINE); + return -EBUSY; +} + static void __do_tdx_cleanup(void) { /* @@ -946,7 +982,7 @@ static int __init __do_tdx_bringup(void) */ r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, "kvm/cpu/tdx:online", - tdx_online_cpu, NULL); + tdx_online_cpu, tdx_offline_cpu); if (r < 0) return r; From patchwork Mon Aug 12 22:48:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761113 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B010D19DF70; Mon, 12 Aug 2024 22:48:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502924; cv=none; b=c1Kd7hcsiCIrMmcSCK2m7rUFJ9XXQ/efTRDQ1LxvPrikJtx0yeyYH22uHaSVW5cvDn6HMlI8wYjTNqWOTa34LWI+bEvIgp3AzdXmds8Z48J6k9QhKr6P8kOjmxcEgKh+hxV1bWGAdYmII3Qh2PA6eRiZaly75vWTHAJ7SO5fRSk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502924; c=relaxed/simple; bh=YGleTpb6mmW/tsezi3q9ZjZv54BNdmNHXPDTlZ3oEC0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SK9IMWHHVDZ1EZQkOtF41TKU1AMrI+DHku5L+3Ouq7XegIOg9wIRGBKFUPo+T/NCuW3cI8IkeGDzVg3s09xkcknqkyPlMIdN/IucZ9JeUogC0IibLvGJbgjT+U4iusl89n7BUoGQpOiN0KNsMKqLPqOit634LmauREenRsnALDE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WfYJl4zj; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WfYJl4zj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502922; x=1755038922; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YGleTpb6mmW/tsezi3q9ZjZv54BNdmNHXPDTlZ3oEC0=; b=WfYJl4zjT+470TEsU8XnH09yOcgv3UhI6tlDaf5Xh8V6pdOVmZo8bhAz oYw/i1L0WhqEdjYp0SyOm/ba5j48qUDwvXLj+KMluuX+AEBlFHUOX0NSO wZFC93ZovW/kfazRbKYLGpmqwHnMYvYNJBvT8PzYxMAU8WhLMAmd9FJy/ 4sV4B7Tbf80W0NPT/dU1ybIgFdX5nAHCgKex7TBNQX2Ke/4xTzXafl7kY z4h1MWwSbzTs2uR/I+hVWLR0XAYYL7c1IEFObwuTK1vDWYaZQue7UU94/ ESQgtdaQGXoKjXrzCxWTk2hwHbbq/sIYlWkjecE1iIAxocCSOp7u2FpGH Q==; X-CSE-ConnectionGUID: g5sR3GEVQe2TK6/dfk84EQ== X-CSE-MsgGUID: 0Qr6qR/lShy9x7fEWwJ8Bw== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041440" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041440" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:36 -0700 X-CSE-ConnectionGUID: IWA4nJpmTB+hmY4EPMaRDQ== X-CSE-MsgGUID: ocSyheWqTIeg27GFZdFlkQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008432" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:36 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata Subject: [PATCH 17/25] KVM: TDX: create/free TDX vcpu structure Date: Mon, 12 Aug 2024 15:48:12 -0700 Message-Id: <20240812224820.34826-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement vcpu related stubs for TDX for create, reset and free. For now, create only the features that do not require the TDX SEAMCALL. The TDX specific vcpu initialization will be handled by KVM_TDX_INIT_VCPU. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Dropped unnecessary WARN_ON_ONCE() in tdx_vcpu_create(). WARN_ON_ONCE(vcpu->arch.cpuid_entries), WARN_ON_ONCE(vcpu->arch.cpuid_nent) - Use kvm_tdx instead of to_kvm_tdx() in tdx_vcpu_create() (Chao) v19: - removed stale comment in tdx_vcpu_create(). v18: - update commit log to use create instead of allocate because the patch doesn't newly allocate memory for TDX vcpu. v16: - Add AMX support as the KVM upstream supports it. -- 2.46.0 --- arch/x86/kvm/vmx/main.c | 44 ++++++++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.c | 41 +++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 10 +++++++++ arch/x86/kvm/x86.c | 2 ++ 4 files changed, 93 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c079a5b057d8..d40de73d2bd3 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -72,6 +72,42 @@ static void vt_vm_free(struct kvm *kvm) tdx_vm_free(kvm); } +static int vt_vcpu_precreate(struct kvm *kvm) +{ + if (is_td(kvm)) + return 0; + + return vmx_vcpu_precreate(kvm); +} + +static int vt_vcpu_create(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return tdx_vcpu_create(vcpu); + + return vmx_vcpu_create(vcpu); +} + +static void vt_vcpu_free(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_vcpu_free(vcpu); + return; + } + + vmx_vcpu_free(vcpu); +} + +static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) +{ + if (is_td_vcpu(vcpu)) { + tdx_vcpu_reset(vcpu, init_event); + return; + } + + vmx_vcpu_reset(vcpu, init_event); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -108,10 +144,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vm_destroy = vt_vm_destroy, .vm_free = vt_vm_free, - .vcpu_precreate = vmx_vcpu_precreate, - .vcpu_create = vmx_vcpu_create, - .vcpu_free = vmx_vcpu_free, - .vcpu_reset = vmx_vcpu_reset, + .vcpu_precreate = vt_vcpu_precreate, + .vcpu_create = vt_vcpu_create, + .vcpu_free = vt_vcpu_free, + .vcpu_reset = vt_vcpu_reset, .prepare_switch_to_guest = vmx_prepare_switch_to_guest, .vcpu_load = vmx_vcpu_load, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 531e87983b90..18738cacbc87 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -377,6 +377,47 @@ int tdx_vm_init(struct kvm *kvm) return 0; } +int tdx_vcpu_create(struct kvm_vcpu *vcpu) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); + + /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ + if (!vcpu->arch.apic) + return -EINVAL; + + fpstate_set_confidential(&vcpu->arch.guest_fpu); + + vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX; + + vcpu->arch.cr0_guest_owned_bits = -1ul; + vcpu->arch.cr4_guest_owned_bits = -1ul; + + vcpu->arch.tsc_offset = kvm_tdx->tsc_offset; + vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset; + vcpu->arch.guest_state_protected = + !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG); + + if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) == XFEATURE_MASK_XTILE) + vcpu->arch.xfd_no_write_intercept = true; + + return 0; +} + +void tdx_vcpu_free(struct kvm_vcpu *vcpu) +{ + /* This is stub for now. More logic will come. */ +} + +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) +{ + + /* Ignore INIT silently because TDX doesn't support INIT event. */ + if (init_event) + return; + + /* This is stub for now. More logic will come here. */ +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 96c74880bd36..e1d3276b0f60 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -123,7 +123,12 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap); int tdx_vm_init(struct kvm *kvm); void tdx_mmu_release_hkid(struct kvm *kvm); void tdx_vm_free(struct kvm *kvm); + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); + +int tdx_vcpu_create(struct kvm_vcpu *vcpu); +void tdx_vcpu_free(struct kvm_vcpu *vcpu); +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); #else static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { @@ -132,7 +137,12 @@ static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} static inline void tdx_vm_free(struct kvm *kvm) {} + static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOPNOTSUPP; } + +static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } +static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} +static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ce2ef63f30f2..9cee326f5e7a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -488,6 +488,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info) kvm_recalculate_apic_map(vcpu->kvm); return 0; } +EXPORT_SYMBOL_GPL(kvm_set_apic_base); /* * Handle a fault on a hardware virtualization (VMX or SVM) instruction. @@ -12630,6 +12631,7 @@ bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu) { return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id; } +EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp); bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) { From patchwork Mon Aug 12 22:48:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761116 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EDB319FA90; Mon, 12 Aug 2024 22:48:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502927; cv=none; b=E7PKhKOBAnyNL+j/xn4f/4bOP4PfFZRiadg394OHGuF5Ddudiik9I5jTKf10OfxkqE84ict2wbZHLCvSJZOJHF/SYyyoeg2f1BCoLGywKTBOOa3Zl1O6yGkSYf4cWp1TEbPjOknAmeBGIE623y7E8sx4wxAgdC7E1Ed0UpvHk1E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502927; c=relaxed/simple; bh=sC6J6CI2XS86ggwRoFtulON3VAw5DfajrsLyb0a/qnc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jyhZw5uCdeU9EXzbR5DrQ62cK62FAe0hFEjjXVyPPRbHh9U4YZqKsvqcuXHIQGDm8KzQW0ptjFQSAIilh9xcQIlUnGPGqrtpzNFFtI0S+YGa4QWhxhxEuv+iP6YShQ7/z/vLxE1FdA9jaA4q/wmvYKr+b0P9P0HgsMrJ7NmXFLE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WZsjHl73; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WZsjHl73" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502924; x=1755038924; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sC6J6CI2XS86ggwRoFtulON3VAw5DfajrsLyb0a/qnc=; b=WZsjHl73FAIzTOAjaNoBpWi88cYcP+Wkw6cb1stIv0l+rL4ezRZFM/dR gXjBakM5TIjUJDwP3vZD+UyHvHQ54HUaOpAB5WX49Gk1J988gl6Amw3ws f15p5TxyRYHXMQvGES6vZm9vHllYwXTyNgjqzn6DzhKffrZ0xeqezpnJ5 3KFd4woR+Id9Tu58mPJlzbl9ubpoUBv6klCMx5sr7nQnZNN3ILyMp2T2Y uSp7M67favx3qucNzEC+GZSXaigknK1Oo5+WDIhH7MQPRLzggU3WQhCST 6Ly/4Mpzbg8lAj1kjfV5xl4EoSk6zr6uAHlowVkPKUbsMWuEibuc3a97C Q==; X-CSE-ConnectionGUID: EMSi8XcNTxewdw1a1SUmBg== X-CSE-MsgGUID: kRl4hp2hTTyrJAxInF1wyQ== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041451" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041451" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:37 -0700 X-CSE-ConnectionGUID: br9ZI5RnT5KhUNLOyu/gcA== X-CSE-MsgGUID: sTMfmqavSh6vUOVSGzbHYg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008435" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:36 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com, Isaku Yamahata , Sean Christopherson Subject: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization Date: Mon, 12 Aug 2024 15:48:13 -0700 Message-Id: <20240812224820.34826-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TD guest vcpu needs TDX specific initialization before running. Repurpose KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command KVM_TDX_INIT_VCPU, and implement the callback for it. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - Support FEATURES0_TOPOLOGY_ENUM - Update for the wrapper functions for SEAMCALLs. (Sean) - Remove WARN_ON_ONCE() in tdx_vcpu_free(). WARN_ON_ONCE(vcpu->cpu != -1), WARN_ON_ONCE(tdx->tdvpx_pa), WARN_ON_ONCE(tdx->tdvpr_pa) - Remove KVM_BUG_ON() in tdx_vcpu_reset(). - Remove duplicate "tdx->tdvpr_pa=" lines - Rename tdvpx to tdcx as it is confusing, follow spec change for same reason (Isaku) - Updates from seamcall overhaul (Kai) - Rename error->hw_error - Change using tdx_info to using exported 'tdx_sysinfo' pointer in tdx_td_vcpu_init(). - Remove code to the old (non-existing) tdx_module_setup(). - Use a new wrapper tdx_sysinfo_nr_tdcx_pages() to replace tdx_info->nr_tdcx_pages. - Combine the two for loops in tdx_td_vcpu_init() (Chao) - Add more line breaks into tdx_td_vcpu_init() for readability (Tony) - Drop Drop local tdcx_pa in tdx_td_vcpu_init() (Rick) - Drop Drop local tdvpr_pa in tdx_td_vcpu_init() (Rick) v18: - Use tdh_sys_rd() instead of struct tdsysinfo_struct. - Rename tdx_reclaim_td_page() => tdx_reclaim_control_page() - Remove the change of tools/arch/x86/include/uapi/asm/kvm.h. --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/main.c | 9 ++ arch/x86/kvm/vmx/tdx.c | 193 ++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx.h | 6 + arch/x86/kvm/vmx/tdx_arch.h | 2 + arch/x86/kvm/vmx/x86_ops.h | 4 + arch/x86/kvm/x86.c | 6 + 9 files changed, 221 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 12ee66bc9026..5dd7955376e3 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -126,6 +126,7 @@ KVM_X86_OP(enable_smi_window) #endif KVM_X86_OP_OPTIONAL(dev_get_attr) KVM_X86_OP(mem_enc_ioctl) +KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl) KVM_X86_OP_OPTIONAL(mem_enc_register_region) KVM_X86_OP_OPTIONAL(mem_enc_unregister_region) KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 188cd684bffb..e3094c843556 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1829,6 +1829,7 @@ struct kvm_x86_ops { int (*dev_get_attr)(u32 group, u64 attr, u64 *val); int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp); + int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp); int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp); int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp); int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd); diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 95ae2d4a4697..b4f12997052d 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -930,6 +930,7 @@ struct kvm_hyperv_eventfd { enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, + KVM_TDX_INIT_VCPU, KVM_TDX_CMD_NR_MAX, }; diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d40de73d2bd3..e34cb476cc78 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -116,6 +116,14 @@ static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) return tdx_vm_ioctl(kvm, argp); } +static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp) +{ + if (!is_td_vcpu(vcpu)) + return -EINVAL; + + return tdx_vcpu_ioctl(vcpu, argp); +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -268,6 +276,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .get_untagged_addr = vmx_get_untagged_addr, .mem_enc_ioctl = vt_mem_enc_ioctl, + .vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl, }; struct kvm_x86_init_ops vt_init_ops __initdata = { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 18738cacbc87..ba7b436fae86 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -89,6 +89,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); } +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx) +{ + return tdx->td_vcpu_created; +} + static inline bool is_td_created(struct kvm_tdx *kvm_tdx) { return kvm_tdx->tdr_pa; @@ -105,6 +110,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) return kvm_tdx->hkid > 0; } +static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx) +{ + return kvm_tdx->finalized; +} + static void tdx_clear_page(unsigned long page_pa) { const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); @@ -293,6 +303,15 @@ static inline u8 tdx_sysinfo_nr_tdcs_pages(void) return tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZE; } +static inline u8 tdx_sysinfo_nr_tdcx_pages(void) +{ + /* + * TDVPS = TDVPR(4K page) + TDCX(multiple 4K pages). + * -1 for TDVPR. + */ + return tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE_SIZE - 1; +} + void tdx_vm_free(struct kvm *kvm) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); @@ -405,7 +424,29 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) void tdx_vcpu_free(struct kvm_vcpu *vcpu) { - /* This is stub for now. More logic will come. */ + struct vcpu_tdx *tdx = to_tdx(vcpu); + int i; + + /* + * This methods can be called when vcpu allocation/initialization + * failed. So it's possible that hkid, tdvpx and tdvpr are not assigned + * yet. + */ + if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm))) + return; + + if (tdx->tdcx_pa) { + for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) { + if (tdx->tdcx_pa[i]) + tdx_reclaim_control_page(tdx->tdcx_pa[i]); + } + kfree(tdx->tdcx_pa); + tdx->tdcx_pa = NULL; + } + if (tdx->tdvpr_pa) { + tdx_reclaim_control_page(tdx->tdvpr_pa); + tdx->tdvpr_pa = 0; + } } void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) @@ -414,8 +455,13 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) /* Ignore INIT silently because TDX doesn't support INIT event. */ if (init_event) return; + if (is_td_vcpu_created(to_tdx(vcpu))) + return; - /* This is stub for now. More logic will come here. */ + /* + * Don't update mp_state to runnable because more initialization + * is needed by TDX_VCPU_INIT. + */ } static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) @@ -884,6 +930,149 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) return r; } +/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */ +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) +{ + const struct tdx_sysinfo_module_info *modinfo = &tdx_sysinfo->module_info; + struct vcpu_tdx *tdx = to_tdx(vcpu); + unsigned long va; + int ret, i; + u64 err; + + if (is_td_vcpu_created(tdx)) + return -EINVAL; + + /* + * vcpu_free method frees allocated pages. Avoid partial setup so + * that the method can't handle it. + */ + va = __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) + return -ENOMEM; + tdx->tdvpr_pa = __pa(va); + + tdx->tdcx_pa = kcalloc(tdx_sysinfo_nr_tdcx_pages(), sizeof(*tdx->tdcx_pa), + GFP_KERNEL_ACCOUNT); + if (!tdx->tdcx_pa) { + ret = -ENOMEM; + goto free_tdvpr; + } + + err = tdh_vp_create(tdx); + if (KVM_BUG_ON(err, vcpu->kvm)) { + tdx->tdvpr_pa = 0; + ret = -EIO; + pr_tdx_error(TDH_VP_CREATE, err); + goto free_tdvpx; + } + + for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) { + va = __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) { + ret = -ENOMEM; + goto free_tdvpx; + } + tdx->tdcx_pa[i] = __pa(va); + + err = tdh_vp_addcx(tdx, tdx->tdcx_pa[i]); + if (KVM_BUG_ON(err, vcpu->kvm)) { + pr_tdx_error(TDH_VP_ADDCX, err); + /* vcpu_free method frees TDCX and TDR donated to TDX */ + return -EIO; + } + } + + if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM) + err = tdh_vp_init_apicid(tdx, vcpu_rcx, vcpu->vcpu_id); + else + err = tdh_vp_init(tdx, vcpu_rcx); + + if (KVM_BUG_ON(err, vcpu->kvm)) { + pr_tdx_error(TDH_VP_INIT, err); + return -EIO; + } + + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; + tdx->td_vcpu_created = true; + + return 0; + +free_tdvpx: + for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) { + if (tdx->tdcx_pa[i]) + free_page((unsigned long)__va(tdx->tdcx_pa[i])); + tdx->tdcx_pa[i] = 0; + } + kfree(tdx->tdcx_pa); + tdx->tdcx_pa = NULL; + +free_tdvpr: + if (tdx->tdvpr_pa) + free_page((unsigned long)__va(tdx->tdvpr_pa)); + tdx->tdvpr_pa = 0; + + return ret; +} + +static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct msr_data apic_base_msr; + struct vcpu_tdx *tdx = to_tdx(vcpu); + int ret; + + if (cmd->flags) + return -EINVAL; + if (tdx->initialized) + return -EINVAL; + + /* + * As TDX requires X2APIC, set local apic mode to X2APIC. User space + * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by + * KVM_SET_CPUID2. Otherwise kvm_set_apic_base() will fail. + */ + apic_base_msr = (struct msr_data) { + .host_initiated = true, + .data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC | + (kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0), + }; + if (kvm_set_apic_base(vcpu, &apic_base_msr)) + return -EINVAL; + + ret = tdx_td_vcpu_init(vcpu, (u64)cmd->data); + if (ret) + return ret; + + tdx->initialized = true; + return 0; +} + +int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); + struct kvm_tdx_cmd cmd; + int ret; + + if (!is_hkid_assigned(kvm_tdx) || is_td_finalized(kvm_tdx)) + return -EINVAL; + + if (copy_from_user(&cmd, argp, sizeof(cmd))) + return -EFAULT; + + if (cmd.hw_error) + return -EINVAL; + + switch (cmd.id) { + case KVM_TDX_INIT_VCPU: + ret = tdx_vcpu_init(vcpu, &cmd); + break; + default: + ret = -EINVAL; + break; + } + + return ret; +} + #define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE) static int __init setup_kvm_tdx_caps(void) diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index ca948f26b755..8349b542836e 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -22,6 +22,8 @@ struct kvm_tdx { u64 xfam; int hkid; + bool finalized; + u64 tsc_offset; }; @@ -29,6 +31,10 @@ struct vcpu_tdx { struct kvm_vcpu vcpu; unsigned long tdvpr_pa; + unsigned long *tdcx_pa; + bool td_vcpu_created; + + bool initialized; /* * Dummy to make pmu_intel not corrupt memory. diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 413619dd92ef..d2d7f9cab740 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -155,4 +155,6 @@ struct td_params { #define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) #define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) +#define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) + #endif /* __KVM_X86_TDX_ARCH_H */ diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index e1d3276b0f60..55fd17fbfd19 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -129,6 +129,8 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); + +int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); #else static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { @@ -143,6 +145,8 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOP static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} + +static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9cee326f5e7a..3d43fa84c2b4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6314,6 +6314,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp, case KVM_SET_DEVICE_ATTR: r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); break; + case KVM_MEMORY_ENCRYPT_OP: + r = -ENOTTY; + if (!kvm_x86_ops.vcpu_mem_enc_ioctl) + goto out; + r = kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp); + break; default: r = -EINVAL; } From patchwork Mon Aug 12 22:48:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761117 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D00821A01B8; Mon, 12 Aug 2024 22:48:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502927; cv=none; b=C6Jxpa6fRqOdYHn17vlxxQ7ud2arHONafor/phZLZYYcxq1dQcs9yQGR7AgxX69WFEOStuYbjmukbO5wKpupNTcKPCW62Ksr58t1w8tlbxVoFLBNz9My7s5mn//lzVu3BtBH/htKlUEPe8hWMiOYLsJaIT3HuP1GDRTLLaqKJ5k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502927; c=relaxed/simple; bh=dzTCJEwxQIcUPZxgaA3bFSKSbYZ9P5mbXKvGZcycYO4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZpqeDx3YCwWgzpzdBLT9/t3+xVkJ0WUj7PHNK3byzWMXo2zzhzdjDWQsntBFEyVDGKpB26A3zOZvw1enc/NPfgTNGCvBt3tI784r7QyWwm1dDMnuiQ2Xef1ZGjI67QD8Jg5+Gv2c8KdJ3eWDsrN88qImTIrNT+JrD6dhS4y3KxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HVOCe3wl; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HVOCe3wl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502926; x=1755038926; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dzTCJEwxQIcUPZxgaA3bFSKSbYZ9P5mbXKvGZcycYO4=; b=HVOCe3wldGQUcM+7opWdnsLlgoJFu4NRdL7c5GfwLiZeBVLlwu8e0Wqj 7AwB/YjGWnfzsYhHHOMapdRYS+GHM5qjIRdhBQoGRUIq4/dhTdpFpxKQL o0naTtGyP52Rxe7aT1zsh5hA9rI2ZgpCDEjsw0ZUqqpAdC9wUAAYqRVqa 3FHCZLvymOedSnYzy2qIb6TMX7MdLOktYOc0TTLEWQS8Ab4F0RF2oX0jG 4CFNqDmVGN0dSmN+FVvI0jKTlaDFV5/viSP/bPZE/18WG0sd/afjJqwBB ymrbKwYhOl1Rgybg1Vk33FQvfrpgG0uv58KT9SQpAZf55FvOKXs/b8SRF w==; X-CSE-ConnectionGUID: WJ8JmIR5Q0u1coycXNBaEQ== X-CSE-MsgGUID: L70OmvONQ5GuOT1UEtoKIQ== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041462" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041462" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:37 -0700 X-CSE-ConnectionGUID: vJReIfazTm2eq16RzlEVXw== X-CSE-MsgGUID: uiUN2SLKQBmKeki1qsp0vQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008439" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:37 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 19/25] KVM: X86: Introduce kvm_get_supported_cpuid_internal() Date: Mon, 12 Aug 2024 15:48:14 -0700 Message-Id: <20240812224820.34826-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Xiaoyao Li TDX module reports a set of configurable CPUIDs. Directly report these bits to userspace and allow them to be set is not good nor right. If a bit is unknown/unsupported to KVM, it should be reported as unsupported thus inconfigurable to userspace. Introduce and export kvm_get_supported_cpuid_internal() for TDX to get the supported CPUID list of KVM. So that TDX can use it to cap the configurable CPUID list reported by TDX module. Signed-off-by: Xiaoyao Li Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - New patch --- arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++ arch/x86/kvm/cpuid.h | 2 ++ 2 files changed, 27 insertions(+) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 7310d8a8a503..499479c769d8 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1487,6 +1487,31 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, return r; } +int kvm_get_supported_cpuid_internal(struct kvm_cpuid2 *cpuid, const u32 *funcs, + int funcs_len) +{ + struct kvm_cpuid_array array = { + .nent = 0, + }; + int i, r; + + if (cpuid->nent < 1 || cpuid->nent > KVM_MAX_CPUID_ENTRIES) + return -E2BIG; + + array.maxnent = cpuid->nent; + array.entries = cpuid->entries; + + for (i = 0; i < funcs_len; i++) { + r = get_cpuid_func(&array, funcs[i], KVM_GET_SUPPORTED_CPUID); + if (r) + return r; + } + + cpuid->nent = array.nent; + return 0; +} +EXPORT_SYMBOL_GPL(kvm_get_supported_cpuid_internal); + struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2( struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index) { diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 00570227e2ae..5cc13d1b7991 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -13,6 +13,8 @@ void kvm_set_cpu_caps(void); void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); void kvm_update_pv_runtime(struct kvm_vcpu *vcpu); +int kvm_get_supported_cpuid_internal(struct kvm_cpuid2 *cpuid, const u32 *funcs, + int func_len); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, From patchwork Mon Aug 12 22:48:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761115 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEB8D1A00D7; Mon, 12 Aug 2024 22:48:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502926; cv=none; b=CONCFm7XNHza5lE/Lv+cSrMpRkv2SPfbSaKquTHf/gbeNJTFs1I7PVlA3/+hb1GiC9htN3ndlFq7EWc5XpbiHVnWgdIXab4ahbjxNjeurdEtiy4n02ygJE7dpTUdPyjpNUYO8HAKX2Eu3BZjh/wcj2kta/lVF28ohciWgDByeuo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502926; c=relaxed/simple; bh=1FN9h/71MwOxJhAZ/a05tDf76AvluKRe9GPevp3JzsQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tknlVWF9nUV/HAply6oY4X8mU21Dgs1h3/YP7Y5nWV4RzjAX2Kj5uj6wmFxRw/JLd0uicPhjwSuv3zh8f+lJJTb1UZR0QImLP32mrNmJyX69P+JB8HDt4Fc8ovMCR0ftQVcTtf527D0aEw+j9dx3NWfvkw9grPrLq6TTbBdujAU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZJSVWoMU; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZJSVWoMU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502925; x=1755038925; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1FN9h/71MwOxJhAZ/a05tDf76AvluKRe9GPevp3JzsQ=; b=ZJSVWoMUs893Ankbqd+4Y80gcophYZ0Z6twnYZueyX/+Vb+FnHp0wCJA KOnifz73FoCTv3jPz9soScD7HB485lyfrYaOzjp8dsWJz+4HBkg0FM6eg T0E6ppAEmtYnOYS6Cz/7FfV4hjCWA9helQv+lo+CiBP+kutLOdeI/Em7x SDSXyltf6aSJ6+vs/EjNAchpw1n8ehHU8zzsZCBrw/Dm93wLQoaiK2yC3 p7tawHRZozzAZmtJ15Ia7AeuuJgQatBssGIEOHrKaIaL9+PP34Qgf4fCm iJ2A3aU5016fX0m0QoM06M53ZMIeuiE0L2phIDURxKcKHsVVCHH41ri/y w==; X-CSE-ConnectionGUID: Vnd5zBH1TramLTMmtLut6g== X-CSE-MsgGUID: Ml12hal/TtCb6oTEQPkYcw== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041470" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041470" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:38 -0700 X-CSE-ConnectionGUID: HmcPPDCuTFW8zo7niScD5g== X-CSE-MsgGUID: oLMK0bC9RNSd2YSILDXI1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008442" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:38 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 20/25] KVM: X86: Introduce tdx_get_kvm_supported_cpuid() Date: Mon, 12 Aug 2024 15:48:15 -0700 Message-Id: <20240812224820.34826-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Two future TDX ioctl's will want to filter output by supported CPUID Add a helper in TDX code instead of using kvm_get_supported_cpuid_internal() directly for two reasons: 1. Logic around which CPUID leaf ranges to query would need to be duplicated. 2. Future patches will add TDX specific fixups to the CPUID data provided by kvm_get_supported_cpuid_internal(). Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - New patch --- arch/x86/kvm/vmx/tdx.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index ba7b436fae86..b2ed031ac0d6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1014,6 +1014,30 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) return ret; } +static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid) +{ + int r; + static const u32 funcs[] = { + 0, 0x80000000, KVM_CPUID_SIGNATURE, + }; + + *cpuid = kzalloc(sizeof(struct kvm_cpuid2) + + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES, + GFP_KERNEL); + if (!*cpuid) + return -ENOMEM; + (*cpuid)->nent = KVM_MAX_CPUID_ENTRIES; + r = kvm_get_supported_cpuid_internal(*cpuid, funcs, ARRAY_SIZE(funcs)); + if (r) + goto err; + + return 0; +err: + kfree(*cpuid); + *cpuid = NULL; + return r; +} + static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) { struct msr_data apic_base_msr; From patchwork Mon Aug 12 22:48:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761118 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E86781A0708; Mon, 12 Aug 2024 22:48:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502929; cv=none; b=Ua4QLOj81pLcjX1Oa8fwyZyxTKERUdar96tUmZ6U6JUZ83x3PtAiy6OSMOOfgvQphTSP8leOXH/a32mJMn5MuFjmTEQtG5kWcrniiMyxN/GaIcmj24IjlFfJ2hFI9tuNoj5gx2dgLj0EgT/05rnV7BxR4LjAS2JFUW38HDHSY3Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502929; c=relaxed/simple; bh=nH9Uo926w4Mj3TTUttCachqfcxQlFrfyfdzDRiO+W3E=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZMVr+SOeniw0c9kmhiHUvWdDVI150PSjbDxjEPlrtg624lcOtf7wIN6yqJlwP6pmkq7P8ScbogqNz1Gs3RCd4UUupBN/R7wrndoi7Q4lDfdqMrBTYcV21Hwv6Kw6du04PqfZh+eWjpuYDe4R0cUBGYd+PQssn3RC7ViFzPNfxjk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lh9hA0g7; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lh9hA0g7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502927; x=1755038927; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nH9Uo926w4Mj3TTUttCachqfcxQlFrfyfdzDRiO+W3E=; b=lh9hA0g7BLvvdkca/s7KrHfqno4GN3eZZvJ8nCTkNPvlH1kVOCXHAjPA nrig5n6epzIwR+5xeu1CvRAefyuwIPWRPhtpF1MgFhvlmPq+/BVIWwM55 GVw3+b31wuBl0Yv/7baON+N2cSUN6nz9dUvUFzNZ6yGeBskpazM2QdC0P daKQgNt0xU1hWX6+fbSXoWx8dQxCd/o4kwpIjhjd4T1cCyxKzUXOCZUAB J0tLMtdFQhYKmL3uwdmmNXWjf1zPC0uKPLE2Q4aKZaMrU3D2kRrTcdsIJ Phzlq4esnV0ucgYpuwlrlv7LeyhjvWCd5dyxx3ED7NzTduNDRSWSNs9uE Q==; X-CSE-ConnectionGUID: io8TwwQFRA2vnff05e8EVA== X-CSE-MsgGUID: ohUYAMx5S4ysebOGU+QfTg== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041477" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041477" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:38 -0700 X-CSE-ConnectionGUID: 2CHmXOUPToiOKCS81PPekw== X-CSE-MsgGUID: /hCAL68gSKWYGSTnTF6PZQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008448" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:38 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 21/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Date: Mon, 12 Aug 2024 15:48:16 -0700 Message-Id: <20240812224820.34826-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Xiaoyao Li Implement an IOCTL to allow userspace to read the CPUID bit values for a configured TD. The TDX module doesn't provide the ability to set all CPUID bits. Instead some are configured indirectly, or have fixed values. But it does allow for the final resulting CPUID bits to be read. This information will be useful for userspace to understand the configuration of the TD, and set KVM's copy via KVM_SET_CPUID2. To prevent userspace from starting to use features that might not have KVM support yet, filter the reported values by KVM's support CPUID bits. Signed-off-by: Xiaoyao Li Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - New patch --- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/tdx.c | 131 ++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 5 ++ arch/x86/kvm/vmx/tdx_arch.h | 5 ++ arch/x86/kvm/vmx/tdx_errno.h | 1 + 5 files changed, 143 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index b4f12997052d..39636be5c891 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, }; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b2ed031ac0d6..fe2bbc2ced41 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -813,6 +813,76 @@ static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params, return ret; } +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id, + u64 *data) +{ + u64 err; + + err = tdh_mng_rd(tdx, field_id, data); + + return err; +} + +#define TDX_MD_UNREADABLE_LEAF_MASK GENMASK(30, 7) +#define TDX_MD_UNREADABLE_SUBLEAF_MASK GENMASK(31, 7) + +static int tdx_mask_cpuid(struct kvm_tdx *tdx, struct kvm_cpuid_entry2 *entry) +{ + u64 field_id = TD_MD_FIELD_ID_CPUID_VALUES; + u64 ebx_eax, edx_ecx; + u64 err = 0; + + if (entry->function & TDX_MD_UNREADABLE_LEAF_MASK || + entry->index & TDX_MD_UNREADABLE_SUBLEAF_MASK) + return -EINVAL; + + /* + * bit 23:17, REVSERVED: reserved, must be 0; + * bit 16, LEAF_31: leaf number bit 31; + * bit 15:9, LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are + * implicitly 0; + * bit 8, SUBLEAF_NA: sub-leaf not applicable flag; + * bit 7:1, SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1, + * the SUBLEAF_6_0 is all-1. + * sub-leaf bits 31:7 are implicitly 0; + * bit 0, ELEMENT_I: Element index within field; + */ + field_id |= ((entry->function & 0x80000000) ? 1 : 0) << 16; + field_id |= (entry->function & 0x7f) << 9; + if (entry->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) + field_id |= (entry->index & 0x7f) << 1; + else + field_id |= 0x1fe; + + err = tdx_td_metadata_field_read(tdx, field_id, &ebx_eax); + if (err) //TODO check for specific errors + goto err_out; + + entry->eax &= (u32) ebx_eax; + entry->ebx &= (u32) (ebx_eax >> 32); + + field_id++; + err = tdx_td_metadata_field_read(tdx, field_id, &edx_ecx); + /* + * It's weird that reading edx_ecx fails while reading ebx_eax + * succeeded. + */ + if (WARN_ON_ONCE(err)) + goto err_out; + + entry->ecx &= (u32) edx_ecx; + entry->edx &= (u32) (edx_ecx >> 32); + return 0; + +err_out: + entry->eax = 0; + entry->ebx = 0; + entry->ecx = 0; + entry->edx = 0; + + return -EIO; +} + static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); @@ -1038,6 +1108,64 @@ static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid) return r; } +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct kvm_cpuid2 __user *output, *td_cpuid; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); + struct kvm_cpuid2 *supported_cpuid; + int r = 0, i, j = 0; + + output = u64_to_user_ptr(cmd->data); + td_cpuid = kzalloc(sizeof(*td_cpuid) + + sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES, + GFP_KERNEL); + if (!td_cpuid) + return -ENOMEM; + + r = tdx_get_kvm_supported_cpuid(&supported_cpuid); + if (r) + goto out; + + for (i = 0; i < supported_cpuid->nent; i++) { + struct kvm_cpuid_entry2 *supported = &supported_cpuid->entries[i]; + struct kvm_cpuid_entry2 *output_e = &td_cpuid->entries[j]; + + *output_e = *supported; + + /* Only allow values of bits that KVM's supports to be exposed */ + if (tdx_mask_cpuid(kvm_tdx, output_e)) + continue; + + /* + * Work around missing support on old TDX modules, fetch + * guest maxpa from gfn_direct_bits. + */ + if (output_e->function == 0x80000008) { + gpa_t gpa_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm)); + unsigned int g_maxpa = __ffs(gpa_bits) + 1; + + output_e->eax &= ~0x00ff0000; + output_e->eax |= g_maxpa << 16; + } + + j++; + } + td_cpuid->nent = j; + + if (copy_to_user(output, td_cpuid, sizeof(*output))) { + r = -EFAULT; + goto out; + } + if (copy_to_user(output->entries, td_cpuid->entries, + td_cpuid->nent * sizeof(struct kvm_cpuid_entry2))) + r = -EFAULT; + +out: + kfree(td_cpuid); + kfree(supported_cpuid); + return r; +} + static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) { struct msr_data apic_base_msr; @@ -1089,6 +1217,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) case KVM_TDX_INIT_VCPU: ret = tdx_vcpu_init(vcpu, &cmd); break; + case KVM_TDX_GET_CPUID: + ret = tdx_vcpu_get_cpuid(vcpu, &cmd); + break; default: ret = -EINVAL; break; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 8349b542836e..7eeb54fbcae1 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -25,6 +25,11 @@ struct kvm_tdx { bool finalized; u64 tsc_offset; + + /* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */ + atomic64_t nr_premapped; + + struct kvm_cpuid2 *cpuid; }; struct vcpu_tdx { diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index d2d7f9cab740..815e74408a34 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -157,4 +157,9 @@ struct td_params { #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) +/* + * TD scope metadata field ID. + */ +#define TD_MD_FIELD_ID_CPUID_VALUES 0x9410000300000000ULL + #endif /* __KVM_X86_TDX_ARCH_H */ diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h index dc3fa2a58c2c..f9dbb3a065cc 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -23,6 +23,7 @@ #define TDX_FLUSHVP_NOT_DONE 0x8000082400000000ULL #define TDX_EPT_WALK_FAILED 0xC0000B0000000000ULL #define TDX_EPT_ENTRY_STATE_INCORRECT 0xC0000B0D00000000ULL +#define TDX_METADATA_FIELD_NOT_READABLE 0xC0000C0200000000ULL /* * TDX module operand ID, appears in 31:0 part of error code as From patchwork Mon Aug 12 22:48:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761119 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B0AF1A08B2; Mon, 12 Aug 2024 22:48:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502930; cv=none; b=FmuRxDt0aGKi61ydIRozhmLkPHjm3eltXQLAuE/RykYCG8ymDPbkmbnWrybxyaeAOxVO0aHQfyjE40qzST7zo0IPsC3pEoLduQ06ieu2ZlZDnSXOI3QfJ4+O+Eg4oeUCZnwfHtezuopWT6OIzKQEcdPO72tplbX7Knqau0ENkMM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502930; c=relaxed/simple; bh=2MzQYYq/0msuwhfOTrXxKe8Z3ljlCGUDX8nja35C1iA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ScqARreiC0ZFgGaMTeamuChKCr8jJVYrNNlD03O9vR0sWSa1dO2ckb6q3xuN3RnnGbkwZAqjH0PcIJf8I+x5J9regHpVEpY75MupCn8XOGDgpzbNnCb3DtXymCJf8I4hZ/49xApW2ca4Gk3JFg5vLg55JgC2HibzUjBw9taO1fk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CVxKztXN; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CVxKztXN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502928; x=1755038928; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2MzQYYq/0msuwhfOTrXxKe8Z3ljlCGUDX8nja35C1iA=; b=CVxKztXNPeS0w80VEosJdGOUZh7F4UlBg4n++JYE8w5EMHrNHC+VzOC2 6vVDjUpS6C4NEzruBO6mnnzx4nJTY6cTOtVikziN29/2R3NBJFjJHtdBl /TeTzqCdUi6Oe39xgU5FpahlWKRcca9l+518Zsx6vPgHo/r4tYbXGrsTu aYSegT+I0JPcFBlM4qrp+lTwjIwAT2foqI3J4XnGgCoe+iz5ZrDEAwyMO DvdyQvNDfOmdsOfQYIiXDIrQMIqr0lceB0Vbdf0w2itssq3GMy5D48nqV nfD9nBHB8iqi8v+w2Qetesmvi2QQOvMPNLxM6LrEAWVdHddxYk3qc9ZVL Q==; X-CSE-ConnectionGUID: /72AuM/DQU28AyNE1+vzkA== X-CSE-MsgGUID: eZCgkK1yRcaWnu2+PAOBJw== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041484" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041484" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:39 -0700 X-CSE-ConnectionGUID: JaCyJs85Rr68YwOdobHgpA== X-CSE-MsgGUID: G68QV+gbTpyfIgnSe7WLVg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008452" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:39 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 22/25] KVM: TDX: Use guest physical address to configure EPT level and GPAW Date: Mon, 12 Aug 2024 15:48:17 -0700 Message-Id: <20240812224820.34826-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Xiaoyao Li KVM reports guest physical address in CPUID.0x800000008.EAX[23:16], which is similar to TDX's GPAW. Use this field as the interface for userspace to configure the GPAW and EPT level for TDs. Note, 1. only value 48 and 52 are supported. 52 means GPAW-52 and EPT level 5, and 48 means GPAW-48 and EPT level 4. 2. value 48, i.e., GPAW-48 is always supported. value 52 is only supported when the platform supports 5 level EPT. Current TDX module doesn't support max_gpa configuration. However current implementation relies on max_gpa to configure EPT level and GPAW. Hack KVM to make it work. Signed-off-by: Xiaoyao Li Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - New patch --- arch/x86/kvm/vmx/tdx.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index fe2bbc2ced41..c6bfeb0b3cc9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -514,23 +514,22 @@ static int setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid, struct td_params *td_params) { const struct kvm_cpuid_entry2 *entry; - int max_pa = 36; + int guest_pa; entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, 0); - if (entry) - max_pa = entry->eax & 0xff; + if (!entry) + return -EINVAL; + + guest_pa = (entry->eax >> 16) & 0xff; + + if (guest_pa != 48 && guest_pa != 52) + return -EINVAL; + + if (guest_pa == 52 && !cpu_has_vmx_ept_5levels()) + return -EINVAL; td_params->eptp_controls = VMX_EPTP_MT_WB; - /* - * No CPU supports 4-level && max_pa > 48. - * "5-level paging and 5-level EPT" section 4.1 4-level EPT - * "4-level EPT is limited to translating 48-bit guest-physical - * addresses." - * cpu_has_vmx_ept_5levels() check is just in case. - */ - if (!cpu_has_vmx_ept_5levels() && max_pa > 48) - return -EINVAL; - if (cpu_has_vmx_ept_5levels() && max_pa > 48) { + if (guest_pa == 52) { td_params->eptp_controls |= VMX_EPTP_PWL_5; td_params->exec_controls |= TDX_EXEC_CONTROL_MAX_GPAW; } else { @@ -576,6 +575,9 @@ static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid, value->ebx = entry->ebx; value->ecx = entry->ecx; value->edx = entry->edx; + + if (c->leaf == 0x80000008) + value->eax &= 0xff00ffff; } return 0; @@ -1277,6 +1279,10 @@ static int __init setup_kvm_tdx_caps(void) memcpy(dest, &source, sizeof(struct kvm_tdx_cpuid_config)); if (dest->sub_leaf == KVM_TDX_CPUID_NO_SUBLEAF) dest->sub_leaf = 0; + + /* Work around missing support on old TDX modules */ + if (dest->leaf == 0x80000008) + dest->eax |= 0x00ff0000; } return 0; From patchwork Mon Aug 12 22:48:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761121 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 346451A0B07; Mon, 12 Aug 2024 22:48:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502931; cv=none; b=hHosPBkW9+9Yq7nWV0oC6p8oTosJ1YC5hMGg7D3bfeXaaHwhkUMdgFSkWwdU2Exmf/+yGmKEZOA2AVdvBqtcuHkIWGnwUUukJAkiYkW+MRFkOiiL6INgKugY/hSlmEgtx4VDsWoXZyIPqMHDEKV8dvog7lhMkIIQcPkn0mt0STM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502931; c=relaxed/simple; bh=XIFhgsS6zqe720LVooXdtdZp14vJaW+49JUGtZO8bOk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cexOirSYH1HQZakd5LoFkIzR+vLqhVohZizExaSidK1VrW8WapJzr8ULPi7TjunjEqt+SUbDiChyDckMh6YUVtSEGa7IC9fQeKCPamMmb6hPRuxWxwzMFvID26a9DUsn8EgjbNiFoxNmOtl+aODaEyF+9b7Tl/kvcJJ70nGMi3k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=L0cR/LLX; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="L0cR/LLX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502929; x=1755038929; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XIFhgsS6zqe720LVooXdtdZp14vJaW+49JUGtZO8bOk=; b=L0cR/LLXyV7xRFjJNjQanA4iNxt+67DASd2lrdfL0ibEHUf68GGG1hXH 4CEO203C6ZxEuWAxgQ2fb6EV7p3I6bjHTzEZx7AVo6WpRln4lYK63aH39 rpbCUFPrbtc1HM4LhC+Nl3FRNPe6TsPlfENByzZgjExSE7w0ivHCxQpEh JzTG53U0Y380icn+QMA0kOLnEGbWEGQEH6oy/JDWaT9vRTC7uBIoNFgG6 wPP67JduVu3YpHIMIg5uSD409UqsLiH2w+Q+hqus6uuYasPQHLdf98N3m LttQWAX4cg0oYHprEFArD55oyKfjuNm8omX+UuaAWgV874z/lFUGKUFeZ A==; X-CSE-ConnectionGUID: 5klYzvdsTrKM3jkoeDoThg== X-CSE-MsgGUID: S9uIeY88R3W1EEDMhlkgbA== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041493" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041493" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:39 -0700 X-CSE-ConnectionGUID: ELLNTBYTRyyvtYnyJzUYbQ== X-CSE-MsgGUID: lpPzjbA7Q5KmpR7IK7doCw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008458" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:39 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 23/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level Date: Mon, 12 Aug 2024 15:48:18 -0700 Message-Id: <20240812224820.34826-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Xiaoyao Li For TDX, the maxpa (CPUID.0x80000008.EAX[7:0]) is fixed as native and the max_gpa (CPUID.0x80000008.EAX[23:16]) is configurable and used to configure the EPT level and GPAW. Use max_gpa to determine the TDP level. Signed-off-by: Xiaoyao Li Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- uAPI breakout v1: - New patch --- arch/x86/kvm/cpuid.c | 14 ++++++++++++++ arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/mmu/mmu.c | 10 +++++++++- 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 499479c769d8..ebebff0dbd3b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -423,6 +423,20 @@ int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu) return 36; } +int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 0x80000000); + if (!best || best->eax < 0x80000008) + goto not_found; + best = kvm_find_cpuid_entry(vcpu, 0x80000008); + if (best) + return (best->eax >> 16) & 0xff; +not_found: + return 0; +} + /* * This "raw" version returns the reserved GPA bits without any adjustments for * encryption technologies that usurp bits. The raw mask should be used if and diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 5cc13d1b7991..2db458e4c450 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -39,6 +39,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 xstate_required_size(u64 xstate_bv, bool compacted); int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu); +int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu); u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu); static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3a00bf062a46..694edcb7ef46 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5440,12 +5440,20 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) { + int maxpa = 0; + + if (vcpu->kvm->arch.vm_type == KVM_X86_TDX_VM) + maxpa = cpuid_query_maxguestphyaddr(vcpu); + + if (!maxpa) + maxpa = cpuid_maxphyaddr(vcpu); + /* tdp_root_level is architecture forced level, use it if nonzero */ if (tdp_root_level) return tdp_root_level; /* Use 5-level TDP if and only if it's useful/necessary. */ - if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48) + if (max_tdp_level == 5 && maxpa <= 48) return 4; return max_tdp_level; From patchwork Mon Aug 12 22:48:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761120 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B1D31A08B5; Mon, 12 Aug 2024 22:48:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502930; cv=none; b=ARW4vBpu0dToPROzKS2wrAPDGUbYJxLuVyLgTuvBYdzYAoFRrbmrW+CamrzgNsG2410uWn2Oqs/7y/mLs0rIJpucGUsHEpr0j5XP025CvvcFV9V+dZuY6n3oxTEZh50iHHLITS2MHcpgOC4j2DtgXiMfyjoxU3IyQGRW8snU3Fc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502930; c=relaxed/simple; bh=0Gqv13mAvcWK2D1HLrVD5lk09yLAPvz02xaqOkAihRU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iqFLxmRT4MV5yVuzH8lxVQSGfhfWwOlDyWNQbHipU0pJPADkUvIHvcx/cO/WTQVSGn9qr1DVZ3etU/MdMuwV/2+z84EI+Rj2dD/0EBba38Fcfjw9/cseqNVXcAVYLPGyDmM6R5XI55K+HJreH6LEvEuUa0q/ffvwmyCAv+suWAY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HQHXmcbL; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HQHXmcbL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502928; x=1755038928; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0Gqv13mAvcWK2D1HLrVD5lk09yLAPvz02xaqOkAihRU=; b=HQHXmcbLHy34Lvm0UDSbB70h4er3HPQAFIh7kcjTox5p+MYAA6uO9rQG hBpb4LBV+YCQ3PNQ2weaaKkMFSKhfFTEWwIjR6rnWCPCNLrjaUWaLngSh IjK7SHxW4OoG5+1BupMdr24XnGZs6o7E8fe47mDz5RtaTe8eldsevrHYb LkA1qP8CZLZtNoxLdwpzXr01m89qp6YWQ8sDPF4aYHAZiAsSsgjYPLX38 O4yIDIwyCofkhqSvcuNF498Oe0WLc/1oKTZrepUG4NnieXzTmbc7UBIDv q4tDDELfiFPBkenRgoZ4FUmagzg8b2+dWElXOTspdst/AjZBCdQqs0HBw g==; X-CSE-ConnectionGUID: kFxZ/bxCSDiX07fJeWd3dw== X-CSE-MsgGUID: kcrw94rlTjSZRrPNHAz3EQ== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041498" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041498" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:40 -0700 X-CSE-ConnectionGUID: oRanxMFVRpmI3xbhM1l9gA== X-CSE-MsgGUID: kVsaMHZ0SrOtFrvIh8rkTQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008462" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:40 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 24/25] KVM: x86: Filter directly configurable TDX CPUID bits Date: Mon, 12 Aug 2024 15:48:19 -0700 Message-Id: <20240812224820.34826-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Future TDX modules may provide support for future HW features, but run with KVM versions that lack support for them. In this case, userspace may try to use features that KVM does not have support, and develop assumptions around KVM's behavior. Then KVM would have to deal with not breaking such userspace. Simplify KVM's job by preventing userspace from configuring any unsupported CPUID feature bits. Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - New patch --- arch/x86/kvm/vmx/tdx.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index c6bfeb0b3cc9..d45b4f7b69ba 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1086,8 +1086,9 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) return ret; } -static int __maybe_unused tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid) +static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid) { + int r; static const u32 funcs[] = { 0, 0x80000000, KVM_CPUID_SIGNATURE, @@ -1235,8 +1236,10 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) static int __init setup_kvm_tdx_caps(void) { const struct tdx_sysinfo_td_conf *td_conf = &tdx_sysinfo->td_conf; + struct kvm_cpuid_entry2 *cpuid_e; + struct kvm_cpuid2 *supported_cpuid; u64 kvm_supported; - int i; + int i, r = -EIO; kvm_tdx_caps = kzalloc(sizeof(*kvm_tdx_caps) + sizeof(struct kvm_tdx_cpuid_config) * td_conf->num_cpuid_config, @@ -1263,6 +1266,10 @@ static int __init setup_kvm_tdx_caps(void) kvm_tdx_caps->supported_xfam = kvm_supported & td_conf->xfam_fixed0; + r = tdx_get_kvm_supported_cpuid(&supported_cpuid); + if (r) + goto err; + kvm_tdx_caps->num_cpuid_config = td_conf->num_cpuid_config; for (i = 0; i < td_conf->num_cpuid_config; i++) { struct kvm_tdx_cpuid_config source = { @@ -1283,12 +1290,24 @@ static int __init setup_kvm_tdx_caps(void) /* Work around missing support on old TDX modules */ if (dest->leaf == 0x80000008) dest->eax |= 0x00ff0000; + + cpuid_e = kvm_find_cpuid_entry2(supported_cpuid->entries, supported_cpuid->nent, + dest->leaf, dest->sub_leaf); + if (!cpuid_e) { + dest->eax = dest->ebx = dest->ecx = dest->edx = 0; + } else { + dest->eax &= cpuid_e->eax; + dest->ebx &= cpuid_e->ebx; + dest->ecx &= cpuid_e->ecx; + dest->edx &= cpuid_e->edx; + } } + kfree(supported_cpuid); return 0; err: kfree(kvm_tdx_caps); - return -EIO; + return r; } static void free_kvm_tdx_cap(void) From patchwork Mon Aug 12 22:48:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13761122 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 507CC1A256B; Mon, 12 Aug 2024 22:48:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502932; cv=none; b=bnkglzDoM+YvHLJGIDgu5XweprCB8x147/2nXHU7Mi2JPsY+WbuJT06Zawgzs0d1WvLh7QDYRD0Pt1MekCpZdqtUjPVPmNRmUs7LNSWFVKBpTzQbHT6Ejf5IzznPD4yLwiAK5Mz0wA80fHdoJq47XOCqPuovM2Uk/EaqhobTFcE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723502932; c=relaxed/simple; bh=JmY2PRyqyu1ItXKiqbyGIaLSSKWF1EMftG8Hi8PJamE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mA/d9I9W2Drz1SHFI0FkPvmPOW9ca00gwJ1ixczSDhfLraKoQhgTZ2hynwqaHR3L5zXHMmdDiPriNTYF8wblRnTJKguSD0D2kxis9Yo9Q/bU7QEoorm1ZpXKd67JnTzUQaO3I1P41nDmgMNnUSbYrh2KBY8WLNC/W4H9QdMen3k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BvsmysGB; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BvsmysGB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1723502930; x=1755038930; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JmY2PRyqyu1ItXKiqbyGIaLSSKWF1EMftG8Hi8PJamE=; b=BvsmysGBTQE/4HcnOS6tJuL8j7OPrQ5laadFM6x/C72mlwUDZmLULLKX l0LYHca0ogiBinicELQP3wM5V27r8leV3byaAAt7liS5nb+ugWIfM7yrt 4T9HoG1mBKzZnaJeb1b/bt4bXbPK8gY9auCsUtKBMOJW6NY7gW/WgTiBM EaQZmkueNf2bmQ0E/xoST6WW2fck8gCW0RPYYviMtrVQAZ4/1/XqiNt/F Pxj71V2aQWpERTGrohgxwo7ETqi+w3sEbPD89jbhAco+Jtj1IbEtMBbi8 cEk6jx0QiyTfkFCj1y4alqlD9w4bKu0+JLnuon5yYdjrrsYaUlu+EJAZX w==; X-CSE-ConnectionGUID: TRmeUrG7SSycWDtfYrvaTA== X-CSE-MsgGUID: 3bCsrP2uRRWR3gqZFsAnfg== X-IronPort-AV: E=McAfee;i="6700,10204,11162"; a="33041505" X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="33041505" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:41 -0700 X-CSE-ConnectionGUID: zKLNPFYKQmG6uYpjWjy2iA== X-CSE-MsgGUID: lgUDR5isTBm2K1ojhMTOcA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,284,1716274800"; d="scan'208";a="59008465" Received: from jdoman-desk1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.222.53]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Aug 2024 15:48:40 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, isaku.yamahata@gmail.com, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, linux-kernel@vger.kernel.org, rick.p.edgecombe@intel.com Subject: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID Date: Mon, 12 Aug 2024 15:48:20 -0700 Message-Id: <20240812224820.34826-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240812224820.34826-1-rick.p.edgecombe@intel.com> References: <20240812224820.34826-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Originally, the plan was to filter the directly configurable CPUID bits exposed by KVM_TDX_CAPABILITIES, and the final configured bit values provided by KVM_TDX_GET_CPUID. However, several issues were found with this. Both the filtering done with KVM_TDX_CAPABILITIES and KVM_TDX_GET_CPUID had the issue that the get_supported_cpuid() provided default values instead of supported masks for multi-bit fields (i.e. those encoding a multi-bit number). For KVM_TDX_CAPABILITIES, there was also the problem of bits that are actually supported by KVM, but missing from get_supported_cpuid() for one reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in QEMU by adjusting which features are expected. Some of these are going to be added to get_supported_cpuid(), and that is probably the right long term fix. For KVM_TDX_GET_CPUID, there is another problem. Some CPUID bits are fixed on by the TDX module, but unsupported by KVM. This means that the TD will have them set, but KVM and userspace won't know about them. This class of bits is dealt with by having QEMU expect not to see them. The bits include: X86_FEATURE_HYPERVISOR. The proper fix for this specifically is probably to change KVM to show it as supported (currently a patch exists). But this scenario could be expected in the end of TDX module ever setting and default 1, or fixed 1 bits. It would be good to have discussion on whether KVM community should mandate that this doesn't happen. Signed-off-by: Rick Edgecombe --- uAPI breakout v1: - New patch --- arch/x86/kvm/vmx/tdx.c | 96 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d45b4f7b69ba..34e838d8f7fd 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1086,13 +1086,24 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) return ret; } +/* + * This function is used in two cases: + * 1. mask KVM unsupported/unknown bits from the configurable CPUIDs reported + * by TDX module. in setup_kvm_tdx_caps(). + * 2. mask KVM unsupported/unknown bits from the actual CPUID value of TD that + * read from TDX module. in tdx_vcpu_get_cpuid(). + * + * For both cases, it needs fixup for the field that consists of multiple bits. + * For multi-bits field, we need a mask however what + * kvm_get_supported_cpuid_internal() returns is just a default value. + */ static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid) { - int r; static const u32 funcs[] = { 0, 0x80000000, KVM_CPUID_SIGNATURE, }; + struct kvm_cpuid_entry2 *entry; *cpuid = kzalloc(sizeof(struct kvm_cpuid2) + sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES, @@ -1104,6 +1115,89 @@ static int tdx_get_kvm_supported_cpuid(struct kvm_cpuid2 **cpuid) if (r) goto err; + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x0, 0); + if (WARN_ON(!entry)) + goto err; + /* Fixup of maximum basic leaf */ + entry->eax |= 0x000000FF; + + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x1, 0); + if (WARN_ON(!entry)) + goto err; + /* Fixup of FMS */ + entry->eax |= 0x0fff3fff; + /* Fixup of maximum logical processors per package */ + entry->ebx |= 0x00ff0000; + + /* + * Fixup of CPUID leaf 4, which enmerates cache info, all of the + * non-reserved fields except EBX[11:0] (System Coherency Line Size) + * are configurable for TDs. + */ + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 0); + if (WARN_ON(!entry)) + goto err; + entry->eax |= 0xffffc3ff; + entry->ebx |= 0xfffff000; + entry->ecx |= 0xffffffff; + entry->edx |= 0x00000007; + + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 1); + if (WARN_ON(!entry)) + goto err; + entry->eax |= 0xffffc3ff; + entry->ebx |= 0xfffff000; + entry->ecx |= 0xffffffff; + entry->edx |= 0x00000007; + + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 2); + if (WARN_ON(!entry)) + goto err; + entry->eax |= 0xffffc3ff; + entry->ebx |= 0xfffff000; + entry->ecx |= 0xffffffff; + entry->edx |= 0x00000007; + + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x4, 3); + if (WARN_ON(!entry)) + goto err; + entry->eax |= 0xffffc3ff; + entry->ebx |= 0xfffff000; + entry->ecx |= 0xffffffff; + entry->edx |= 0x00000007; + + /* Fixup of CPUID leaf 0xB */ + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0xb, 0); + if (WARN_ON(!entry)) + goto err; + entry->eax = 0x0000001f; + entry->ebx = 0x0000ffff; + entry->ecx = 0x0000ffff; + + /* + * Fixup of CPUID leaf 0x1f, which is totally configurable for TDs. + */ + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x1f, 0); + if (WARN_ON(!entry)) + goto err; + entry->eax = 0x0000001f; + entry->ebx = 0x0000ffff; + entry->ecx = 0x0000ffff; + + for (int i = 1; i <= 5; i++) { + entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 0x1f, i); + if (!entry) { + entry = &(*cpuid)->entries[(*cpuid)->nent]; + entry->function = 0x1f; + entry->index = i; + entry->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX; + (*cpuid)->nent++; + } + entry->eax = 0x0000001f; + entry->ebx = 0x0000ffff; + entry->ecx = 0x0000ffff; + } + return 0; err: kfree(*cpuid);