From patchwork Wed Nov 27 20:10:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13887350 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51F3E201277 for ; Wed, 27 Nov 2024 20:10:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732738229; cv=none; b=pBOxMnMSzNMbrBKUpuCsomqExHHI+vkjbTKMZuEHKIHOqEZIbmVjEO98beJtWRRgM/4+HnMNLmiQnnRL/iaf+t5ATJMy30YWrDHqdo88sBIx0npW4GNplf1JdSB5NfcWGEsGpVDqbUzRHGTdX1wHPeONZiT9tP5XJpNumESsSZk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732738229; c=relaxed/simple; bh=r6aFeq6PS837UNC9SFHrRCf/th1nqfXjv/BwAMo2NCY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nw1mIQFVlYPBFk4B8bUsCkfglouxndHvnkimTizD8+h9BIXHvyNVkqFiLvcQrBTOcuyf/0jKjUoAgDQRAIRZH7YsvoEye64+HcK3dj/Mnio50wi77t4cLKB6l3nMQl0WEal0ro78isfysb6QhEY+NC5XRXojsRBdNTB6EVqFNk8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EgUKVVD+; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EgUKVVD+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732738226; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I+M825//4HSdqz1oxZiOBdhTSc7nT0ze0JjqDLRc138=; b=EgUKVVD+1ecgG5FLbumhBLqpcE5eJ4mQ7JdAKqaDI1Tq3PeCNI45mz2iLZqFMhOP/TizPS vCkaC4NJljAtPxq5QETEuQSB6Mc2xO8jFr8maRUj8gBStd/+3mBP8kvZDczOUfLVJid8Pg iAGPe1FGtHO2PUPqvGMjdqCxN5KQF3M= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-227-hLSMnDmFNByN6JM9_Fq3UA-1; Wed, 27 Nov 2024 15:10:22 -0500 X-MC-Unique: hLSMnDmFNByN6JM9_Fq3UA-1 X-Mimecast-MFC-AGG-ID: hLSMnDmFNByN6JM9_Fq3UA Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9F4B619540F0; Wed, 27 Nov 2024 20:10:21 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0868B19560A3; Wed, 27 Nov 2024 20:10:20 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: kai.huang@intel.com Subject: [PATCH 1/3] KVM: Export hardware virtualization enabling/disabling functions Date: Wed, 27 Nov 2024 15:10:17 -0500 Message-ID: <20241127201019.136086-2-pbonzini@redhat.com> In-Reply-To: <20241127201019.136086-1-pbonzini@redhat.com> References: <20241127201019.136086-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Kai Huang To support TDX, KVM will need to enable TDX during KVM module loading time. Enabling TDX requires enabling hardware virtualization first so that all online CPUs (and the new CPU going online) are in post-VMXON state. KVM by default enables hardware virtualization but that is done in kvm_init(), which must be the last step after all initialization is done thus is too late for enabling TDX. Export functions to enable/disable hardware virtualization so that TDX code can use them to handle hardware virtualization enabling before kvm_init(). Signed-off-by: Kai Huang Message-ID: Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 8 ++++++++ virt/kvm/kvm_main.c | 18 ++++-------------- 2 files changed, 12 insertions(+), 14 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9db304d14c46..3fcee4209120 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2559,4 +2559,12 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range); #endif +#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING +int kvm_enable_virtualization(void); +void kvm_disable_virtualization(void); +#else +static inline int kvm_enable_virtualization(void) { return 0; } +static inline void kvm_disable_virtualization(void) { } +#endif + #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c5e7782fbd4a..8214efa76a6f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -143,8 +143,6 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file) #define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \ .open = kvm_no_compat_open #endif -static int kvm_enable_virtualization(void); -static void kvm_disable_virtualization(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); @@ -5552,7 +5550,7 @@ static struct syscore_ops kvm_syscore_ops = { .shutdown = kvm_shutdown, }; -static int kvm_enable_virtualization(void) +int kvm_enable_virtualization(void) { int r; @@ -5597,8 +5595,9 @@ static int kvm_enable_virtualization(void) --kvm_usage_count; return r; } +EXPORT_SYMBOL_GPL(kvm_enable_virtualization); -static void kvm_disable_virtualization(void) +void kvm_disable_virtualization(void) { guard(mutex)(&kvm_usage_lock); @@ -5609,6 +5608,7 @@ static void kvm_disable_virtualization(void) cpuhp_remove_state(CPUHP_AP_KVM_ONLINE); kvm_arch_disable_virtualization(); } +EXPORT_SYMBOL_GPL(kvm_disable_virtualization); static int kvm_init_virtualization(void) { @@ -5624,21 +5624,11 @@ static void kvm_uninit_virtualization(void) kvm_disable_virtualization(); } #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */ -static int kvm_enable_virtualization(void) -{ - return 0; -} - static int kvm_init_virtualization(void) { return 0; } -static void kvm_disable_virtualization(void) -{ - -} - static void kvm_uninit_virtualization(void) { From patchwork Wed Nov 27 20:10:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13887352 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECE6A202F71 for ; Wed, 27 Nov 2024 20:10:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732738229; cv=none; b=N9GI+m9f/eHXi6mv8jZwb2ieox/6SJtpHAlpCTvVyTM7q32muzizU+hPEcsiBN/az6l3mZ4LQF3dVaB2CKJ3TDDrAsDc72OJhvUUoLDTEnZL3F85ttZ5D8Z71aF47luKY9bzAY7LchE4kLJwrsQDY76/ltTiGs7q0LOJWsdpecg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732738229; c=relaxed/simple; bh=52pdzx8rTe5dCMj07CjQEs6GrtRLXgVxwx4XbmkuAng=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dJPPBi5IV6lqiV2yEFJ4n1Wgl36J24A0ctAnnQYg4qyMJ10x/ScW7kf0j2shYIXsliqko6WiJEVpOKw4b4EVhkl7Pa0cZWk4ygMWyxtZoxauAl8Dv6rGCPGJrniwlrv7CzoFidMxv76Wj9lozJqHRsj5F2euXsmD3vubFeV4P/Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Cmthb+Si; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Cmthb+Si" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732738227; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Osu+JlXIFqkDN7a1RzYyy5wX2zZ0R8KOgTVlLLKwRTM=; b=Cmthb+SiyzlGuLOvY+Y6Aq7QH7AQtgiSqOdqKQKeSYOetwW25rAzdjStGr2iRfNYvMedb8 6ovjVZDZDE0hjB+TZAHX+w0GIFWwG92v8l9AtFCev08fUb3Wzfp/JHrVykseGH4Oy/JNTG V4MlBSgcpCk2LdbLH2tlWI229JrnHB8= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-34-_USOMyVePUWlQgj6h0nTeQ-1; Wed, 27 Nov 2024 15:10:23 -0500 X-MC-Unique: _USOMyVePUWlQgj6h0nTeQ-1 X-Mimecast-MFC-AGG-ID: _USOMyVePUWlQgj6h0nTeQ Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6E57F195419D; Wed, 27 Nov 2024 20:10:22 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CFD2F1955F43; Wed, 27 Nov 2024 20:10:21 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: kai.huang@intel.com Subject: [PATCH 2/3] KVM: VMX: Refactor VMX module init/exit functions Date: Wed, 27 Nov 2024 15:10:18 -0500 Message-ID: <20241127201019.136086-3-pbonzini@redhat.com> In-Reply-To: <20241127201019.136086-1-pbonzini@redhat.com> References: <20241127201019.136086-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Kai Huang Add vt_init() and vt_exit() as the new module init/exit functions and refactor existing vmx_init()/vmx_exit() as helper to make room for TDX specific initialization and teardown. To support TDX, KVM will need to enable TDX during KVM module loading time. Enabling TDX requires enabling hardware virtualization first so that all online CPUs (and the new CPU going online) are in post-VMXON state. Currently, the vmx_init() flow is: 1) hv_init_evmcs(), 2) kvm_x86_vendor_init(), 3) Other VMX specific initialization, 4) kvm_init() The kvm_x86_vendor_init() invokes kvm_x86_init_ops::hardware_setup() to do VMX specific hardware setup and calls kvm_update_ops() to initialize kvm_x86_ops to VMX's version. TDX will have its own version for most of kvm_x86_ops callbacks. It would be nice if kvm_x86_init_ops::hardware_setup() could also be used for TDX, but in practice it cannot. The reason is, as mentioned above, TDX initialization requires hardware virtualization having been enabled, which must happen after kvm_update_ops(), but hardware_setup() is done before that. Also, TDX is based on VMX, and it makes sense to only initialize TDX after VMX has been initialized. If VMX fails to initialize, TDX is likely broken anyway. So the new flow of KVM module init function will be: 1) Current VMX initialization code in vmx_init() before kvm_init(), 2) TDX initialization, 3) kvm_init() Split vmx_init() into two parts based on above 1) and 3) so that TDX initialization can fit in between. Make part 1) as the new helper vmx_init(). Introduce vt_init() as the new module init function which calls vmx_init() and kvm_init(). TDX initialization will be added later. Do the same thing for vmx_exit()/vt_exit(). Signed-off-by: Kai Huang Message-ID: <3f23f24098bdcf42e213798893ffff7cdc7103be.1731664295.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/main.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 23 ++--------------------- arch/x86/kvm/vmx/vmx.h | 3 +++ 3 files changed, 37 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 92d35cc6cd15..6772e560ac7b 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -167,3 +167,35 @@ struct kvm_x86_init_ops vt_init_ops __initdata = { .runtime_ops = &vt_x86_ops, .pmu_ops = &intel_pmu_ops, }; + +static void vt_exit(void) +{ + kvm_exit(); + vmx_exit(); +} +module_exit(vt_exit); + +static int __init vt_init(void) +{ + int r; + + r = vmx_init(); + if (r) + return r; + + /* + * Common KVM initialization _must_ come last, after this, /dev/kvm is + * exposed to userspace! + */ + r = kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), + THIS_MODULE); + if (r) + goto err_kvm_init; + + return 0; + +err_kvm_init: + vmx_exit(); + return r; +} +module_init(vt_init); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 893366e53732..6793a8cc5c60 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8590,23 +8590,16 @@ static void vmx_cleanup_l1d_flush(void) l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; } -static void __vmx_exit(void) +void vmx_exit(void) { allow_smaller_maxphyaddr = false; vmx_cleanup_l1d_flush(); -} -static void vmx_exit(void) -{ - kvm_exit(); - __vmx_exit(); kvm_x86_vendor_exit(); - } -module_exit(vmx_exit); -static int __init vmx_init(void) +int __init vmx_init(void) { int r, cpu; @@ -8650,21 +8643,9 @@ static int __init vmx_init(void) if (!enable_ept) allow_smaller_maxphyaddr = true; - /* - * Common KVM initialization _must_ come last, after this, /dev/kvm is - * exposed to userspace! - */ - r = kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), - THIS_MODULE); - if (r) - goto err_kvm_init; - return 0; -err_kvm_init: - __vmx_exit(); err_l1d_flush: kvm_x86_vendor_exit(); return r; } -module_init(vmx_init); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 43f573f6ca46..1813caac1cea 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -756,4 +756,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx) vmx->segment_cache.bitmask = 0; } +int vmx_init(void); +void vmx_exit(void); + #endif /* __KVM_X86_VMX_H */ From patchwork Wed Nov 27 20:10:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13887351 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39EF12036E6 for ; Wed, 27 Nov 2024 20:10:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732738229; cv=none; b=Qe7RQfWfl0x9N3wCUQcaeIf25c24AbubYHAYMz72FWz8Ruvd9ZCc3jEPNsmoUIhTmzbW7vDaziQF9JUpJpPSzzJ2tMlfs95H4U7w0ThYmMReQS6GaluHGNdHxLUCUDTB50RqWm25ZEQoQDbqEsamvj/K+oli8+dYuLiNFlWv2Qg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732738229; c=relaxed/simple; bh=GjTTqn2whow5AQGXruMUfYhh0oUV4AeJJk0c33RICus=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=O74ntZLr0r0+wPpSIK2Y6xvLZfEwaI+P7tzV7X1ItxkD7io/4/srs+PN0es2xDI+zmjYLI2p6L0FaBNIxsXMTUle0t/kCzlKVDm2ZMNzdBfYH8ifRGIWeLC8s0aUlW6gDJ1+vX8X4e4B77gj6CdrHiWWi+o4D3qwgH9aMuSkOJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Nj9pXLfn; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Nj9pXLfn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732738225; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vSsdGrzP8QTWyIieNbBesE4v5M+34faZuSvZl5ZTcvI=; b=Nj9pXLfne1K8jqK7dUZgON5r2NqvrA+YsuQpjMuRyWbd1+QYBsR0dSB7QRkqP1d27RnpYd RX4XJCnzugiTXA4oODXAnrFfiDJYeC3tQcnAdtfzc1+zfBcj2X8nzs48v7tFlG3UXQ0ut0 SQf4XqDf1l2Bovk0TLRY1d0fVR+nHj0= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-113-u85DGN12OKCSNcmV8A27Tw-1; Wed, 27 Nov 2024 15:10:24 -0500 X-MC-Unique: u85DGN12OKCSNcmV8A27Tw-1 X-Mimecast-MFC-AGG-ID: u85DGN12OKCSNcmV8A27Tw Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 412101955DC4; Wed, 27 Nov 2024 20:10:23 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A0D1C19560A3; Wed, 27 Nov 2024 20:10:22 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: kai.huang@intel.com Subject: [PATCH 3/3] KVM: VMX: Initialize TDX during KVM module load Date: Wed, 27 Nov 2024 15:10:19 -0500 Message-ID: <20241127201019.136086-4-pbonzini@redhat.com> In-Reply-To: <20241127201019.136086-1-pbonzini@redhat.com> References: <20241127201019.136086-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Kai Huang Before KVM can use TDX to create and run TDX guests, TDX needs to be initialized from two perspectives: 1) TDX module must be initialized properly to a working state; 2) A per-cpu TDX initialization, a.k.a the TDH.SYS.LP.INIT SEAMCALL must be done on any logical cpu before it can run any other TDX SEAMCALLs. The TDX host core-kernel provides two functions to do the above two respectively: tdx_enable() and tdx_cpu_enable(). There are two options in terms of when to initialize TDX: initialize TDX at KVM module loading time, or when creating the first TDX guest. Choose to initialize TDX during KVM module loading time: Initializing TDX module is both memory and CPU time consuming: 1) the kernel needs to allocate a non-trivial size(~1/256) of system memory as metadata used by TDX module to track each TDX-usable memory page's status; 2) the TDX module needs to initialize this metadata, one entry for each TDX-usable memory page. Also, the kernel uses alloc_contig_pages() to allocate those metadata chunks, because they are large and need to be physically contiguous. alloc_contig_pages() can fail. If initializing TDX when creating the first TDX guest, then there's chance that KVM won't be able to run any TDX guests albeit KVM _declares_ to be able to support TDX. This isn't good for the user. On the other hand, initializing TDX at KVM module loading time can make sure KVM is providing a consistent view of whether KVM can support TDX to the user. Always only try to initialize TDX after VMX has been initialized. TDX is based on VMX, and if VMX fails to initialize then TDX is likely to be broken anyway. Also, in practice, supporting TDX will require part of VMX and common x86 infrastructure in working order, so TDX cannot be enabled alone w/o VMX support. There are two cases that can result in failure to initialize TDX: 1) TDX cannot be supported (e.g., because of TDX is not supported or enabled by hardware, or module is not loaded, or missing some dependency in KVM's configuration); 2) Any unexpected error during TDX bring-up. For the first case only mark TDX is disabled but still allow KVM module to be loaded. For the second case just fail to load the KVM module so that the user can be aware. Because TDX costs additional memory, don't enable TDX by default. Add a new module parameter 'enable_tdx' to allow the user to opt-in. Note, the name tdx_init() has already been taken by the early boot code. Use tdx_bringup() for initializing TDX (and tdx_cleanup() since KVM doesn't actually teardown TDX). They don't match vt_init()/vt_exit(), vmx_init()/vmx_exit() etc but it's not end of the world. Also, once initialized, the TDX module cannot be disabled and enabled again w/o the TDX module runtime update, which isn't supported by the kernel. After TDX is enabled, nothing needs to be done when KVM disables hardware virtualization, e.g., when offlining CPU, or during suspend/resume. TDX host core-kernel code internally tracks TDX status and can handle "multiple enabling" scenario. Similar to KVM_AMD_SEV, add a new KVM_INTEL_TDX Kconfig to guide KVM TDX code. Make it depend on INTEL_TDX_HOST but not replace INTEL_TDX_HOST because in the longer term there's a use case that requires making SEAMCALLs w/o KVM as mentioned by Dan [1]. Link: https://lore.kernel.org/6723fc2070a96_60c3294dc@dwillia2-mobl3.amr.corp.intel.com.notmuch/ [1] Signed-off-by: Kai Huang Message-ID: <162f9dee05c729203b9ad6688db1ca2960b4b502.1731664295.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/Kconfig | 10 +++ arch/x86/kvm/Makefile | 1 + arch/x86/kvm/vmx/main.c | 9 +++ arch/x86/kvm/vmx/tdx.c | 160 +++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 13 ++++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 3 +- 7 files changed, 196 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kvm/vmx/tdx.c create mode 100644 arch/x86/kvm/vmx/tdx.h diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index ea2c4f21c1ca..fe8cbee6f614 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -128,6 +128,16 @@ config X86_SGX_KVM If unsure, say N. +config KVM_INTEL_TDX + bool "Intel Trust Domain Extensions (TDX) support" + default y + depends on INTEL_TDX_HOST + help + Provides support for launching Intel Trust Domain Extensions (TDX) + confidential VMs on Intel processors. + + If unsure, say N. + config KVM_AMD tristate "KVM for AMD processors support" depends on KVM && (CPU_SUP_AMD || CPU_SUP_HYGON) diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index f9dddb8cb466..a5d362c7b504 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -20,6 +20,7 @@ kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \ kvm-intel-$(CONFIG_X86_SGX_KVM) += vmx/sgx.o kvm-intel-$(CONFIG_KVM_HYPERV) += vmx/hyperv.o vmx/hyperv_evmcs.o +kvm-intel-$(CONFIG_KVM_INTEL_TDX) += vmx/tdx.o kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 6772e560ac7b..bc690c63e511 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -6,6 +6,7 @@ #include "nested.h" #include "pmu.h" #include "posted_intr.h" +#include "tdx.h" #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ @@ -171,6 +172,7 @@ struct kvm_x86_init_ops vt_init_ops __initdata = { static void vt_exit(void) { kvm_exit(); + tdx_cleanup(); vmx_exit(); } module_exit(vt_exit); @@ -183,6 +185,11 @@ static int __init vt_init(void) if (r) return r; + /* tdx_init() has been taken */ + r = tdx_bringup(); + if (r) + goto err_tdx_bringup; + /* * Common KVM initialization _must_ come last, after this, /dev/kvm is * exposed to userspace! @@ -195,6 +202,8 @@ static int __init vt_init(void) return 0; err_kvm_init: + tdx_cleanup(); +err_tdx_bringup: vmx_exit(); return r; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c new file mode 100644 index 000000000000..0666dfbe0bc0 --- /dev/null +++ b/arch/x86/kvm/vmx/tdx.c @@ -0,0 +1,160 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include "capabilities.h" +#include "tdx.h" + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +static bool enable_tdx __ro_after_init; +module_param_named(tdx, enable_tdx, bool, 0444); + +static enum cpuhp_state tdx_cpuhp_state; + +static int tdx_online_cpu(unsigned int cpu) +{ + unsigned long flags; + int r; + + /* Sanity check CPU is already in post-VMXON */ + WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE)); + + local_irq_save(flags); + r = tdx_cpu_enable(); + local_irq_restore(flags); + + return r; +} + +static void __do_tdx_cleanup(void) +{ + /* + * Once TDX module is initialized, it cannot be disabled and + * re-initialized again w/o runtime update (which isn't + * supported by kernel). Only need to remove the cpuhp here. + * The TDX host core code tracks TDX status and can handle + * 'multiple enabling' scenario. + */ + WARN_ON_ONCE(!tdx_cpuhp_state); + cpuhp_remove_state_nocalls(tdx_cpuhp_state); + tdx_cpuhp_state = 0; +} + +static int __init __do_tdx_bringup(void) +{ + int r; + + /* + * TDX-specific cpuhp callback to call tdx_cpu_enable() on all + * online CPUs before calling tdx_enable(), and on any new + * going-online CPU to make sure it is ready for TDX guest. + */ + r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, + "kvm/cpu/tdx:online", + tdx_online_cpu, NULL); + if (r < 0) + return r; + + tdx_cpuhp_state = r; + + r = tdx_enable(); + if (r) + __do_tdx_cleanup(); + + return r; +} + +static bool __init kvm_can_support_tdx(void) +{ + return cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM); +} + +static int __init __tdx_bringup(void) +{ + int r; + + /* + * Enabling TDX requires enabling hardware virtualization first, + * as making SEAMCALLs requires CPU being in post-VMXON state. + */ + r = kvm_enable_virtualization(); + if (r) + return r; + + cpus_read_lock(); + r = __do_tdx_bringup(); + cpus_read_unlock(); + + if (r) + goto tdx_bringup_err; + + /* + * Leave hardware virtualization enabled after TDX is enabled + * successfully. TDX CPU hotplug depends on this. + */ + return 0; +tdx_bringup_err: + kvm_disable_virtualization(); + return r; +} + +void tdx_cleanup(void) +{ + if (enable_tdx) { + __do_tdx_cleanup(); + kvm_disable_virtualization(); + } +} + +int __init tdx_bringup(void) +{ + int r; + + if (!enable_tdx) + return 0; + + if (!kvm_can_support_tdx()) { + pr_err("tdx: no TDX private KeyIDs available\n"); + goto success_disable_tdx; + } + + if (!enable_virt_at_load) { + pr_err("tdx: tdx requires kvm.enable_virt_at_load=1\n"); + goto success_disable_tdx; + } + + /* + * Ideally KVM should probe whether TDX module has been loaded + * first and then try to bring it up. But TDX needs to use SEAMCALL + * to probe whether the module is loaded (there is no CPUID or MSR + * for that), and making SEAMCALL requires enabling virtualization + * first, just like the rest steps of bringing up TDX module. + * + * So, for simplicity do everything in __tdx_bringup(); the first + * SEAMCALL will return -ENODEV when the module is not loaded. The + * only complication is having to make sure that initialization + * SEAMCALLs don't return TDX_SEAMCALL_VMFAILINVALID in other + * cases. + */ + r = __tdx_bringup(); + if (r) { + /* + * Disable TDX only but don't fail to load module if + * the TDX module could not be loaded. No need to print + * message saying "module is not loaded" because it was + * printed when the first SEAMCALL failed. + */ + if (r == -ENODEV) + goto success_disable_tdx; + + enable_tdx = 0; + } + + return r; + +success_disable_tdx: + enable_tdx = 0; + return 0; +} diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h new file mode 100644 index 000000000000..8aee938a968f --- /dev/null +++ b/arch/x86/kvm/vmx/tdx.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __KVM_X86_VMX_TDX_H +#define __KVM_X86_VMX_TDX_H + +#ifdef CONFIG_INTEL_TDX_HOST +int tdx_bringup(void); +void tdx_cleanup(void); +#else +static inline int tdx_bringup(void) { return 0; } +static inline void tdx_cleanup(void) {} +#endif + +#endif diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3fcee4209120..0a141685872d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2272,6 +2272,7 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) } #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING +extern bool enable_virt_at_load; extern bool kvm_rebooting; #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8214efa76a6f..265b687fc5d6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5440,8 +5440,9 @@ static struct miscdevice kvm_dev = { }; #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING -static bool enable_virt_at_load = true; +bool enable_virt_at_load = true; module_param(enable_virt_at_load, bool, 0444); +EXPORT_SYMBOL_GPL(enable_virt_at_load); __visible bool kvm_rebooting; EXPORT_SYMBOL_GPL(kvm_rebooting);