From patchwork Mon Feb 13 11:59:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 13138322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34700C636CC for ; Mon, 13 Feb 2023 12:02:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231341AbjBMMCH (ORCPT ); Mon, 13 Feb 2023 07:02:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231195AbjBMMBw (ORCPT ); Mon, 13 Feb 2023 07:01:52 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E76DF1A645; Mon, 13 Feb 2023 04:01:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676289677; x=1707825677; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eUYI2f/G5Hw2W0lMnLcgpoR4LJWm2nSFqPt6Km379wQ=; b=Q9/Xk4/iKClrDXbIEf5nuujNzlgKLq6l4Ma0dRZHn2e2iBACSCaRAmkk ui/xua6/6czJO+9sGlpCEwv5HlNSV/+1zl74gi5PyutaXTCT53T0oVVoz cRnnd9dmnBYFDIJ0MynNtiBfej5Y//94/9NzbXPv3qVPxNz2+wVz4vFz9 Cg2+6rdvlwhQty7L8sc4xH7x8V6JrAtlJE2jP6fWqgHDLyZucnJIrgjI3 NqBYq5j9EAjnuQsLzv2BMYBE66AfT/1o+tg6tMwcPYzODEU2mV1EdBeEV z/dXznmIvhRe0uh1h4On8zL7bhEdWgn3Y5SfjBNDV2Acxu589dKxM7QRX Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10619"; a="358283258" X-IronPort-AV: E=Sophos;i="5.97,293,1669104000"; d="scan'208";a="358283258" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2023 04:00:49 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10619"; a="701243252" X-IronPort-AV: E=Sophos;i="5.97,293,1669104000"; d="scan'208";a="701243252" Received: from wonger-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.188.34]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2023 04:00:44 -0800 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, dave.hansen@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, david@redhat.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v9 07/18] x86/virt/tdx: Do TDX module per-cpu initialization Date: Tue, 14 Feb 2023 00:59:14 +1300 Message-Id: <557c526a1190903d11d67c4e2c76e01f67f6eb15.1676286526.git.kai.huang@intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org After the SEAMCALL to do TDX module global initialization, a SEAMCALL to do per-cpu initialization (TDH.SYS.LP.INIT) must be done on one logical cpu before any other SEAMCALLs can be made on that cpu, including those involved in the future steps of the module initialization. To keep things simple, this implementation just chooses to guarantee all online cpus are "TDX-runnable" (TDH.SYS.LP.INIT has been successfully done on them). If the kernel were to allow one cpu to be online while TDH.SYS.LP.INIT failed on it, the kernel would need to track a cpumask of "TDX-runnable" cpus, know which task is "TDX workload" and guarantee such task can only be scheduled to "TDX-runnable" cpus. For example, the kernel would need to reject in sched_setaffinity() if the userspace tries to bind TDX task to any "non-TDX-runnable" cpu. To guarantee all online cpus are "TDX-runnable", disable the CPU hotplug during module initialization and do TDH.SYS.LP.INIT for all online cpus before any further steps of module initialization. In CPU hotplug, do TDH.SYS.LP.INIT when TDX has been enabled in the CPU online callback and reject to online the cpu if the SEAMCALL fails. Currently only KVM handles VMXON. Similar to tdx_enable(), only provide a new helper tdx_cpu_online() but make KVM itself responsible for doing VMXON and calling tdx_cpu_online() in its own CPU online callback. Note tdx_enable() can be called multiple times by KVM because KVM module can be unloaded and reloaded. New cpus may become online while KVM is unloaded, and in this case TDH.SYS.LP.INIT won't be called for those new online cpus because KVM's CPU online callback is removed when KVM is unloaded. To make sure all online cpus are "TDX-runnable", always do the per-cpu initialization for all online cpus in tdx_enable() even the module has been initialized. Similar to the per-cpu module initialization, a later step to config the key for the global KeyID needs to call some SEAMCALL on one cpu for each CPU package. The difference is that SEAMCALL cannot run in parallel on different cpus but TDH.SYS.LP.INIT can. To avoid duplicated code, add a helper to call SEAMCALL on all online cpus one by one but with a skip function to check whether to skip certain cpus, and use that helper to do the per-cpu initialization. Signed-off-by: Kai Huang --- v8 -> v9: - Added this patch back. - Handled the relaxed new behaviour of TDH.SYS.LP.INIT --- arch/x86/include/asm/tdx.h | 2 + arch/x86/virt/vmx/tdx/tdx.c | 210 +++++++++++++++++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.h | 1 + 3 files changed, 208 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 5c5ecfddb15b..2b2efaa4bc0e 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -107,9 +107,11 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, #ifdef CONFIG_INTEL_TDX_HOST bool platform_tdx_enabled(void); int tdx_enable(void); +int tdx_cpu_online(unsigned int cpu); #else /* !CONFIG_INTEL_TDX_HOST */ static inline bool platform_tdx_enabled(void) { return false; } static inline int tdx_enable(void) { return -EINVAL; } +static inline int tdx_cpu_online(unsigned int cpu) { return 0; } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 79cee28c51b5..23b2db28726f 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include #include @@ -26,6 +28,10 @@ static enum tdx_module_status_t tdx_module_status; /* Prevent concurrent attempts on TDX module initialization */ static DEFINE_MUTEX(tdx_module_lock); +/* TDX-runnable cpus. Protected by cpu_hotplug_lock. */ +static cpumask_t __cpu_tdx_mask; +static cpumask_t *cpu_tdx_mask = &__cpu_tdx_mask; + /* * Use tdx_global_keyid to indicate that TDX is uninitialized. * This is used in TDX initialization error paths to take it from @@ -170,6 +176,63 @@ static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, return ret; } +/* + * Call @func on all online cpus one by one but skip those cpus + * when @skip_func is valid and returns true for them. + */ +static int tdx_on_each_cpu_cond(int (*func)(void *), void *func_data, + bool (*skip_func)(int cpu, void *), + void *skip_data) +{ + int cpu; + + for_each_online_cpu(cpu) { + int ret; + + if (skip_func && skip_func(cpu, skip_data)) + continue; + + /* + * SEAMCALL can be time consuming. Call the @func on + * remote cpu via smp_call_on_cpu() instead of + * smp_call_function_single() to avoid busy waiting. + */ + ret = smp_call_on_cpu(cpu, func, func_data, true); + if (ret) + return ret; + } + + return 0; +} + +static int seamcall_lp_init(void) +{ + /* All '0's are just unused parameters */ + return seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL); +} + +static int smp_func_module_lp_init(void *data) +{ + int ret, cpu = smp_processor_id(); + + ret = seamcall_lp_init(); + if (!ret) + cpumask_set_cpu(cpu, cpu_tdx_mask); + + return ret; +} + +static bool skip_func_module_lp_init_done(int cpu, void *data) +{ + return cpumask_test_cpu(cpu, cpu_tdx_mask); +} + +static int module_lp_init_online_cpus(void) +{ + return tdx_on_each_cpu_cond(smp_func_module_lp_init, NULL, + skip_func_module_lp_init_done, NULL); +} + static int init_tdx_module(void) { int ret; @@ -182,10 +245,26 @@ static int init_tdx_module(void) if (ret) return ret; + /* + * TDX module per-cpu initialization SEAMCALL must be done on + * one cpu before any other SEAMCALLs can be made on that cpu, + * including those involved in further steps to initialize the + * TDX module. + * + * To make sure further SEAMCALLs can be done successfully w/o + * having to consider preemption, disable CPU hotplug during + * rest of module initialization and do per-cpu initialization + * for all online cpus. + */ + cpus_read_lock(); + + ret = module_lp_init_online_cpus(); + if (ret) + goto out; + /* * TODO: * - * - TDX module per-cpu initialization. * - Get TDX module information and TDX-capable memory regions. * - Build the list of TDX-usable memory regions. * - Construct a list of "TD Memory Regions" (TDMRs) to cover @@ -196,7 +275,17 @@ static int init_tdx_module(void) * * Return error before all steps are done. */ - return -EINVAL; + ret = -EINVAL; +out: + /* + * Clear @cpu_tdx_mask if module initialization fails before + * CPU hotplug is re-enabled. tdx_cpu_online() uses it to check + * whether the initialization has been successful or not. + */ + if (ret) + cpumask_clear(cpu_tdx_mask); + cpus_read_unlock(); + return ret; } static int __tdx_enable(void) @@ -220,13 +309,72 @@ static int __tdx_enable(void) return 0; } +/* + * Disable TDX module after it has been initialized successfully. + */ +static void disable_tdx_module(void) +{ + /* + * TODO: module clean up in reverse to steps in + * init_tdx_module(). Remove this comment after + * all steps are done. + */ + cpumask_clear(cpu_tdx_mask); +} + +static int tdx_module_init_online_cpus(void) +{ + int ret; + + /* + * Make sure no cpu can become online to prevent + * race against tdx_cpu_online(). + */ + cpus_read_lock(); + + /* + * Do per-cpu initialization for any new online cpus. + * If any fails, disable TDX. + */ + ret = module_lp_init_online_cpus(); + if (ret) + disable_tdx_module(); + + cpus_read_unlock(); + + return ret; + +} +static int __tdx_enable_online_cpus(void) +{ + if (tdx_module_init_online_cpus()) { + /* + * SEAMCALL failure has already printed + * meaningful error message. + */ + tdx_module_status = TDX_MODULE_ERROR; + + /* + * Just return one universal error code. + * For now the caller cannot recover anyway. + */ + return -EINVAL; + } + + return 0; +} + /** * tdx_enable - Enable TDX to be ready to run TDX guests * * Initialize the TDX module to enable TDX. After this function, the TDX - * module is ready to create and run TDX guests. + * module is ready to create and run TDX guests on all online cpus. + * + * This function internally calls cpus_read_lock()/unlock() to prevent + * any cpu from going online and offline. * * This function assumes all online cpus are already in VMX operation. + * * This function can be called in parallel by multiple callers. * * Return 0 if TDX is enabled successfully, otherwise error. @@ -247,8 +395,17 @@ int tdx_enable(void) ret = __tdx_enable(); break; case TDX_MODULE_INITIALIZED: - /* Already initialized, great, tell the caller. */ - ret = 0; + /* + * The previous call of __tdx_enable() may only have + * initialized part of present cpus during module + * initialization, and new cpus may have become online + * since then. + * + * To make sure all online cpus are TDX-runnable, always + * do per-cpu initialization for all online cpus here + * even the module has been initialized. + */ + ret = __tdx_enable_online_cpus(); break; default: /* Failed to initialize in the previous attempts */ @@ -261,3 +418,46 @@ int tdx_enable(void) return ret; } EXPORT_SYMBOL_GPL(tdx_enable); + +/** + * tdx_cpu_online - Enable TDX on a hotplugged local cpu + * + * @cpu: the cpu to be brought up. + * + * Do TDX module per-cpu initialization for a hotplugged cpu to make + * it TDX-runnable. All online cpus are initialized during module + * initialization. + * + * This function must be called from CPU hotplug callback which holds + * write lock of cpu_hotplug_lock. + * + * This function assumes local cpu is already in VMX operation. + */ +int tdx_cpu_online(unsigned int cpu) +{ + int ret; + + /* + * @cpu_tdx_mask is updated in tdx_enable() and is protected + * by cpus_read_lock()/unlock(). If it is empty, TDX module + * either hasn't been initialized, or TDX didn't get enabled + * successfully. + * + * In either case, do nothing but return success. + */ + if (cpumask_empty(cpu_tdx_mask)) + return 0; + + WARN_ON_ONCE(cpu != smp_processor_id()); + + /* Already done */ + if (cpumask_test_cpu(cpu, cpu_tdx_mask)) + return 0; + + ret = seamcall_lp_init(); + if (!ret) + cpumask_set_cpu(cpu, cpu_tdx_mask); + + return ret; +} +EXPORT_SYMBOL_GPL(tdx_cpu_online); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 55472e085bc8..30413d7fbee8 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -15,6 +15,7 @@ * TDX module SEAMCALL leaf functions */ #define TDH_SYS_INIT 33 +#define TDH_SYS_LP_INIT 35 /* Kernel defined TDX module status during module initialization. */ enum tdx_module_status_t {