From patchwork Thu Aug 3 04:27:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339442 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D7A8C41513 for ; Thu, 3 Aug 2023 07:36:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233610AbjHCHgy (ORCPT ); Thu, 3 Aug 2023 03:36:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231347AbjHCHgE (ORCPT ); Thu, 3 Aug 2023 03:36:04 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC50949CE; Thu, 3 Aug 2023 00:32:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047935; x=1722583935; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2VcvAH9Ef8/oEOrKUSGm3DD0fM2O3Fx+1zW9XjFjm/A=; b=APMXlypXgeICWmF4z8QTqMUJK6juj8K0tV8lm459Gm1GdBiqVZKvQIdt Cj7FCTo0CJ7lO/fiH+YPQVGSUdojlV3KekjbZGVJuu3RfqYO3/penO7Eq X8vtO4YagzvZ7lfu8Jxctt9c6zxz3UCPptkOfR2iPQhSYQCve4DGwLBeR 8acXyM/NJuliTFK7Sobdr13jMXTWlQ4YDp7DlCrX7BycfGlE4cZYRkmHT y56W1Ss0IFdUBri3jGopgCDeeX0cDTBtRJUn5jiL1oTQUbiOiCH3dKxiD 646ru8HlOPDBLzGJlNBUK9/Z/eAXE9qswlEcIAeAw1BaZloUPX8Y0ZJs3 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708073" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708073" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888464" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888464" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:14 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Yu-cheng Yu , Borislav Petkov , Kees Cook , Mike Rapoport , Pengfei Xu Subject: [PATCH v5 01/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Thu, 3 Aug 2023 00:27:14 -0400 Message-Id: <20230803042732.88515-2-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Rick Edgecombe The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-9-rick.p.edgecombe%40intel.com --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 31c862d79fae..8ea5c290259c 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -308,6 +308,7 @@ #define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ +#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ @@ -380,6 +381,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* "" Shadow stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index fafe9be7a6f4..b9c7eae2e70f 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define DISABLE_USER_SHSTK 0 +#else +#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -120,7 +126,7 @@ #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ - DISABLE_CALL_DEPTH_TRACKING) + DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK) #define DISABLED_MASK12 (DISABLE_LAM) #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index f6748c8bd647..e462c1d3800a 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Thu Aug 3 04:27:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 270E4C0015E for ; Thu, 3 Aug 2023 07:36:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234007AbjHCHg4 (ORCPT ); Thu, 3 Aug 2023 03:36:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231382AbjHCHgF (ORCPT ); Thu, 3 Aug 2023 03:36:05 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6798B49D1; Thu, 3 Aug 2023 00:32:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047936; x=1722583936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dBGogkB5XYPTtr6lo9fhHibu/qwkmuDYT0Fu9ffXzZE=; b=broEp11K79Ud/NfouuB2FNU3CcDi7BisFMD6dVt08YvKZrLSLdxbXcH3 ifAde6OeNhkm3wiSC1PqymxO40CgZs2PnhP+aHFK6h5aqIg6z7DQjeW2z p2X4h+gWsgeDjPd8tIITNk6Z6uvptuPJX0K8kwPtPNO6AqdSgQLPEr9Vj ZTxfbwG4g3MdqzlRET6eAUx5gp07eUCx1m4THAfyxvDGodNx0M1i5VG1X YoUzY0gC/FqMXAhiR3PYNffH9Qd03LvIjrSZ9kT9YHJo+9OiYpjmWCC7N 396Ii/2ywMBajDF947nThedM/6boYhlSmypEt8KArltFg8mvSKZbXOPXh A==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708082" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708082" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888467" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888467" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:14 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Yu-cheng Yu , Borislav Petkov , Kees Cook , Mike Rapoport , Pengfei Xu Subject: [PATCH v5 02/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Thu, 3 Aug 2023 00:27:15 -0400 Message-Id: <20230803042732.88515-3-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Rick Edgecombe Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about unimplemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state is a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-25-rick.p.edgecombe%40intel.com --- arch/x86/include/asm/fpu/types.h | 16 +++++- arch/x86/include/asm/fpu/xstate.h | 6 ++- arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++---------------- 3 files changed, 61 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 7f6d858ff47a..eb810074f1e7 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,16 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + /* user control-flow settings */ + u64 user_cet; + /* user shadow stack pointer */ + u64 user_ssp; +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 0bab497c9436..4fa4751912d9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , + "x87 floating point registers", + "SSE registers", + "AVX registers", + "MPX bounds registers", + "MPX CSR", + "AVX-512 opmask", + "AVX-512 Hi256", + "AVX-512 ZMM_Hi256", + "Processor Trace (unused)", "Protection Keys User registers", "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "Control-flow User registers", + "Control-flow Kernel registers (unused)", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config", + "AMX Tile data", + "unknown xstate feature", }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ +#define XCHECK_SZ(sz, nr, __struct) ({ \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "[%s]: struct is %zu bytes, cpu state %d bytes\n", \ + xfeature_names[nr], sizeof(__struct), sz)) { \ __xstate_dump_leaves(); \ } \ -} while (0) + true; \ +}) + /** * check_xtile_data_against_struct - Check tile data state size. @@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); - - /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) - check_xtile_data_against_struct(sz); - - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + switch (nr) { + case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct); + case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state); + case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state); + case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state); + case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state); + case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state); + case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state); + case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); + case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); + case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; + default: XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr); return false; } + return true; } From patchwork Thu Aug 3 04:27:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339444 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 653BDC0015E for ; Thu, 3 Aug 2023 07:37:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231876AbjHCHhA (ORCPT ); Thu, 3 Aug 2023 03:37:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231490AbjHCHgG (ORCPT ); Thu, 3 Aug 2023 03:36:06 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E0B049D3; Thu, 3 Aug 2023 00:32:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047937; x=1722583937; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MUjDZ0YuHI5m2zqdXy/bAW5fa+u5BTJA3szPadv1RM0=; b=CqywbmOm5KXfXtJiq08pqISrrdTUPjhqUn9CPgyTRHFoTLQSBO092isI iP4doA23BdTIiVqhLX2O/Vpfy0Jn0po51dh1v5kKLIkzFUxuwIM9Sn/Pd z6zYwKa+NpETG/weYJqb/10iCVfzOWTVZ8vUGiDX0Io/4q8cvAw4I8Ev2 vpBym1Z71HR5TpwqgY6cLD4vhjSNqhsRCYyJxkGSbZTQ809iwPgDF0XBv Mmnv52jAoWxsshvYPAICRN7PcHKAGCJGJ2gkRjiY18p9vMfe9XAlQWl08 l/4WCrzT49M6Ej0tQzL0iJsXtgZaRCQRp/8ukSZrJ8YExEVo3Yu0r4jM8 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708089" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708089" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888471" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888471" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:14 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 03/19] KVM:x86: Report XSS as to-be-saved if there are supported features Date: Thu, 3 Aug 2023 00:27:16 -0400 Message-Id: <20230803042732.88515-4-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Add MSR_IA32_XSS to the list of MSRs reported to userspace if supported_xss is non-zero, i.e. KVM supports at least one XSS based feature. Signed-off-by: Sean Christopherson Signed-off-by: Yang Weijiang --- arch/x86/kvm/x86.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a6b9bea62fb8..0b9033551d8c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1451,6 +1451,7 @@ static const u32 msrs_to_save_base[] = { MSR_IA32_UMWAIT_CONTROL, MSR_IA32_XFD, MSR_IA32_XFD_ERR, + MSR_IA32_XSS, }; static const u32 msrs_to_save_pmu[] = { @@ -7172,6 +7173,10 @@ static void kvm_probe_msr_to_save(u32 msr_index) if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) return; break; + case MSR_IA32_XSS: + if (!kvm_caps.supported_xss) + return; + break; default: break; } From patchwork Thu Aug 3 04:27:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0226C04A6A for ; Thu, 3 Aug 2023 07:37:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231184AbjHCHhC (ORCPT ); Thu, 3 Aug 2023 03:37:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231743AbjHCHgG (ORCPT ); Thu, 3 Aug 2023 03:36:06 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3484249D4; Thu, 3 Aug 2023 00:32:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047937; x=1722583937; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IVhHfi1eWY88vEuzBQ7T04hxto/MpvhMpx9t9R3UaJI=; b=FL3yqIDhm7iZRznFs/SmjEeSmfRy9ViQ0GgjsnYa/sWfRlut/gxTrJwL 80eEys8Fl0f/Lup0c/laHY3jYFTJi0a4CvGPrtaa73AoJ8xn/V72II2mJ Rk47Gngs1r3gUMXbeSo/fcqjV060b7x8KIWXZfo7CXoV2RX6jtsf27feG 6lB5WSjDOjhTKWWl1YZt+oVEKEegoyq+Npfr+xoQ2+JwrLpJyDRwpcWUz gTJE3hbQTid8NXOIwkWPFQxPQ9/dPkaqHX5KgiHWztSniZLNHiSgelo1t mjSnbqkXjc+hmGipSC4RjdM2dhb/Nee20JbGt4CsAg44749ryDEyFMMbb Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708099" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708099" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888475" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888475" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:15 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Zhang Yi Z Subject: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Date: Thu, 3 Aug 2023 00:27:17 -0400 Message-Id: <20230803042732.88515-5-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified. CPUID(EAX=0DH,ECX=1).EBX reports required storage size of all enabled xstate features in XCR0 | XSS. Guest can allocate sufficient xsave buffer based on the info. Note, KVM does not yet support any XSS based features, i.e. supported_xss is guaranteed to be zero at this time. Co-developed-by: Zhang Yi Z Signed-off-by: Zhang Yi Z Signed-off-by: Yang Weijiang Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c | 20 ++++++++++++++++++-- arch/x86/kvm/x86.c | 8 +++++--- 3 files changed, 24 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 28bd38303d70..20bbcd95511f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -804,6 +804,7 @@ struct kvm_vcpu_arch { u64 xcr0; u64 guest_supported_xcr0; + u64 guest_supported_xss; struct kvm_pio_request pio; void *pio_data; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 7f4d13383cf2..0338316b827c 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -249,6 +249,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent) return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0; } +static u64 cpuid_get_supported_xss(struct kvm_cpuid_entry2 *entries, int nent) +{ + struct kvm_cpuid_entry2 *best; + + best = cpuid_entry2_find(entries, nent, 0xd, 1); + if (!best) + return 0; + + return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss; +} + static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries, int nent) { @@ -276,8 +287,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e best = cpuid_entry2_find(entries, nent, 0xD, 1); if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) || - cpuid_entry_has(best, X86_FEATURE_XSAVEC))) - best->ebx = xstate_required_size(vcpu->arch.xcr0, true); + cpuid_entry_has(best, X86_FEATURE_XSAVEC))) { + u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss; + + best->ebx = xstate_required_size(xstate, true); + } best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent); if (kvm_hlt_in_guest(vcpu->kvm) && best && @@ -325,6 +339,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); + vcpu->arch.guest_supported_xss = + cpuid_get_supported_xss(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); /* * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0b9033551d8c..5d6d6fa33e5b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3780,10 +3780,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than * XSAVES/XRSTORS to save/restore PT MSRs. */ - if (data & ~kvm_caps.supported_xss) + if (data & ~vcpu->arch.guest_supported_xss) return 1; - vcpu->arch.ia32_xss = data; - kvm_update_cpuid_runtime(vcpu); + if (vcpu->arch.ia32_xss != data) { + vcpu->arch.ia32_xss = data; + kvm_update_cpuid_runtime(vcpu); + } break; case MSR_SMI_COUNT: if (!msr_info->host_initiated) From patchwork Thu Aug 3 04:27:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339446 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51038C0015E for ; Thu, 3 Aug 2023 07:37:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234032AbjHCHhF (ORCPT ); Thu, 3 Aug 2023 03:37:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231895AbjHCHgG (ORCPT ); Thu, 3 Aug 2023 03:36:06 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1DAE49D6; Thu, 3 Aug 2023 00:32:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047938; x=1722583938; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iob2WlcaBJVHzzN0CVpB4q36QOk92fLy0+YqXdTOxuI=; b=GUkFSN0vSY6S9N7VrY9a9L83bLJqULQzbkZ19i06HBfPahd8xgpDgAcE HP8OJiHviPJ2l463ruiGPu0jslboVn7VWVR9sRXLLTl5/qp9BIuMiR0xG PuSAbUkZH6AHiIPm7AAu07hqq89XZ/aziScj9xPzWSTOJxbrpcONB0RvE phWx5qnjGSB6+XJZKejnLBvfhQUGaE5KWCGOJlTD0AiuxDmqWS196gx0g E7bvNYa0puivYRqfZthDs710DdHtCpMxDxNwD6YpSrU8scjhpaYt8fRVY IEQ20C98adMyf8N96IfWxzYlUu6s2siOyaDCC8fjjpcPLlzFC2OZoVFqX w==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708104" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708104" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888479" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888479" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:15 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss Date: Thu, 3 Aug 2023 00:27:18 -0400 Message-Id: <20230803042732.88515-6-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Set kvm_caps.supported_xss to host_xss && KVM XSS mask. host_xss contains the host supported xstate feature bits for thread context switch, KVM_SUPPORTED_XSS includes all KVM enabled XSS feature bits, the operation result represents all KVM supported feature bits. Since the result is subset of host_xss, the related XSAVE-managed MSRs are automatically swapped for guest and host when vCPU exits to userspace. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/vmx.c | 1 - arch/x86/kvm/x86.c | 6 +++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..c8d9870cfecb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7849,7 +7849,6 @@ static __init void vmx_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_UMIP); /* CPUID 0xD.1 */ - kvm_caps.supported_xss = 0; if (!cpu_has_vmx_xsaves()) kvm_cpu_cap_clear(X86_FEATURE_XSAVES); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5d6d6fa33e5b..e9f3627d5fdd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -225,6 +225,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) +#define KVM_SUPPORTED_XSS 0 + u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); @@ -9498,8 +9500,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) rdmsrl_safe(MSR_EFER, &host_efer); - if (boot_cpu_has(X86_FEATURE_XSAVES)) + if (boot_cpu_has(X86_FEATURE_XSAVES)) { rdmsrl(MSR_IA32_XSS, host_xss); + kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS; + } kvm_init_pmu_capability(ops->pmu_ops); From patchwork Thu Aug 3 04:27:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339448 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65164C04A94 for ; Thu, 3 Aug 2023 07:37:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232452AbjHCHhM (ORCPT ); Thu, 3 Aug 2023 03:37:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231922AbjHCHgH (ORCPT ); Thu, 3 Aug 2023 03:36:07 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45F2F35BB; Thu, 3 Aug 2023 00:32:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047938; x=1722583938; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IPCrPajh7UzLAqJcOYg8gmyGcQfu+kJKJlt1LMvrhdM=; b=mm1VdVyEN0XUqDPhu/MCEgd0WnOklWAozPKebEWI8r7DI9pcs0t5Idjv ALOHnrB4U3OF6gmWYtwFopkVfeR7AXTt80dQjvmvHVutloE4uo3m68Byr se7WK3fWYEkvvqGZSUEnKkVNXGu0WyNwbssH953LUpYXuZ9uQg+IaA7EI opz+++sIT5Etc8zK7njZzTrqgYAmm3I9fLgENWWA507ErUqAeZJpXqnCs NYgs5hYO5Lm9SPsnKKwudeod5rQeC12YatQCqvp24EMupZbcl27QvHOyi oShrl2aTvFmpP+9btxP14EKfc/ey2hz7mGPY7ej96mor3Wkn67/Gqyeba A==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708109" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708109" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888483" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888483" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:15 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 06/19] KVM:x86: Load guest FPU state when access XSAVE-managed MSRs Date: Thu, 3 Aug 2023 00:27:19 -0400 Message-Id: <20230803042732.88515-7-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Load the guest's FPU state if userspace is accessing MSRs whose values are managed by XSAVES. Two MSR access helpers, i.e., kvm_{get,set}_xsave_msr(), are designed by a later patch to facilitate access to such kind of MSRs. If MSRs supported in kvm_caps.supported_xss are passed through to guest, the guest MSRs are swapped with host contents before vCPU exits to userspace and after it enters kernel again. Because the modified code is also used for the KVM_GET_MSRS device ioctl(), explicitly check @vcpu is non-null before attempting to load guest state. The XSS supporting MSRs cannot be retrieved via the device ioctl() without loading guest FPU state (which doesn't exist). Note that guest_cpuid_has() is not queried as host userspace is allowed to access MSRs that have not been exposed to the guest, e.g. it might do KVM_SET_MSRS prior to KVM_SET_CPUID2. Signed-off-by: Sean Christopherson Co-developed-by: Yang Weijiang Signed-off-by: Yang Weijiang --- arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e9f3627d5fdd..015fb0ef102c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -132,6 +132,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); static DEFINE_MUTEX(vendor_module_lock); +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); + struct kvm_x86_ops kvm_x86_ops __read_mostly; #define KVM_X86_OP(func) \ @@ -4345,6 +4348,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } EXPORT_SYMBOL_GPL(kvm_get_msr_common); +static const u32 xstate_msrs[] = { + MSR_IA32_U_CET, MSR_IA32_PL3_SSP, +}; + +static bool is_xstate_msr(u32 index) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(xstate_msrs); i++) { + if (index == xstate_msrs[i]) + return true; + } + return false; +} + /* * Read or write a bunch of msrs. All parameters are kernel addresses. * @@ -4355,11 +4373,20 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, int (*do_msr)(struct kvm_vcpu *vcpu, unsigned index, u64 *data)) { + bool fpu_loaded = false; int i; - for (i = 0; i < msrs->nmsrs; ++i) + for (i = 0; i < msrs->nmsrs; ++i) { + if (vcpu && !fpu_loaded && kvm_caps.supported_xss && + is_xstate_msr(entries[i].index)) { + kvm_load_guest_fpu(vcpu); + fpu_loaded = true; + } if (do_msr(vcpu, entries[i].index, &entries[i].data)) break; + } + if (fpu_loaded) + kvm_put_guest_fpu(vcpu); return i; } From patchwork Thu Aug 3 04:27:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42DD3C001E0 for ; Thu, 3 Aug 2023 07:37:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232733AbjHCHhJ (ORCPT ); Thu, 3 Aug 2023 03:37:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231958AbjHCHgH (ORCPT ); Thu, 3 Aug 2023 03:36:07 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64F1749D7; Thu, 3 Aug 2023 00:32:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047938; x=1722583938; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BYJCqqXlCBMi7/Rnb/mQI1paLqwceR/LZ8vb3nUNbyo=; b=BvzHyV4fZDVQvImqQRHudVhzBiqEJ2NPaMm7UX/JaE6DCCEXBi3do5ue LemdwO64OBVr/Q9gLNomwPRpEOOYZm9/SSPICla30aVH1SSrJMQmT3u1G iqVUpbwEBz72l0RzCYt2a6pGoiK0zK/wWDxvlZlazYFzu5A99tndVfPtK Y1rP7xwByeO1T3iaS6BlKdnVRb2Z4SLuwfhYmKol8viUVkO3WFegSBkU4 5naZASv1h10WlF9g1yYGL/7GXxmrSGx87fRQfmKAEi/cv1pKhnjgb88/C cu/7d/x8cURvRM5C6jSXgUzwOy7RgGLvMzFXxHXcncZf2BuJXPbOIRRGJ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708111" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708111" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888486" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888486" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 07/19] KVM:x86: Add fault checks for guest CR4.CET setting Date: Thu, 3 Aug 2023 00:27:20 -0400 Message-Id: <20230803042732.88515-8-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Check potential faults for CR4.CET setting per Intel SDM. CET can be enabled if and only if CR0.WP==1, i.e. setting CR4.CET=1 faults if CR0.WP==0 and setting CR0.WP=0 fails if CR4.CET==1. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Yang Weijiang Reviewed-by: Chao Gao --- arch/x86/kvm/x86.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 015fb0ef102c..82b9f14990da 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -993,6 +993,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE))) return 1; + if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) + return 1; + static_call(kvm_x86_set_cr0)(vcpu, cr0); kvm_post_set_cr0(vcpu, old_cr0, cr0); @@ -1204,6 +1207,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) return 1; } + if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP)) + return 1; + static_call(kvm_x86_set_cr4)(vcpu, cr4); kvm_post_set_cr4(vcpu, old_cr4, cr4); From patchwork Thu Aug 3 04:27:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BE6FC0015E for ; Thu, 3 Aug 2023 07:37:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231743AbjHCHhN (ORCPT ); Thu, 3 Aug 2023 03:37:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232059AbjHCHgI (ORCPT ); Thu, 3 Aug 2023 03:36:08 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD3A549DA; Thu, 3 Aug 2023 00:32:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047938; x=1722583938; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xmTaXd3FT5TTYs/Ul1AFf8YmKJY5fGFKnUZJiJYCczY=; b=Xpbv/LNHcAAQ6nWpDO3rVXA/ai5uQv9DAUUvrZ0zarukuZHzPPvRa+y4 d48zbqdcZ/dJphelthE1A063VmNTVGteu/sCiqaGmgp98ugvlozDG4LSl qEgMjTWHHWcqVhFMc8hNLMH2c//iAwDaK4Ud15jajf0uoqs0cBOFZdMEy bBtbPZC98iMmo5PqoyQ3bTNpfbYcIPCKDU0WkoKfnaQ79TkD4rzvQv31D 5WzWVFhINgz2UGfp5gxd3TlIf8XrkVAcqQhM/yk4Lo1s6rBnMFfaGY9qg c7t8vhpTNO1ryVLW414nR+J/P6LoOVhnte4DwtcuA3dRIGtni0qC9U0u/ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708116" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708116" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888489" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888489" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved Date: Thu, 3 Aug 2023 00:27:21 -0400 Message-Id: <20230803042732.88515-9-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add all CET MSRs including the synthesized GUEST_SSP to report list. PL{0,1,2}_SSP are independent to host XSAVE management with later patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP are not XSAVE-managed. When CET IBT/SHSTK are enumerated to guest, both user and supervisor modes should be supported for architechtural integrity, i.e., two modes are supported as both or neither. Signed-off-by: Yang Weijiang --- arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/x86.c | 10 ++++++++++ arch/x86/kvm/x86.h | 10 ++++++++++ 3 files changed, 21 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 6e64b27b2c1e..7af465e4e0bd 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -58,6 +58,7 @@ #define MSR_KVM_ASYNC_PF_INT 0x4b564d06 #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 #define MSR_KVM_MIGRATION_CONTROL 0x4b564d08 +#define MSR_KVM_GUEST_SSP 0x4b564d09 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 82b9f14990da..d68ef87fe007 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1463,6 +1463,9 @@ static const u32 msrs_to_save_base[] = { MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS, + MSR_IA32_U_CET, MSR_IA32_S_CET, + MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP, + MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP, }; static const u32 msrs_to_save_pmu[] = { @@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index) if (!kvm_caps.supported_xss) return; break; + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + if (!kvm_is_cet_supported()) + return; + break; default: break; } diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 82e3dafc5453..6e6292915f8c 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void) == (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR); } +#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER) +/* + * Shadow Stack and Indirect Branch Tracking feature enabling depends on + * whether host side CET user xstate bit is supported or not. + */ +static inline bool kvm_is_cet_supported(void) +{ + return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK; +} + extern unsigned int min_timer_period_us; extern bool enable_vmware_backdoor; From patchwork Thu Aug 3 04:27:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339450 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE75AC001E0 for ; Thu, 3 Aug 2023 07:37:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234056AbjHCHhP (ORCPT ); Thu, 3 Aug 2023 03:37:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232080AbjHCHgI (ORCPT ); Thu, 3 Aug 2023 03:36:08 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 304AE35B5; Thu, 3 Aug 2023 00:32:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047939; x=1722583939; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MtGa0CIT6TMKrMJCbtFW+RrI6aWHvtFB1pwFF6IjLuc=; b=hD+szM+yT8EoXcB+T4cyYV68AVRuOHBSZ/+4RcJRTw0Eos1tkN+L2LaQ UXmO/FQzdy3ttp86SyXt4kNftJWx+kWQiKTDNwOKM21VrrbpTH7JOOij8 ClkgBr3BjOdU2zUIigeAIXFPZk/MOjXzOEx1ofujKAmXse8Mn8tjMLyVX SPY2KlocQR4a4Aso5yDdWfK/z31uFymU5hJ7ZmDGigT8+rz5Mw8DpHFtR JeprOO51G84qgEXcRt9MunV/LgD/0cYG3lSdRU++nBCG/nXfvBc8aNM4x yMi9McEST4df+O1byy3S0E5o/s4E2u6WKibP5kPEAN64J7PcYfB7gFStB w==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708123" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708123" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888492" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888492" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed Date: Thu, 3 Aug 2023 00:27:22 -0400 Message-Id: <20230803042732.88515-10-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Save guest CET supervisor states, i.e.,PL{0,1,2}_SSP, when vCPU is exiting to userspace or being preempted. Reload the MSRs before vm-entry. Embeded the helpers in {vmx,svm}_prepare_switch_to_guest() and vmx_prepare_switch_to_host()/svm_prepare_host_switch() to employ existing guest state management and optimize the invocation of the helpers. Enabling CET supervisor state management in KVM due to: -Introducing unnecessary XSAVE operation when switch to non-vCPU userspace within current FPU framework. -Forcing allocating additional space for CET supervisor states in each thread context regardless whether it's vCPU thread or not. Suggested-by: Chao Gao Signed-off-by: Yang Weijiang Reviewed-by: Chao Gao Signed-off-by: Yang Weijiang --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.h | 11 +++++++++++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 2 ++ arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++++ arch/x86/kvm/x86.h | 3 +++ 6 files changed, 46 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 20bbcd95511f..69cbc9d9b277 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -805,6 +805,7 @@ struct kvm_vcpu_arch { u64 xcr0; u64 guest_supported_xcr0; u64 guest_supported_xss; + u64 cet_s_ssp[3]; struct kvm_pio_request pio; void *pio_data; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index b1658c0de847..b221a663de4c 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -232,4 +232,15 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu, return vcpu->arch.pv_cpuid.features & (1u << kvm_feature); } +/* + * FIXME: When the "KVM-governed" enabling patchset is merge, rebase this + * series on top of that and add a new patch for CET to replace this helper + * with the qualified one. + */ +static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu, + unsigned int feature) +{ + return kvm_cpu_cap_has(feature) && guest_cpuid_has(vcpu, feature); +} + #endif diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 1bc0936bbd51..8652e86fbfb2 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1515,11 +1515,13 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu) if (likely(tsc_aux_uret_slot >= 0)) kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull); + reload_cet_supervisor_ssp(vcpu); svm->guest_state_loaded = true; } static void svm_prepare_host_switch(struct kvm_vcpu *vcpu) { + save_cet_supervisor_ssp(vcpu); to_svm(vcpu)->guest_state_loaded = false; } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c8d9870cfecb..6aa76124e81e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1323,6 +1323,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) gs_base = segment_base(gs_sel); #endif + reload_cet_supervisor_ssp(vcpu); vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base); vmx->guest_state_loaded = true; } @@ -1362,6 +1363,7 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx) wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base); #endif load_fixmap_gdt(raw_smp_processor_id()); + save_cet_supervisor_ssp(&vmx->vcpu); vmx->guest_state_loaded = false; vmx->guest_uret_msrs_loaded = false; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d68ef87fe007..5b63441fd2d2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11128,6 +11128,31 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) trace_kvm_fpu(0); } +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu) +{ + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) { + rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]); + rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]); + rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]); + /* + * Omit reset to host PL{1,2}_SSP because Linux will never use + * these MSRs. + */ + wrmsrl(MSR_IA32_PL0_SSP, 0); + } +} +EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp); + +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu) +{ + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) { + wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]); + wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]); + wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]); + } +} +EXPORT_SYMBOL_GPL(reload_cet_supervisor_ssp); + int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { struct kvm_queued_exception *ex = &vcpu->arch.exception; @@ -12133,6 +12158,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vcpu->arch.cr3 = 0; kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); + memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp)); /* * CR0.CD/NW are set on RESET, preserved on INIT. Note, some versions @@ -12313,6 +12339,7 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) pmu->need_cleanup = true; kvm_make_request(KVM_REQ_PMU, vcpu); } + static_call(kvm_x86_sched_in)(vcpu, cpu); } diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 6e6292915f8c..c69fc027f5ec 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -501,6 +501,9 @@ static inline void kvm_machine_check(void) void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu); void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu); +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu); +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu); + int kvm_spec_ctrl_test_value(u64 value); bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4); int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r, From patchwork Thu Aug 3 04:27:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16CEEC04A94 for ; Thu, 3 Aug 2023 07:37:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231269AbjHCHhQ (ORCPT ); Thu, 3 Aug 2023 03:37:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232107AbjHCHgI (ORCPT ); Thu, 3 Aug 2023 03:36:08 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3180949DB; Thu, 3 Aug 2023 00:32:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047939; x=1722583939; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jblDujTjMAcBZhcNAatHSwTO0iV4izqOxmpVVXZfkb8=; b=KCfZecKHvLL8UZHXHy6TD0NRG7VzqFcCxvUMPYxdvxJFAvKsLTfJgxiA d48e1qT58fIozIoiF4WCUB4VdamrpVZQBy/dF8z55+DHPKhzrn0/xQ16I XsnFpmydK9eDOTeDi3x4rJu01QO5XtiJAHFAAfQRgf6s0CO5hwFR4VUuX +wagmHNL+bh1CEqhnWeeotc8ZmYPVFBbjjj/emYjXv6fHlpH+r7b33e0O Vgqpw2NFNAjsJ4LNt+8HnQJGqRfTU0wr1HuZdw/BjfPFtTtKsE7P7wRDS nIWAp0s4PHLBlVvNbSxWvBoYB9IAVwblK4Ki/SliYhufzlh0nt5IzBBja w==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708127" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708127" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888495" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888495" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:16 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Zhang Yi Z Subject: [PATCH v5 10/19] KVM:VMX: Introduce CET VMCS fields and control bits Date: Thu, 3 Aug 2023 00:27:23 -0400 Message-Id: <20230803042732.88515-11-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Control-flow Enforcement Technology (CET) is a kind of CPU feature used to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks. It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP style control-flow subversion attacks. Shadow Stack (SHSTK): A shadow stack is a second stack used exclusively for control transfer operations. The shadow stack is separate from the data/normal stack and can be enabled individually in user and kernel mode. When shadow stack is enabled, CALL pushes the return address on both the data and shadow stack. RET pops the return address from both stacks and compares them. If the return addresses from the two stacks do not match, the processor generates a #CP. Indirect Branch Tracking (IBT): IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of indirect branches (CALL, JMP etc...). If an indirect branch is executed and the next instruction is _not_ an ENDBRANCH, the processor generates a #CP. These instruction behaves as a NOP on platforms that doesn't support CET. Several new CET MSRs are defined to support CET: MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} mode respectively. MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}. MSR_IA32_INT_SSP_TAB: Linear address of SHSTK table,the entry is indexed by IST of interrupt gate desc. Two XSAVES state bits are introduced for CET: IA32_XSS:[bit 11]: Control saving/restoring user mode CET states IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states. Six VMCS fields are introduced for CET: {HOST,GUEST}_S_CET: Stores CET settings for kernel mode. {HOST,GUEST}_SSP: Stores shadow stack pointer of current active task/thread. {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB. On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY control fields: If VM_EXIT_LOAD_CET_STATE = 1, the host CET states are restored from the following VMCS fields at VM-Exit: HOST_S_CET HOST_SSP HOST_INTR_SSP_TABLE If VM_ENTRY_LOAD_CET_STATE = 1, the guest CET states are loaded from the following VMCS fields at VM-Entry: GUEST_S_CET GUEST_SSP GUEST_INTR_SSP_TABLE Co-developed-by: Zhang Yi Z Signed-off-by: Zhang Yi Z Signed-off-by: Yang Weijiang Reviewed-by: Chao Gao --- arch/x86/include/asm/vmx.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0d02c4aafa6f..db7f93307349 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -104,6 +104,7 @@ #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 +#define VM_EXIT_LOAD_CET_STATE 0x10000000 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff @@ -117,6 +118,7 @@ #define VM_ENTRY_LOAD_BNDCFGS 0x00010000 #define VM_ENTRY_PT_CONCEAL_PIP 0x00020000 #define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000 +#define VM_ENTRY_LOAD_CET_STATE 0x00100000 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff @@ -345,6 +347,9 @@ enum vmcs_field { GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, GUEST_SYSENTER_ESP = 0x00006824, GUEST_SYSENTER_EIP = 0x00006826, + GUEST_S_CET = 0x00006828, + GUEST_SSP = 0x0000682a, + GUEST_INTR_SSP_TABLE = 0x0000682c, HOST_CR0 = 0x00006c00, HOST_CR3 = 0x00006c02, HOST_CR4 = 0x00006c04, @@ -357,6 +362,9 @@ enum vmcs_field { HOST_IA32_SYSENTER_EIP = 0x00006c12, HOST_RSP = 0x00006c14, HOST_RIP = 0x00006c16, + HOST_S_CET = 0x00006c18, + HOST_SSP = 0x00006c1a, + HOST_INTR_SSP_TABLE = 0x00006c1c }; /* From patchwork Thu Aug 3 04:27:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FA54C0015E for ; Thu, 3 Aug 2023 07:37:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234074AbjHCHhT (ORCPT ); Thu, 3 Aug 2023 03:37:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232135AbjHCHgI (ORCPT ); Thu, 3 Aug 2023 03:36:08 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2CED49DC; Thu, 3 Aug 2023 00:32:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047939; x=1722583939; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OgSkvhBH4iL29jcbWkPSDYBp4zt8Yn9ghWhyMH+JnNE=; b=D7JBWkxJWfiHjzleKPkEXKfva5AytdY006lVXgY19cLJw3hWiot6V+s2 3XPusnU8aLM9Ejsk2de3hd0Cz0HJ3oIRM/tuBEzPr6ZXrmhzN/9HYCxJt anKl7M02czcLjebQz+R8wC67OPH3QI0aysonOoRrGGqHl4SpocqWtuUs9 NiLBuDRHiBos6tNeBObypjn0TGg1VoDR2lhDRbKLS4MRub2/wqV6u17hN RprRV81L6k69/mNqC5tVyMuGtXesPGLNuNnWvyWYgvppMohLcQBcrLZQR jUUo5G60jO939lAQOjn4YQ9vYaovsQV0ebq/j9bKt89o2FiNQHaL12+WC Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708132" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708132" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888499" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888499" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs Date: Thu, 3 Aug 2023 00:27:24 -0400 Message-Id: <20230803042732.88515-12-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add emulation interface for CET MSR read and write. The emulation code is split into common part and vendor specific part, the former resides in x86.c to benefic different x86 CPU vendors, the latter for VMX is implemented in this patch. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/vmx.c | 27 +++++++++++ arch/x86/kvm/x86.c | 104 +++++++++++++++++++++++++++++++++++++---- arch/x86/kvm/x86.h | 18 +++++++ 3 files changed, 141 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6aa76124e81e..ccf750e79608 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) else msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; break; + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_INT_SSP_TAB: + if (kvm_get_msr_common(vcpu, msr_info)) + return 1; + if (msr_info->index == MSR_KVM_GUEST_SSP) + msr_info->data = vmcs_readl(GUEST_SSP); + else if (msr_info->index == MSR_IA32_S_CET) + msr_info->data = vmcs_readl(GUEST_S_CET); + else if (msr_info->index == MSR_IA32_INT_SSP_TAB) + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE); + break; case MSR_IA32_DEBUGCTLMSR: msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); break; @@ -2404,6 +2416,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) else vmx->pt_desc.guest.addr_a[index / 2] = data; break; + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_INT_SSP_TAB: + if (kvm_set_msr_common(vcpu, msr_info)) + return 1; + if (msr_index == MSR_KVM_GUEST_SSP) + vmcs_writel(GUEST_SSP, data); + else if (msr_index == MSR_IA32_S_CET) + vmcs_writel(GUEST_S_CET, data); + else if (msr_index == MSR_IA32_INT_SSP_TAB) + vmcs_writel(GUEST_INTR_SSP_TABLE, data); + break; case MSR_IA32_PERF_CAPABILITIES: if (data && !vcpu_to_pmu(vcpu)->version) return 1; @@ -4864,6 +4888,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmcs_write64(GUEST_BNDCFGS, 0); vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */ + vmcs_writel(GUEST_SSP, 0); + vmcs_writel(GUEST_S_CET, 0); + vmcs_writel(GUEST_INTR_SSP_TABLE, 0); kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5b63441fd2d2..98f3ff6078e6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3627,6 +3627,39 @@ static bool kvm_is_msr_to_save(u32 msr_index) return false; } +static inline bool is_shadow_stack_msr(u32 msr) +{ + return msr == MSR_IA32_PL0_SSP || + msr == MSR_IA32_PL1_SSP || + msr == MSR_IA32_PL2_SSP || + msr == MSR_IA32_PL3_SSP || + msr == MSR_IA32_INT_SSP_TAB || + msr == MSR_KVM_GUEST_SSP; +} + +static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, + struct msr_data *msr) +{ + if (is_shadow_stack_msr(msr->index)) { + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK)) + return false; + + if (msr->index == MSR_KVM_GUEST_SSP) + return msr->host_initiated; + + return msr->host_initiated || + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); + } + + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && + !kvm_cpu_cap_has(X86_FEATURE_IBT)) + return false; + + return msr->host_initiated || + guest_cpuid_has(vcpu, X86_FEATURE_IBT) || + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); +} + int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { u32 msr = msr_info->index; @@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu->arch.guest_fpu.xfd_err = data; break; #endif +#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR) +#define CET_CTRL_RESERVED_BITS GENMASK(9, 6) +#define CET_SHSTK_MASK_BITS GENMASK(1, 0) +#define CET_IBT_MASK_BITS (GENMASK_ULL(5, 2) | \ + GENMASK_ULL(63, 10)) +#define CET_LEG_BITMAP_BASE(data) ((data) >> 12) + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + if (!kvm_cet_is_msr_accessible(vcpu, msr_info)) + return 1; + if (!!(data & CET_CTRL_RESERVED_BITS)) + return 1; + if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) && + (data & CET_SHSTK_MASK_BITS)) + return 1; + if (!guest_can_use(vcpu, X86_FEATURE_IBT) && + (data & CET_IBT_MASK_BITS)) + return 1; + if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) || + (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS) + return 1; + if (msr == MSR_IA32_U_CET) + kvm_set_xsave_msr(msr_info); + break; + case MSR_KVM_GUEST_SSP: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + if (!kvm_cet_is_msr_accessible(vcpu, msr_info)) + return 1; + if (is_noncanonical_address(data, vcpu)) + return 1; + if (!IS_ALIGNED(data, 4)) + return 1; + if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP || + msr == MSR_IA32_PL2_SSP) { + vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data; + } else if (msr == MSR_IA32_PL3_SSP) { + kvm_set_xsave_msr(msr_info); + } + break; default: if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_set_msr(vcpu, msr_info); @@ -4051,7 +4123,9 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host) int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { - switch (msr_info->index) { + u32 msr = msr_info->index; + + switch (msr) { case MSR_IA32_PLATFORM_ID: case MSR_IA32_EBL_CR_POWERON: case MSR_IA32_LASTBRANCHFROMIP: @@ -4086,7 +4160,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3: case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1: case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1: - if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) + if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_get_msr(vcpu, msr_info); msr_info->data = 0; break; @@ -4137,7 +4211,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_MTRRcap: case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: case MSR_MTRRdefType: - return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data); + return kvm_mtrr_get_msr(vcpu, msr, &msr_info->data); case 0xcd: /* fsb frequency */ msr_info->data = 3; break; @@ -4159,7 +4233,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = kvm_get_apic_base(vcpu); break; case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff: - return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data); + return kvm_x2apic_msr_read(vcpu, msr, &msr_info->data); case MSR_IA32_TSC_DEADLINE: msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu); break; @@ -4253,7 +4327,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_MCG_STATUS: case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1: - return get_msr_mce(vcpu, msr_info->index, &msr_info->data, + return get_msr_mce(vcpu, msr, &msr_info->data, msr_info->host_initiated); case MSR_IA32_XSS: if (!msr_info->host_initiated && @@ -4284,7 +4358,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case HV_X64_MSR_TSC_EMULATION_STATUS: case HV_X64_MSR_TSC_INVARIANT_CONTROL: return kvm_hv_get_msr_common(vcpu, - msr_info->index, &msr_info->data, + msr, &msr_info->data, msr_info->host_initiated); case MSR_IA32_BBL_CR_CTL3: /* This legacy MSR exists but isn't fully documented in current @@ -4337,8 +4411,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = vcpu->arch.guest_fpu.xfd_err; break; #endif + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + if (!kvm_cet_is_msr_accessible(vcpu, msr_info)) + return 1; + if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP || + msr == MSR_IA32_PL2_SSP) { + msr_info->data = + vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP]; + } else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) { + kvm_get_xsave_msr(msr_info); + } + break; default: - if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) + if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_get_msr(vcpu, msr_info); /* @@ -4346,7 +4434,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) * to-be-saved, even if an MSR isn't fully supported. */ if (msr_info->host_initiated && - kvm_is_msr_to_save(msr_info->index)) { + kvm_is_msr_to_save(msr)) { msr_info->data = 0; break; } diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index c69fc027f5ec..3b79d6db2f83 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size, unsigned int port, void *data, unsigned int count, int in); +/* + * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before + * access the MSRs to avoid MSR content corruption. + */ +static inline void kvm_get_xsave_msr(struct msr_data *msr_info) +{ + kvm_fpu_get(); + rdmsrl(msr_info->index, msr_info->data); + kvm_fpu_put(); +} + +static inline void kvm_set_xsave_msr(struct msr_data *msr_info) +{ + kvm_fpu_get(); + wrmsrl(msr_info->index, msr_info->data); + kvm_fpu_put(); +} + #endif From patchwork Thu Aug 3 04:27:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C41E8C001E0 for ; Thu, 3 Aug 2023 07:37:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234082AbjHCHhV (ORCPT ); Thu, 3 Aug 2023 03:37:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233213AbjHCHgJ (ORCPT ); Thu, 3 Aug 2023 03:36:09 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7012949DD; Thu, 3 Aug 2023 00:32:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047940; x=1722583940; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vNllE2Y9ire4AWHQOg7xEw0xLKtvwfNzPjWKsewbPlk=; b=aPF1Pwb5si9DWTtNIhp5GcTbCkLo2iekEPk1enMjMxC48gJTT3/VDn+v 0AjY2Wq9rEvOhqW8udK+oIB6iqef345xHemgJEsyzjtKUHCDyw9okNe6O ZsyUnt3XruwnJOq/tt7JeNW3bGptip2nHuVHzrd1TDRJvzcHrcPd2xgKJ W/PxgK/Azf8tAChGChz1Fb0n0qIWYgumBb5EsiX9/y5NQo+xB9ng/VPr8 RS4ltY0IUevqyUOPv9Y7Xvde0Ea72RVYQKdHTE310Ve9AT2z4peCz9DFd kkRnwJ0KhYkvPH5Th/SK5B6qklXX4Ic4qUGfHuAhYg9E7314NOYkRGyhC Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708139" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708139" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888502" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888502" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM Date: Thu, 3 Aug 2023 00:27:25 -0400 Message-Id: <20230803042732.88515-13-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates architectural behavior when guest enters/leaves SMM mode, i.e., save registers to SMRAM at the entry of SMM and reload them at the exit of SMM. Per SDM, SSP is defined as one of the fields in SMRAM for 64-bit mode, so handle the state accordingly. Check is_smm() to determine whether kvm_cet_is_msr_accessible() is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode. Signed-off-by: Yang Weijiang --- arch/x86/kvm/smm.c | 11 +++++++++++ arch/x86/kvm/smm.h | 2 +- arch/x86/kvm/x86.c | 11 ++++++++++- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c index b42111a24cc2..e0b62d211306 100644 --- a/arch/x86/kvm/smm.c +++ b/arch/x86/kvm/smm.c @@ -309,6 +309,12 @@ void enter_smm(struct kvm_vcpu *vcpu) kvm_smm_changed(vcpu, true); +#ifdef CONFIG_X86_64 + if (guest_can_use(vcpu, X86_FEATURE_SHSTK) && + kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp)) + goto error; +#endif + if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram))) goto error; @@ -586,6 +592,11 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt) if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0) static_call(kvm_x86_set_nmi_mask)(vcpu, false); +#ifdef CONFIG_X86_64 + if (guest_can_use(vcpu, X86_FEATURE_SHSTK) && + kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, smram.smram64.ssp)) + return X86EMUL_UNHANDLEABLE; +#endif kvm_smm_changed(vcpu, false); /* diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h index a1cf2ac5bd78..1e2a3e18207f 100644 --- a/arch/x86/kvm/smm.h +++ b/arch/x86/kvm/smm.h @@ -116,8 +116,8 @@ struct kvm_smram_state_64 { u32 smbase; u32 reserved4[5]; - /* ssp and svm_* fields below are not implemented by KVM */ u64 ssp; + /* svm_* fields below are not implemented by KVM */ u64 svm_guest_pat; u64 svm_host_efer; u64 svm_host_cr4; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 98f3ff6078e6..56aa5a3d3913 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3644,8 +3644,17 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK)) return false; - if (msr->index == MSR_KVM_GUEST_SSP) + /* + * This MSR is synthesized mainly for userspace access during + * Live Migration, it also can be accessed in SMM mode by VMM. + * Guest is not allowed to access this MSR. + */ + if (msr->index == MSR_KVM_GUEST_SSP) { + if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu)) + return true; + return msr->host_initiated; + } return msr->host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); From patchwork Thu Aug 3 04:27:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339454 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5630C04A94 for ; Thu, 3 Aug 2023 07:37:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231542AbjHCHhX (ORCPT ); Thu, 3 Aug 2023 03:37:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232386AbjHCHgJ (ORCPT ); Thu, 3 Aug 2023 03:36:09 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78F3B49DE; Thu, 3 Aug 2023 00:32:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047940; x=1722583940; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=93vLErl8ccSchrRyHq0bfKsXuDt7NS/Y/g+1+b739Vw=; b=SbsvmerdnnchPRroZiZO4kOEuyUj1IDhI+BbMM4eQ1HWgWaG/Stz3UI0 3qKhlGrN7HzJ6MeDZuWn8imp8YHhi1f8gOUmq9zJAl1LvZxYjx09eTQV2 PFM0CnXyqdP+IVoHsfZbSRjpcW7ZTkZO7BLlK/6VGtShg+5VJN/GKpSNw m9CejDpKbk9Qg1LBFjBVXRPtTlsW3pQYmGx+kUTEzYCjkeyKK3EzvOcCL b5sFeaNEStwIDwuk4kLG2zYcASvBsBlXmhoHCxZNfiHmA0ii+HHeDzw74 Mopod1w0snP9aLrPopNe8lnxw+yno87GzEUQVocgZMdZ/YE6Pub/0UqE0 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708150" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708150" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888508" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888508" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs Date: Thu, 3 Aug 2023 00:27:26 -0400 Message-Id: <20230803042732.88515-14-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Pass through CET MSRs when the associated feature is enabled. Shadow Stack feature requires all the CET MSRs to make it architectural support in guest. IBT feature only depends on MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and supervisor IBT. Note, This MSR design introduced an architectual limitation of SHSTK and IBT control for guest, i.e., when SHSTK is exposed, IBT is also available to guest from architectual level since IBT relies on subset of SHSTK relevant MSRs. Signed-off-by: Yang Weijiang Reviewed-by: Chao Gao --- arch/x86/kvm/vmx/vmx.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index ccf750e79608..6779b8a63789 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -709,6 +709,10 @@ static bool is_valid_passthrough_msr(u32 msr) case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ return true; + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + return true; } r = possible_passthrough_msr_slot(msr) != -ENOENT; @@ -7747,6 +7751,41 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) +{ + bool incpt; + + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) { + incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); + + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, + MSR_TYPE_RW, incpt); + if (!incpt) + return; + } + + if (kvm_cpu_cap_has(X86_FEATURE_IBT)) { + incpt = !guest_can_use(vcpu, X86_FEATURE_IBT); + + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, + MSR_TYPE_RW, incpt); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, + MSR_TYPE_RW, incpt); + } +} + static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -7814,6 +7853,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) /* Refresh #PF interception to account for MAXPHYADDR changes. */ vmx_update_exception_bitmap(vcpu); + + vmx_update_intercept_for_cet_msr(vcpu); } static u64 vmx_get_perf_capabilities(void) From patchwork Thu Aug 3 04:27:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339456 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 192E6C0015E for ; Thu, 3 Aug 2023 07:37:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231706AbjHCHh3 (ORCPT ); Thu, 3 Aug 2023 03:37:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232458AbjHCHgK (ORCPT ); Thu, 3 Aug 2023 03:36:10 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4579A49DF; Thu, 3 Aug 2023 00:32:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047941; x=1722583941; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qv737col8wU+/w5BrY2eGqQ22lxPRzjYmcTTHFYVkxo=; b=gfWBmPm+RUjfuegU538Ftf0zfYzCqxe9UVrUARsQ/zP92Y76IVDE3kfG VbI9aqBX8So6cr++ZHTS09VbQL590EykEvRFWzf0Au1XAPCCs0CT+7icQ Jyuie8ajZ0Yzd6ZnrAOAUiUaasyP8Lq/v3WWLXyVTbLsocbniRQqrrWmE nrHQJKCa+49tV3ZiicUO6dA+8EuS3+gLSdMno5vvZwQBdxG1gJ0F6TZUC w6v9J5IhvTuDb0wnYJKDVc8Lwx4J/tENxDj5/uBFoFKxNHKHWEFkKnjGc AQanUg2RkM8+0f7OWPUe5+QNuqCE2efC+SEg2vwF8E4rggt6vDKzYskkJ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708157" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708157" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888511" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888511" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields Date: Thu, 3 Aug 2023 00:27:27 -0400 Message-Id: <20230803042732.88515-15-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Set constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} VMCS fields explicitly. Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after post-boot(except is BIOS call case but vCPU thread never across it.), i.e. KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/VM-Exit sequence. Host supervisor shadow stack is not enabled now and SSP is not accessible to kernel mode, thus it's safe to set host IA32_INT_ SSP_TAB/SSP VMCS fields to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from IA32_PL3_SSP before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/SYSRET/SYSENTER SYSEXIT/ RDSSP/CALL etc. Prevent KVM module loading and if host supervisor shadow stack SHSTK_EN is set in MSR_IA32_S_CET as KVM cannot co-exit with it correctly. Suggested-by: Sean Christopherson Suggested-by: Chao Gao Signed-off-by: Yang Weijiang Reviewed-by: Chao Gao --- arch/x86/kvm/vmx/capabilities.h | 4 ++++ arch/x86/kvm/vmx/vmx.c | 15 +++++++++++++++ arch/x86/kvm/x86.c | 14 ++++++++++++++ arch/x86/kvm/x86.h | 1 + 4 files changed, 34 insertions(+) diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index d0abee35d7ba..b1883f6c08eb 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void) return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; } +static inline bool cpu_has_load_cet_ctrl(void) +{ + return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE); +} static inline bool cpu_has_vmx_mpx(void) { return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6779b8a63789..99bf63b2a779 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4341,6 +4341,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx) if (cpu_has_load_ia32_efer()) vmcs_write64(HOST_IA32_EFER, host_efer); + + /* + * Supervisor shadow stack is not enabled on host side, i.e., + * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM + * description(RDSSP instruction), SSP is not readable in CPL0, + * so resetting the two registers to 0s at VM-Exit does no harm + * to kernel execution. When execution flow exits to userspace, + * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter + * 3 and 4 for details. + */ + if (cpu_has_load_cet_ctrl()) { + vmcs_writel(HOST_S_CET, host_s_cet); + vmcs_writel(HOST_SSP, 0); + vmcs_writel(HOST_INTR_SSP_TABLE, 0); + } } void set_cr4_guest_host_mask(struct vcpu_vmx *vmx) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 56aa5a3d3913..01b4f10fa8ab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -113,6 +113,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE); #endif static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS; +u64 __read_mostly host_s_cet; +EXPORT_SYMBOL_GPL(host_s_cet); #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE) @@ -9615,6 +9617,18 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) return -EIO; } + if (boot_cpu_has(X86_FEATURE_SHSTK)) { + rdmsrl(MSR_IA32_S_CET, host_s_cet); + /* + * Linux doesn't yet support supervisor shadow stacks (SSS), so + * KVM doesn't save/restore the associated MSRs, i.e. KVM may + * clobber the host values. Yell and refuse to load if SSS is + * unexpectedly enabled, e.g. to avoid crashing the host. + */ + if (WARN_ON_ONCE(host_s_cet & CET_SHSTK_EN)) + return -EIO; + } + x86_emulator_cache = kvm_alloc_emulator_cache(); if (!x86_emulator_cache) { pr_err("failed to allocate cache for x86 emulator\n"); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 3b79d6db2f83..e42e5263fcf7 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -323,6 +323,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu); extern u64 host_xcr0; extern u64 host_xss; +extern u64 host_s_cet; extern struct kvm_caps kvm_caps; From patchwork Thu Aug 3 04:27:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65169C0015E for ; Thu, 3 Aug 2023 07:37:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232648AbjHCHh0 (ORCPT ); Thu, 3 Aug 2023 03:37:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233742AbjHCHgL (ORCPT ); Thu, 3 Aug 2023 03:36:11 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EAB849E0; Thu, 3 Aug 2023 00:32:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047941; x=1722583941; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wwU1VFqu3usr+HP49DlB/+X38Ghi5qAmy1D4scHrxkA=; b=D1Qg06nDEioHG+a5I2LwNpH5aK2mBrMBsnlQW/6DqEtHDRPYslQ3EHq+ Vn+/nF/9iE1Lk3g1fWTNHZLA9MLguK8vP8Q48bqrWvNHNp4hHrT4ze83f Wf/3QCuRat6bud698nfoQh/IIucoo/UW9kep5k5lKuQY0iIM1vmfk1A2l jiNX6vRDhIKA9lFEwwBEW8QdkbSKK+FYtZQ8N+eMmCFPytFp9bT1Cyj6O 85RlKWKziTB5MXISVQX6JVcFnsCwkPgSt5L9qCD8iohy+Mgm0QXiLwpLX /CJ/txgf6tIu/MoSrpWPANrKFl2qPJZ5OqX6O+GAUxQR5cEEKyt9jtzAo Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708158" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708158" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888514" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888514" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:17 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload Date: Thu, 3 Aug 2023 00:27:28 -0400 Message-Id: <20230803042732.88515-16-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Make PL{0,1,2}_SSP as write-intercepted to detect whether guest is using these MSRs. Disable intercept to the MSRs if they're written with non-zero values. KVM does save/ reload for the MSRs only if they're used by guest. Signed-off-by: Yang Weijiang --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx/vmx.c | 31 ++++++++++++++++++++++++++++--- arch/x86/kvm/x86.c | 8 ++++++-- 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 69cbc9d9b277..c50b555234fb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -748,6 +748,7 @@ struct kvm_vcpu_arch { bool tpr_access_reporting; bool xsaves_enabled; bool xfd_no_write_intercept; + bool cet_sss_active; u64 ia32_xss; u64 microcode_version; u64 arch_capabilities; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 99bf63b2a779..96e22515ed13 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2152,6 +2152,18 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated return debugctl; } +static void vmx_disable_write_intercept_sss_msr(struct kvm_vcpu *vcpu) +{ + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) { + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, + MSR_TYPE_RW, false); + } +} + /* * Writes msr value into the appropriate "register". * Returns 0 on success, non-0 otherwise. @@ -2420,6 +2432,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) else vmx->pt_desc.guest.addr_a[index / 2] = data; break; + case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP: + if (kvm_set_msr_common(vcpu, msr_info)) + return 1; + if (data) { + vmx_disable_write_intercept_sss_msr(vcpu); + wrmsrl(msr_index, data); + } + break; case MSR_IA32_S_CET: case MSR_KVM_GUEST_SSP: case MSR_IA32_INT_SSP_TAB: @@ -7777,12 +7797,17 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) MSR_TYPE_RW, incpt); vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt); + /* + * Supervisor shadow stack MSRs are intercepted until + * they're written by guest, this is designed to + * optimize the save/restore overhead. + */ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, - MSR_TYPE_RW, incpt); + MSR_TYPE_R, incpt); vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, - MSR_TYPE_RW, incpt); + MSR_TYPE_R, incpt); vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, - MSR_TYPE_RW, incpt); + MSR_TYPE_R, incpt); vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt); vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 01b4f10fa8ab..fa3e7f7c639f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4060,6 +4060,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP || msr == MSR_IA32_PL2_SSP) { vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data; + if (!vcpu->arch.cet_sss_active && data) + vcpu->arch.cet_sss_active = true; } else if (msr == MSR_IA32_PL3_SSP) { kvm_set_xsave_msr(msr_info); } @@ -11241,7 +11243,8 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu) { - if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) { + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK) && + vcpu->arch.cet_sss_active)) { rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]); rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]); rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]); @@ -11256,7 +11259,8 @@ EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp); void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu) { - if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) { + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK) && + vcpu->arch.cet_sss_active)) { wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]); wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]); wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]); From patchwork Thu Aug 3 04:27:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339458 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A85A1C04A6A for ; Thu, 3 Aug 2023 07:37:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234112AbjHCHhf (ORCPT ); Thu, 3 Aug 2023 03:37:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232445AbjHCHgQ (ORCPT ); Thu, 3 Aug 2023 03:36:16 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B5B749E1; Thu, 3 Aug 2023 00:32:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047941; x=1722583941; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wm2XCAb9bRVJi4ilyUjYIez95nMsPWAgZHk7NlDWZyY=; b=KmAfsxHOYAWQWa8wHaSYYsOvN2wUhNll0JU7LYWo0pTEoMWwwvXu5yKz UtXK6vgiVdNDT9KsQYkQPn5O5RXLbNEib5U4vd7g5m6vR921lRvmm4iJR jhWOYk8zVks5KbvHVVAIFtzChA0DtMeqG410iPsfToGbgLcQmd9V6tido ksecJHlCsaZwywV8OlChVSytcfKj6MzWWGVIt1TBOzllXe3S7i61YaabP fCYnHHpt1vC9KHmHV1nsObvGK8bIP0PNepeAWRsTZgHme/SsEJH6T1zKi FUY3wftLFriWq9P1S48J3Yv3V9dq5j6+wUcvrcj6XKpt8pft8HSUdaXJd w==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708169" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708169" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888517" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888517" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 16/19] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Date: Thu, 3 Aug 2023 00:27:29 -0400 Message-Id: <20230803042732.88515-17-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Enable CET related feature bits in KVM capabilities array and make X86_CR4_CET available to guest. Remove the feature bits if host side dependencies cannot be met. Set the feature bits so that CET features are available in guest CPUID. Add CR4.CET bit support in order to allow guest set CET master control bit(CR4.CET). Disable KVM CET feature if unrestricted_guest is unsupported/disabled as KVM does not support emulating CET. Don't expose CET feature if dependent CET bit(U_CET) is cleared in host XSS or if XSAVES isn't supported. The CET bits in VM_ENTRY/VM_EXIT control fields should be set to make guest CET states isolated from host side. CET is only available on platforms that enumerate VMX_BASIC[bit 56] as 1. Signed-off-by: Yang Weijiang --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kvm/cpuid.c | 12 ++++++++++-- arch/x86/kvm/vmx/capabilities.h | 6 ++++++ arch/x86/kvm/vmx/vmx.c | 20 ++++++++++++++++++++ arch/x86/kvm/vmx/vmx.h | 6 ++++-- arch/x86/kvm/x86.c | 16 +++++++++++++++- arch/x86/kvm/x86.h | 3 +++ 8 files changed, 61 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c50b555234fb..f883696723f4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -125,7 +125,8 @@ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \ - | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP)) + | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \ + | X86_CR4_CET)) #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 3aedae61af4f..7ce0850c6067 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1078,6 +1078,7 @@ #define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU #define VMX_BASIC_MEM_TYPE_WB 6LLU #define VMX_BASIC_INOUT 0x0040000000000000LLU +#define VMX_BASIC_NO_HW_ERROR_CODE 0x0100000000000000LLU /* Resctrl MSRs: */ /* - Intel: */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0338316b827c..1a601be7b4fa 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -624,7 +624,7 @@ void kvm_set_cpu_caps(void) F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) | F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) | F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ | - F(SGX_LC) | F(BUS_LOCK_DETECT) + F(SGX_LC) | F(BUS_LOCK_DETECT) | F(SHSTK) ); /* Set LA57 based on hardware capability. */ if (cpuid_ecx(7) & F(LA57)) @@ -642,7 +642,8 @@ void kvm_set_cpu_caps(void) F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) | F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) | F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) | - F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D) + F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D) | + F(IBT) ); /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */ @@ -655,6 +656,13 @@ void kvm_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP); if (boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD); + /* + * The feature bit in boot_cpu_data.x86_capability could have been + * cleared due to ibt=off cmdline option, then add it back if CPU + * supports IBT. + */ + if (cpuid_edx(7) & F(IBT)) + kvm_cpu_cap_set(X86_FEATURE_IBT); kvm_cpu_cap_mask(CPUID_7_1_EAX, F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) | diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index b1883f6c08eb..2948a288d0b4 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -79,6 +79,12 @@ static inline bool cpu_has_vmx_basic_inout(void) return (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT); } +static inline bool cpu_has_vmx_basic_no_hw_errcode(void) +{ + return ((u64)vmcs_config.basic_cap << 32) & + VMX_BASIC_NO_HW_ERROR_CODE; +} + static inline bool cpu_has_virtual_nmis(void) { return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS && diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 96e22515ed13..2f2b6f7c33d9 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2624,6 +2624,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf, { VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER }, { VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS }, { VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL }, + { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE }, }; memset(vmcs_conf, 0, sizeof(*vmcs_conf)); @@ -6357,6 +6358,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu) if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0) vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest); + if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) { + pr_err("S_CET = 0x%016lx\n", vmcs_readl(GUEST_S_CET)); + pr_err("SSP = 0x%016lx\n", vmcs_readl(GUEST_SSP)); + pr_err("INTR SSP TABLE = 0x%016lx\n", + vmcs_readl(GUEST_INTR_SSP_TABLE)); + } pr_err("*** Host State ***\n"); pr_err("RIP = 0x%016lx RSP = 0x%016lx\n", vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP)); @@ -6434,6 +6441,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu) if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID) pr_err("Virtual processor ID = 0x%04x\n", vmcs_read16(VIRTUAL_PROCESSOR_ID)); + if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) { + pr_err("S_CET = 0x%016lx\n", vmcs_readl(HOST_S_CET)); + pr_err("SSP = 0x%016lx\n", vmcs_readl(HOST_SSP)); + pr_err("INTR SSP TABLE = 0x%016lx\n", + vmcs_readl(HOST_INTR_SSP_TABLE)); + } } /* @@ -7970,6 +7983,13 @@ static __init void vmx_set_cpu_caps(void) if (cpu_has_vmx_waitpkg()) kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); + + if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest || + !cpu_has_vmx_basic_no_hw_errcode()) { + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); + kvm_cpu_cap_clear(X86_FEATURE_IBT); + kvm_caps.supported_xss &= ~CET_XSTATE_MASK; + } } static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 32384ba38499..4e88b5fb45e8 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -481,7 +481,8 @@ static inline u8 vmx_get_rvi(void) VM_ENTRY_LOAD_IA32_EFER | \ VM_ENTRY_LOAD_BNDCFGS | \ VM_ENTRY_PT_CONCEAL_PIP | \ - VM_ENTRY_LOAD_IA32_RTIT_CTL) + VM_ENTRY_LOAD_IA32_RTIT_CTL | \ + VM_ENTRY_LOAD_CET_STATE) #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \ (VM_EXIT_SAVE_DEBUG_CONTROLS | \ @@ -503,7 +504,8 @@ static inline u8 vmx_get_rvi(void) VM_EXIT_LOAD_IA32_EFER | \ VM_EXIT_CLEAR_BNDCFGS | \ VM_EXIT_PT_CONCEAL_PIP | \ - VM_EXIT_CLEAR_IA32_RTIT_CTL) + VM_EXIT_CLEAR_IA32_RTIT_CTL | \ + VM_EXIT_LOAD_CET_STATE) #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \ (PIN_BASED_EXT_INTR_MASK | \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fa3e7f7c639f..aa92dec66f1e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -230,7 +230,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) -#define KVM_SUPPORTED_XSS 0 +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER) u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); @@ -9669,6 +9669,20 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_ops_update(ops); + if (!kvm_is_cet_supported()) { + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); + kvm_cpu_cap_clear(X86_FEATURE_IBT); + } + + /* + * If SHSTK and IBT are not available in KVM, clear CET user bit in + * kvm_caps.supported_xss so that kvm_is_cet__supported() returns + * false when called. + */ + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && + !kvm_cpu_cap_has(X86_FEATURE_IBT)) + kvm_caps.supported_xss &= ~CET_XSTATE_MASK; + for_each_online_cpu(cpu) { smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1); if (r < 0) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index e42e5263fcf7..373386fb9ed2 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -542,6 +542,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); __reserved_bits |= X86_CR4_VMXE; \ if (!__cpu_has(__c, X86_FEATURE_PCID)) \ __reserved_bits |= X86_CR4_PCIDE; \ + if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \ + !__cpu_has(__c, X86_FEATURE_IBT)) \ + __reserved_bits |= X86_CR4_CET; \ __reserved_bits; \ }) From patchwork Thu Aug 3 04:27:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEEB5C41513 for ; Thu, 3 Aug 2023 07:37:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234103AbjHCHhd (ORCPT ); Thu, 3 Aug 2023 03:37:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232438AbjHCHgQ (ORCPT ); Thu, 3 Aug 2023 03:36:16 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DDB849E2; Thu, 3 Aug 2023 00:32:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047942; x=1722583942; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CmAs7m1B9VFSX+QDbvzqcYRQr/GL9RBo18A45fnyo9E=; b=Gmmw4GG2FxxewWsSHzLi6ZBwn/B+qUUqiNY0k86xXs/Pua4IJFVle1u8 tQE6aW4FPsJkO074ul7OzJcdkpUoov2pCUZoS3hLZib0+KC+R0Rjz7pvF bzTVpxlodiDsw1uBmldnTa8arRLSMwPWyBnrpSjBv+oXb0ChCV1N9yz64 bPUckwTYfrlgtaFxqhbo4IB+CRsaz5/Kdgv84HU9dKBwKyIGgp+sLE5gh 6Tst/8F9U4qDsQaA0J62keE4EpgW8EyrqzwFXVEaU8TazCYOwMKr5lh0J PB8l405XZv43DzLA9vTDNhzgy4nYEp56EaNt46Y8UUE5Vn2hJMPqfzw6W g==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708174" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708174" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888521" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888521" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support Date: Thu, 3 Aug 2023 00:27:30 -0400 Message-Id: <20230803042732.88515-18-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add S_CET bit in kvm_caps.supported_xss so that guest can enumerate the feature in CPUID(0xd,1).ECX. Guest S_CET xstate bit is specially handled, i.e., it can be exposed without related enabling on host side, because KVM manually saves/reloads guest supervisor SHSTK SSPs and current XSS swap logic for host/guest aslo supports doing so, thus it's safe to enable the bit without host support. Signed-off-by: Yang Weijiang --- arch/x86/kvm/x86.c | 8 +++++++- arch/x86/kvm/x86.h | 2 +- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aa92dec66f1e..2e200a5d00e9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -230,7 +230,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) -#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER) +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER | \ + XFEATURE_MASK_CET_KERNEL) u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); @@ -9657,8 +9658,13 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) rdmsrl_safe(MSR_EFER, &host_efer); if (boot_cpu_has(X86_FEATURE_XSAVES)) { + u32 eax, ebx, ecx, edx; + + cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx); rdmsrl(MSR_IA32_XSS, host_xss); kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS; + if (ecx & XFEATURE_MASK_CET_KERNEL) + kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL; } kvm_init_pmu_capability(ops->pmu_ops); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 373386fb9ed2..ea0ecb8f0df6 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -363,7 +363,7 @@ static inline bool kvm_mpx_supported(void) == (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR); } -#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER) +#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL) /* * Shadow Stack and Indirect Branch Tracking feature enabling depends on * whether host side CET user xstate bit is supported or not. From patchwork Thu Aug 3 04:27:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F4AFC0015E for ; Thu, 3 Aug 2023 07:37:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234121AbjHCHhh (ORCPT ); Thu, 3 Aug 2023 03:37:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232555AbjHCHgR (ORCPT ); Thu, 3 Aug 2023 03:36:17 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD9E049E3; Thu, 3 Aug 2023 00:32:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047942; x=1722583942; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7e5ypkNSvdnqN0MtKaMYvhf+y8wsyyJlu13WVz7mfU8=; b=bjAH44JRU+58OQM/x+IGAP4KjrG+bp4K/RjOQuL8f/5POyMSOQnWzlTj xhG1w5Rdo6YWff1PaxSAgq7TpD3xdr1hnr17Wxflfj0TP/o5m8KbaFQ/o ICX/vkCBwrexiX5FAv/JSer7b5DQ/2mQhs/FiJdaqAJ7s6VUpvsJAXE+Q YWOR7eFdPHdDZDg2YkzQm+WbQUabPEDlwQNMf/Zu5xjvqX7O2T+61b4Ak XxaCPOUJhPEG8wyRTULn101GYCn8+z4s3pyTMf1R8R+NTB5n14TzCNTrK wMVFukxvp9Y4UjnO7ZpFaCOlQAF7LgWp/rxMigzc/nil88zsqFc/oJoR8 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708180" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708180" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888524" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888524" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM Date: Thu, 3 Aug 2023 00:27:31 -0400 Message-Id: <20230803042732.88515-19-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Per SDM description(Vol.3D, Appendix A.1): "If bit 56 is read as 1, software can use VM entry to deliver a hardware exception with or without an error code, regardless of vector" Modify has_error_code check before inject events to nested guest. Only enforce the check when guest is in real mode, the exception is not hard exception and the platform doesn't enumerate bit56 in VMX_BASIC, otherwise ignore it. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++-------- arch/x86/kvm/vmx/nested.h | 7 +++++++ 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 516391cc0d64..9bcd989252f7 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1205,9 +1205,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data) { const u64 feature_and_reserved = /* feature (except bit 48; see below) */ - BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | + BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) | /* reserved */ - BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56); + BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57); u64 vmx_basic = vmcs_config.nested.basic; if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved)) @@ -2846,12 +2846,16 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu, CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0)) return -EINVAL; - /* VM-entry interruption-info field: deliver error code */ - should_have_error_code = - intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode && - x86_exception_has_error_code(vector); - if (CC(has_error_code != should_have_error_code)) - return -EINVAL; + if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION || + !nested_cpu_has_no_hw_errcode(vcpu)) { + /* VM-entry interruption-info field: deliver error code */ + should_have_error_code = + intr_type == INTR_TYPE_HARD_EXCEPTION && + prot_mode && + x86_exception_has_error_code(vector); + if (CC(has_error_code != should_have_error_code)) + return -EINVAL; + } /* VM-entry exception error code */ if (CC(has_error_code && @@ -6967,6 +6971,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs) if (cpu_has_vmx_basic_inout()) msrs->basic |= VMX_BASIC_INOUT; + if (cpu_has_vmx_basic_no_hw_errcode()) + msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE; } static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs) diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h index 96952263b029..1884628294e4 100644 --- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) __kvm_is_valid_cr4(vcpu, val); } +static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE; +} + /* No difference in the restrictions on guest and host CR4 in VMX operation. */ #define nested_guest_cr4_valid nested_cr4_valid #define nested_host_cr4_valid nested_cr4_valid From patchwork Thu Aug 3 04:27:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13339460 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB07AC001E0 for ; Thu, 3 Aug 2023 07:37:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233159AbjHCHhl (ORCPT ); Thu, 3 Aug 2023 03:37:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232583AbjHCHgR (ORCPT ); Thu, 3 Aug 2023 03:36:17 -0400 Received: from mgamail.intel.com (unknown [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BC7335BF; Thu, 3 Aug 2023 00:32:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691047943; x=1722583943; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZZklg1iWYB8ZKN/NctxB18CRvtRGrLjvZfmnmOqSCpQ=; b=WJI14tnHgOh0rt+iQcA01dL7tBpMABTp/Hi4PzFutjG0HrtS31QnK9AH xrPv+CPc0mJpoGlzQU6nE0H0AGMNbXK92zFwZKUAuTKbyccF2Y+3ks8xN SCe84/VdCOUhl64ZPGpAjju1qkBkUlKuMGlCTd6azm37lzwbJO879gwfz VgxebSpYrHxdIcFALe+DG2/QK1S7WwqU1s4oESKGdNOS1JhaZR2ex0H/e 3IAqHIljTjqz94yVgCUPYmuShSec1inrcDkFcPy3hK6OZ7kCzN8zNvKxx BXKxa84mLoZUCarXzi5bezK4Dn9PyoL6+dqJI8RO7UMQfX/Vk5E//rVdr Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="354708181" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="354708181" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10790"; a="794888527" X-IronPort-AV: E=Sophos;i="6.01,251,1684825200"; d="scan'208";a="794888527" Received: from embargo.jf.intel.com ([10.165.9.183]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2023 00:32:18 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v5 19/19] KVM:nVMX: Enable CET support for nested VM Date: Thu, 3 Aug 2023 00:27:32 -0400 Message-Id: <20230803042732.88515-20-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230803042732.88515-1-weijiang.yang@intel.com> References: <20230803042732.88515-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed setting for CR4 to enable CET for nested VM. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/nested.c | 27 +++++++++++++++++++++++++-- arch/x86/kvm/vmx/vmcs12.c | 6 ++++++ arch/x86/kvm/vmx/vmcs12.h | 14 +++++++++++++- arch/x86/kvm/vmx/vmx.c | 2 ++ 4 files changed, 46 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 9bcd989252f7..bd6883033f69 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -660,6 +660,28 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, MSR_IA32_FLUSH_CMD, MSR_TYPE_W); + /* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */ + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_U_CET, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_S_CET, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL0_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL1_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL2_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL3_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW); + kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false); vmx->nested.force_msr_bitmap_recalc = false; @@ -6793,7 +6815,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf, VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | - VM_EXIT_CLEAR_BNDCFGS; + VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE; msrs->exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | @@ -6815,7 +6837,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf, #ifdef CONFIG_X86_64 VM_ENTRY_IA32E_MODE | #endif - VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS; + VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS | + VM_ENTRY_LOAD_CET_STATE; msrs->entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL); diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c index 106a72c923ca..4233b5ca9461 100644 --- a/arch/x86/kvm/vmx/vmcs12.c +++ b/arch/x86/kvm/vmx/vmcs12.c @@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = { FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions), FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp), FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip), + FIELD(GUEST_S_CET, guest_s_cet), + FIELD(GUEST_SSP, guest_ssp), + FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl), FIELD(HOST_CR0, host_cr0), FIELD(HOST_CR3, host_cr3), FIELD(HOST_CR4, host_cr4), @@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = { FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip), FIELD(HOST_RSP, host_rsp), FIELD(HOST_RIP, host_rip), + FIELD(HOST_S_CET, host_s_cet), + FIELD(HOST_SSP, host_ssp), + FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl), }; const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets); diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h index 01936013428b..3884489e7f7e 100644 --- a/arch/x86/kvm/vmx/vmcs12.h +++ b/arch/x86/kvm/vmx/vmcs12.h @@ -117,7 +117,13 @@ struct __packed vmcs12 { natural_width host_ia32_sysenter_eip; natural_width host_rsp; natural_width host_rip; - natural_width paddingl[8]; /* room for future expansion */ + natural_width host_s_cet; + natural_width host_ssp; + natural_width host_ssp_tbl; + natural_width guest_s_cet; + natural_width guest_ssp; + natural_width guest_ssp_tbl; + natural_width paddingl[2]; /* room for future expansion */ u32 pin_based_vm_exec_control; u32 cpu_based_vm_exec_control; u32 exception_bitmap; @@ -292,6 +298,12 @@ static inline void vmx_check_vmcs12_offsets(void) CHECK_OFFSET(host_ia32_sysenter_eip, 656); CHECK_OFFSET(host_rsp, 664); CHECK_OFFSET(host_rip, 672); + CHECK_OFFSET(host_s_cet, 680); + CHECK_OFFSET(host_ssp, 688); + CHECK_OFFSET(host_ssp_tbl, 696); + CHECK_OFFSET(guest_s_cet, 704); + CHECK_OFFSET(guest_ssp, 712); + CHECK_OFFSET(guest_ssp_tbl, 720); CHECK_OFFSET(pin_based_vm_exec_control, 744); CHECK_OFFSET(cpu_based_vm_exec_control, 748); CHECK_OFFSET(exception_bitmap, 752); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2f2b6f7c33d9..491039aeb61b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7726,6 +7726,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); + cr4_fixed1_update(X86_CR4_CET, ecx, feature_bit(SHSTK)); + cr4_fixed1_update(X86_CR4_CET, edx, feature_bit(IBT)); #undef cr4_fixed1_update }