From patchwork Fri Jul 21 03:03:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A83FEB64DD for ; Fri, 21 Jul 2023 06:09:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230085AbjGUGJD (ORCPT ); Fri, 21 Jul 2023 02:09:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229936AbjGUGI4 (ORCPT ); Fri, 21 Jul 2023 02:08:56 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DD17171D; Thu, 20 Jul 2023 23:08:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919734; x=1721455734; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yO02CRQF4AAMWQEvtkx0QaFIZMCrH9uguSTnN7FlnyQ=; b=UMEWyhlmiErzUcMNL6naCW5+MMK5Q+5V4Y+10TCXLNpdBShwOAx7QXq0 5BzgJC1jRLGpSMi5onCyir3MR/moK7x6L7I7oR8goX9X9lGmDeK8J/36f escryvRnLq8ogbobj9DNk8W6XbPhV1ve/G2j4ke2f8AuK7jG4ZBc3S3w2 5/LWn4Nbybibtoibji/rYpmaLg7WX1RPDpNGi7ym7MeO0b0w4EpLY1m3b xSZk6cLEE4a/rp2n6W2RQDhpHhk/Co5Z4s9c8GCbcpYtt7ybCiQdwwwdu 0EQz7xkL/j72qNFzbdKnhGAm7tPYaP/Bx8E09Ma46dsdB9szHtJ4pSW1L g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547526" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547526" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721870" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721870" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:40 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Yu-cheng Yu , Borislav Petkov , Kees Cook , Mike Rapoport , Pengfei Xu Subject: [PATCH v4 01/20] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Thu, 20 Jul 2023 23:03:33 -0400 Message-Id: <20230721030352.72414-2-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Rick Edgecombe The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-9-rick.p.edgecombe%40intel.com --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cb8ca46213be..d7215c8b7923 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -308,6 +308,7 @@ #define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ +#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ @@ -380,6 +381,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* "" Shadow stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index fafe9be7a6f4..b9c7eae2e70f 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define DISABLE_USER_SHSTK 0 +#else +#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -120,7 +126,7 @@ #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ - DISABLE_CALL_DEPTH_TRACKING) + DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK) #define DISABLED_MASK12 (DISABLE_LAM) #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index f6748c8bd647..e462c1d3800a 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Fri Jul 21 03:03:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ABA0C001DE for ; Fri, 21 Jul 2023 06:09:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230212AbjGUGJI (ORCPT ); Fri, 21 Jul 2023 02:09:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229940AbjGUGI4 (ORCPT ); Fri, 21 Jul 2023 02:08:56 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C68C19A6; Thu, 20 Jul 2023 23:08:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919734; x=1721455734; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dBGogkB5XYPTtr6lo9fhHibu/qwkmuDYT0Fu9ffXzZE=; b=eRMQTv1XWQQ54eHN0ruYccqV13Q/sDzf2rkDfxVucDmo46eJgxzBl5iY XOZyjmhbNZrHekoIEgZJBQQ4jexu/liZKdyd7NcePCklgpy9WzgMgpiAW CVSnUFRuhZ6pYA7cBJRgTwskCSmPwgI5By+66eeRJX1MVW94U3kvaguRW gWgqD8BW46K3Ct/wpqAcxO3o6HLG75fy+uxkTf8vsmIX11lJsX3llC9rg 8K1Jk666bMH8QvuwZg8eM0tXdBFWgbWcnaZawVLzrR90s4GKJOwM9SPad a2+BQQLoqPuPUaUyF6ktngeMurAAJneucO9ZVOsDZCSWIDpyo5FodJwOh w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547540" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547540" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721879" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721879" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:40 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Yu-cheng Yu , Borislav Petkov , Kees Cook , Mike Rapoport , Pengfei Xu Subject: [PATCH v4 02/20] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Thu, 20 Jul 2023 23:03:34 -0400 Message-Id: <20230721030352.72414-3-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Rick Edgecombe Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about unimplemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state is a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Signed-off-by: Dave Hansen Reviewed-by: Borislav Petkov (AMD) Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/all/20230613001108.3040476-25-rick.p.edgecombe%40intel.com --- arch/x86/include/asm/fpu/types.h | 16 +++++- arch/x86/include/asm/fpu/xstate.h | 6 ++- arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++---------------- 3 files changed, 61 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 7f6d858ff47a..eb810074f1e7 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,16 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + /* user control-flow settings */ + u64 user_cet; + /* user shadow stack pointer */ + u64 user_ssp; +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 0bab497c9436..4fa4751912d9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , + "x87 floating point registers", + "SSE registers", + "AVX registers", + "MPX bounds registers", + "MPX CSR", + "AVX-512 opmask", + "AVX-512 Hi256", + "AVX-512 ZMM_Hi256", + "Processor Trace (unused)", "Protection Keys User registers", "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "Control-flow User registers", + "Control-flow Kernel registers (unused)", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config", + "AMX Tile data", + "unknown xstate feature", }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ +#define XCHECK_SZ(sz, nr, __struct) ({ \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "[%s]: struct is %zu bytes, cpu state %d bytes\n", \ + xfeature_names[nr], sizeof(__struct), sz)) { \ __xstate_dump_leaves(); \ } \ -} while (0) + true; \ +}) + /** * check_xtile_data_against_struct - Check tile data state size. @@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); - - /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) - check_xtile_data_against_struct(sz); - - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + switch (nr) { + case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct); + case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state); + case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state); + case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state); + case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state); + case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state); + case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state); + case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); + case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); + case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; + default: XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr); return false; } + return true; } From patchwork Fri Jul 21 03:03:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321406 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25679C001DE for ; Fri, 21 Jul 2023 06:09:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230230AbjGUGJf (ORCPT ); Fri, 21 Jul 2023 02:09:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230071AbjGUGJB (ORCPT ); Fri, 21 Jul 2023 02:09:01 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89B55271F; Thu, 20 Jul 2023 23:08:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919739; x=1721455739; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MwfzTOMWRaXCBbumrvW/EaiFBOP8yUV8uNuQHa+JVoI=; b=XScH0s3Z2hifcqsZ1zDsHj8pBPmIVooVbRETZiurG1f7cqXdurczyYGi GX/JjPNwfi2HlJs5ODK0gyqyWKIxucyQFbmCb48EbfbYNvt/6IaV4MQ5L 7epS8nM3+xHT8MsiGbCUMkqqf2D1dS6lrVDWeR9hr3eq+agXei5srhnUy Cv2xcNFD6nR4exHAUuJrMaRvpR39aQYAdUk3c7+ioVQSxdd0IUE+YZ2dG L/z2pm3fcpcUOHAo04C4lbpxl+ueE1q+ct9JXZ+60JzfDZJFxcK7Xne3a lsV2hsC1U9t789YMszCacRuWsGEQjenq8HcOnKWx08/dL8G/4OOUE5v0r w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547592" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547592" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721883" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721883" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:40 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Sean Christopherson Subject: [PATCH v4 03/20] KVM:x86: Report XSS as to-be-saved if there are supported features Date: Thu, 20 Jul 2023 23:03:35 -0400 Message-Id: <20230721030352.72414-4-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Add MSR_IA32_XSS to the list of MSRs reported to userspace if supported_xss is non-zero, i.e. KVM supports at least one XSS based feature. Signed-off-by: Sean Christopherson Signed-off-by: Yang Weijiang --- arch/x86/kvm/x86.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 439312e04384..4597ab3c811c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1451,6 +1451,7 @@ static const u32 msrs_to_save_base[] = { MSR_IA32_UMWAIT_CONTROL, MSR_IA32_XFD, MSR_IA32_XFD_ERR, + MSR_IA32_XSS, }; static const u32 msrs_to_save_pmu[] = { @@ -7173,6 +7174,10 @@ static void kvm_probe_msr_to_save(u32 msr_index) if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR)) return; break; + case MSR_IA32_XSS: + if (!kvm_caps.supported_xss) + return; + break; default: break; } From patchwork Fri Jul 21 03:03:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 358E9C001DC for ; Fri, 21 Jul 2023 06:09:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbjGUGJF (ORCPT ); Fri, 21 Jul 2023 02:09:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229970AbjGUGI4 (ORCPT ); Fri, 21 Jul 2023 02:08:56 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99B1E10F3; Thu, 20 Jul 2023 23:08:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919735; x=1721455735; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UcCgJA2s4rP276zztg85CBo/7zr+hvds9+mewB5nILs=; b=EW9Dx7815HChxJLjeAJ3wBqdYUcTX0ekdn3SqZXS+uhC/9E0RbB4ZTJY dhfPSatRUUsAQQ8Da8bTS2zLG8bkb2domt8clystvzLFD/6y6VZAkgNij EsnBftfUOo4VOuoRGapSw97sveojG4iaxjKi+h+VIH+WS+fHfAeOJhfWW wZMyyp8/w1EdTBf5Vws/tzw1HboroNCy2LbB03V3SWzxD5iIw5jg26Rap Qc5QCBgWkhHzvCBbYnvRsvujH50XNApb4SCJdWeuecZcBbwCjxawylDWo LXGGDIP9I7J5ZS0tEr9gXsENDOuadRS48Ij2zMOq/S9k1CAw/uKsLqxvW g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547548" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547548" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721886" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721886" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:40 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Zhang Yi Z Subject: [PATCH v4 04/20] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Date: Thu, 20 Jul 2023 23:03:36 -0400 Message-Id: <20230721030352.72414-5-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified. CPUID(EAX=0DH,ECX=1).EBX reports required storage size of all enabled xstate features in XCR0 | XSS. Guest can allocate sufficient xsave buffer based on the info. Note, KVM does not yet support any XSS based features, i.e. supported_xss is guaranteed to be zero at this time. Co-developed-by: Zhang Yi Z Signed-off-by: Zhang Yi Z Signed-off-by: Yang Weijiang --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c | 20 ++++++++++++++++++-- arch/x86/kvm/x86.c | 8 +++++--- 3 files changed, 24 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 28bd38303d70..20bbcd95511f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -804,6 +804,7 @@ struct kvm_vcpu_arch { u64 xcr0; u64 guest_supported_xcr0; + u64 guest_supported_xss; struct kvm_pio_request pio; void *pio_data; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 7f4d13383cf2..0338316b827c 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -249,6 +249,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent) return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0; } +static u64 cpuid_get_supported_xss(struct kvm_cpuid_entry2 *entries, int nent) +{ + struct kvm_cpuid_entry2 *best; + + best = cpuid_entry2_find(entries, nent, 0xd, 1); + if (!best) + return 0; + + return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss; +} + static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries, int nent) { @@ -276,8 +287,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e best = cpuid_entry2_find(entries, nent, 0xD, 1); if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) || - cpuid_entry_has(best, X86_FEATURE_XSAVEC))) - best->ebx = xstate_required_size(vcpu->arch.xcr0, true); + cpuid_entry_has(best, X86_FEATURE_XSAVEC))) { + u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss; + + best->ebx = xstate_required_size(xstate, true); + } best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent); if (kvm_hlt_in_guest(vcpu->kvm) && best && @@ -325,6 +339,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); + vcpu->arch.guest_supported_xss = + cpuid_get_supported_xss(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent); /* * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4597ab3c811c..26edb8fa857a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3781,10 +3781,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than * XSAVES/XRSTORS to save/restore PT MSRs. */ - if (data & ~kvm_caps.supported_xss) + if (data & ~vcpu->arch.guest_supported_xss) return 1; - vcpu->arch.ia32_xss = data; - kvm_update_cpuid_runtime(vcpu); + if (vcpu->arch.ia32_xss != data) { + vcpu->arch.ia32_xss = data; + kvm_update_cpuid_runtime(vcpu); + } break; case MSR_SMI_COUNT: if (!msr_info->host_initiated) From patchwork Fri Jul 21 03:03:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CADFEB64DD for ; Fri, 21 Jul 2023 06:08:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229925AbjGUGI4 (ORCPT ); Fri, 21 Jul 2023 02:08:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229821AbjGUGIy (ORCPT ); Fri, 21 Jul 2023 02:08:54 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5629610F3; Thu, 20 Jul 2023 23:08:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919733; x=1721455733; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A/gv95RjbJ1ZnOGolXJm9SfRqqkclE9mWMbdAK0FExE=; b=WvAYZkG0NoKdCYBMjyHU2N9VlCmie9nsCDLNLJfHv+h7DbtXpswJjBT4 KMCfO9rnWWPTKggmzfDFJjVCM7LxbGUWl8OUpQvwnViOsb8zHsCc9dky5 JJnYl44YhQtUaEk4UjowR0wF3/i5wxFeB9mXDC9dZUYd5Ar7yeZXvcArR TUnRqVe4LOmIvk0/Fz4pQrZO/AIHaY9dRh4OV2HHOXPxv9smO4fG/cwmx WIXHg89mrHovJIfY1NqQsEm1dCtKFTQBvDnQbZyYj5SYUY5EA+yNScKDU g62NRMQ66AxfU51lM7bPHvZVAJtL5seK2eCc9CEbUI/7LhYim1gGiBeL+ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547512" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547512" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721890" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721890" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:40 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 05/20] KVM:x86: Initialize kvm_caps.supported_xss Date: Thu, 20 Jul 2023 23:03:37 -0400 Message-Id: <20230721030352.72414-6-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Set kvm_caps.supported_xss to host_xss && KVM XSS mask. host_xss contains the host supported xstate feature bits for thread context switch, KVM_SUPPORTED_XSS includes all KVM enabled XSS feature bits, the operation result represents all KVM supported feature bits. Since the result is subset of host_xss, the related XSAVE-managed MSRs are automatically swapped for guest and host when vCPU exits to userspace. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/vmx.c | 1 - arch/x86/kvm/x86.c | 6 +++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..c8d9870cfecb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7849,7 +7849,6 @@ static __init void vmx_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_UMIP); /* CPUID 0xD.1 */ - kvm_caps.supported_xss = 0; if (!cpu_has_vmx_xsaves()) kvm_cpu_cap_clear(X86_FEATURE_XSAVES); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 26edb8fa857a..8bdcbcf13146 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -225,6 +225,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) +#define KVM_SUPPORTED_XSS 0 + u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); @@ -9499,8 +9501,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) rdmsrl_safe(MSR_EFER, &host_efer); - if (boot_cpu_has(X86_FEATURE_XSAVES)) + if (boot_cpu_has(X86_FEATURE_XSAVES)) { rdmsrl(MSR_IA32_XSS, host_xss); + kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS; + } kvm_init_pmu_capability(ops->pmu_ops); From patchwork Fri Jul 21 03:03:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321404 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3059DEB64DC for ; Fri, 21 Jul 2023 06:09:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230332AbjGUGJ0 (ORCPT ); Fri, 21 Jul 2023 02:09:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229821AbjGUGI7 (ORCPT ); Fri, 21 Jul 2023 02:08:59 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2665019A6; Thu, 20 Jul 2023 23:08:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919738; x=1721455738; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tsSOF6cYaov4kLpfGw7bY+1WPoPGNDtD70m4QddfAE4=; b=gz2cbFwk9k7de1sTGpaWDZXMGpWhP7HnJgwDweI6RQxi9KiUV3NpArRL 5vTCs/LlXL0MTPDhfgXMUod46dzRIiVc6W8gKQ4BqFOJF/508gvj1iDrE +r5MZyfTNB2I63pq5ucCal8G3Of2gRCZlHEwKTnZUK86Aq/4WkNF3s8tI kBI3AJaBwx28kVWN3MD0F4n3EewAvzIUl6saCZ3Alxxqx388PBleEUB4V IEH1fGvHw0RW+LNvkRMmJs4IEt9Et2EWr418IlBRXJSiOC27nWhSjCo6e 72gl7yZpvFrPJW4egwNtDJMHE3NtrpqVBE2d4mc4uYFFKl4QpjrABo5K7 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547588" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547588" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721962" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721962" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:40 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Sean Christopherson Subject: [PATCH v4 06/20] KVM:x86: Load guest FPU state when access XSAVE-managed MSRs Date: Thu, 20 Jul 2023 23:03:38 -0400 Message-Id: <20230721030352.72414-7-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Sean Christopherson Load the guest's FPU state if userspace is accessing MSRs whose values are managed by XSAVES. Two MSR access helpers, i.e., kvm_{get,set}_xsave_msr(), are designed by a later patch to facilitate access to such kind of MSRs. If MSRs supported in kvm_caps.supported_xss are passed through to guest, the guest MSRs are swapped with host contents before vCPU exits to userspace and after it enters kernel again. Because the modified code is also used for the KVM_GET_MSRS device ioctl(), explicitly check @vcpu is non-null before attempting to load guest state. The XSS supporting MSRs cannot be retrieved via the device ioctl() without loading guest FPU state (which doesn't exist). Note that guest_cpuid_has() is not queried as host userspace is allowed to access MSRs that have not been exposed to the guest, e.g. it might do KVM_SET_MSRS prior to KVM_SET_CPUID2. Signed-off-by: Sean Christopherson Co-developed-by: Yang Weijiang Signed-off-by: Yang Weijiang --- arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8bdcbcf13146..04f0245ad0a2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -132,6 +132,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); static DEFINE_MUTEX(vendor_module_lock); +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); + struct kvm_x86_ops kvm_x86_ops __read_mostly; #define KVM_X86_OP(func) \ @@ -4346,6 +4349,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } EXPORT_SYMBOL_GPL(kvm_get_msr_common); +static const u32 xstate_msrs[] = { + MSR_IA32_U_CET, MSR_IA32_PL3_SSP, +}; + +static bool is_xstate_msr(u32 index) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(xstate_msrs); i++) { + if (index == xstate_msrs[i]) + return true; + } + return false; +} + /* * Read or write a bunch of msrs. All parameters are kernel addresses. * @@ -4356,11 +4374,20 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, int (*do_msr)(struct kvm_vcpu *vcpu, unsigned index, u64 *data)) { + bool fpu_loaded = false; int i; - for (i = 0; i < msrs->nmsrs; ++i) + for (i = 0; i < msrs->nmsrs; ++i) { + if (vcpu && !fpu_loaded && kvm_caps.supported_xss && + is_xstate_msr(entries[i].index)) { + kvm_load_guest_fpu(vcpu); + fpu_loaded = true; + } if (do_msr(vcpu, entries[i].index, &entries[i].data)) break; + } + if (fpu_loaded) + kvm_put_guest_fpu(vcpu); return i; } From patchwork Fri Jul 21 03:03:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDD5AC04E69 for ; Fri, 21 Jul 2023 06:09:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230082AbjGUGJB (ORCPT ); Fri, 21 Jul 2023 02:09:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229885AbjGUGIz (ORCPT ); Fri, 21 Jul 2023 02:08:55 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D22292; Thu, 20 Jul 2023 23:08:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919734; x=1721455734; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TLsCDwkPypQyoDGr6WA5M8mk+rSbh1ZROOPBD8uLgVM=; b=Q+Y9TnWjyRBrD/wc3at1uonwO1muE6vcB6PAs70p8SmzEEHFtClxx7lT 0ZQizZCHMDXmELcX+nUCdzuT9TRfYD+WfQ5YBqNGCkObtbl2Hj4tpSod9 eB6Zsrocnqd8Uk2B2l6NPeOpvooCqfZP49t/cQr/iPCImSP2hxxwOHeV3 lX2uOd00H9PSVBtDV8ACHeLrLb35Cc0YVM2o3QOaMpveTJB/E4FQS9Gnh 70MvNIySnk7yNhZkWIsJbl9UGF02RqupWY8e+XW+hr00enUq5GYZsycb1 YU+NMCHGUETiv6Ky56OfoKoQgA6+c0kWb3SZIlUXXPFcVRTOmqZnRqdsC w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547533" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547533" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721964" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721964" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:40 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 07/20] KVM:x86: Add fault checks for guest CR4.CET setting Date: Thu, 20 Jul 2023 23:03:39 -0400 Message-Id: <20230721030352.72414-8-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Check potential faults for CR4.CET setting per Intel SDM. CET can be enabled if and only if CR0.WP==1, i.e. setting CR4.CET=1 faults if CR0.WP==0 and setting CR0.WP=0 fails if CR4.CET==1. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Yang Weijiang --- arch/x86/kvm/x86.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 04f0245ad0a2..be76ac6bbb21 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -993,6 +993,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE))) return 1; + if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) + return 1; + static_call(kvm_x86_set_cr0)(vcpu, cr0); kvm_post_set_cr0(vcpu, old_cr0, cr0); @@ -1204,6 +1207,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) return 1; } + if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP)) + return 1; + static_call(kvm_x86_set_cr4)(vcpu, cr4); kvm_post_set_cr4(vcpu, old_cr4, cr4); From patchwork Fri Jul 21 03:03:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14357EB64DC for ; Fri, 21 Jul 2023 06:09:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230219AbjGUGJJ (ORCPT ); Fri, 21 Jul 2023 02:09:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230018AbjGUGI5 (ORCPT ); Fri, 21 Jul 2023 02:08:57 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52D731711; Thu, 20 Jul 2023 23:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919736; x=1721455736; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cPRAjBh+JcqfxChi1axz7cYPe9M4lTZa+vuA83lyP04=; b=EF11m+fMI/p4oBXPIw6WsuhgCp9/zt5bKLnaYLS3NKzbLw66hA/TJC7C V+vkvFhJ2dTJg6OF92oDmuLqXsERPg7R/kxXSAW6OQgVwKEHs4GqZ89H9 BCvA3/vB+Q7Ee3xqN6PKG9nx6Pajze57oWxtbe6xVLGFt5kyAMgcZ9Uo5 nvMxjYU/LdBC4ja3SMdJmGHRgUaToCGCjK902D2X3uZO3kLKF7OQibCRA STBB084UA+y9nlU9MSwceq+8AY1uoGsRXT+ljp8MQzSN+Q8qQ5o+NuCxO 3wwK2lXwm6Bm6/05JNdB8QewV1jFrtwtVC53PJKo0NhpcBUbAKPiwWqON A==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547554" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547554" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721966" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721966" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 08/20] KVM:x86: Report KVM supported CET MSRs as to-be-saved Date: Thu, 20 Jul 2023 23:03:40 -0400 Message-Id: <20230721030352.72414-9-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add all CET MSRs including the synthesized GUEST_SSP to report list. PL{0,1,2}_SSP are independent to host XSAVE management with later patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP are not XSAVE-managed. When CET IBT/SHSTK are enumerated to guest, both user and supervisor modes should be supported for architechtural integrity, i.e., two modes are supported as both or neither. Signed-off-by: Yang Weijiang --- arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/x86.c | 10 ++++++++++ arch/x86/kvm/x86.h | 10 ++++++++++ 3 files changed, 21 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 6e64b27b2c1e..7af465e4e0bd 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -58,6 +58,7 @@ #define MSR_KVM_ASYNC_PF_INT 0x4b564d06 #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 #define MSR_KVM_MIGRATION_CONTROL 0x4b564d08 +#define MSR_KVM_GUEST_SSP 0x4b564d09 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be76ac6bbb21..59e961a88b75 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1463,6 +1463,9 @@ static const u32 msrs_to_save_base[] = { MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS, + MSR_IA32_U_CET, MSR_IA32_S_CET, + MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP, + MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP, }; static const u32 msrs_to_save_pmu[] = { @@ -7215,6 +7218,13 @@ static void kvm_probe_msr_to_save(u32 msr_index) if (!kvm_caps.supported_xss) return; break; + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + if (!kvm_is_cet_supported()) + return; + break; default: break; } diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 82e3dafc5453..6e6292915f8c 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void) == (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR); } +#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER) +/* + * Shadow Stack and Indirect Branch Tracking feature enabling depends on + * whether host side CET user xstate bit is supported or not. + */ +static inline bool kvm_is_cet_supported(void) +{ + return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK; +} + extern unsigned int min_timer_period_us; extern bool enable_vmware_backdoor; From patchwork Fri Jul 21 03:03:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31F49EB64DC for ; Fri, 21 Jul 2023 06:09:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230264AbjGUGJR (ORCPT ); Fri, 21 Jul 2023 02:09:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230034AbjGUGI6 (ORCPT ); Fri, 21 Jul 2023 02:08:58 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4AA9171D; Thu, 20 Jul 2023 23:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919736; x=1721455736; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/Vcu3lfidbXVWuWh0tAa22jzUlvj8smQsLz77KEzUs0=; b=PdSmu6ahSDJKCcHd6pkyMngHUfSclIPa5OqbWleMizVcQlGyzvshD60H GL+W5cWD08ciJc0v+KQdrDYRBO1rnGuV3rWTxoQ///bdp/M4KqBEXJMBH A8aw/94eWnT/+u+Aonu/K+LXuUYsXtj2vplJMQpbgbaAEDm9i4KAFhEOa kjaArlGE0IvOhwkpnso0qu6OBMFN7D5oksNhuiKyF7zxT+kyvZwaX3gVF wSdMCoax0CsDRTmsuPM2i/Hp+8EHYDJljJUtVCT1ZkX4uv1qfBehfaSIc xvNlV6FE3dDBTCjax43uNQUh+mYyawqDcPxXcCmBC71C9ytzmfE3PACIR g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547567" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547567" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721968" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721968" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access Date: Thu, 20 Jul 2023 23:03:41 -0400 Message-Id: <20230721030352.72414-10-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add unified handling for AMD/Intel's SHSTK MSRs, and leave IBT related handling in VMX specific code. Guest supervisor SSPs are handled specially because they are loaded into HW registers only when the SSPs are used by guest. Currently, AMD doesn't support IBT, so move related handling in VMX specific code. Signed-off-by: Yang Weijiang --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.h | 10 ++++ arch/x86/kvm/x86.c | 103 +++++++++++++++++++++++++++++--- arch/x86/kvm/x86.h | 18 ++++++ 4 files changed, 124 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 20bbcd95511f..69cbc9d9b277 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -805,6 +805,7 @@ struct kvm_vcpu_arch { u64 xcr0; u64 guest_supported_xcr0; u64 guest_supported_xss; + u64 cet_s_ssp[3]; struct kvm_pio_request pio; void *pio_data; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index b1658c0de847..7791a19b88ba 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -232,4 +232,14 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu, return vcpu->arch.pv_cpuid.features & (1u << kvm_feature); } +/* + * FIXME: When the "KVM-governed" enabling patchset is merge, rebase this + * series on top of that and replace this one with the helper merged. + */ +static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu, + unsigned int feature) +{ + return kvm_cpu_cap_has(feature) && guest_cpuid_has(vcpu, feature); +} + #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 59e961a88b75..7a3753c05c09 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3628,6 +3628,47 @@ static bool kvm_is_msr_to_save(u32 msr_index) return false; } +static inline bool is_shadow_stack_msr(struct kvm_vcpu *vcpu, + struct msr_data *msr) +{ + return msr->index == MSR_IA32_PL0_SSP || + msr->index == MSR_IA32_PL1_SSP || + msr->index == MSR_IA32_PL2_SSP || + msr->index == MSR_IA32_PL3_SSP || + msr->index == MSR_IA32_INT_SSP_TAB || + msr->index == MSR_KVM_GUEST_SSP; +} + +static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, + struct msr_data *msr) +{ + + /* + * This function cannot work without later CET MSR read/write + * emulation patch. + */ + WARN_ON_ONCE(1); + + if (is_shadow_stack_msr(vcpu, msr)) { + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK)) + return false; + + if (msr->index == MSR_KVM_GUEST_SSP) + return msr->host_initiated; + + return msr->host_initiated || + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); + } + + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && + !kvm_cpu_cap_has(X86_FEATURE_IBT)) + return false; + + return msr->host_initiated || + guest_cpuid_has(vcpu, X86_FEATURE_IBT) || + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); +} + int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { u32 msr = msr_info->index; @@ -3982,6 +4023,35 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu->arch.guest_fpu.xfd_err = data; break; #endif +#define CET_IBT_MASK_BITS GENMASK_ULL(63, 2) +#define CET_SHSTK_MASK_BITS GENMASK(1, 0) + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + if (!kvm_cet_is_msr_accessible(vcpu, msr_info)) + return 1; + if ((!guest_can_use(vcpu, X86_FEATURE_SHSTK) && + (data & CET_SHSTK_MASK_BITS)) || + (!guest_can_use(vcpu, X86_FEATURE_IBT) && + (data & CET_IBT_MASK_BITS))) + return 1; + if (msr == MSR_IA32_U_CET) + kvm_set_xsave_msr(msr_info); + break; + case MSR_KVM_GUEST_SSP: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + if (!kvm_cet_is_msr_accessible(vcpu, msr_info)) + return 1; + if (is_noncanonical_address(data, vcpu)) + return 1; + if (!IS_ALIGNED(data, 4)) + return 1; + if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP || + msr == MSR_IA32_PL2_SSP) { + vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data; + } else if (msr == MSR_IA32_PL3_SSP) { + kvm_set_xsave_msr(msr_info); + } + break; default: if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_set_msr(vcpu, msr_info); @@ -4052,7 +4122,9 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host) int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { - switch (msr_info->index) { + u32 msr = msr_info->index; + + switch (msr) { case MSR_IA32_PLATFORM_ID: case MSR_IA32_EBL_CR_POWERON: case MSR_IA32_LASTBRANCHFROMIP: @@ -4087,7 +4159,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3: case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1: case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1: - if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) + if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_get_msr(vcpu, msr_info); msr_info->data = 0; break; @@ -4138,7 +4210,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_MTRRcap: case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: case MSR_MTRRdefType: - return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data); + return kvm_mtrr_get_msr(vcpu, msr, &msr_info->data); case 0xcd: /* fsb frequency */ msr_info->data = 3; break; @@ -4160,7 +4232,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = kvm_get_apic_base(vcpu); break; case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff: - return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data); + return kvm_x2apic_msr_read(vcpu, msr, &msr_info->data); case MSR_IA32_TSC_DEADLINE: msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu); break; @@ -4254,7 +4326,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_MCG_STATUS: case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1: - return get_msr_mce(vcpu, msr_info->index, &msr_info->data, + return get_msr_mce(vcpu, msr, &msr_info->data, msr_info->host_initiated); case MSR_IA32_XSS: if (!msr_info->host_initiated && @@ -4285,7 +4357,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case HV_X64_MSR_TSC_EMULATION_STATUS: case HV_X64_MSR_TSC_INVARIANT_CONTROL: return kvm_hv_get_msr_common(vcpu, - msr_info->index, &msr_info->data, + msr, &msr_info->data, msr_info->host_initiated); case MSR_IA32_BBL_CR_CTL3: /* This legacy MSR exists but isn't fully documented in current @@ -4338,8 +4410,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = vcpu->arch.guest_fpu.xfd_err; break; #endif + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + if (!kvm_cet_is_msr_accessible(vcpu, msr_info)) + return 1; + if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP || + msr == MSR_IA32_PL2_SSP) { + msr_info->data = + vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP]; + } else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) { + kvm_get_xsave_msr(msr_info); + } + break; default: - if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) + if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_get_msr(vcpu, msr_info); /* @@ -4347,7 +4433,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) * to-be-saved, even if an MSR isn't fully supported. */ if (msr_info->host_initiated && - kvm_is_msr_to_save(msr_info->index)) { + kvm_is_msr_to_save(msr)) { msr_info->data = 0; break; } @@ -12131,6 +12217,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vcpu->arch.cr3 = 0; kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); + memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp)); /* * CR0.CD/NW are set on RESET, preserved on INIT. Note, some versions diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 6e6292915f8c..09dd35a79ff3 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -549,4 +549,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size, unsigned int port, void *data, unsigned int count, int in); +/* + * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before + * access the MSRs to avoid MSR content corruption. + */ +static inline void kvm_get_xsave_msr(struct msr_data *msr_info) +{ + kvm_fpu_get(); + rdmsrl(msr_info->index, msr_info->data); + kvm_fpu_put(); +} + +static inline void kvm_set_xsave_msr(struct msr_data *msr_info) +{ + kvm_fpu_get(); + wrmsrl(msr_info->index, msr_info->data); + kvm_fpu_put(); +} + #endif From patchwork Fri Jul 21 03:03:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22DA1EB64DD for ; Fri, 21 Jul 2023 06:09:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230281AbjGUGJz (ORCPT ); Fri, 21 Jul 2023 02:09:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230100AbjGUGJN (ORCPT ); Fri, 21 Jul 2023 02:09:13 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28BB42D54; Thu, 20 Jul 2023 23:09:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919743; x=1721455743; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NdP2SeUCzPEdFuzwKBKi+sK0rXQVe+hgWiznlOj3LLM=; b=Sqyb+VllIsouvOilnlD1Uu154n+oIaBrxePBC+mmZyBVDxzwIRjrMXIW YnIAOtQ2cml5sa2otVP8R0tk8a5mFSFAFa/GgAeMJfjsQlw31gdUUrSc9 GAJlJnLzZh4/zsobAAM4rUqEXUmQyCbIvTAJhs+hzOhCf5azI5bQkv0xv 5MKEM0UOWGp3ZzxAVH1uMmFWnLQhaP0rZEPAEhmzdxYwqJZRmQSJg4F7t WN7XTtLSDlBd8GC03iPMlNMaaCQYO9429D76MUGmeRhTleQ9ewfasf2D+ dTOy0hMX9XxMl+zuDnuZY4cRsl6NCsog+gDx1SnLk3LLDIA0KTX+GycYC w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547616" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547616" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721971" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721971" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 10/20] KVM:x86: Make guest supervisor states as non-XSAVE managed Date: Thu, 20 Jul 2023 23:03:42 -0400 Message-Id: <20230721030352.72414-11-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Save and reload guest CET supervisor states, i.e.,PL{0,1,2}_SSP, when vCPU context is being swapped before and after userspace <->kernel entry, also do the same operation when vCPU is sched-in or sched-out. Enabling CET supervisor state management in KVM due to: -Introducing unnecessary XSAVE operation when switch to non-vCPU userspace within current FPU framework. -Forcing allocating additional space for CET supervisor states in each thread context regardless whether it's vCPU thread or not. Add a new helper kvm_arch_sched_out() for that purpose. Adding the support in kvm_arch_vcpu_put/load() without the new helper looks possible, but the put/load functions are also called in vcpu_put()/load(), the latter are heavily used in KVM, so adding new helper makes the implementation clearer. Signed-off-by: Yang Weijiang --- arch/arm64/include/asm/kvm_host.h | 1 + arch/mips/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/kvm_host.h | 1 + arch/riscv/include/asm/kvm_host.h | 1 + arch/s390/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 37 +++++++++++++++++++++++++++++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 8 files changed, 44 insertions(+) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 7e7e19ef6993..98235cb3d258 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1023,6 +1023,7 @@ void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu); static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} +static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {} void kvm_arm_init_debug(void); void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu); diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 957121a495f0..56c5e85ba5a3 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -893,6 +893,7 @@ static inline void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) {} static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} +static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 14ee0dece853..11587d953bf6 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -880,6 +880,7 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {} static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} +static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index ee0acccb1d3b..6ff4a04fe0f2 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -244,6 +244,7 @@ struct kvm_vcpu_arch { static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} +static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {} #define KVM_ARCH_WANT_MMU_NOTIFIER diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 2bbc3d54959d..d1750a6a86cf 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -1033,6 +1033,7 @@ extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc); static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} +static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) {} static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {} diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7a3753c05c09..f7558f0f6fc0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11212,6 +11212,33 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) trace_kvm_fpu(0); } +static void kvm_save_cet_supervisor_ssp(struct kvm_vcpu *vcpu) +{ + preempt_disable(); + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) { + rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]); + rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]); + rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]); + /* + * Omit reset to host PL{1,2}_SSP because Linux will never use + * these MSRs. + */ + wrmsrl(MSR_IA32_PL0_SSP, 0); + } + preempt_enable(); +} + +static void kvm_reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu) +{ + preempt_disable(); + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) { + wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]); + wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]); + wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]); + } + preempt_enable(); +} + int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { struct kvm_queued_exception *ex = &vcpu->arch.exception; @@ -11222,6 +11249,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) kvm_sigset_activate(vcpu); kvm_run->flags = 0; kvm_load_guest_fpu(vcpu); + kvm_reload_cet_supervisor_ssp(vcpu); kvm_vcpu_srcu_read_lock(vcpu); if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { @@ -11310,6 +11338,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) r = vcpu_run(vcpu); out: + kvm_save_cet_supervisor_ssp(vcpu); kvm_put_guest_fpu(vcpu); if (kvm_run->kvm_valid_regs) store_regs(vcpu); @@ -12398,9 +12427,17 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) pmu->need_cleanup = true; kvm_make_request(KVM_REQ_PMU, vcpu); } + + kvm_reload_cet_supervisor_ssp(vcpu); + static_call(kvm_x86_sched_in)(vcpu, cpu); } +void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) +{ + kvm_save_cet_supervisor_ssp(vcpu); +} + void kvm_arch_free_vm(struct kvm *kvm) { kfree(to_kvm_hv(kvm)->hv_pa_pg); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d90331f16db1..b3032a5f0641 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1423,6 +1423,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu); +void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu); void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 66c1447d3c7f..42f28e8905e1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5885,6 +5885,7 @@ static void kvm_sched_out(struct preempt_notifier *pn, { struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); + kvm_arch_sched_out(vcpu, 0); if (current->on_rq) { WRITE_ONCE(vcpu->preempted, true); WRITE_ONCE(vcpu->ready, true); From patchwork Fri Jul 21 03:03:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B824BEB64DC for ; Fri, 21 Jul 2023 06:09:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230253AbjGUGJO (ORCPT ); Fri, 21 Jul 2023 02:09:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbjGUGI6 (ORCPT ); Fri, 21 Jul 2023 02:08:58 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78EE610F3; Thu, 20 Jul 2023 23:08:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919737; x=1721455737; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RHa2LwonmPJL/W+28Rw5xERcqDGr/QwBrpN8d1H9ufA=; b=TK1fBbV7gUDtFxIcrP8GOI42bWzIdyfSEjtJ8Dn4/4SLj6SXRO+CoHpI ecn+GieOg4lu9YBYcpfl9G/Ujx+hcYOgDEcE800F5DBuKE7r34JH8Lkvq s9Tsn6oEqvn/CrzFLF5Eu3CG1bb3Vf8Yb3sSi2lHrQ6mpWzoVTwDdqSIi UdQ4cK90bPUK1ssBivnbEx9JYucQZAhpawm0qS85zTVhRV/0ChWeyP8zc 7rYOCmD9gaKbmWLfkxW7yeI2fMbpuo2NhzaneyTI78hoznoiwzLpYs+06 M7N3wYErdVpWSWOJctt1LEmPxAFhD1XxJDFtJNHQhLE4p0KGyTFKg+j7s A==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547577" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547577" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721974" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721974" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 11/20] KVM:x86: Save and reload GUEST_SSP to/from SMRAM Date: Thu, 20 Jul 2023 23:03:43 -0400 Message-Id: <20230721030352.72414-12-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Save GUEST_SSP to SMRAM on SMI and reload it on RSM. KVM emulates architectural behavior when guest enters/leaves SMM mode, i.e., save registers to SMRAM at the entry of SMM and reload them at the exit of SMM. Per SDM, GUEST_SSP is defined as one of the fields in SMRAM for 64-bit mode, so handle the state accordingly. Check HF_SMM_MASK to determine whether kvm_cet_is_msr_accessible() is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode. Signed-off-by: Yang Weijiang --- arch/x86/kvm/smm.c | 17 +++++++++++++++++ arch/x86/kvm/smm.h | 2 +- arch/x86/kvm/x86.c | 12 +++++++++++- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c index b42111a24cc2..a4e19d72224f 100644 --- a/arch/x86/kvm/smm.c +++ b/arch/x86/kvm/smm.c @@ -309,6 +309,15 @@ void enter_smm(struct kvm_vcpu *vcpu) kvm_smm_changed(vcpu, true); +#ifdef CONFIG_X86_64 + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) { + u64 data; + + if (!kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &data)) + smram.smram64.ssp = data; + } +#endif + if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram))) goto error; @@ -586,6 +595,14 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt) if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0) static_call(kvm_x86_set_nmi_mask)(vcpu, false); +#ifdef CONFIG_X86_64 + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) { + u64 data = smram.smram64.ssp; + + if (is_noncanonical_address(data, vcpu) && IS_ALIGNED(data, 4)) + kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, data); + } +#endif kvm_smm_changed(vcpu, false); /* diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h index a1cf2ac5bd78..b3efef7cb1dc 100644 --- a/arch/x86/kvm/smm.h +++ b/arch/x86/kvm/smm.h @@ -116,7 +116,7 @@ struct kvm_smram_state_64 { u32 smbase; u32 reserved4[5]; - /* ssp and svm_* fields below are not implemented by KVM */ + /* svm_* fields below are not implemented by KVM */ u64 ssp; u64 svm_guest_pat; u64 svm_host_efer; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f7558f0f6fc0..70d7c80889d6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3653,8 +3653,18 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK)) return false; - if (msr->index == MSR_KVM_GUEST_SSP) + /* + * This MSR is synthesized mainly for userspace access during + * Live Migration, it also can be accessed in SMM mode by VMM. + * Guest is not allowed to access this MSR. + */ + if (msr->index == MSR_KVM_GUEST_SSP) { + if (IS_ENABLED(CONFIG_X86_64) && + !!(vcpu->arch.hflags & HF_SMM_MASK)) + return true; + return msr->host_initiated; + } return msr->host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); From patchwork Fri Jul 21 03:03:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321398 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A5E2C001DE for ; Fri, 21 Jul 2023 06:09:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230233AbjGUGJM (ORCPT ); Fri, 21 Jul 2023 02:09:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230028AbjGUGI6 (ORCPT ); Fri, 21 Jul 2023 02:08:58 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FE741BDC; Thu, 20 Jul 2023 23:08:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919737; x=1721455737; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4bPEHO8OY9NnpX6vcR7dyAdp18jscs+rXkAmk91xB4U=; b=aDdkNLVNcVRltsuY/ZjqUjKtUZimumT83K4vgkzSgtLTMh/6CnIZYRZ+ VpTxVxYjK+uNtlcBS8ZA3c4RupYqRdsti2kyCwARXQ/aISo0hoYbbmNQ7 ugLcDbDtKCELytZcoqLCDYta/ryX2PV6t8gqlooLqAv6kIuvyVVo+mKFd UwSWEKh2kgqVEuitykdSPJzx20Uzu4kjrwxdsO6inzacU/eyqSkXThSr4 T8baJIhh/VeH68xTDYMKb61Zh44/gWk6y5/wIIO/ML7DrcalJEsG9icHD TldLJvyhLkEviBl218RutUc/wptu5NbR5EshKdHlQffK58uB2CnlMtO4Y A==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547578" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547578" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721976" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721976" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com, Zhang Yi Z Subject: [PATCH v4 12/20] KVM:VMX: Introduce CET VMCS fields and control bits Date: Thu, 20 Jul 2023 23:03:44 -0400 Message-Id: <20230721030352.72414-13-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Control-flow Enforcement Technology (CET) is a kind of CPU feature used to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks. It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP style control-flow subversion attacks. Shadow Stack (SHSTK): A shadow stack is a second stack used exclusively for control transfer operations. The shadow stack is separate from the data/normal stack and can be enabled individually in user and kernel mode. When shadow stack is enabled, CALL pushes the return address on both the data and shadow stack. RET pops the return address from both stacks and compares them. If the return addresses from the two stacks do not match, the processor generates a #CP. Indirect Branch Tracking (IBT): IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of indirect branches (CALL, JMP etc...). If an indirect branch is executed and the next instruction is _not_ an ENDBRANCH, the processor generates a #CP. These instruction behaves as a NOP on platforms that doesn't support CET. Several new CET MSRs are defined to support CET: MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} mode respectively. MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}. MSR_IA32_INT_SSP_TAB: Linear address of SHSTK table,the entry is indexed by IST of interrupt gate desc. Two XSAVES state bits are introduced for CET: IA32_XSS:[bit 11]: Control saving/restoring user mode CET states IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states. Six VMCS fields are introduced for CET: {HOST,GUEST}_S_CET: Stores CET settings for kernel mode. {HOST,GUEST}_SSP: Stores shadow stack pointer of current active task/thread. {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB. On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY control fields: If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET states are restored from the following VMCS fields at VM-Exit: HOST_S_CET HOST_SSP HOST_INTR_SSP_TABLE If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET states are loaded from the following VMCS fields at VM-Entry: GUEST_S_CET GUEST_SSP GUEST_INTR_SSP_TABLE Co-developed-by: Zhang Yi Z Signed-off-by: Zhang Yi Z Signed-off-by: Yang Weijiang Reviewed-by: Chao Gao --- arch/x86/include/asm/vmx.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0d02c4aafa6f..db7f93307349 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -104,6 +104,7 @@ #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 +#define VM_EXIT_LOAD_CET_STATE 0x10000000 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff @@ -117,6 +118,7 @@ #define VM_ENTRY_LOAD_BNDCFGS 0x00010000 #define VM_ENTRY_PT_CONCEAL_PIP 0x00020000 #define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000 +#define VM_ENTRY_LOAD_CET_STATE 0x00100000 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff @@ -345,6 +347,9 @@ enum vmcs_field { GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822, GUEST_SYSENTER_ESP = 0x00006824, GUEST_SYSENTER_EIP = 0x00006826, + GUEST_S_CET = 0x00006828, + GUEST_SSP = 0x0000682a, + GUEST_INTR_SSP_TABLE = 0x0000682c, HOST_CR0 = 0x00006c00, HOST_CR3 = 0x00006c02, HOST_CR4 = 0x00006c04, @@ -357,6 +362,9 @@ enum vmcs_field { HOST_IA32_SYSENTER_EIP = 0x00006c12, HOST_RSP = 0x00006c14, HOST_RIP = 0x00006c16, + HOST_S_CET = 0x00006c18, + HOST_SSP = 0x00006c1a, + HOST_INTR_SSP_TABLE = 0x00006c1c }; /* From patchwork Fri Jul 21 03:03:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07FDCEB64DD for ; Fri, 21 Jul 2023 06:09:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230413AbjGUGJs (ORCPT ); Fri, 21 Jul 2023 02:09:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229936AbjGUGJM (ORCPT ); Fri, 21 Jul 2023 02:09:12 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F9CA2735; Thu, 20 Jul 2023 23:09:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919742; x=1721455742; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Fx9J88O7Q3yUbtft19sTQ/HsoEJxgMzNzpONUm7yFwM=; b=KN5va1MZJF3+624TteJkh7VLlZrrJ1MmAXHqwYZeUoS65WGJZN6PGOPQ qtXxTv+EzGHPKGgAS6omAK5qOI+BqJkCZ1EakmqMEB+k9wgrlP+v7OcFJ 3CnW8x3jcFqHzuDPrq5RIibp1gwaLoWLx3wwHbZFGKl0z1gAftRDzMqgI XLnPi3UYmmNFYJNTGS0RDXJmiPU+uW5Lqo0R6W8eFlO7kuAzxzXrL+sDT tDiOi/jWfB9DJ4bk3pDR8EynKdd3/WrQ3LZJJZerZm2sxA8MXdTtNBZj2 uV28oiKyLnKu3ylxin2W4Oa6K3cPbH65hNtGaaTOU756cPMfyJVX1nxet g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547630" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547630" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721978" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721978" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs Date: Thu, 20 Jul 2023 23:03:45 -0400 Message-Id: <20230721030352.72414-14-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add VMX specific emulation for CET MSR read and write. IBT feature is only available on Intel platforms now and the virtualization interface to the control fields is vensor specific, so split this part from the common code. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 7 ------- 2 files changed, 40 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c8d9870cfecb..b29817ec6f2e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2093,6 +2093,21 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) else msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; break; + case MSR_IA32_U_CET: + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: + return kvm_get_msr_common(vcpu, msr_info); + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_INT_SSP_TAB: + if (kvm_get_msr_common(vcpu, msr_info)) + return 1; + if (msr_info->index == MSR_KVM_GUEST_SSP) + msr_info->data = vmcs_readl(GUEST_SSP); + else if (msr_info->index == MSR_IA32_S_CET) + msr_info->data = vmcs_readl(GUEST_S_CET); + else if (msr_info->index == MSR_IA32_INT_SSP_TAB) + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE); + break; case MSR_IA32_DEBUGCTLMSR: msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); break; @@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) else vmx->pt_desc.guest.addr_a[index / 2] = data; break; +#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6)) +#define CET_LEG_BITMAP_BASE(data) ((data) >> 12) +#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR) + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: + return kvm_set_msr_common(vcpu, msr_info); + break; + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_KVM_GUEST_SSP: + case MSR_IA32_INT_SSP_TAB: + if ((msr_index == MSR_IA32_U_CET || + msr_index == MSR_IA32_S_CET) && + ((data & ~VMX_CET_CONTROL_MASK) || + !IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) || + (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)) + return 1; + if (kvm_set_msr_common(vcpu, msr_info)) + return 1; + if (msr_index == MSR_KVM_GUEST_SSP) + vmcs_writel(GUEST_SSP, data); + else if (msr_index == MSR_IA32_S_CET) + vmcs_writel(GUEST_S_CET, data); + else if (msr_index == MSR_IA32_INT_SSP_TAB) + vmcs_writel(GUEST_INTR_SSP_TABLE, data); + break; case MSR_IA32_PERF_CAPABILITIES: if (data && !vcpu_to_pmu(vcpu)->version) return 1; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 70d7c80889d6..e200f22cdaad 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3642,13 +3642,6 @@ static inline bool is_shadow_stack_msr(struct kvm_vcpu *vcpu, static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr) { - - /* - * This function cannot work without later CET MSR read/write - * emulation patch. - */ - WARN_ON_ONCE(1); - if (is_shadow_stack_msr(vcpu, msr)) { if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK)) return false; From patchwork Fri Jul 21 03:03:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65C3EC001DC for ; Fri, 21 Jul 2023 06:09:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230291AbjGUGJT (ORCPT ); Fri, 21 Jul 2023 02:09:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230022AbjGUGI5 (ORCPT ); Fri, 21 Jul 2023 02:08:57 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3B5E92; Thu, 20 Jul 2023 23:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919736; x=1721455736; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=szDn5JNOEksWSPekEwByzDTDJlcIsFrl8fPjFl+HRyw=; b=XdeoyaG7Eqr2Do1djMaqHqadVtgSscQ+MuSjjNA9pKrJ15F/NgMeJjFX OrkVneXxNSjn6zg0jlpCQyekWIbDgmcGwJfeZGuMTQQIdbudVIikPZCMk CHM26XqpUWoSuHQ91iWiLzFYa6hmuhD1LZ2DZ0bzGUDRhZdVH+96B1Myu mL5tEyy0kF25hCBIZFOzJF0rgO4mAfk5wiPQbuvBRsYyDI2Z9Fr1pyFaC pq2WbRtojjZYwp7ZahuYBmLinaJF+Bpj94Lwz1L1+fsfZa28/N0IVjkXN Bsh0CzgsjjCPqBb3z75YkADCUFc6ba39C3ffairFg4DYJVltvVroqDba/ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547556" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547556" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721981" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721981" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 14/20] KVM:VMX: Set up interception for CET MSRs Date: Thu, 20 Jul 2023 23:03:46 -0400 Message-Id: <20230721030352.72414-15-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Pass through CET MSRs when the associated feature is enabled. Shadow Stack feature requires all the CET MSRs to make it architectural support in guest. IBT feature only depends on MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and supervisor IBT. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b29817ec6f2e..85cb7e748a89 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -709,6 +709,10 @@ static bool is_valid_passthrough_msr(u32 msr) case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8: /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */ return true; + case MSR_IA32_U_CET: + case MSR_IA32_S_CET: + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB: + return true; } r = possible_passthrough_msr_slot(msr) != -ENOENT; @@ -7758,6 +7762,34 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) +{ + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) { + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, + MSR_TYPE_RW, false); + return; + } + + if (guest_can_use(vcpu, X86_FEATURE_IBT)) { + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, + MSR_TYPE_RW, false); + } +} + static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -7825,6 +7857,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) /* Refresh #PF interception to account for MAXPHYADDR changes. */ vmx_update_exception_bitmap(vcpu); + + if (kvm_is_cet_supported()) + vmx_update_intercept_for_cet_msr(vcpu); } static u64 vmx_get_perf_capabilities(void) From patchwork Fri Jul 21 03:03:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11E26EB64DC for ; Fri, 21 Jul 2023 06:09:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230191AbjGUGJV (ORCPT ); Fri, 21 Jul 2023 02:09:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbjGUGI7 (ORCPT ); Fri, 21 Jul 2023 02:08:59 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4E671BC1; Thu, 20 Jul 2023 23:08:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919738; x=1721455738; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZuQcR6f3jO1a6bdidmK1FklkJlTVOpSvErOUbLlFv04=; b=k9m0WLhjqsAqF0PVN7K204DL4rJ3TC+wmh/AoXSGaNOERuyTMzxpaYxR d2T99XnMmLy9v2+oI7Z8dwZNdThEBFLrpTNs8T6b2PKEHEUB4+1vGfCJP LZA9p0j4rBb8eLoySs6viGJIYJl6xWMSNtPuLniL4kNrfhp2E3B0YIJbI +NErzUo1/vMVaaF4CXJJVosNnZcw/BUP2XVInYzCYzcNc6Chc/XUgdtT/ jUcDDH0hkTvMv39OVvneIf03yp+KJ+JTL3irxN1+tVTUA+FA+feuzFtWT /FeIKTvLt+UYCZyjgnjs1BEjV7bVKSYZHATL+O9Of8SWHWdkrsx4AbVmp g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547603" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547603" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721983" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721983" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 15/20] KVM:VMX: Save host MSR_IA32_S_CET to VMCS field Date: Thu, 20 Jul 2023 23:03:47 -0400 Message-Id: <20230721030352.72414-16-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Save host MSR_IA32_S_CET to VMCS field as host constant state. Kernel IBT is supported now and the setting in MSR_IA32_S_CET is static after post-boot except in BIOS call case, but vCPU won't execute such BIOS call path currently, so it's safe to make the MSR as host constant. Suggested-by: Sean Christopherson Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/capabilities.h | 4 ++++ arch/x86/kvm/vmx/vmx.c | 8 ++++++++ 2 files changed, 12 insertions(+) diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index d0abee35d7ba..b1883f6c08eb 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void) return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; } +static inline bool cpu_has_load_cet_ctrl(void) +{ + return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE); +} static inline bool cpu_has_vmx_mpx(void) { return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 85cb7e748a89..cba24acf1a7a 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -109,6 +109,8 @@ module_param(enable_apicv, bool, S_IRUGO); bool __read_mostly enable_ipiv = true; module_param(enable_ipiv, bool, 0444); +static u64 __read_mostly host_s_cet; + /* * If nested=1, nested virtualization is supported, i.e., guests may use * VMX and be a hypervisor for its own guests. If nested=0, guests may not @@ -4355,6 +4357,9 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx) if (cpu_has_load_ia32_efer()) vmcs_write64(HOST_IA32_EFER, host_efer); + + if (cpu_has_load_cet_ctrl()) + vmcs_writel(HOST_S_CET, host_s_cet); } void set_cr4_guest_host_mask(struct vcpu_vmx *vmx) @@ -8633,6 +8638,9 @@ static __init int hardware_setup(void) return r; } + if (cpu_has_load_cet_ctrl()) + rdmsrl_safe(MSR_IA32_S_CET, &host_s_cet); + vmx_set_cpu_caps(); r = alloc_kvm_area(); From patchwork Fri Jul 21 03:03:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13855EB64DD for ; Fri, 21 Jul 2023 06:09:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230018AbjGUGJY (ORCPT ); Fri, 21 Jul 2023 02:09:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229937AbjGUGI7 (ORCPT ); Fri, 21 Jul 2023 02:08:59 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27DBE19B3; Thu, 20 Jul 2023 23:08:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919738; x=1721455738; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nqMR9J+OHoyRkGbaLaT7762FsalTwr5apnr/WAlbq54=; b=D/NJzhLhOiHaUwUmuSqFSaPo42d86HbCegCmPy2YA3JkhdmhZnPdXc4r WnyaebliVPLiw9fYpaV8kauUBtxkEDh36PKIInNj9VACYZIPgT8LZjywh 7eGri9exPcRvCUiAycAFhnfuBZ7X9uMDYX4iUl9oY9dDy5T+VunH4cuHu Ps8FkZL88SJpPz81Qpb7RZ+QFIPgseNOQG9k7dBMHqETlYznpIq2XqFVk rCawOSoBI77RyQXoOtHpmoc0RU3f/YHFZ+EiH8e6Lcbx9QO6tht1FQWST wZteWJE1ZVv9HcZhg0zvtgTOUNAFOlvsv/YcBgSaJ3IUPdvZuuBJ0FToD Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547568" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547568" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721985" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721985" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 16/20] KVM:x86: Optimize CET supervisor SSP save/reload Date: Thu, 20 Jul 2023 23:03:48 -0400 Message-Id: <20230721030352.72414-17-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Make PL{0,1,2}_SSP as write-intercepted to detect whether guest is using these MSRs. Disable intercept to the MSRs if they're written with non-zero values. KVM does save/ reload for the MSRs only if they're used by guest. Signed-off-by: Yang Weijiang --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++---- arch/x86/kvm/x86.c | 15 ++++++++++----- 3 files changed, 41 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 69cbc9d9b277..c50b555234fb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -748,6 +748,7 @@ struct kvm_vcpu_arch { bool tpr_access_reporting; bool xsaves_enabled; bool xfd_no_write_intercept; + bool cet_sss_active; u64 ia32_xss; u64 microcode_version; u64 arch_capabilities; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cba24acf1a7a..3eb4fe9c9ab6 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2155,6 +2155,18 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated return debugctl; } +static void vmx_disable_write_intercept_sss_msr(struct kvm_vcpu *vcpu) +{ + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) { + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, + MSR_TYPE_RW, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, + MSR_TYPE_RW, false); + } +} + /* * Writes msr value into the appropriate "register". * Returns 0 on success, non-0 otherwise. @@ -2427,7 +2439,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) #define CET_LEG_BITMAP_BASE(data) ((data) >> 12) #define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR) case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: - return kvm_set_msr_common(vcpu, msr_info); + if (kvm_set_msr_common(vcpu, msr_info)) + return 1; + /* + * Write to the base SSP MSRs should happen ahead of toggling + * of IA32_S_CET.SH_STK_EN bit. + */ + if (msr_index != MSR_IA32_PL3_SSP && data) { + vmx_disable_write_intercept_sss_msr(vcpu); + wrmsrl(msr_index, data); + } break; case MSR_IA32_U_CET: case MSR_IA32_S_CET: @@ -7774,12 +7795,17 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu) MSR_TYPE_RW, false); vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, false); + /* + * Supervisor shadow stack MSRs are intercepted until + * they're written by guest, this is designed to + * optimize the save/restore overhead. + */ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, - MSR_TYPE_RW, false); + MSR_TYPE_R, false); vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, - MSR_TYPE_RW, false); + MSR_TYPE_R, false); vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, - MSR_TYPE_RW, false); + MSR_TYPE_R, false); vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, false); vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e200f22cdaad..49049454caf4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4051,6 +4051,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP || msr == MSR_IA32_PL2_SSP) { vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data; + if (!vcpu->arch.cet_sss_active && data) + vcpu->arch.cet_sss_active = true; } else if (msr == MSR_IA32_PL3_SSP) { kvm_set_xsave_msr(msr_info); } @@ -11252,7 +11254,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) kvm_sigset_activate(vcpu); kvm_run->flags = 0; kvm_load_guest_fpu(vcpu); - kvm_reload_cet_supervisor_ssp(vcpu); + if (vcpu->arch.cet_sss_active) + kvm_reload_cet_supervisor_ssp(vcpu); kvm_vcpu_srcu_read_lock(vcpu); if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { @@ -11341,7 +11344,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) r = vcpu_run(vcpu); out: - kvm_save_cet_supervisor_ssp(vcpu); + if (vcpu->arch.cet_sss_active) + kvm_save_cet_supervisor_ssp(vcpu); kvm_put_guest_fpu(vcpu); if (kvm_run->kvm_valid_regs) store_regs(vcpu); @@ -12430,15 +12434,16 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) pmu->need_cleanup = true; kvm_make_request(KVM_REQ_PMU, vcpu); } - - kvm_reload_cet_supervisor_ssp(vcpu); + if (vcpu->arch.cet_sss_active) + kvm_reload_cet_supervisor_ssp(vcpu); static_call(kvm_x86_sched_in)(vcpu, cpu); } void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) { - kvm_save_cet_supervisor_ssp(vcpu); + if (vcpu->arch.cet_sss_active) + kvm_save_cet_supervisor_ssp(vcpu); } void kvm_arch_free_vm(struct kvm *kvm) From patchwork Fri Jul 21 03:03:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AC8EEB64DD for ; Fri, 21 Jul 2023 06:09:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230345AbjGUGJa (ORCPT ); Fri, 21 Jul 2023 02:09:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230122AbjGUGJB (ORCPT ); Fri, 21 Jul 2023 02:09:01 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80686E65; Thu, 20 Jul 2023 23:08:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919739; x=1721455739; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DAeXoyv+A5Z9XYgOT6Rh9HAu/BbNwZ7od9ZzCiqyPss=; b=guKVU1O/SAsjWpswWbyEZ9+qDnHJB5QaiLZAdlUIEF7MimYOhEhyXdJB IQgIhov1sqh8sgM06WiFaQRjXbELSNFvu/7t/+T2cFEL5nwid9E4OTOQQ tVQjzmmVLnN8kWMWuqzdludXa1fNrzk46CUO/dEohS8qb6fKzAHrQbLLI x/kLRzBJzXmt0F5iSiSyBrfn7w6+38GLXXTs4H0aruM37K5zyEmTP5n6N n3lzxYubV9B559GJdY71m+3BhFufv3q7tTPLRsG5F9B86NllWE/2KDHbi 8NAY/cRL8dH9O8nuA9ohPrn1pmnXQguJD9zmdhSLyeK3egVsp2UghdME9 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547601" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547601" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721987" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721987" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 17/20] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Date: Thu, 20 Jul 2023 23:03:49 -0400 Message-Id: <20230721030352.72414-18-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Enable CET related feature bits in KVM capabilities array and make X86_CR4_CET available to guest. Remove the feature bits if host side dependencies cannot be met. Set the feature bits so that CET features are available in guest CPUID. Add CR4.CET bit support in order to allow guest set CET master control bit(CR4.CET). Disable KVM CET feature if unrestricted_guest is unsupported/disabled as KVM does not support emulating CET. Don't expose CET feature if dependent CET bit(U_CET) is cleared in host XSS or if XSAVES isn't supported. The CET bits in VM_ENTRY/VM_EXIT control fields should be set to make guest CET states isolated from host side. CET is only available on platforms that enumerate VMX_BASIC[bit 56] as 1. Signed-off-by: Yang Weijiang --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kvm/cpuid.c | 12 ++++++++++-- arch/x86/kvm/vmx/capabilities.h | 6 ++++++ arch/x86/kvm/vmx/vmx.c | 22 +++++++++++++++++++++- arch/x86/kvm/vmx/vmx.h | 6 ++++-- arch/x86/kvm/x86.c | 16 +++++++++++++++- arch/x86/kvm/x86.h | 3 +++ 8 files changed, 62 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c50b555234fb..f883696723f4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -125,7 +125,8 @@ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \ - | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP)) + | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \ + | X86_CR4_CET)) #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 3aedae61af4f..7ce0850c6067 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1078,6 +1078,7 @@ #define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU #define VMX_BASIC_MEM_TYPE_WB 6LLU #define VMX_BASIC_INOUT 0x0040000000000000LLU +#define VMX_BASIC_NO_HW_ERROR_CODE 0x0100000000000000LLU /* Resctrl MSRs: */ /* - Intel: */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0338316b827c..1a601be7b4fa 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -624,7 +624,7 @@ void kvm_set_cpu_caps(void) F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) | F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) | F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ | - F(SGX_LC) | F(BUS_LOCK_DETECT) + F(SGX_LC) | F(BUS_LOCK_DETECT) | F(SHSTK) ); /* Set LA57 based on hardware capability. */ if (cpuid_ecx(7) & F(LA57)) @@ -642,7 +642,8 @@ void kvm_set_cpu_caps(void) F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) | F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) | F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) | - F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D) + F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D) | + F(IBT) ); /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */ @@ -655,6 +656,13 @@ void kvm_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP); if (boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD); + /* + * The feature bit in boot_cpu_data.x86_capability could have been + * cleared due to ibt=off cmdline option, then add it back if CPU + * supports IBT. + */ + if (cpuid_edx(7) & F(IBT)) + kvm_cpu_cap_set(X86_FEATURE_IBT); kvm_cpu_cap_mask(CPUID_7_1_EAX, F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) | diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index b1883f6c08eb..2948a288d0b4 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -79,6 +79,12 @@ static inline bool cpu_has_vmx_basic_inout(void) return (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT); } +static inline bool cpu_has_vmx_basic_no_hw_errcode(void) +{ + return ((u64)vmcs_config.basic_cap << 32) & + VMX_BASIC_NO_HW_ERROR_CODE; +} + static inline bool cpu_has_virtual_nmis(void) { return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS && diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 3eb4fe9c9ab6..3f2f966e327d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2641,6 +2641,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf, { VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER }, { VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS }, { VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL }, + { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE }, }; memset(vmcs_conf, 0, sizeof(*vmcs_conf)); @@ -2761,7 +2762,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf, rdmsrl(MSR_IA32_VMX_MISC, misc_msr); vmcs_conf->size = vmx_msr_high & 0x1fff; - vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff; + vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff; vmcs_conf->revision_id = vmx_msr_low; @@ -6359,6 +6360,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu) if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0) vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest); + if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) { + pr_err("S_CET = 0x%016lx\n", vmcs_readl(GUEST_S_CET)); + pr_err("SSP = 0x%016lx\n", vmcs_readl(GUEST_SSP)); + pr_err("INTR SSP TABLE = 0x%016lx\n", + vmcs_readl(GUEST_INTR_SSP_TABLE)); + } pr_err("*** Host State ***\n"); pr_err("RIP = 0x%016lx RSP = 0x%016lx\n", vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP)); @@ -6436,6 +6443,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu) if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID) pr_err("Virtual processor ID = 0x%04x\n", vmcs_read16(VIRTUAL_PROCESSOR_ID)); + if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) { + pr_err("S_CET = 0x%016lx\n", vmcs_readl(HOST_S_CET)); + pr_err("SSP = 0x%016lx\n", vmcs_readl(HOST_SSP)); + pr_err("INTR SSP TABLE = 0x%016lx\n", + vmcs_readl(HOST_INTR_SSP_TABLE)); + } } /* @@ -7966,6 +7979,13 @@ static __init void vmx_set_cpu_caps(void) if (cpu_has_vmx_waitpkg()) kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); + + if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest || + !cpu_has_vmx_basic_no_hw_errcode()) { + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); + kvm_cpu_cap_clear(X86_FEATURE_IBT); + kvm_caps.supported_xss &= ~CET_XSTATE_MASK; + } } static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 32384ba38499..4e88b5fb45e8 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -481,7 +481,8 @@ static inline u8 vmx_get_rvi(void) VM_ENTRY_LOAD_IA32_EFER | \ VM_ENTRY_LOAD_BNDCFGS | \ VM_ENTRY_PT_CONCEAL_PIP | \ - VM_ENTRY_LOAD_IA32_RTIT_CTL) + VM_ENTRY_LOAD_IA32_RTIT_CTL | \ + VM_ENTRY_LOAD_CET_STATE) #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \ (VM_EXIT_SAVE_DEBUG_CONTROLS | \ @@ -503,7 +504,8 @@ static inline u8 vmx_get_rvi(void) VM_EXIT_LOAD_IA32_EFER | \ VM_EXIT_CLEAR_BNDCFGS | \ VM_EXIT_PT_CONCEAL_PIP | \ - VM_EXIT_CLEAR_IA32_RTIT_CTL) + VM_EXIT_CLEAR_IA32_RTIT_CTL | \ + VM_EXIT_LOAD_CET_STATE) #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \ (PIN_BASED_EXT_INTR_MASK | \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 49049454caf4..665593d75251 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -228,7 +228,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) -#define KVM_SUPPORTED_XSS 0 +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER) u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); @@ -9648,6 +9648,20 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_ops_update(ops); + if (!kvm_is_cet_supported()) { + kvm_cpu_cap_clear(X86_FEATURE_SHSTK); + kvm_cpu_cap_clear(X86_FEATURE_IBT); + } + + /* + * If SHSTK and IBT are not available in KVM, clear CET user bit in + * kvm_caps.supported_xss so that kvm_is_cet__supported() returns + * false when called. + */ + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) && + !kvm_cpu_cap_has(X86_FEATURE_IBT)) + kvm_caps.supported_xss &= ~CET_XSTATE_MASK; + for_each_online_cpu(cpu) { smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1); if (r < 0) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 09dd35a79ff3..9c88ddfb3e97 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -538,6 +538,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); __reserved_bits |= X86_CR4_VMXE; \ if (!__cpu_has(__c, X86_FEATURE_PCID)) \ __reserved_bits |= X86_CR4_PCIDE; \ + if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \ + !__cpu_has(__c, X86_FEATURE_IBT)) \ + __reserved_bits |= X86_CR4_CET; \ __reserved_bits; \ }) From patchwork Fri Jul 21 03:03:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321408 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41C82EB64DD for ; Fri, 21 Jul 2023 06:09:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230383AbjGUGJn (ORCPT ); Fri, 21 Jul 2023 02:09:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230178AbjGUGJD (ORCPT ); Fri, 21 Jul 2023 02:09:03 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6247E19A6; Thu, 20 Jul 2023 23:09:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919741; x=1721455741; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=z1qCzT0ZMxvYsf0S3vJ/qE/swqLkzSDdT/v3wrwsFro=; b=D9GPpHcSK/ixBg5X4yqRAN6Otgh1mlOd5XXnKJC4rE8dtt05vyH27TaU 0JOAPnCURdKjyEdPUJnlKS/5OOKAvG0whkc/2tsPFFFKjQtqqQU/sWCe1 iVGf9Vl3Z/pO9V6N9k/ZKvJDgtIheuytvke+pO+w2va66HNgGS8muZ7+x G/btLIo7C8VS/kG2L6tPDQhKappD3pP6qFh2Z2pjP1YCJUXJGVVTKVFoA HTO8Sy0FOMwaosvRE4ydhkGCVfahS7vr2IjrssIbU1HMXfWWS7f4C5Prf AjJgroiCTmJQaNgChZ8zetbo/tFC2nr2Ykg/b94tIvNdf0X4ByiK27Bul g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547628" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547628" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721990" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721990" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 18/20] KVM:x86: Enable guest CET supervisor xstate bit support Date: Thu, 20 Jul 2023 23:03:50 -0400 Message-Id: <20230721030352.72414-19-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add S_CET bit in kvm_caps.supported_xss so that guest can enumerate the feature in CPUID(0xd,1).ECX. Guest S_CET xstate bit is specially handled, i.e., it can be exposed without related enabling on host side, because KVM manually saves/reloads guest supervisor SHSTK SSPs and current XSS swap logic for host/guest aslo supports doing so, thus it's safe to enable the bit without host support. Signed-off-by: Yang Weijiang --- arch/x86/kvm/x86.c | 4 +++- arch/x86/kvm/x86.h | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 665593d75251..f68e36ef34b9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -228,7 +228,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) -#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER) +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER | \ + XFEATURE_MASK_CET_KERNEL) u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); @@ -9638,6 +9639,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (boot_cpu_has(X86_FEATURE_XSAVES)) { rdmsrl(MSR_IA32_XSS, host_xss); kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS; + kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL; } kvm_init_pmu_capability(ops->pmu_ops); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 9c88ddfb3e97..66733e01b0ce 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -362,7 +362,7 @@ static inline bool kvm_mpx_supported(void) == (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR); } -#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER) +#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL) /* * Shadow Stack and Indirect Branch Tracking feature enabling depends on * whether host side CET user xstate bit is supported or not. From patchwork Fri Jul 21 03:03:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E7B3C04A6A for ; Fri, 21 Jul 2023 06:09:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230243AbjGUGJp (ORCPT ); Fri, 21 Jul 2023 02:09:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230228AbjGUGJL (ORCPT ); Fri, 21 Jul 2023 02:09:11 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F8FE272E; Thu, 20 Jul 2023 23:09:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919742; x=1721455742; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7e5ypkNSvdnqN0MtKaMYvhf+y8wsyyJlu13WVz7mfU8=; b=Qh/NYCB0XJKv9nTmOBxn4iOP7WbEca9oaNPpBJuT670UAz0rYLG4QU3K JEVvxsunsEnHNlNyFWuWzAvSp1+7BGs8u8Qlx5dwgzGdBr+eJ1LxNyj39 BCSImOETgfnDK5VeoWblLjcV7kODUbEkThP2/Tbm0dIfafK2rNDcLPjnr qAtZLKI42qo4xXnnBPpUmUbolrEP7+gqXn/pXe6FDqc1ss587Nz3Wr4lf Flu9B3AKos0G93ag92a9D9mH9FxUQUoCK9q2/EPblXiz+4+z67jeC7ur5 jGbfTJwoB/zjJ/OY9aZ6iw3BYHoddQ7G8xZNTumDcJSuDwtX8qD9OP2Z2 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547629" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547629" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721992" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721992" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 19/20] KVM:nVMX: Refine error code injection to nested VM Date: Thu, 20 Jul 2023 23:03:51 -0400 Message-Id: <20230721030352.72414-20-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Per SDM description(Vol.3D, Appendix A.1): "If bit 56 is read as 1, software can use VM entry to deliver a hardware exception with or without an error code, regardless of vector" Modify has_error_code check before inject events to nested guest. Only enforce the check when guest is in real mode, the exception is not hard exception and the platform doesn't enumerate bit56 in VMX_BASIC, otherwise ignore it. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++-------- arch/x86/kvm/vmx/nested.h | 7 +++++++ 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 516391cc0d64..9bcd989252f7 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1205,9 +1205,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data) { const u64 feature_and_reserved = /* feature (except bit 48; see below) */ - BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | + BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) | /* reserved */ - BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56); + BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57); u64 vmx_basic = vmcs_config.nested.basic; if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved)) @@ -2846,12 +2846,16 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu, CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0)) return -EINVAL; - /* VM-entry interruption-info field: deliver error code */ - should_have_error_code = - intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode && - x86_exception_has_error_code(vector); - if (CC(has_error_code != should_have_error_code)) - return -EINVAL; + if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION || + !nested_cpu_has_no_hw_errcode(vcpu)) { + /* VM-entry interruption-info field: deliver error code */ + should_have_error_code = + intr_type == INTR_TYPE_HARD_EXCEPTION && + prot_mode && + x86_exception_has_error_code(vector); + if (CC(has_error_code != should_have_error_code)) + return -EINVAL; + } /* VM-entry exception error code */ if (CC(has_error_code && @@ -6967,6 +6971,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs) if (cpu_has_vmx_basic_inout()) msrs->basic |= VMX_BASIC_INOUT; + if (cpu_has_vmx_basic_no_hw_errcode()) + msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE; } static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs) diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h index 96952263b029..1884628294e4 100644 --- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val) __kvm_is_valid_cr4(vcpu, val); } +static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE; +} + /* No difference in the restrictions on guest and host CR4 in VMX operation. */ #define nested_guest_cr4_valid nested_cr4_valid #define nested_host_cr4_valid nested_cr4_valid From patchwork Fri Jul 21 03:03:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 13321407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D16AEB64DC for ; Fri, 21 Jul 2023 06:09:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230376AbjGUGJk (ORCPT ); Fri, 21 Jul 2023 02:09:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230146AbjGUGJC (ORCPT ); Fri, 21 Jul 2023 02:09:02 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 813D91BDC; Thu, 20 Jul 2023 23:08:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689919739; x=1721455739; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=G8e7lmvX+2m/o4PMmCn29qc9YniDdcFmVgtYAIHxeXI=; b=Dv1vdOg5Oe00GAOhYfJWqnflED7AZFXJHyu5lN8ylCCkscwo9KfcKUHh 0kX8VQS0vVgqtxTdy9RewRXoHsQbstrAkfyApI6f28MkW8Wsmoqg+QfrF Hj5Yc24X8IqDx+XSoCRCF1SGeB5CYNDnP/bnV3iuoc1wuGDXdOeGGq+bR LUAxSbvIvkuKDFn/9N21qBORe+TnXCP8LOqDZ2jI0CA0IzhHW1GsTuHXl Eb+H0iFL9SOw8DGYuXi1xHm95j5LXMBr+gU8V0MO3Ln+iQfHlgauw1vrR 529mehPHE79XH9eTwvF9AoPCJkqqFchCz3J9qKHETJpOUnrbCZ2QGCT8d Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="370547613" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="370547613" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="848721994" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="848721994" Received: from embargo.jf.intel.com ([10.165.9.183]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 23:08:41 -0700 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, peterz@infradead.org, john.allen@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com, chao.gao@intel.com, binbin.wu@linux.intel.com, weijiang.yang@intel.com Subject: [PATCH v4 20/20] KVM:nVMX: Enable CET support for nested VM Date: Thu, 20 Jul 2023 23:03:52 -0400 Message-Id: <20230721030352.72414-21-weijiang.yang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230721030352.72414-1-weijiang.yang@intel.com> References: <20230721030352.72414-1-weijiang.yang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed setting for CR4 to enable CET for nested VM. Signed-off-by: Yang Weijiang --- arch/x86/kvm/vmx/nested.c | 27 +++++++++++++++++++++++++-- arch/x86/kvm/vmx/vmcs12.c | 6 ++++++ arch/x86/kvm/vmx/vmcs12.h | 14 +++++++++++++- arch/x86/kvm/vmx/vmx.c | 2 ++ 4 files changed, 46 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 9bcd989252f7..bd6883033f69 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -660,6 +660,28 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, MSR_IA32_FLUSH_CMD, MSR_TYPE_W); + /* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */ + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_U_CET, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_S_CET, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL0_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL1_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL2_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PL3_SSP, MSR_TYPE_RW); + + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW); + kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false); vmx->nested.force_msr_bitmap_recalc = false; @@ -6793,7 +6815,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf, VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | - VM_EXIT_CLEAR_BNDCFGS; + VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE; msrs->exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | @@ -6815,7 +6837,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf, #ifdef CONFIG_X86_64 VM_ENTRY_IA32E_MODE | #endif - VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS; + VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS | + VM_ENTRY_LOAD_CET_STATE; msrs->entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL); diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c index 106a72c923ca..4233b5ca9461 100644 --- a/arch/x86/kvm/vmx/vmcs12.c +++ b/arch/x86/kvm/vmx/vmcs12.c @@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = { FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions), FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp), FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip), + FIELD(GUEST_S_CET, guest_s_cet), + FIELD(GUEST_SSP, guest_ssp), + FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl), FIELD(HOST_CR0, host_cr0), FIELD(HOST_CR3, host_cr3), FIELD(HOST_CR4, host_cr4), @@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = { FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip), FIELD(HOST_RSP, host_rsp), FIELD(HOST_RIP, host_rip), + FIELD(HOST_S_CET, host_s_cet), + FIELD(HOST_SSP, host_ssp), + FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl), }; const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets); diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h index 01936013428b..3884489e7f7e 100644 --- a/arch/x86/kvm/vmx/vmcs12.h +++ b/arch/x86/kvm/vmx/vmcs12.h @@ -117,7 +117,13 @@ struct __packed vmcs12 { natural_width host_ia32_sysenter_eip; natural_width host_rsp; natural_width host_rip; - natural_width paddingl[8]; /* room for future expansion */ + natural_width host_s_cet; + natural_width host_ssp; + natural_width host_ssp_tbl; + natural_width guest_s_cet; + natural_width guest_ssp; + natural_width guest_ssp_tbl; + natural_width paddingl[2]; /* room for future expansion */ u32 pin_based_vm_exec_control; u32 cpu_based_vm_exec_control; u32 exception_bitmap; @@ -292,6 +298,12 @@ static inline void vmx_check_vmcs12_offsets(void) CHECK_OFFSET(host_ia32_sysenter_eip, 656); CHECK_OFFSET(host_rsp, 664); CHECK_OFFSET(host_rip, 672); + CHECK_OFFSET(host_s_cet, 680); + CHECK_OFFSET(host_ssp, 688); + CHECK_OFFSET(host_ssp_tbl, 696); + CHECK_OFFSET(guest_s_cet, 704); + CHECK_OFFSET(guest_ssp, 712); + CHECK_OFFSET(guest_ssp_tbl, 720); CHECK_OFFSET(pin_based_vm_exec_control, 744); CHECK_OFFSET(cpu_based_vm_exec_control, 748); CHECK_OFFSET(exception_bitmap, 752); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 3f2f966e327d..dd68dc73f723 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7728,6 +7728,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); + cr4_fixed1_update(X86_CR4_CET, ecx, feature_bit(SHSTK)); + cr4_fixed1_update(X86_CR4_CET, edx, feature_bit(IBT)); #undef cr4_fixed1_update }