Message ID | 20230104111146.2094-3-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86: Work around Shstk fracturing | expand |
On 04.01.2023 12:11, Andrew Cooper wrote: > Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack > Token". > > Architecturally, an event delivery which starts in CPL<3 and switches shadow > stack will first validate the Supervisor Shadow Stack Token (setting the busy > bit), then pushes CS/LIP/SSP. One example of this is an NMI interrupting Xen. > > Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc > between setting the busy bit and completing the event injection renders the > action non-restartable, because when it comes time to restart, the busy bit is > found to be already set. > > This is far more easily encountered under virt, yet it is not the fault of the > hypervisor, nor the fault of the guest kernel. The fault lies somewhere > between the architectural specification, and the uarch behaviour. > > Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor > shadow stacks are safe to use. Because of how Xen lays out its shadow stacks, > fracturing is not expected to be a problem on native. IOW that's the "contained in an aligned 32-byte region" constraint which we meet, aiui. > Detect this case on boot and default to not using shstk if virtualised. > Specifying `cet=shstk` on the command line will override this heuristic and > enable shadow stacks irrespective. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with one nit (below). > This ideally wants backporting to Xen 4.14. I have no idea how likely it is > to need to backport the prerequisite patch for new feature words, but we've > already had to do that once for security patches. OTOH, I have no idea how > easy it is to trigger in non-synthetic cases. Plus: How likely is it that Xen actually is used virtualized in production? For the moment I don't see any reason to backport to branches in security- only maintenance mode. I'm not even sure it needs backporting at all. > @@ -1099,11 +1095,45 @@ void __init noreturn __start_xen(unsigned long mbi_p) > early_cpu_init(); > > /* Choose shadow stack early, to set infrastructure up appropriately. */ > - if ( opt_xen_shstk && boot_cpu_has(X86_FEATURE_CET_SS) ) > + if ( !boot_cpu_has(X86_FEATURE_CET_SS) ) > + opt_xen_shstk = 0; > + > + if ( opt_xen_shstk ) > { > - printk("Enabling Supervisor Shadow Stacks\n"); > + /* > + * Some CPUs suffer from Shadow Stack Fracturing, an issue whereby a > + * fault/VMExit/etc between setting a Supervisor Busy bit and the > + * event delivery completing renders the operation non-restartable. > + * On restart, event delivery will find the Busy bit already set. > + * > + * This is a problem on bare metal, but outside of synthetic cases or > + * a very badly timed #MC, it's not believed to problem. It is a much Nit: "... to be a problem." Jan
On 04/01/2023 3:44 pm, Jan Beulich wrote: > On 04.01.2023 12:11, Andrew Cooper wrote: >> Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack >> Token". >> >> Architecturally, an event delivery which starts in CPL<3 and switches shadow >> stack will first validate the Supervisor Shadow Stack Token (setting the busy >> bit), then pushes CS/LIP/SSP. One example of this is an NMI interrupting Xen. >> >> Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc >> between setting the busy bit and completing the event injection renders the >> action non-restartable, because when it comes time to restart, the busy bit is >> found to be already set. >> >> This is far more easily encountered under virt, yet it is not the fault of the >> hypervisor, nor the fault of the guest kernel. The fault lies somewhere >> between the architectural specification, and the uarch behaviour. >> >> Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor >> shadow stacks are safe to use. Because of how Xen lays out its shadow stacks, >> fracturing is not expected to be a problem on native. > IOW that's the "contained in an aligned 32-byte region" constraint which we > meet, aiui. For practical purposes, it is "contained within a single cache line". AMD's position is "if the OS doesn't have suitable alignment, or doesn't place the shstk on WB memory, it gets to keep any resulting pieces". They have confirmed that if there is suitable alignment and it is on a WB mapping, then no vmexit can occur which would fracture the the shadow stack. Intel retroactively tightened the constraints in microcode on TGL/ADL so that MSR_PL?_SSP (or ISST pointers from memory) would suffer #GP if they didn't refer to the top word in an aligned 32-byte region. But this provide to be insufficient to fix the problem, hence the new CPUID bit, and recommendation to disable supervisor shadow stacks by default under virt. >> Detect this case on boot and default to not using shstk if virtualised. >> Specifying `cet=shstk` on the command line will override this heuristic and >> enable shadow stacks irrespective. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > with one nit (below). > >> This ideally wants backporting to Xen 4.14. I have no idea how likely it is >> to need to backport the prerequisite patch for new feature words, but we've >> already had to do that once for security patches. OTOH, I have no idea how >> easy it is to trigger in non-synthetic cases. > Plus: How likely is it that Xen actually is used virtualized in production? All of our gitlab smoke tests, a larger part of QubesOS's CI, and non-default PV-shim configurations. As soon as we get guest CET working, then part of OSStest, and a portion of XenServer's general testing too. It does need backporting to all general support trees. (When I first started working on this problem with Intel, that would have included 4.14 too...) >> @@ -1099,11 +1095,45 @@ void __init noreturn __start_xen(unsigned long mbi_p) >> early_cpu_init(); >> >> /* Choose shadow stack early, to set infrastructure up appropriately. */ >> - if ( opt_xen_shstk && boot_cpu_has(X86_FEATURE_CET_SS) ) >> + if ( !boot_cpu_has(X86_FEATURE_CET_SS) ) >> + opt_xen_shstk = 0; >> + >> + if ( opt_xen_shstk ) >> { >> - printk("Enabling Supervisor Shadow Stacks\n"); >> + /* >> + * Some CPUs suffer from Shadow Stack Fracturing, an issue whereby a >> + * fault/VMExit/etc between setting a Supervisor Busy bit and the >> + * event delivery completing renders the operation non-restartable. >> + * On restart, event delivery will find the Busy bit already set. >> + * >> + * This is a problem on bare metal, but outside of synthetic cases or >> + * a very badly timed #MC, it's not believed to problem. It is a much > Nit: "... to be a problem." Fixed, thanks. ~Andrew
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index 923910f553c5..19d4d815bdee 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -287,10 +287,15 @@ can be maintained with the pv-shim mechanism. protection. The option is available when `CONFIG_XEN_SHSTK` is compiled in, and - defaults to `true` on hardware supporting CET-SS. Specifying + generally defaults to `true` on hardware supporting CET-SS. Specifying `cet=no-shstk` will cause Xen not to use Shadow Stacks even when support is available in hardware. + Some hardware suffers from an issue known as Supervisor Shadow Stack + Fracturing. On such hardware, Xen will default to not using Shadow Stacks + when virtualised. Specifying `cet=shstk` will override this heuristic and + enable Shadow Stacks unilaterally. + * The `ibt=` boolean controls whether Xen uses Indirect Branch Tracking for its own protection. diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c index 2aa23225f42c..d97a2f3338bc 100644 --- a/tools/libs/light/libxl_cpuid.c +++ b/tools/libs/light/libxl_cpuid.c @@ -235,6 +235,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str) {"fsrs", 0x00000007, 1, CPUID_REG_EAX, 11, 1}, {"fsrcs", 0x00000007, 1, CPUID_REG_EAX, 12, 1}, + {"cet-sss", 0x00000007, 1, CPUID_REG_EDX, 18, 1}, + {"intel-psfd", 0x00000007, 2, CPUID_REG_EDX, 0, 1}, {"mcdt-no", 0x00000007, 2, CPUID_REG_EDX, 5, 1}, diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c index addb3a39a11a..0248eaef44c1 100644 --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -208,6 +208,7 @@ static const char *const str_7c1[32] = static const char *const str_7d1[32] = { + [18] = "cet-sss", }; static const char *const str_7d2[32] = diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index b3fcf4680f3a..27f73d3bbe31 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -346,11 +346,18 @@ void __init early_cpu_init(void) x86_cpuid_vendor_to_str(c->x86_vendor), c->x86, c->x86, c->x86_model, c->x86_model, c->x86_mask, eax); - if (c->cpuid_level >= 7) - cpuid_count(7, 0, &eax, &ebx, + if (c->cpuid_level >= 7) { + uint32_t max_subleaf; + + cpuid_count(7, 0, &max_subleaf, &ebx, &c->x86_capability[FEATURESET_7c0], &c->x86_capability[FEATURESET_7d0]); + if (max_subleaf >= 1) + cpuid_count(7, 1, &eax, &ebx, &ecx, + &c->x86_capability[FEATURESET_7d1]); + } + eax = cpuid_eax(0x80000000); if ((eax >> 16) == 0x8000 && eax >= 0x80000008) { ebx = eax >= 0x8000001f ? cpuid_ebx(0x8000001f) : 0; diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 566422600d94..1b8b74599f4a 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -95,11 +95,7 @@ unsigned long __initdata highmem_start; size_param("highmem-start", highmem_start); #endif -#ifdef CONFIG_XEN_SHSTK -static bool __initdata opt_xen_shstk = true; -#else -#define opt_xen_shstk false -#endif +static int8_t __initdata opt_xen_shstk = -IS_ENABLED(CONFIG_XEN_SHSTK); #ifdef CONFIG_XEN_IBT static bool __initdata opt_xen_ibt = true; @@ -1099,11 +1095,45 @@ void __init noreturn __start_xen(unsigned long mbi_p) early_cpu_init(); /* Choose shadow stack early, to set infrastructure up appropriately. */ - if ( opt_xen_shstk && boot_cpu_has(X86_FEATURE_CET_SS) ) + if ( !boot_cpu_has(X86_FEATURE_CET_SS) ) + opt_xen_shstk = 0; + + if ( opt_xen_shstk ) { - printk("Enabling Supervisor Shadow Stacks\n"); + /* + * Some CPUs suffer from Shadow Stack Fracturing, an issue whereby a + * fault/VMExit/etc between setting a Supervisor Busy bit and the + * event delivery completing renders the operation non-restartable. + * On restart, event delivery will find the Busy bit already set. + * + * This is a problem on bare metal, but outside of synthetic cases or + * a very badly timed #MC, it's not believed to problem. It is a much + * bigger problem under virt, because we can VMExit for a number of + * legitimate reasons and tickle this bug. + * + * CPUs with this addressed enumerate CET-SSS to indicate that + * supervisor shadow stacks are now safe to use. + */ + bool cpu_has_bug_shstk_fracture = + boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && + !boot_cpu_has(X86_FEATURE_CET_SSS); - setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK); + /* + * On bare metal, assume that Xen won't be impacted by shstk + * fracturing problems. Under virt, be more conservative and disable + * shstk by default. + */ + if ( opt_xen_shstk == -1 ) + opt_xen_shstk = + cpu_has_hypervisor ? !cpu_has_bug_shstk_fracture + : true; + + if ( opt_xen_shstk ) + { + printk("Enabling Supervisor Shadow Stacks\n"); + + setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK); + } } if ( opt_xen_ibt && boot_cpu_has(X86_FEATURE_CET_IBT) ) diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 7a896f0e2d92..f6a46f62a549 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -290,6 +290,7 @@ XEN_CPUFEATURE(INTEL_PPIN, 12*32+ 0) /* Protected Processor Inventory /* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */ /* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */ +XEN_CPUFEATURE(CET_SSS, 15*32+18) /* CET Supervisor Shadow Stacks safe to use */ /* Intel-defined CPU features, CPUID level 0x00000007:2.edx, word 13 */ XEN_CPUFEATURE(INTEL_PSFD, 13*32+ 0) /*A MSR_SPEC_CTRL.PSFD */
Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack Token". Architecturally, an event delivery which starts in CPL<3 and switches shadow stack will first validate the Supervisor Shadow Stack Token (setting the busy bit), then pushes CS/LIP/SSP. One example of this is an NMI interrupting Xen. Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc between setting the busy bit and completing the event injection renders the action non-restartable, because when it comes time to restart, the busy bit is found to be already set. This is far more easily encountered under virt, yet it is not the fault of the hypervisor, nor the fault of the guest kernel. The fault lies somewhere between the architectural specification, and the uarch behaviour. Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor shadow stacks are safe to use. Because of how Xen lays out its shadow stacks, fracturing is not expected to be a problem on native. Detect this case on boot and default to not using shstk if virtualised. Specifying `cet=shstk` on the command line will override this heuristic and enable shadow stacks irrespective. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> v2: * Confirmation from AMD that their parts are not impacted. * Fix pv-shim build. The "define false" trick doesn't work with tristates. * Tweak wording in several places. * Fix tabs vs spaces. This ideally wants backporting to Xen 4.14. I have no idea how likely it is to need to backport the prerequisite patch for new feature words, but we've already had to do that once for security patches. OTOH, I have no idea how easy it is to trigger in non-synthetic cases. --- docs/misc/xen-command-line.pandoc | 7 ++++- tools/libs/light/libxl_cpuid.c | 2 ++ tools/misc/xen-cpuid.c | 1 + xen/arch/x86/cpu/common.c | 11 +++++-- xen/arch/x86/setup.c | 46 ++++++++++++++++++++++++----- xen/include/public/arch-x86/cpufeatureset.h | 1 + 6 files changed, 57 insertions(+), 11 deletions(-)