Message ID | 20200102145953.6503-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/boot: Clean up the trampoline transition into Long mode | expand |
On Thu, Jan 02, 2020 at 02:59:53PM +0000, Andrew Cooper wrote: > The jmp after setting %cr0 is redundant with the following ljmp. > > The CPUID to protect the jump to higher mappings was inserted due to an > abundance of caution/paranoia before Spectre was public. There is not a > matching protection in the S3 resume path, and there is nothing > interesting in memory at this point. What do you mean by "there is nothing interesting in memory" here? As far as I can tell idel page table has been loaded. During AP bring-up it contains runtime data, no? Wei.
On 02/01/2020 16:55, Wei Liu wrote: > On Thu, Jan 02, 2020 at 02:59:53PM +0000, Andrew Cooper wrote: >> The jmp after setting %cr0 is redundant with the following ljmp. >> >> The CPUID to protect the jump to higher mappings was inserted due to an >> abundance of caution/paranoia before Spectre was public. There is not a >> matching protection in the S3 resume path, and there is nothing >> interesting in memory at this point. > What do you mean by "there is nothing interesting in memory" here? > > As far as I can tell idel page table has been loaded. During AP > bring-up it contains runtime data, no? We haven't even decompressed the dom0 kernel at this point. What data are you concerned by? This protection is only meaningful for virtualised guests, and is ultimately incomplete. If another VM can use Spectre v2 against this VM, it can also use Spectre v1 and have a far more interesting time. In the time since writing this code, it has become substantially more apparent that VMs must trust their hypervisor to provide adequate isolation, because there is literally nothing the VM can do itself. ~Andrew
On Thu, Jan 02, 2020 at 05:20:12PM +0000, Andrew Cooper wrote: > On 02/01/2020 16:55, Wei Liu wrote: > > On Thu, Jan 02, 2020 at 02:59:53PM +0000, Andrew Cooper wrote: > >> The jmp after setting %cr0 is redundant with the following ljmp. > >> > >> The CPUID to protect the jump to higher mappings was inserted due to an > >> abundance of caution/paranoia before Spectre was public. There is not a > >> matching protection in the S3 resume path, and there is nothing > >> interesting in memory at this point. > > What do you mean by "there is nothing interesting in memory" here? > > > > As far as I can tell idel page table has been loaded. During AP > > bring-up it contains runtime data, no? > > We haven't even decompressed the dom0 kernel at this point. What data > are you concerned by? As the original text implied, CPU hotplug should also be considered. If that's not relevant now, can you please note that in the commit message? Wei. > > This protection is only meaningful for virtualised guests, and is > ultimately incomplete. If another VM can use Spectre v2 against this > VM, it can also use Spectre v1 and have a far more interesting time. > > In the time since writing this code, it has become substantially more > apparent that VMs must trust their hypervisor to provide adequate > isolation, because there is literally nothing the VM can do itself. > > ~Andrew
On 02.01.2020 15:59, Andrew Cooper wrote: > @@ -111,26 +109,6 @@ trampoline_protmode_entry: > start64: > /* Jump to high mappings. */ > movabs $__high_start, %rdi > - > -#ifdef CONFIG_INDIRECT_THUNK > - /* > - * If booting virtualised, or hot-onlining a CPU, sibling threads can > - * attempt Branch Target Injection against this jmp. > - * > - * We've got no usable stack so can't use a RETPOLINE thunk, and are > - * further than disp32 from the high mappings so couldn't use > - * JUMP_THUNK even if it was a non-RETPOLINE thunk. Furthermore, an > - * LFENCE isn't necessarily safe to use at this point. > - * > - * As this isn't a hotpath, use a fully serialising event to reduce > - * the speculation window as much as possible. %ebx needs preserving > - * for __high_start. > - */ > - mov %ebx, %esi > - cpuid > - mov %esi, %ebx > -#endif > - > jmpq *%rdi I can see this being unneeded when running virtualized, as you said in reply to Wei. However, for hot-onlining (when other CPUs may run random vCPU-s) I don't see how this can safely be dropped. There's no similar concern for S3 resume, as thaw_domains() happens only after enable_nonboot_cpus(). Jan
On 03/01/2020 13:36, Jan Beulich wrote: > On 02.01.2020 15:59, Andrew Cooper wrote: >> @@ -111,26 +109,6 @@ trampoline_protmode_entry: >> start64: >> /* Jump to high mappings. */ >> movabs $__high_start, %rdi >> - >> -#ifdef CONFIG_INDIRECT_THUNK >> - /* >> - * If booting virtualised, or hot-onlining a CPU, sibling threads can >> - * attempt Branch Target Injection against this jmp. >> - * >> - * We've got no usable stack so can't use a RETPOLINE thunk, and are >> - * further than disp32 from the high mappings so couldn't use >> - * JUMP_THUNK even if it was a non-RETPOLINE thunk. Furthermore, an >> - * LFENCE isn't necessarily safe to use at this point. >> - * >> - * As this isn't a hotpath, use a fully serialising event to reduce >> - * the speculation window as much as possible. %ebx needs preserving >> - * for __high_start. >> - */ >> - mov %ebx, %esi >> - cpuid >> - mov %esi, %ebx >> -#endif >> - >> jmpq *%rdi > I can see this being unneeded when running virtualized, as you said > in reply to Wei. However, for hot-onlining (when other CPUs may run > random vCPU-s) I don't see how this can safely be dropped. There's > no similar concern for S3 resume, as thaw_domains() happens only > after enable_nonboot_cpus(). I covered that in the same reply. Any guest which can use branch target injection against this jmp can also poison the regular branch predictor and get at data that way. Once again, we get to CPU Hotplug being an unused feature in practice, which is completely evident now with Intel MCE behaviour. A guest can't control/guess when a hotplug even might occur, or where exactly this branch is in memory (after all - it is variable based on the position of the trampoline), and core scheduling mitigates the risk entirely. ~Andrew
On 03.01.2020 14:44, Andrew Cooper wrote: > On 03/01/2020 13:36, Jan Beulich wrote: >> On 02.01.2020 15:59, Andrew Cooper wrote: >>> @@ -111,26 +109,6 @@ trampoline_protmode_entry: >>> start64: >>> /* Jump to high mappings. */ >>> movabs $__high_start, %rdi >>> - >>> -#ifdef CONFIG_INDIRECT_THUNK >>> - /* >>> - * If booting virtualised, or hot-onlining a CPU, sibling threads can >>> - * attempt Branch Target Injection against this jmp. >>> - * >>> - * We've got no usable stack so can't use a RETPOLINE thunk, and are >>> - * further than disp32 from the high mappings so couldn't use >>> - * JUMP_THUNK even if it was a non-RETPOLINE thunk. Furthermore, an >>> - * LFENCE isn't necessarily safe to use at this point. >>> - * >>> - * As this isn't a hotpath, use a fully serialising event to reduce >>> - * the speculation window as much as possible. %ebx needs preserving >>> - * for __high_start. >>> - */ >>> - mov %ebx, %esi >>> - cpuid >>> - mov %esi, %ebx >>> -#endif >>> - >>> jmpq *%rdi >> I can see this being unneeded when running virtualized, as you said >> in reply to Wei. However, for hot-onlining (when other CPUs may run >> random vCPU-s) I don't see how this can safely be dropped. There's >> no similar concern for S3 resume, as thaw_domains() happens only >> after enable_nonboot_cpus(). > > I covered that in the same reply. Any guest which can use branch target > injection against this jmp can also poison the regular branch predictor > and get at data that way. Aren't you implying then that retpolines could also be dropped? > Once again, we get to CPU Hotplug being an unused feature in practice, > which is completely evident now with Intel MCE behaviour. What does Intel's MCE behavior have to do with whether CPU hotplug (or hot-onlining) is (un)used in practice? > A guest can't control/guess when a hotplug even might occur, or where > exactly this branch is in memory (after all - it is variable based on > the position of the trampoline), and core scheduling mitigates the risk > entirely. "... will mitigate ..." - it's experimental up to now, isn't it? Jan
On 03/01/2020 13:52, Jan Beulich wrote: > On 03.01.2020 14:44, Andrew Cooper wrote: >> On 03/01/2020 13:36, Jan Beulich wrote: >>> On 02.01.2020 15:59, Andrew Cooper wrote: >>>> @@ -111,26 +109,6 @@ trampoline_protmode_entry: >>>> start64: >>>> /* Jump to high mappings. */ >>>> movabs $__high_start, %rdi >>>> - >>>> -#ifdef CONFIG_INDIRECT_THUNK >>>> - /* >>>> - * If booting virtualised, or hot-onlining a CPU, sibling threads can >>>> - * attempt Branch Target Injection against this jmp. >>>> - * >>>> - * We've got no usable stack so can't use a RETPOLINE thunk, and are >>>> - * further than disp32 from the high mappings so couldn't use >>>> - * JUMP_THUNK even if it was a non-RETPOLINE thunk. Furthermore, an >>>> - * LFENCE isn't necessarily safe to use at this point. >>>> - * >>>> - * As this isn't a hotpath, use a fully serialising event to reduce >>>> - * the speculation window as much as possible. %ebx needs preserving >>>> - * for __high_start. >>>> - */ >>>> - mov %ebx, %esi >>>> - cpuid >>>> - mov %esi, %ebx >>>> -#endif >>>> - >>>> jmpq *%rdi >>> I can see this being unneeded when running virtualized, as you said >>> in reply to Wei. However, for hot-onlining (when other CPUs may run >>> random vCPU-s) I don't see how this can safely be dropped. There's >>> no similar concern for S3 resume, as thaw_domains() happens only >>> after enable_nonboot_cpus(). >> I covered that in the same reply. Any guest which can use branch target >> injection against this jmp can also poison the regular branch predictor >> and get at data that way. > Aren't you implying then that retpolines could also be dropped? No. It is a simple risk vs complexity tradeoff. Guests running on a sibling *can already* attack this branch with BTI, because CPUID isn't a fix to bad BTB speculation, and the leakage gadget need only be a single instruction. Such a guest can also attack Xen in general with Spectre v1. As I said - this was introduced because of paranoia, back while the few people who knew about the issues (only several hundred at the time) were attempting to figure out what exactly a speculative attack looked like, and was applying duct tape to everything suspicious because we had 0 time to rewrite several core pieces of system handling. >> Once again, we get to CPU Hotplug being an unused feature in practice, >> which is completely evident now with Intel MCE behaviour. > What does Intel's MCE behavior have to do with whether CPU hotplug > (or hot-onlining) is (un)used in practice? The logical consequence of hotplug breaking MCEs. If hotplug had been used in practice, the MCE behaviour would have come to light much sooner, when MCEs didn't work in practice. Given that MCEs really did work in practice even before the L1TF days, hotplug wasn't in common-enough use for anyone to notice the MCE behaviour. >> A guest can't control/guess when a hotplug even might occur, or where >> exactly this branch is in memory (after all - it is variable based on >> the position of the trampoline), and core scheduling mitigates the risk >> entirely. > "... will mitigate ..." - it's experimental up to now, isn't it? Core scheduling ought to prevent the problem entirely. The current code is not safe in the absence of core scheduling. ~Andrew
On 03.01.2020 15:25, Andrew Cooper wrote: > On 03/01/2020 13:52, Jan Beulich wrote: >> On 03.01.2020 14:44, Andrew Cooper wrote: >>> On 03/01/2020 13:36, Jan Beulich wrote: >>>> On 02.01.2020 15:59, Andrew Cooper wrote: >>>>> @@ -111,26 +109,6 @@ trampoline_protmode_entry: >>>>> start64: >>>>> /* Jump to high mappings. */ >>>>> movabs $__high_start, %rdi >>>>> - >>>>> -#ifdef CONFIG_INDIRECT_THUNK >>>>> - /* >>>>> - * If booting virtualised, or hot-onlining a CPU, sibling threads can >>>>> - * attempt Branch Target Injection against this jmp. >>>>> - * >>>>> - * We've got no usable stack so can't use a RETPOLINE thunk, and are >>>>> - * further than disp32 from the high mappings so couldn't use >>>>> - * JUMP_THUNK even if it was a non-RETPOLINE thunk. Furthermore, an >>>>> - * LFENCE isn't necessarily safe to use at this point. >>>>> - * >>>>> - * As this isn't a hotpath, use a fully serialising event to reduce >>>>> - * the speculation window as much as possible. %ebx needs preserving >>>>> - * for __high_start. >>>>> - */ >>>>> - mov %ebx, %esi >>>>> - cpuid >>>>> - mov %esi, %ebx >>>>> -#endif >>>>> - >>>>> jmpq *%rdi >>>> I can see this being unneeded when running virtualized, as you said >>>> in reply to Wei. However, for hot-onlining (when other CPUs may run >>>> random vCPU-s) I don't see how this can safely be dropped. There's >>>> no similar concern for S3 resume, as thaw_domains() happens only >>>> after enable_nonboot_cpus(). >>> I covered that in the same reply. Any guest which can use branch target >>> injection against this jmp can also poison the regular branch predictor >>> and get at data that way. >> Aren't you implying then that retpolines could also be dropped? > > No. It is a simple risk vs complexity tradeoff. > > Guests running on a sibling *can already* attack this branch with BTI, > because CPUID isn't a fix to bad BTB speculation, and the leakage gadget > need only be a single instruction. > > Such a guest can also attack Xen in general with Spectre v1. > > As I said - this was introduced because of paranoia, back while the few > people who knew about the issues (only several hundred at the time) were > attempting to figure out what exactly a speculative attack looked like, > and was applying duct tape to everything suspicious because we had 0 > time to rewrite several core pieces of system handling. Well, okay then: Acked-by: Jan Beulich <jbeulich@suse.com> >>> Once again, we get to CPU Hotplug being an unused feature in practice, >>> which is completely evident now with Intel MCE behaviour. >> What does Intel's MCE behavior have to do with whether CPU hotplug >> (or hot-onlining) is (un)used in practice? > > The logical consequence of hotplug breaking MCEs. > > If hotplug had been used in practice, the MCE behaviour would have come > to light much sooner, when MCEs didn't work in practice. > > Given that MCEs really did work in practice even before the L1TF days, > hotplug wasn't in common-enough use for anyone to notice the MCE behaviour. Or systems where CPU hotplug was actually used on were of good enough quality to never surface #MC (personally I don't think I've seen more than a handful of non-reproducible #MC instances)? Or people having run into the bad behavior simply didn't have the resources to investigate why their system shut down silently (perhaps giving entirely random appearance of the behavior)? Jan
On 03/01/2020 14:34, Jan Beulich wrote: > On 03.01.2020 15:25, Andrew Cooper wrote: >> On 03/01/2020 13:52, Jan Beulich wrote: >>> On 03.01.2020 14:44, Andrew Cooper wrote: >>>> On 03/01/2020 13:36, Jan Beulich wrote: >>>>> On 02.01.2020 15:59, Andrew Cooper wrote: >>>>>> @@ -111,26 +109,6 @@ trampoline_protmode_entry: >>>>>> start64: >>>>>> /* Jump to high mappings. */ >>>>>> movabs $__high_start, %rdi >>>>>> - >>>>>> -#ifdef CONFIG_INDIRECT_THUNK >>>>>> - /* >>>>>> - * If booting virtualised, or hot-onlining a CPU, sibling threads can >>>>>> - * attempt Branch Target Injection against this jmp. >>>>>> - * >>>>>> - * We've got no usable stack so can't use a RETPOLINE thunk, and are >>>>>> - * further than disp32 from the high mappings so couldn't use >>>>>> - * JUMP_THUNK even if it was a non-RETPOLINE thunk. Furthermore, an >>>>>> - * LFENCE isn't necessarily safe to use at this point. >>>>>> - * >>>>>> - * As this isn't a hotpath, use a fully serialising event to reduce >>>>>> - * the speculation window as much as possible. %ebx needs preserving >>>>>> - * for __high_start. >>>>>> - */ >>>>>> - mov %ebx, %esi >>>>>> - cpuid >>>>>> - mov %esi, %ebx >>>>>> -#endif >>>>>> - >>>>>> jmpq *%rdi >>>>> I can see this being unneeded when running virtualized, as you said >>>>> in reply to Wei. However, for hot-onlining (when other CPUs may run >>>>> random vCPU-s) I don't see how this can safely be dropped. There's >>>>> no similar concern for S3 resume, as thaw_domains() happens only >>>>> after enable_nonboot_cpus(). >>>> I covered that in the same reply. Any guest which can use branch target >>>> injection against this jmp can also poison the regular branch predictor >>>> and get at data that way. >>> Aren't you implying then that retpolines could also be dropped? >> No. It is a simple risk vs complexity tradeoff. >> >> Guests running on a sibling *can already* attack this branch with BTI, >> because CPUID isn't a fix to bad BTB speculation, and the leakage gadget >> need only be a single instruction. >> >> Such a guest can also attack Xen in general with Spectre v1. >> >> As I said - this was introduced because of paranoia, back while the few >> people who knew about the issues (only several hundred at the time) were >> attempting to figure out what exactly a speculative attack looked like, >> and was applying duct tape to everything suspicious because we had 0 >> time to rewrite several core pieces of system handling. > Well, okay then: > Acked-by: Jan Beulich <jbeulich@suse.com> Thanks. I've adjusted the commit message in light of this conversation. > >>>> Once again, we get to CPU Hotplug being an unused feature in practice, >>>> which is completely evident now with Intel MCE behaviour. >>> What does Intel's MCE behavior have to do with whether CPU hotplug >>> (or hot-onlining) is (un)used in practice? >> The logical consequence of hotplug breaking MCEs. >> >> If hotplug had been used in practice, the MCE behaviour would have come >> to light much sooner, when MCEs didn't work in practice. >> >> Given that MCEs really did work in practice even before the L1TF days, >> hotplug wasn't in common-enough use for anyone to notice the MCE behaviour. > Or systems where CPU hotplug was actually used on were of good > enough quality to never surface #MC Suffice it to say that there is plenty of evidence to the contrary here. Without going into details for obvious reasons, there have been number of #MC conditions (both preexisting, and regressions) in recent microcode discovered in the field because everyone is needing to proactively take microcode updates these days. > (personally I don't think > I've seen more than a handful of non-reproducible #MC instances)? You don't run a "cloud scale" number of systems. Even XenServers test system of a few hundred systems sees a concerning (but ultimately, background noise) rate of #MC's, some of which are definite hardware failures (and kept around for error testing purposes), and others are in need of investigation. > Or people having run into the bad behavior simply didn't have the > resources to investigate why their system shut down silently > (perhaps giving entirely random appearance of the behavior)? Customers don't tolerate their hosts randomly crashing, especially if it happens consistently. Yes - technically speaking all of these are options, but the balance of probability is vastly on the side of CPU hot-plug not actually being used at any scale in practice. (Not least because there are still interrupt handling bugs present in Xen's implementation.) ~Andrew
diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index c60ebb3f00..574d1bd8f4 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -101,8 +101,6 @@ trampoline_protmode_entry: mov $(X86_CR0_PG | X86_CR0_AM | X86_CR0_WP | X86_CR0_NE |\ X86_CR0_ET | X86_CR0_MP | X86_CR0_PE), %eax mov %eax,%cr0 - jmp 1f -1: /* Now in compatibility mode. Long-jump into 64-bit mode. */ ljmp $BOOT_CS64,$bootsym_rel(start64,6) @@ -111,26 +109,6 @@ trampoline_protmode_entry: start64: /* Jump to high mappings. */ movabs $__high_start, %rdi - -#ifdef CONFIG_INDIRECT_THUNK - /* - * If booting virtualised, or hot-onlining a CPU, sibling threads can - * attempt Branch Target Injection against this jmp. - * - * We've got no usable stack so can't use a RETPOLINE thunk, and are - * further than disp32 from the high mappings so couldn't use - * JUMP_THUNK even if it was a non-RETPOLINE thunk. Furthermore, an - * LFENCE isn't necessarily safe to use at this point. - * - * As this isn't a hotpath, use a fully serialising event to reduce - * the speculation window as much as possible. %ebx needs preserving - * for __high_start. - */ - mov %ebx, %esi - cpuid - mov %esi, %ebx -#endif - jmpq *%rdi #include "video.h"
The jmp after setting %cr0 is redundant with the following ljmp. The CPUID to protect the jump to higher mappings was inserted due to an abundance of caution/paranoia before Spectre was public. There is not a matching protection in the S3 resume path, and there is nothing interesting in memory at this point. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wl@xen.org> CC: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/boot/trampoline.S | 22 ---------------------- 1 file changed, 22 deletions(-)