Message ID | 20191101202502.31750-2-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix PV shim ballooning problems | expand |
On 01.11.2019 21:25, Andrew Cooper wrote: > --- a/xen/arch/x86/cpu/intel.c > +++ b/xen/arch/x86/cpu/intel.c > @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) > if (disable) { > wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); > bootsym(trampoline_misc_enable_off) |= disable; > + bootsym(trampoline_efer) |= EFER_NX; > } I'm fine with all other changes here, just this one concerns me: Before your change we latch a value into trampoline_misc_enable_off just to be used for subsequent IA32_MISC_ENABLE writes we do. This means that, on a hypervisor (like Xen itself) simply discarding guest writes to the MSR (which is "fine" with the described usage of ours up to now), with your change we'd now end up trying to set EFER.NX when the feature may not actually be enabled in IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? I.e. don't we need to read back the MSR value here, and verify we actually managed to clear the bit before actually OR-ing in EFER_NX? Jan
On 04/11/2019 13:25, Jan Beulich wrote: > On 01.11.2019 21:25, Andrew Cooper wrote: >> --- a/xen/arch/x86/cpu/intel.c >> +++ b/xen/arch/x86/cpu/intel.c >> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >> if (disable) { >> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >> bootsym(trampoline_misc_enable_off) |= disable; >> + bootsym(trampoline_efer) |= EFER_NX; >> } > I'm fine with all other changes here, just this one concerns me: > Before your change we latch a value into trampoline_misc_enable_off > just to be used for subsequent IA32_MISC_ENABLE writes we do. This > means that, on a hypervisor (like Xen itself) simply discarding > guest writes to the MSR (which is "fine" with the described usage > of ours up to now), with your change we'd now end up trying to set > EFER.NX when the feature may not actually be enabled in > IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? > I.e. don't we need to read back the MSR value here, and verify > we actually managed to clear the bit before actually OR-ing in > EFER_NX? If this is a problem in practice, execution won't continue beyond the next if() condition just out of context, which set EFER.NX on the BSP with an unguarded WRMSR. ~Andrew
On 04.11.2019 15:59, Andrew Cooper wrote: > On 04/11/2019 13:25, Jan Beulich wrote: >> On 01.11.2019 21:25, Andrew Cooper wrote: >>> --- a/xen/arch/x86/cpu/intel.c >>> +++ b/xen/arch/x86/cpu/intel.c >>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >>> if (disable) { >>> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >>> bootsym(trampoline_misc_enable_off) |= disable; >>> + bootsym(trampoline_efer) |= EFER_NX; >>> } >> I'm fine with all other changes here, just this one concerns me: >> Before your change we latch a value into trampoline_misc_enable_off >> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >> means that, on a hypervisor (like Xen itself) simply discarding >> guest writes to the MSR (which is "fine" with the described usage >> of ours up to now), with your change we'd now end up trying to set >> EFER.NX when the feature may not actually be enabled in >> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >> I.e. don't we need to read back the MSR value here, and verify >> we actually managed to clear the bit before actually OR-ing in >> EFER_NX? > > If this is a problem in practice, execution won't continue beyond the > next if() condition just out of context, which set EFER.NX on the BSP > with an unguarded WRMSR. And how is this any good? This kind of crash is exactly what I'm asking to avoid. Jan
On 04/11/2019 15:03, Jan Beulich wrote: > On 04.11.2019 15:59, Andrew Cooper wrote: >> On 04/11/2019 13:25, Jan Beulich wrote: >>> On 01.11.2019 21:25, Andrew Cooper wrote: >>>> --- a/xen/arch/x86/cpu/intel.c >>>> +++ b/xen/arch/x86/cpu/intel.c >>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >>>> if (disable) { >>>> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >>>> bootsym(trampoline_misc_enable_off) |= disable; >>>> + bootsym(trampoline_efer) |= EFER_NX; >>>> } >>> I'm fine with all other changes here, just this one concerns me: >>> Before your change we latch a value into trampoline_misc_enable_off >>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >>> means that, on a hypervisor (like Xen itself) simply discarding >>> guest writes to the MSR (which is "fine" with the described usage >>> of ours up to now), with your change we'd now end up trying to set >>> EFER.NX when the feature may not actually be enabled in >>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >>> I.e. don't we need to read back the MSR value here, and verify >>> we actually managed to clear the bit before actually OR-ing in >>> EFER_NX? >> If this is a problem in practice, execution won't continue beyond the >> next if() condition just out of context, which set EFER.NX on the BSP >> with an unguarded WRMSR. > And how is this any good? This kind of crash is exactly what I'm > asking to avoid. What is the point of working around a theoretical edge case of broken nesting under Xen which demonstrably doesn't exist in practice? ~Andrew
On 04.11.2019 16:22, Andrew Cooper wrote: > On 04/11/2019 15:03, Jan Beulich wrote: >> On 04.11.2019 15:59, Andrew Cooper wrote: >>> On 04/11/2019 13:25, Jan Beulich wrote: >>>> On 01.11.2019 21:25, Andrew Cooper wrote: >>>>> --- a/xen/arch/x86/cpu/intel.c >>>>> +++ b/xen/arch/x86/cpu/intel.c >>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >>>>> if (disable) { >>>>> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >>>>> bootsym(trampoline_misc_enable_off) |= disable; >>>>> + bootsym(trampoline_efer) |= EFER_NX; >>>>> } >>>> I'm fine with all other changes here, just this one concerns me: >>>> Before your change we latch a value into trampoline_misc_enable_off >>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >>>> means that, on a hypervisor (like Xen itself) simply discarding >>>> guest writes to the MSR (which is "fine" with the described usage >>>> of ours up to now), with your change we'd now end up trying to set >>>> EFER.NX when the feature may not actually be enabled in >>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >>>> I.e. don't we need to read back the MSR value here, and verify >>>> we actually managed to clear the bit before actually OR-ing in >>>> EFER_NX? >>> If this is a problem in practice, execution won't continue beyond the >>> next if() condition just out of context, which set EFER.NX on the BSP >>> with an unguarded WRMSR. >> And how is this any good? This kind of crash is exactly what I'm >> asking to avoid. > > What is the point of working around a theoretical edge case of broken > nesting under Xen which demonstrably doesn't exist in practice? Well, so far nothing was said about this not being an actual problem. I simply don't know whether hardware would refuse such an EFER write. If it does, it would be appropriate for hypervisors to also refuse it. I.e. if we don't do so right now, correcting the behavior would trip the code here. Jan
On 04/11/2019 15:31, Jan Beulich wrote: > On 04.11.2019 16:22, Andrew Cooper wrote: >> On 04/11/2019 15:03, Jan Beulich wrote: >>> On 04.11.2019 15:59, Andrew Cooper wrote: >>>> On 04/11/2019 13:25, Jan Beulich wrote: >>>>> On 01.11.2019 21:25, Andrew Cooper wrote: >>>>>> --- a/xen/arch/x86/cpu/intel.c >>>>>> +++ b/xen/arch/x86/cpu/intel.c >>>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >>>>>> if (disable) { >>>>>> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >>>>>> bootsym(trampoline_misc_enable_off) |= disable; >>>>>> + bootsym(trampoline_efer) |= EFER_NX; >>>>>> } >>>>> I'm fine with all other changes here, just this one concerns me: >>>>> Before your change we latch a value into trampoline_misc_enable_off >>>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >>>>> means that, on a hypervisor (like Xen itself) simply discarding >>>>> guest writes to the MSR (which is "fine" with the described usage >>>>> of ours up to now), with your change we'd now end up trying to set >>>>> EFER.NX when the feature may not actually be enabled in >>>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >>>>> I.e. don't we need to read back the MSR value here, and verify >>>>> we actually managed to clear the bit before actually OR-ing in >>>>> EFER_NX? >>>> If this is a problem in practice, execution won't continue beyond the >>>> next if() condition just out of context, which set EFER.NX on the BSP >>>> with an unguarded WRMSR. >>> And how is this any good? This kind of crash is exactly what I'm >>> asking to avoid. >> What is the point of working around a theoretical edge case of broken >> nesting under Xen which demonstrably doesn't exist in practice? > Well, so far nothing was said about this not being an actual problem. Its not an actual problem. If it were, we would have had crash reports. > I simply don't know whether hardware would refuse such an EFER write. I've just experimented - writing EFER.NX takes a #GP fault when MISC_ENABLE.XD is set. > If it does, it would be appropriate for hypervisors to also refuse > it. I.e. if we don't do so right now, correcting the behavior would > trip the code here. MISC_ENABLES.XD is architectural on any Intel system which enumerates NX, and if the bit is set, it can be cleared. (Although the semantics described in the SDM are broken. It is only available if NX is enumerated, which is obfuscated by setting XD). However, no hypervisor is going to bother virtualising this functionality. Either configure the VM with NX or without. (KVM for example doesn't virtualise MISC_ENABLES at all.) There is one corner case on out-of-support versions of Xen (which don't clear XD themselves) where XD would leak through and be ignored, after which Xen will take a #GP fault trying to set EFER.NX, but I am still firmly of the opinion that it is not worth putting in a workaround for an obsolete issue which doesn't exist in practice. ~Andrew
On 12.11.2019 17:09, Andrew Cooper wrote: > On 04/11/2019 15:31, Jan Beulich wrote: >> On 04.11.2019 16:22, Andrew Cooper wrote: >>> On 04/11/2019 15:03, Jan Beulich wrote: >>>> On 04.11.2019 15:59, Andrew Cooper wrote: >>>>> On 04/11/2019 13:25, Jan Beulich wrote: >>>>>> On 01.11.2019 21:25, Andrew Cooper wrote: >>>>>>> --- a/xen/arch/x86/cpu/intel.c >>>>>>> +++ b/xen/arch/x86/cpu/intel.c >>>>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >>>>>>> if (disable) { >>>>>>> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >>>>>>> bootsym(trampoline_misc_enable_off) |= disable; >>>>>>> + bootsym(trampoline_efer) |= EFER_NX; >>>>>>> } >>>>>> I'm fine with all other changes here, just this one concerns me: >>>>>> Before your change we latch a value into trampoline_misc_enable_off >>>>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >>>>>> means that, on a hypervisor (like Xen itself) simply discarding >>>>>> guest writes to the MSR (which is "fine" with the described usage >>>>>> of ours up to now), with your change we'd now end up trying to set >>>>>> EFER.NX when the feature may not actually be enabled in >>>>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >>>>>> I.e. don't we need to read back the MSR value here, and verify >>>>>> we actually managed to clear the bit before actually OR-ing in >>>>>> EFER_NX? >>>>> If this is a problem in practice, execution won't continue beyond the >>>>> next if() condition just out of context, which set EFER.NX on the BSP >>>>> with an unguarded WRMSR. >>>> And how is this any good? This kind of crash is exactly what I'm >>>> asking to avoid. >>> What is the point of working around a theoretical edge case of broken >>> nesting under Xen which demonstrably doesn't exist in practice? >> Well, so far nothing was said about this not being an actual problem. > > Its not an actual problem. If it were, we would have had crash reports. > >> I simply don't know whether hardware would refuse such an EFER write. > > I've just experimented - writing EFER.NX takes a #GP fault when > MISC_ENABLE.XD is set. > >> If it does, it would be appropriate for hypervisors to also refuse >> it. I.e. if we don't do so right now, correcting the behavior would >> trip the code here. > > MISC_ENABLES.XD is architectural on any Intel system which enumerates > NX, and if the bit is set, it can be cleared. (Although the semantics > described in the SDM are broken. It is only available if NX is > enumerated, which is obfuscated by setting XD). > > However, no hypervisor is going to bother virtualising this > functionality. Either configure the VM with NX or without. (KVM for > example doesn't virtualise MISC_ENABLES at all.) I'm sorry, but I still don't follow: You say "if the bit is set, it can be cleared", which is clearly not in line with our current guest MSR write handling. It just so happens that we have no command line option allowing to suppress the clearing of XD. If we had, according to your findings above we'd run into a #GP upon trying to set NX. How can you easily exclude another hypervisor actually doing so (and nobody having run into the issue simply because the option is rarely used)? Btw - all would be fine if the code in question was guarded by an NX feature check, but as you say that's not possible because XD set forces NX clear. However, our setting of EFER.NX could be guarded this way, as we _expect_ XD to be clear at that point. Jan
On 12/11/2019 17:15, Jan Beulich wrote: > On 12.11.2019 17:09, Andrew Cooper wrote: >> On 04/11/2019 15:31, Jan Beulich wrote: >>> On 04.11.2019 16:22, Andrew Cooper wrote: >>>> On 04/11/2019 15:03, Jan Beulich wrote: >>>>> On 04.11.2019 15:59, Andrew Cooper wrote: >>>>>> On 04/11/2019 13:25, Jan Beulich wrote: >>>>>>> On 01.11.2019 21:25, Andrew Cooper wrote: >>>>>>>> --- a/xen/arch/x86/cpu/intel.c >>>>>>>> +++ b/xen/arch/x86/cpu/intel.c >>>>>>>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >>>>>>>> if (disable) { >>>>>>>> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >>>>>>>> bootsym(trampoline_misc_enable_off) |= disable; >>>>>>>> + bootsym(trampoline_efer) |= EFER_NX; >>>>>>>> } >>>>>>> I'm fine with all other changes here, just this one concerns me: >>>>>>> Before your change we latch a value into trampoline_misc_enable_off >>>>>>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >>>>>>> means that, on a hypervisor (like Xen itself) simply discarding >>>>>>> guest writes to the MSR (which is "fine" with the described usage >>>>>>> of ours up to now), with your change we'd now end up trying to set >>>>>>> EFER.NX when the feature may not actually be enabled in >>>>>>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >>>>>>> I.e. don't we need to read back the MSR value here, and verify >>>>>>> we actually managed to clear the bit before actually OR-ing in >>>>>>> EFER_NX? >>>>>> If this is a problem in practice, execution won't continue beyond the >>>>>> next if() condition just out of context, which set EFER.NX on the BSP >>>>>> with an unguarded WRMSR. >>>>> And how is this any good? This kind of crash is exactly what I'm >>>>> asking to avoid. >>>> What is the point of working around a theoretical edge case of broken >>>> nesting under Xen which demonstrably doesn't exist in practice? >>> Well, so far nothing was said about this not being an actual problem. >> Its not an actual problem. If it were, we would have had crash reports. >> >>> I simply don't know whether hardware would refuse such an EFER write. >> I've just experimented - writing EFER.NX takes a #GP fault when >> MISC_ENABLE.XD is set. >> >>> If it does, it would be appropriate for hypervisors to also refuse >>> it. I.e. if we don't do so right now, correcting the behavior would >>> trip the code here. >> MISC_ENABLES.XD is architectural on any Intel system which enumerates >> NX, and if the bit is set, it can be cleared. (Although the semantics >> described in the SDM are broken. It is only available if NX is >> enumerated, which is obfuscated by setting XD). >> >> However, no hypervisor is going to bother virtualising this >> functionality. Either configure the VM with NX or without. (KVM for >> example doesn't virtualise MISC_ENABLES at all.) > I'm sorry, but I still don't follow: You say "if the bit is set, it > can be cleared", which is clearly not in line with our current guest > MSR write handling. Yes - Xen's MSR handing is broken, but you snipped that part of my reply. > It just so happens that we have no command line > option allowing to suppress the clearing of XD. Nor does Linux. As to the other hypervisors... > If we had, according > to your findings above we'd run into a #GP upon trying to set NX. Yes. > How can you easily exclude another hypervisor actually doing so (and > nobody having run into the issue simply because the option is rarely > used)? ... observe that they require NX support as a prerequisite to install. You will not find a system with XD set these days. > Btw - all would be fine if the code in question was guarded by an > NX feature check, but as you say that's not possible because XD set > forces NX clear. However, our setting of EFER.NX could be guarded > this way, as we _expect_ XD to be clear at that point. XD was clearly never designed for the OS to find and turn off, but NX functionality is simply too important to let misconfigured firmware get in the way of using. The long and the short of it is that this patch does not change Xen's behaviour WRT poorly virtualised XD. I am not convinced the behaviour is worth changing, and I don't have time for this scope creep. If you want to submit a fix then go ahead, but patch 3 of this is important to get in for 4.13 ~Andrew
On 13.11.2019 14:22, Andrew Cooper wrote: > I am not convinced the behaviour is worth changing, and I don't have > time for this scope creep. There's no scope creep here at all. All I'm asking for is that you don't blindly OR in EFER_NX into trampoline_efer, but rather check that it will be possible to successfully set it after the MISC_ENABLE write (by reading back the value, or by reading CPUID[0x80000001].NX again). Jan
On 13/11/2019 13:29, Jan Beulich wrote: > On 13.11.2019 14:22, Andrew Cooper wrote: >> I am not convinced the behaviour is worth changing, and I don't have >> time for this scope creep. > There's no scope creep here at all. Yes - it really is scope creep. This patch does not change the behaviour of Xen in the case of poor virtualisation of the bit. Xen will still crash either way. I have explained, repeatedly now, why I am not inclined to fix this. It is a bug which doesn't exist in practice. You are welcome to fix this yourself, in separate change to match the separate scope, when you are not blocking a 4.13 fix with your request. ~Andrew
On 19.11.2019 16:15, Andrew Cooper wrote: > On 13/11/2019 13:29, Jan Beulich wrote: >> On 13.11.2019 14:22, Andrew Cooper wrote: >>> I am not convinced the behaviour is worth changing, and I don't have >>> time for this scope creep. >> There's no scope creep here at all. > > Yes - it really is scope creep. > > This patch does not change the behaviour of Xen in the case of poor > virtualisation of the bit. Xen will still crash either way. So I have to apologize. What I didn't notice is if (disable & MSR_IA32_MISC_ENABLE_XD_DISABLE) { write_efer(read_efer() | EFER_NX); printk(KERN_INFO "re-enabled NX (Execute Disable) protection\n"); } in early_init_intel(). I simply didn't expect we'd already have such a blind EFER write. I therefore agree now that this is a pre-existing bug that you don't make any worse. Reviewed-by: Jan Beulich <jbeulich@suse.com> > I have explained, repeatedly now, why I am not inclined to fix this. It > is a bug which doesn't exist in practice. I should have been looking more closely; the lack of sufficient context did misguide me. Jan
On 19/11/2019 16:44, Jan Beulich wrote: > On 19.11.2019 16:15, Andrew Cooper wrote: >> On 13/11/2019 13:29, Jan Beulich wrote: >>> On 13.11.2019 14:22, Andrew Cooper wrote: >>>> I am not convinced the behaviour is worth changing, and I don't have >>>> time for this scope creep. >>> There's no scope creep here at all. >> Yes - it really is scope creep. >> >> This patch does not change the behaviour of Xen in the case of poor >> virtualisation of the bit. Xen will still crash either way. > So I have to apologize. What I didn't notice is > > if (disable & MSR_IA32_MISC_ENABLE_XD_DISABLE) { > write_efer(read_efer() | EFER_NX); > printk(KERN_INFO > "re-enabled NX (Execute Disable) protection\n"); > } > > in early_init_intel(). I simply didn't expect we'd already have > such a blind EFER write. I therefore agree now that this is a > pre-existing bug that you don't make any worse. > > Reviewed-by: Jan Beulich <jbeulich@suse.com> Thankyou. I'll get this series in now, along with some other straggling 4.13 content. ~Andrew
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index a1564b520b..77309e3c82 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -640,8 +640,13 @@ trampoline_setup: jbe 1f mov $0x80000001,%eax cpuid -1: mov %edx,sym_fs(cpuid_ext_features) - mov %edx,sym_fs(boot_cpu_data)+CPUINFO_FEATURE_OFFSET(X86_FEATURE_LM) +1: mov %edx, sym_fs(boot_cpu_data) + CPUINFO_FEATURE_OFFSET(X86_FEATURE_LM) + + /* Check for NX. Adjust EFER setting if available. */ + bt $cpufeat_bit(X86_FEATURE_NX), %edx + jnc 1f + orb $EFER_NX >> 8, 1 + sym_esi(trampoline_efer) +1: /* Check for availability of long mode. */ bt $cpufeat_bit(X86_FEATURE_LM),%edx diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 870ec79a2d..26584493bb 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -88,8 +88,9 @@ trampoline_gdt: GLOBAL(trampoline_misc_enable_off) .quad 0 -GLOBAL(cpuid_ext_features) - .long 0 +/* EFER OR-mask for boot paths. This gets adjusted with NX when available. */ +GLOBAL(trampoline_efer) + .long EFER_LME | EFER_SCE GLOBAL(trampoline_xen_phys_start) .long 0 @@ -132,14 +133,10 @@ trampoline_protmode_entry: 1: /* Set up EFER (Extended Feature Enable Register). */ - mov bootsym_rel(cpuid_ext_features,4,%edi) movl $MSR_EFER,%ecx rdmsr - or $EFER_LME|EFER_SCE,%eax /* Long Mode + SYSCALL/SYSRET */ - bt $cpufeat_bit(X86_FEATURE_NX),%edi /* No Execute? */ - jnc 1f - btsl $_EFER_NX,%eax /* No Execute */ -1: wrmsr + or bootsym_rel(trampoline_efer, 4, %eax) + wrmsr mov $(X86_CR0_PG | X86_CR0_AM | X86_CR0_WP | X86_CR0_NE |\ X86_CR0_ET | X86_CR0_MP | X86_CR0_PE), %eax diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S index 25ec2fa32b..fc47721f43 100644 --- a/xen/arch/x86/boot/wakeup.S +++ b/xen/arch/x86/boot/wakeup.S @@ -131,20 +131,11 @@ wakeup_32: wrmsr 1: - /* Will cpuid feature change after resume? */ /* Set up EFER (Extended Feature Enable Register). */ - mov bootsym_rel(cpuid_ext_features,4,%edi) - test $0x20100800,%edi /* SYSCALL/SYSRET, No Execute, Long Mode? */ - jz .Lskip_eferw movl $MSR_EFER,%ecx rdmsr - btsl $_EFER_LME,%eax /* Long Mode */ - btsl $_EFER_SCE,%eax /* SYSCALL/SYSRET */ - btl $20,%edi /* No Execute? */ - jnc 1f - btsl $_EFER_NX,%eax /* No Execute */ -1: wrmsr -.Lskip_eferw: + or bootsym_rel(trampoline_efer, 4, %eax) + wrmsr wbinvd diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index 6c6bd63301..e5ad17d8d9 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -391,9 +391,6 @@ static void generic_identify(struct cpuinfo_x86 *c) cpuid(0x80000001, &tmp, &tmp, &c->x86_capability[cpufeat_word(X86_FEATURE_LAHF_LM)], &c->x86_capability[cpufeat_word(X86_FEATURE_SYSCALL)]); - if (c == &boot_cpu_data) - bootsym(cpuid_ext_features) = - c->x86_capability[cpufeat_word(X86_FEATURE_NX)]; if (c->extended_cpuid_level >= 0x80000004) get_model_name(c); /* Default name */ diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c index 5356a6ae10..4d7324e4d0 100644 --- a/xen/arch/x86/cpu/intel.c +++ b/xen/arch/x86/cpu/intel.c @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) if (disable) { wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); bootsym(trampoline_misc_enable_off) |= disable; + bootsym(trampoline_efer) |= EFER_NX; } if (disable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID) diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 940ce12706..cde193a771 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -238,7 +238,7 @@ static void __init noreturn efi_arch_post_exit_boot(void) asm volatile("pushq $0\n\tpopfq"); rdmsrl(MSR_EFER, efer); efer |= EFER_SCE; - if ( cpuid_ext_features & cpufeat_mask(X86_FEATURE_NX) ) + if ( cpu_has_nx ) efer |= EFER_NX; wrmsrl(MSR_EFER, efer); write_cr0(X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | @@ -640,9 +640,11 @@ static void __init efi_arch_cpu(void) if ( (eax >> 16) == 0x8000 && eax > 0x80000000 ) { - cpuid_ext_features = cpuid_edx(0x80000001); boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_SYSCALL)] - = cpuid_ext_features; + = cpuid_edx(0x80000001); + + if ( cpu_has_nx ) + trampoline_efer |= EFER_NX; } } diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index b686156ea0..45d8f5117e 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -151,7 +151,7 @@ extern void ctxt_switch_levelling(const struct vcpu *next); extern void (*ctxt_switch_masking)(const struct vcpu *next); extern bool_t opt_cpu_info; -extern u32 cpuid_ext_features; +extern u32 trampoline_efer; extern u64 trampoline_misc_enable_off; /* Maximum width of physical addresses supported by the hardware. */
We have a cached cpuid_ext_features in the trampoline which is kept in sync by various pieces of boot logic. This is complicated, and all it is actually used for is to derive whether NX is safe to use. Replace it with a canned value to load into EFER. trampoline_setup() and efi_arch_cpu() now tweak trampoline_efer at the point that they are stashing the main copy of CPUID data. Similarly, early_init_intel() needs to tweak if it has re-enabled the use of NX. This simplifies the AP boot and S3 resume paths by using trampoline_efer directly, rather than locally turning FEATURE_NX into EFER_NX. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wl@xen.org> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Sergey Dyasli <sergey.dyasli@citrix.com> CC: Juergen Gross <jgross@suse.com> --- xen/arch/x86/boot/head.S | 9 +++++++-- xen/arch/x86/boot/trampoline.S | 13 +++++-------- xen/arch/x86/boot/wakeup.S | 13 ++----------- xen/arch/x86/cpu/common.c | 3 --- xen/arch/x86/cpu/intel.c | 1 + xen/arch/x86/efi/efi-boot.h | 8 +++++--- xen/include/asm-x86/processor.h | 2 +- 7 files changed, 21 insertions(+), 28 deletions(-)