| Submitter | Andy Lutomirski |
|---|---|
| Date | May 1, 2013, 5:46 a.m. |
| Message ID | <3cdeaedaa41e258e8aa9ca83d79a936e0b9462bc.1367385613.git.luto@amacapital.net> |
| Download | mbox | patch |
| Permalink | /patch/2507111/ |
| State | New, archived |
| Headers | show |
Comments
* Andy Lutomirski <luto@amacapital.net> wrote: > From: Andrew Lutomirski <luto@mit.edu> > > Intel SDM volume 3A, 8.4.2 says: > > Software can disable fast-string operation by clearing the > fast-string-enable bit (bit 0) of IA32_MISC_ENABLE MSR. > However, Intel recomments that system software always enable > fast-string operation. > > The Intel DQ67SW board (with latest BIOS) disables fast string > operations if TXT is enabled. A Lenovo X220 disables it regardless > of TXT setting. I doubt I'm the only person with a dumb BIOS like > this. Hm, I think we could try this. Do we know whether Windows enables it? Most laptop vendors will test/certify on Windows, so that's the 'expected' environment. > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > > v4 was a almost two years ago, but I just noticed that this is still a problem. > This is tested on v3.9. > > https://patchwork.kernel.org/patch/1073972/ > > This is identical to v4 of this patch except that it uses wrmsrl_safe instead > of wrmsr_safe. > > arch/x86/kernel/cpu/intel.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c > index 1905ce9..a4a3ef2 100644 > --- a/arch/x86/kernel/cpu/intel.c > +++ b/arch/x86/kernel/cpu/intel.c > @@ -29,6 +29,7 @@ > static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) > { > u64 misc_enable; > + bool allow_fast_string = true; > > /* Unmask CPUID levels if masked: */ > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { > @@ -119,10 +120,11 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) > * (model 2) with the same problem. > */ > if (c->x86 == 15) { > - rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > + allow_fast_string = false; > > + rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) { > - printk(KERN_INFO "kmemcheck: Disabling fast string operations\n"); > + printk_once(KERN_INFO "kmemcheck: Disabling fast string operations\n"); > > misc_enable &= ~MSR_IA32_MISC_ENABLE_FAST_STRING; > wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > @@ -131,13 +133,28 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) > #endif > > /* > - * If fast string is not enabled in IA32_MISC_ENABLE for any reason, > - * clear the fast string and enhanced fast string CPU capabilities. > + * If BIOS didn't enable fast string operation, try to enable > + * it ourselves. If that fails, then clear the fast string > + * and enhanced fast string CPU capabilities. > */ > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { > rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > + > + if (allow_fast_string && > + !(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) { > + misc_enable |= MSR_IA32_MISC_ENABLE_FAST_STRING; > + wrmsrl_safe(MSR_IA32_MISC_ENABLE, misc_enable); > + > + /* Re-read to make sure it stuck. */ > + rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > + > + if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) > + printk_once(KERN_INFO FW_WARN "IA32_MISC_ENABLE.FAST_STRING_ENABLE was not set\n"); > + } > + > if (!(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) { > - printk(KERN_INFO "Disabled fast string operations\n"); > + if (allow_fast_string) > + printk_once(KERN_INFO "Failed to enable fast string operations\n"); > setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); > setup_clear_cpu_cap(X86_FEATURE_ERMS); I think we should also printk if we enabled it against the BIOS setting - so that if the user sees any problems it can possibly be tracked back to this change ... I.e. stay silent if the BIOS has it enabled already - but otherwise document our action. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Apr 30, 2013 at 10:46:00PM -0700, Andy Lutomirski wrote: > From: Andrew Lutomirski <luto@mit.edu> > > Intel SDM volume 3A, 8.4.2 says: > > Software can disable fast-string operation by clearing the > fast-string-enable bit (bit 0) of IA32_MISC_ENABLE MSR. > However, Intel recomments that system software always enable > fast-string operation. > > The Intel DQ67SW board (with latest BIOS) disables fast string > operations if TXT is enabled. A Lenovo X220 disables it regardless > of TXT setting. I doubt I'm the only person with a dumb BIOS like > this. Hmm, interesting. So I have a x230 and it is enabled here: # rdmsr -x 0x000001a0 850089 It could be some fast strings erratum like AAJ6 or BD3 (they have different names for what apparently is the same erratum in different docs). Simply search for "intel fast strings erratum" and sample the first couple of pdfs to get an idea. If this erratum is actually the case here, it has no fix according to the docs (same core in different packages :)) and it looks like OEM vendors want to be on the safe side by disabling fast strings. So, in this case, if you force-enable it, you could risk forcing the erratum if the conditions apply (crossing page boundary with inconsistent memory types). You could check whether the CPU revisions you have are affected by the erratum. > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > > v4 was a almost two years ago, but I just noticed that this is still a problem. > This is tested on v3.9. > > https://patchwork.kernel.org/patch/1073972/ > > This is identical to v4 of this patch except that it uses wrmsrl_safe instead > of wrmsr_safe. > > arch/x86/kernel/cpu/intel.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c > index 1905ce9..a4a3ef2 100644 > --- a/arch/x86/kernel/cpu/intel.c > +++ b/arch/x86/kernel/cpu/intel.c > @@ -29,6 +29,7 @@ > static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) > { > u64 misc_enable; > + bool allow_fast_string = true; > > /* Unmask CPUID levels if masked: */ > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { > @@ -119,10 +120,11 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) > * (model 2) with the same problem. > */ > if (c->x86 == 15) { > - rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > + allow_fast_string = false; > > + rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) { > - printk(KERN_INFO "kmemcheck: Disabling fast string operations\n"); > + printk_once(KERN_INFO "kmemcheck: Disabling fast string operations\n"); > > misc_enable &= ~MSR_IA32_MISC_ENABLE_FAST_STRING; > wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > @@ -131,13 +133,28 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) > #endif > > /* > - * If fast string is not enabled in IA32_MISC_ENABLE for any reason, > - * clear the fast string and enhanced fast string CPU capabilities. > + * If BIOS didn't enable fast string operation, try to enable > + * it ourselves. If that fails, then clear the fast string > + * and enhanced fast string CPU capabilities. > */ > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { > rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > + > + if (allow_fast_string && > + !(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) { > + misc_enable |= MSR_IA32_MISC_ENABLE_FAST_STRING; > + wrmsrl_safe(MSR_IA32_MISC_ENABLE, misc_enable); > + > + /* Re-read to make sure it stuck. */ > + rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > + > + if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) > + printk_once(KERN_INFO FW_WARN "IA32_MISC_ENABLE.FAST_STRING_ENABLE was not set\n"); Nit: Why this printk here? You say already below that we've failed enabling fast strings. > + } > + > if (!(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) { > - printk(KERN_INFO "Disabled fast string operations\n"); > + if (allow_fast_string) > + printk_once(KERN_INFO "Failed to enable fast string operations\n"); > setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); > setup_clear_cpu_cap(X86_FEATURE_ERMS); > } > --
On Wed, May 01, 2013 at 01:33:52PM +0200, Borislav Petkov wrote: > It could be some fast strings erratum like AAJ6 or BD3 (they have > different names for what apparently is the same erratum in different > docs). Simply search for "intel fast strings erratum" and sample the > first couple of pdfs to get an idea. This errata does seem pretty scary: Problem: Under certain conditions as described in the Software Developers Manual section "Out-of-Order Stores For String Operations in Pentium 4, Intel Xeon, and P6 Family Processors" the processor performs REP MOVS or REP STOS as fast strings. Due to this erratum fast string REP MOVS/REP STOS instructions that cross page boundaries from WB/WC memory types to UC/WP/WT memory types, may start using an incorrect data size or may observe memory ordering violations. Implication: Upon crossing the page boundary the following may occur, dependent on the new page memory type: * UC the data size of each write will now always be 8 bytes, as opposed to the original data size. * WP the data size of each write will now always be 8 bytes, as opposed to the original data size and there may be a memory ordering violation. * WT there may be a memory ordering violation. In fact, there is the question of whether we should be checking to see if the CPU stepping is one of the ones with the bug, and if so, to have Linux disable fast strings even if the BIOS didn't, instead of blindly enabling fast strings.... - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/01/2013 09:34 AM, Theodore Ts'o wrote: > > In fact, there is the question of whether we should be checking to see > if the CPU stepping is one of the ones with the bug, and if so, to > have Linux disable fast strings even if the BIOS didn't, instead of > blindly enabling fast strings.... > The erratum reads seriously, but it only affects crossings between pages of different page types, which is rare in itself. WT and WP are not even used in Linux; the UC case we end up doing 8-byte stores instead of the proper size, which is wrong, but for the case where the user is malicious the user could just do that directly, and it seems extremely hard to envision a scenario where someone would do that intentionally. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Theodore Ts'o <tytso@mit.edu> writes: > > In fact, there is the question of whether we should be checking to see > if the CPU stepping is one of the ones with the bug, and if so, to > have Linux disable fast strings even if the BIOS didn't, instead of > blindly enabling fast strings.... Crossing pages with different memory attributes shouldn't happen in normal operation. I wouldn't be too concerned about that one. I would suggest to leave the bit alone. There may be valid reasons on the system to either set or not set it, which the kernel doesn't necessarily know. -Andi
On Wed, May 1, 2013 at 9:42 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 05/01/2013 09:34 AM, Theodore Ts'o wrote: >> >> In fact, there is the question of whether we should be checking to see >> if the CPU stepping is one of the ones with the bug, and if so, to >> have Linux disable fast strings even if the BIOS didn't, instead of >> blindly enabling fast strings.... >> > > The erratum reads seriously, but it only affects crossings between pages > of different page types, which is rare in itself. WT and WP are not > even used in Linux; the UC case we end up doing 8-byte stores instead of > the proper size, which is wrong, but for the case where the user is > malicious the user could just do that directly, and it seems extremely > hard to envision a scenario where someone would do that intentionally. (Just my luck. I'm currently trying to implement WT via PAT by stealing a slot from either UC or UC-.) There's already a warning in the Intel system programming manual: 11.5.2.3 Writing Values Across Pages with Different Memory Types If two adjoining pages in memory have different memory types, and a word or longer operand is written to a memory location that crosses the page boundary between those two pages, the operand might be written to memory twice. This action does not present a problem for writes to actual memory; however, if a device is mapped the memory space assigned to the pages, the device might malfunction. Is there any code that memcpys across memory types and expects any particularly sensible behavior out of it? I'll try to see what Windows is doing. From my cursory reading of the errata documents, this affects basically all CPUs -- it doesn't seem to have been fixed in any revision of anything. So this erratum doesn't seem to explain why different BIOSes would do different things. --Andy P.S. The printk is in the right place in the patch, but the text is misleading. I'll fix it if there's a v6. > > -hpa >
On Wed, May 01, 2013 at 09:42:30AM -0700, H. Peter Anvin wrote: > The erratum reads seriously, but it only affects crossings between pages > of different page types, which is rare in itself. WT and WP are not > even used in Linux; the UC case we end up doing 8-byte stores instead of > the proper size, which is wrong, but for the case where the user is > malicious the user could just do that directly, and it seems extremely > hard to envision a scenario where someone would do that intentionally. Yeah, I wasn't so much worried about a malicious user as much as a situation where the you're trying to debug a mysterious and hard-to-reproduce failure, start tearing your hair out, and wondering whether you're going insane or the compiler hates you and is out to get you and you start staring at assembly code to try to figure out how some piece of memory got mysteriously corrupted.... - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/01/2013 09:54 AM, Andy Lutomirski wrote: > > (Just my luck. I'm currently trying to implement WT via PAT by > stealing a slot from either UC or UC-.) > NAK on that. Use a slot in the upper half, perhaps (we already blacklist the CPUs for which the upper half aren't usable.) What do you want WT for, anyway? > Is there any code that memcpys across memory types and expects any > particularly sensible behavior out of it? Unlikely (see my post.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/01/2013 10:20 AM, Theodore Ts'o wrote: > On Wed, May 01, 2013 at 09:42:30AM -0700, H. Peter Anvin wrote: >> The erratum reads seriously, but it only affects crossings between pages >> of different page types, which is rare in itself. WT and WP are not >> even used in Linux; the UC case we end up doing 8-byte stores instead of >> the proper size, which is wrong, but for the case where the user is >> malicious the user could just do that directly, and it seems extremely >> hard to envision a scenario where someone would do that intentionally. > > Yeah, I wasn't so much worried about a malicious user as much as a > situation where the you're trying to debug a mysterious and > hard-to-reproduce failure, start tearing your hair out, and wondering > whether you're going insane or the compiler hates you and is out to > get you and you start staring at assembly code to try to figure out > how some piece of memory got mysteriously corrupted.... > If you are crossing pages with different memory types, the fact that the sizes being written are wrong is probably the least of your problems. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wed, May 1, 2013 at 10:35 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 05/01/2013 09:54 AM, Andy Lutomirski wrote: >> >> (Just my luck. I'm currently trying to implement WT via PAT by >> stealing a slot from either UC or UC-.) >> > > NAK on that. Use a slot in the upper half, perhaps (we already > blacklist the CPUs for which the upper half aren't usable.) Isn't the upper half incompatible with large pages? Why the NAK? Unless I've misread the spec, UC and UC- are only different if there's a WC MTRR set, and I haven't found anything in the kernel yet that adds a WC MTRR that actually needs that MTRR if PAT is enabled. I've made my way about half-way through the mtrr_add calls so far. (The drivers that use MTRRs are graphics devices, ivtv, fusion MPT, myri10ge, and infiniband.) > > What do you want WT for, anyway? Generically, memory regions in which writes have side effects but reads are just reads and should be cached. In particular, persistent (i.e. nonvolatile) memory. There's an NDA involved, but I can safely say (at least): there seem to be nifty devices that aren't quite RAM that are nonetheless presented to the system as RAM. Write are durable, but only if they make it out of cache before power fails or the CPU resets in such a way that caches are invalidated but not written back. UC and WC are a bit heavy-handed because read caching is fine. (PowerPC has nice instructions for things like "write this back now", but x86 seems to be missing any way other than WT to force data out to RAM without invalidating the cache line.) Making this work with a WT MTRR is probably doable, but it's IMO rather ugly. Even if I go that route, I'd still want to convince graphics drivers to stop wasting MTRRs, since they don't need them and they tend to be in short supply. Here's an example: http://www.tomshardware.com/news/Viking-ArxCis-NV-NVDIMM-RAM,21892.html --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/01/2013 10:50 AM, Andy Lutomirski wrote: > > Isn't the upper half incompatible with large pages? > No, just with attributes *on the page tables themselves*. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wed, May 1, 2013 at 10:54 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 05/01/2013 10:50 AM, Andy Lutomirski wrote: >> >> Isn't the upper half incompatible with large pages? >> > > No, just with attributes *on the page tables themselves*. Thanks :) Now I found the somewhat alarming algorithm in section 4.9.2. This will be a bit unpleasant, though, since the _PAGE_CACHE_xxx macros will become rather confused. I suppose there's no fundamental reason that pgprot_t has to correspond to pmd bit positions. Sigh. --Andy > > -hpa >
* Andy Lutomirski <luto@amacapital.net> wrote: > > What do you want WT for, anyway? > > Generically, memory regions in which writes have side effects but reads > are just reads and should be cached. > > In particular, persistent (i.e. nonvolatile) memory. There's an NDA > involved, but I can safely say (at least): there seem to be nifty > devices that aren't quite RAM that are nonetheless presented to the > system as RAM. Write are durable, but only if they make it out of cache > before power fails or the CPU resets in such a way that caches are > invalidated but not written back. UC and WC are a bit heavy-handed > because read caching is fine. (PowerPC has nice instructions for things > like "write this back now", but x86 seems to be missing any way other > than WT to force data out to RAM without invalidating the cache line.) > > Making this work with a WT MTRR is probably doable, but it's IMO rather > ugly. Even if I go that route, I'd still want to convince graphics > drivers to stop wasting MTRRs, since they don't need them and they tend > to be in short supply. > > Here's an example: > > http://www.tomshardware.com/news/Viking-ArxCis-NV-NVDIMM-RAM,21892.html This looks potentially useful. I'd consider your cache-attributes review and cleanups to drivers and infrastructure to be the main upstream benefit we win from your effort. So as long as your patches go in that general direction, and the PAT code and its usage gets cleaner and more organized, and there's no showstopper issue discovered, the fact that you gain ioremap_wt() for your driver is mostly just a happy coincidence that we don't mind that much. Maybe in the end we'd have to hide it behind some sort of CONFIG_COMPAT_PAT trigger and turn it off on old/buggy systems - but in the first approximation it would be nice to try and make this just a single variant with no Kconfig complexity? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Patch
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 1905ce9..a4a3ef2 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -29,6 +29,7 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; + bool allow_fast_string = true; /* Unmask CPUID levels if masked: */ if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { @@ -119,10 +120,11 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) * (model 2) with the same problem. */ if (c->x86 == 15) { - rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); + allow_fast_string = false; + rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) { - printk(KERN_INFO "kmemcheck: Disabling fast string operations\n"); + printk_once(KERN_INFO "kmemcheck: Disabling fast string operations\n"); misc_enable &= ~MSR_IA32_MISC_ENABLE_FAST_STRING; wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable); @@ -131,13 +133,28 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) #endif /* - * If fast string is not enabled in IA32_MISC_ENABLE for any reason, - * clear the fast string and enhanced fast string CPU capabilities. + * If BIOS didn't enable fast string operation, try to enable + * it ourselves. If that fails, then clear the fast string + * and enhanced fast string CPU capabilities. */ if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); + + if (allow_fast_string && + !(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) { + misc_enable |= MSR_IA32_MISC_ENABLE_FAST_STRING; + wrmsrl_safe(MSR_IA32_MISC_ENABLE, misc_enable); + + /* Re-read to make sure it stuck. */ + rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); + + if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) + printk_once(KERN_INFO FW_WARN "IA32_MISC_ENABLE.FAST_STRING_ENABLE was not set\n"); + } + if (!(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) { - printk(KERN_INFO "Disabled fast string operations\n"); + if (allow_fast_string) + printk_once(KERN_INFO "Failed to enable fast string operations\n"); setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); setup_clear_cpu_cap(X86_FEATURE_ERMS); }