Message ID | 1555536851-17462-5-git-send-email-fenghua.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/split_lock: Enable split lock detection | expand |
From: Fenghua Yu > Sent: 17 April 2019 22:34 > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to > operate on bitmap defined in x86_capability. > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode, > the location is at: > base address of x86_capability + (bit offset in x86_capability / 64) * 8 > > Since base address of x86_capability may not be aligned to unsigned long, > the single unsigned long location may cross two cache lines and > accessing the location by locked BTS/BTR introductions will cause > split lock. Isn't the problem that the type (and definition) of x86_capability[] are wrong. If the 'bitmap' functions are used for it, it should be defined as a bitmap. This would make it 'unsigned long' not __u32. This type munging of bitmaps only works on LE systems. OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses. ISTR some of the associated functions use byte accesses. Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and 32bit accesses. > > To fix the split lock issue, align x86_capability to size of unsigned long > so that the location will be always within one cache line. > > Changing x86_capability's type to unsigned long may also fix the issue > because x86_capability will be naturally aligned to size of unsigned long. > But this needs additional code changes. So choose the simpler solution > by setting the array's alignment to size of unsigned long. > > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > --- > arch/x86/include/asm/processor.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 2bb3a648fc12..7c62b9ad6e5a 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -93,7 +93,9 @@ struct cpuinfo_x86 { > __u32 extended_cpuid_level; > /* Maximum supported CPUID level, -1=no CPUID: */ > int cpuid_level; > - __u32 x86_capability[NCAPINTS + NBUGINTS]; > + /* Aligned to size of unsigned long to avoid split lock in atomic ops */ > + __u32 x86_capability[NCAPINTS + NBUGINTS] > + __aligned(sizeof(unsigned long)); > char x86_vendor_id[16]; > char x86_model_id[64]; > /* in KB - valid for CPUS which support this call: */ > -- > 2.19.1 - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: David Laight > Sent: 18 April 2019 10:21 > From: Fenghua Yu > > Sent: 17 April 2019 22:34 > > > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to > > operate on bitmap defined in x86_capability. > > > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode, > > the location is at: > > base address of x86_capability + (bit offset in x86_capability / 64) * 8 > > > > Since base address of x86_capability may not be aligned to unsigned long, > > the single unsigned long location may cross two cache lines and > > accessing the location by locked BTS/BTR introductions will cause > > split lock. > > Isn't the problem that the type (and definition) of x86_capability[] are wrong. > If the 'bitmap' functions are used for it, it should be defined as a bitmap. > This would make it 'unsigned long' not __u32. > > This type munging of bitmaps only works on LE systems. > > OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses. > ISTR some of the associated functions use byte accesses. > > Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and > 32bit accesses. A quick look shows that this isn't the only __32[] that is being cast to (unsigned long) and then to set/test/clear_bit() in those files. I wonder how much other code is applying such casts? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, 18 Apr 2019, David Laight wrote: > From: David Laight > > Sent: 18 April 2019 10:21 > > From: Fenghua Yu > > > Sent: 17 April 2019 22:34 > > > > > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to > > > operate on bitmap defined in x86_capability. > > > > > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode, > > > the location is at: > > > base address of x86_capability + (bit offset in x86_capability / 64) * 8 > > > > > > Since base address of x86_capability may not be aligned to unsigned long, > > > the single unsigned long location may cross two cache lines and > > > accessing the location by locked BTS/BTR introductions will cause > > > split lock. > > > > Isn't the problem that the type (and definition) of x86_capability[] are wrong. > > If the 'bitmap' functions are used for it, it should be defined as a bitmap. > > This would make it 'unsigned long' not __u32. > > > > This type munging of bitmaps only works on LE systems. > > > > OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses. > > ISTR some of the associated functions use byte accesses. > > > > Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and > > 32bit accesses. > > A quick look shows that this isn't the only __32[] that is being > cast to (unsigned long) and then to set/test/clear_bit() in those > files. > > I wonder how much other code is applying such casts? The reason for the cpuid stuff using u32 is that this is actually the width of the information retrieved from CPUID. Thanks, tglx
From: Thomas Gleixner > Sent: 18 April 2019 12:49 > On Thu, 18 Apr 2019, David Laight wrote: > > From: David Laight > > > Sent: 18 April 2019 10:21 > > > From: Fenghua Yu > > > > Sent: 17 April 2019 22:34 > > > > > > > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to > > > > operate on bitmap defined in x86_capability. > > > > > > > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode, > > > > the location is at: > > > > base address of x86_capability + (bit offset in x86_capability / 64) * 8 > > > > > > > > Since base address of x86_capability may not be aligned to unsigned long, > > > > the single unsigned long location may cross two cache lines and > > > > accessing the location by locked BTS/BTR introductions will cause > > > > split lock. > > > > > > Isn't the problem that the type (and definition) of x86_capability[] are wrong. > > > If the 'bitmap' functions are used for it, it should be defined as a bitmap. > > > This would make it 'unsigned long' not __u32. > > > > > > This type munging of bitmaps only works on LE systems. > > > > > > OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses. > > > ISTR some of the associated functions use byte accesses. > > > > > > Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and > > > 32bit accesses. > > > > A quick look shows that this isn't the only __32[] that is being > > cast to (unsigned long) and then to set/test/clear_bit() in those > > files. > > > > I wonder how much other code is applying such casts? > > The reason for the cpuid stuff using u32 is that this is actually the width > of the information retrieved from CPUID. Right, but you shouldn't (as has been found out) cast pointers to integer types. Running grep -r --include '*.[ch]' '_bit([^(]*, *([^)]* ' . over the entire kernel source tree shows quite a few 'dubious' casts. They'll be doubly dubious on BE systems. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: David Laight > Sent: 18 April 2019 14:15 ... > Running > grep -r --include '*.[ch]' '_bit([^(]*, *([^)]* ' . > over the entire kernel source tree shows quite a few 'dubious' casts. > > They'll be doubly dubious on BE systems. The alternate pattern: grep -r --include '*.[ch]' '_bit([^(]*, *([^)]*\*)' . has a few less false positives and detects some extras with (void*)&foo->bar. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 2bb3a648fc12..7c62b9ad6e5a 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -93,7 +93,9 @@ struct cpuinfo_x86 { __u32 extended_cpuid_level; /* Maximum supported CPUID level, -1=no CPUID: */ int cpuid_level; - __u32 x86_capability[NCAPINTS + NBUGINTS]; + /* Aligned to size of unsigned long to avoid split lock in atomic ops */ + __u32 x86_capability[NCAPINTS + NBUGINTS] + __aligned(sizeof(unsigned long)); char x86_vendor_id[16]; char x86_model_id[64]; /* in KB - valid for CPUS which support this call: */
set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to operate on bitmap defined in x86_capability. Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode, the location is at: base address of x86_capability + (bit offset in x86_capability / 64) * 8 Since base address of x86_capability may not be aligned to unsigned long, the single unsigned long location may cross two cache lines and accessing the location by locked BTS/BTR introductions will cause split lock. To fix the split lock issue, align x86_capability to size of unsigned long so that the location will be always within one cache line. Changing x86_capability's type to unsigned long may also fix the issue because x86_capability will be naturally aligned to size of unsigned long. But this needs additional code changes. So choose the simpler solution by setting the array's alignment to size of unsigned long. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> --- arch/x86/include/asm/processor.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)