diff mbox series

[v7,04/21] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access

Message ID 1555536851-17462-5-git-send-email-fenghua.yu@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/split_lock: Enable split lock detection | expand

Commit Message

Fenghua Yu April 17, 2019, 9:33 p.m. UTC
set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
operate on bitmap defined in x86_capability.

Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
the location is at:
base address of x86_capability + (bit offset in x86_capability / 64) * 8

Since base address of x86_capability may not be aligned to unsigned long,
the single unsigned long location may cross two cache lines and
accessing the location by locked BTS/BTR introductions will cause
split lock.

To fix the split lock issue, align x86_capability to size of unsigned long
so that the location will be always within one cache line.

Changing x86_capability's type to unsigned long may also fix the issue
because x86_capability will be naturally aligned to size of unsigned long.
But this needs additional code changes. So choose the simpler solution
by setting the array's alignment to size of unsigned long.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/processor.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

David Laight April 18, 2019, 9:20 a.m. UTC | #1
From: Fenghua Yu
> Sent: 17 April 2019 22:34
> 
> set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> operate on bitmap defined in x86_capability.
> 
> Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> the location is at:
> base address of x86_capability + (bit offset in x86_capability / 64) * 8
> 
> Since base address of x86_capability may not be aligned to unsigned long,
> the single unsigned long location may cross two cache lines and
> accessing the location by locked BTS/BTR introductions will cause
> split lock.

Isn't the problem that the type (and definition) of x86_capability[] are wrong.
If the 'bitmap' functions are used for it, it should be defined as a bitmap.
This would make it 'unsigned long' not __u32.

This type munging of bitmaps only works on LE systems.

OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses.
ISTR some of the associated functions use byte accesses.

Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and
32bit accesses.

> 
> To fix the split lock issue, align x86_capability to size of unsigned long
> so that the location will be always within one cache line.
> 
> Changing x86_capability's type to unsigned long may also fix the issue
> because x86_capability will be naturally aligned to size of unsigned long.
> But this needs additional code changes. So choose the simpler solution
> by setting the array's alignment to size of unsigned long.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  arch/x86/include/asm/processor.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 2bb3a648fc12..7c62b9ad6e5a 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
>  	__u32			extended_cpuid_level;
>  	/* Maximum supported CPUID level, -1=no CPUID: */
>  	int			cpuid_level;
> -	__u32			x86_capability[NCAPINTS + NBUGINTS];
> +	/* Aligned to size of unsigned long to avoid split lock in atomic ops */
> +	__u32			x86_capability[NCAPINTS + NBUGINTS]
> +				__aligned(sizeof(unsigned long));
>  	char			x86_vendor_id[16];
>  	char			x86_model_id[64];
>  	/* in KB - valid for CPUS which support this call: */
> --
> 2.19.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight April 18, 2019, 11:08 a.m. UTC | #2
From: David Laight
> Sent: 18 April 2019 10:21
> From: Fenghua Yu
> > Sent: 17 April 2019 22:34
> >
> > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> > operate on bitmap defined in x86_capability.
> >
> > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> > the location is at:
> > base address of x86_capability + (bit offset in x86_capability / 64) * 8
> >
> > Since base address of x86_capability may not be aligned to unsigned long,
> > the single unsigned long location may cross two cache lines and
> > accessing the location by locked BTS/BTR introductions will cause
> > split lock.
> 
> Isn't the problem that the type (and definition) of x86_capability[] are wrong.
> If the 'bitmap' functions are used for it, it should be defined as a bitmap.
> This would make it 'unsigned long' not __u32.
> 
> This type munging of bitmaps only works on LE systems.
> 
> OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses.
> ISTR some of the associated functions use byte accesses.
> 
> Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and
> 32bit accesses.

A quick look shows that this isn't the only __32[] that is being
cast to (unsigned long) and then to set/test/clear_bit() in those
files.

I wonder how much other code is applying such casts?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Thomas Gleixner April 18, 2019, 11:49 a.m. UTC | #3
On Thu, 18 Apr 2019, David Laight wrote:
> From: David Laight
> > Sent: 18 April 2019 10:21
> > From: Fenghua Yu
> > > Sent: 17 April 2019 22:34
> > >
> > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> > > operate on bitmap defined in x86_capability.
> > >
> > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> > > the location is at:
> > > base address of x86_capability + (bit offset in x86_capability / 64) * 8
> > >
> > > Since base address of x86_capability may not be aligned to unsigned long,
> > > the single unsigned long location may cross two cache lines and
> > > accessing the location by locked BTS/BTR introductions will cause
> > > split lock.
> > 
> > Isn't the problem that the type (and definition) of x86_capability[] are wrong.
> > If the 'bitmap' functions are used for it, it should be defined as a bitmap.
> > This would make it 'unsigned long' not __u32.
> > 
> > This type munging of bitmaps only works on LE systems.
> > 
> > OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses.
> > ISTR some of the associated functions use byte accesses.
> > 
> > Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and
> > 32bit accesses.
> 
> A quick look shows that this isn't the only __32[] that is being
> cast to (unsigned long) and then to set/test/clear_bit() in those
> files.
> 
> I wonder how much other code is applying such casts?

The reason for the cpuid stuff using u32 is that this is actually the width
of the information retrieved from CPUID.

Thanks,

	tglx
David Laight April 18, 2019, 1:14 p.m. UTC | #4
From: Thomas Gleixner
> Sent: 18 April 2019 12:49
> On Thu, 18 Apr 2019, David Laight wrote:
> > From: David Laight
> > > Sent: 18 April 2019 10:21
> > > From: Fenghua Yu
> > > > Sent: 17 April 2019 22:34
> > > >
> > > > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> > > > operate on bitmap defined in x86_capability.
> > > >
> > > > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> > > > the location is at:
> > > > base address of x86_capability + (bit offset in x86_capability / 64) * 8
> > > >
> > > > Since base address of x86_capability may not be aligned to unsigned long,
> > > > the single unsigned long location may cross two cache lines and
> > > > accessing the location by locked BTS/BTR introductions will cause
> > > > split lock.
> > >
> > > Isn't the problem that the type (and definition) of x86_capability[] are wrong.
> > > If the 'bitmap' functions are used for it, it should be defined as a bitmap.
> > > This would make it 'unsigned long' not __u32.
> > >
> > > This type munging of bitmaps only works on LE systems.
> > >
> > > OTOH the locked BTS/BTR instructions could be changed to use 32 bit accesses.
> > > ISTR some of the associated functions use byte accesses.
> > >
> > > Perhaps there ought to be asm wrappers for BTS/BTR that do 8bit and
> > > 32bit accesses.
> >
> > A quick look shows that this isn't the only __32[] that is being
> > cast to (unsigned long) and then to set/test/clear_bit() in those
> > files.
> >
> > I wonder how much other code is applying such casts?
> 
> The reason for the cpuid stuff using u32 is that this is actually the width
> of the information retrieved from CPUID.

Right, but you shouldn't (as has been found out) cast pointers
to integer types.

Running
grep -r --include '*.[ch]' '_bit([^(]*, *([^)]* ' .
over the entire kernel source tree shows quite a few 'dubious' casts.

They'll be doubly dubious on BE systems.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight April 18, 2019, 1:26 p.m. UTC | #5
From: David Laight
> Sent: 18 April 2019 14:15
...
> Running
> grep -r --include '*.[ch]' '_bit([^(]*, *([^)]* ' .
> over the entire kernel source tree shows quite a few 'dubious' casts.
> 
> They'll be doubly dubious on BE systems.

The alternate pattern:
grep -r --include '*.[ch]' '_bit([^(]*, *([^)]*\*)' .
has a few less false positives and detects some extras with (void*)&foo->bar.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2bb3a648fc12..7c62b9ad6e5a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -93,7 +93,9 @@  struct cpuinfo_x86 {
 	__u32			extended_cpuid_level;
 	/* Maximum supported CPUID level, -1=no CPUID: */
 	int			cpuid_level;
-	__u32			x86_capability[NCAPINTS + NBUGINTS];
+	/* Aligned to size of unsigned long to avoid split lock in atomic ops */
+	__u32			x86_capability[NCAPINTS + NBUGINTS]
+				__aligned(sizeof(unsigned long));
 	char			x86_vendor_id[16];
 	char			x86_model_id[64];
 	/* in KB - valid for CPUS which support this call: */