diff mbox series

[v2,01/10] x86/mtrr: fix MTRR fixup on APs

Message ID 20220820092533.29420-2-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: make pat and mtrr independent from each other | expand

Commit Message

Jürgen Groß Aug. 20, 2022, 9:25 a.m. UTC
When booting or resuming the system MTRR state is saved on the boot
processor and then this state is loaded into MTRRs of all other cpus.
During update of the MTRRs the MTRR mechanism needs to be disabled by
writing the related MSR. The old contents of this MSR are saved in a
set of static variables and later those static variables are used to
restore the MSR.

In case the MSR contents need to be modified on a cpu due to the MSR
not having been initialized properly by the BIOS, the related update
function is modifying the static variables accordingly.

Unfortunately the MTRR state update is usually running on all cpus
at the same time, so using just one set of static variables for all
cpus is racy in case the MSR contents differ across cpus.

Fix that by using percpu variables for saving the MSR contents.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
---
I thought adding a "Fixes:" tag for the kernel's initial git commit
would maybe be entertaining, but without being really helpful.
The percpu variables were preferred over on-stack ones in order to
avoid more code churn in followup patches decoupling PAT from MTRR
support.
V2:
- new patch
---
 arch/x86/kernel/cpu/mtrr/generic.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Comments

Greg KH Aug. 20, 2022, 10:28 a.m. UTC | #1
On Sat, Aug 20, 2022 at 11:25:24AM +0200, Juergen Gross wrote:
> When booting or resuming the system MTRR state is saved on the boot
> processor and then this state is loaded into MTRRs of all other cpus.
> During update of the MTRRs the MTRR mechanism needs to be disabled by
> writing the related MSR. The old contents of this MSR are saved in a
> set of static variables and later those static variables are used to
> restore the MSR.
> 
> In case the MSR contents need to be modified on a cpu due to the MSR
> not having been initialized properly by the BIOS, the related update
> function is modifying the static variables accordingly.
> 
> Unfortunately the MTRR state update is usually running on all cpus
> at the same time, so using just one set of static variables for all
> cpus is racy in case the MSR contents differ across cpus.
> 
> Fix that by using percpu variables for saving the MSR contents.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> I thought adding a "Fixes:" tag for the kernel's initial git commit
> would maybe be entertaining, but without being really helpful.

So that means I will just do a "best guess" as to how far to backport
things.  Hopefully I guess well...

thanks,

greg k-h
Borislav Petkov Aug. 21, 2022, 12:25 p.m. UTC | #2
On Sat, Aug 20, 2022 at 11:25:24AM +0200, Juergen Gross wrote:
> When booting or resuming the system MTRR state is saved on the boot
> processor and then this state is loaded into MTRRs of all other cpus.

s/cpu/CPU/g

Pls check all commit messages.

> During update of the MTRRs the MTRR mechanism needs to be disabled by
> writing the related MSR. The old contents of this MSR are saved in a
> set of static variables and later those static variables are used to
> restore the MSR.
> 
> In case the MSR contents need to be modified on a cpu due to the MSR
> not having been initialized properly by the BIOS, the related update
> function is modifying the static variables accordingly.
> 
> Unfortunately the MTRR state update is usually running on all cpus
> at the same time, so using just one set of static variables for all
> cpus is racy in case the MSR contents differ across cpus.
> 
> Fix that by using percpu variables for saving the MSR contents.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> I thought adding a "Fixes:" tag for the kernel's initial git commit
> would maybe be entertaining, but without being really helpful.
> The percpu variables were preferred over on-stack ones in order to
> avoid more code churn in followup patches decoupling PAT from MTRR
> support.

So if that thing has been broken for so long and no one noticed, we
could just as well not backport to stable at all...

> V2:
> - new patch
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 558108296f3c..3d185fcf08ca 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -679,7 +679,8 @@ static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr)
>  	return changed;
>  }
>  
> -static u32 deftype_lo, deftype_hi;
> +static DEFINE_PER_CPU(u32, deftype_lo);
> +static DEFINE_PER_CPU(u32, deftype_hi);

My APM says that the high 32 bits of the MTRRdefType MSR are reserved
and MBZ. So you can drop the _hi thing and use 0 everywhere. Or rather a
dummy = 0 var because of that damn rdmsr() macro.

Or simply use a

u64 deftype;

use the rdmsrl/wrmsrl() variants and split it when passing to
umtrr_wrmsr() because that thing wants 2 32s.

>  /**
>   * set_mtrr_state - Set the MTRR state for this CPU.
> @@ -691,6 +692,7 @@ static unsigned long set_mtrr_state(void)
>  {
>  	unsigned long change_mask = 0;
>  	unsigned int i;
> +	u32 *lo = this_cpu_ptr(&deftype_lo);

The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::

	struct long_struct_name *descriptive_name;
	unsigned long foo, bar;
	unsigned int tmp;
	int ret;

The above is faster to parse than the reverse ordering::

	int ret;
	unsigned int tmp;
	unsigned long foo, bar;
	struct long_struct_name *descriptive_name;

And even more so than random ordering::

	unsigned long foo, bar;
	int ret;
	struct long_struct_name *descriptive_name;
	unsigned int tmp;

Please check all your patches.

>  	for (i = 0; i < num_var_ranges; i++) {
>  		if (set_mtrr_var_ranges(i, &mtrr_state.var_ranges[i]))
> @@ -704,10 +706,10 @@ static unsigned long set_mtrr_state(void)
>  	 * Set_mtrr_restore restores the old value of MTRRdefType,
>  	 * so to set it we fiddle with the saved value:
>  	 */
> -	if ((deftype_lo & 0xff) != mtrr_state.def_type
> -	    || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
> +	if ((*lo & 0xff) != mtrr_state.def_type
> +	    || ((*lo & 0xc00) >> 10) != mtrr_state.enabled) {
>  
> -		deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
> +		*lo = (*lo & ~0xcff) | mtrr_state.def_type |
>  			     (mtrr_state.enabled << 10);
>  		change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
>  	}
> @@ -729,6 +731,8 @@ static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
>  static void prepare_set(void) __acquires(set_atomicity_lock)
>  {
>  	unsigned long cr0;
> +	u32 *lo = this_cpu_ptr(&deftype_lo);
> +	u32 *hi = this_cpu_ptr(&deftype_hi);

You don't need the pointers here - this_cpu_read() should be enough.

>  	/*
>  	 * Note that this is not ideal
> @@ -763,10 +767,10 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
>  	flush_tlb_local();
>  
>  	/* Save MTRR state */
> -	rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
> +	rdmsr(MSR_MTRRdefType, *lo, *hi);
>  
>  	/* Disable MTRRs, and set the default type to uncached */
> -	mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
> +	mtrr_wrmsr(MSR_MTRRdefType, *lo & ~0xcff, *hi);
>  
>  	/* Again, only flush caches if we have to. */
>  	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
> @@ -775,12 +779,15 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
>  
>  static void post_set(void) __releases(set_atomicity_lock)
>  {
> +	u32 *lo = this_cpu_ptr(&deftype_lo);
> +	u32 *hi = this_cpu_ptr(&deftype_hi);

Ditto.

Thx.
Borislav Petkov Aug. 21, 2022, 9:41 p.m. UTC | #3
On Sun, Aug 21, 2022 at 02:25:59PM +0200, Borislav Petkov wrote:
> > Fix that by using percpu variables for saving the MSR contents.
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> > ---
> > I thought adding a "Fixes:" tag for the kernel's initial git commit
> > would maybe be entertaining, but without being really helpful.
> > The percpu variables were preferred over on-stack ones in order to
> > avoid more code churn in followup patches decoupling PAT from MTRR
> > support.
> 
> So if that thing has been broken for so long and no one noticed, we
> could just as well not backport to stable at all...

Yeah, you can't do that.

The whole day today I kept thinking that something's wrong with this
here. As in, why hasn't it been reported until now.

You say above:

"... for all cpus is racy in case the MSR contents differ across cpus."

But they don't differ:

"7.7.5 MTRRs in Multi-Processing Environments

In multi-processing environments, the MTRRs located in all processors
must characterize memory in the same way. Generally, this means that
identical values are written to the MTRRs used by the processors. This
also means that values CR0.CD and the PAT must be consistent across
processors. Failure to do so may result in coherency violations or loss
of atomicity. Processor implementations do not check the MTRR settings
in other processors to ensure consistency. It is the responsibility of
system software to initialize and maintain MTRR consistency across all
processors."

And you can't have different fixed MTRR type on each CPU - that would
lead to all kinds of nasty bugs.

And here's from a big fat box:

$ rdmsr -a 0x2ff | uniq -c
    256 c00

All 256 CPUs have the def type set to the same thing.

Now, if all CPUs go write that same deftype_lo variable in the
rendezvous handler, the only issue that could happen is if a read
sees a partial write. BUT, AFAIK, x86 doesn't tear 32-bit writes so I
*think* all CPUs see the same value being corrected by using mtrr_state
previously saved on the BSP.

As I said, we should've seen this exploding left and right otherwise...

Thx.
Jürgen Groß Aug. 22, 2022, 5:17 a.m. UTC | #4
On 21.08.22 23:41, Borislav Petkov wrote:
> On Sun, Aug 21, 2022 at 02:25:59PM +0200, Borislav Petkov wrote:
>>> Fix that by using percpu variables for saving the MSR contents.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>> I thought adding a "Fixes:" tag for the kernel's initial git commit
>>> would maybe be entertaining, but without being really helpful.
>>> The percpu variables were preferred over on-stack ones in order to
>>> avoid more code churn in followup patches decoupling PAT from MTRR
>>> support.
>>
>> So if that thing has been broken for so long and no one noticed, we
>> could just as well not backport to stable at all...
> 
> Yeah, you can't do that.
> 
> The whole day today I kept thinking that something's wrong with this
> here. As in, why hasn't it been reported until now.
> 
> You say above:
> 
> "... for all cpus is racy in case the MSR contents differ across cpus."
> 
> But they don't differ:
> 
> "7.7.5 MTRRs in Multi-Processing Environments
> 
> In multi-processing environments, the MTRRs located in all processors
> must characterize memory in the same way. Generally, this means that
> identical values are written to the MTRRs used by the processors. This
> also means that values CR0.CD and the PAT must be consistent across
> processors. Failure to do so may result in coherency violations or loss
> of atomicity. Processor implementations do not check the MTRR settings
> in other processors to ensure consistency. It is the responsibility of
> system software to initialize and maintain MTRR consistency across all
> processors."
> 
> And you can't have different fixed MTRR type on each CPU - that would
> lead to all kinds of nasty bugs.
> 
> And here's from a big fat box:
> 
> $ rdmsr -a 0x2ff | uniq -c
>      256 c00
> 
> All 256 CPUs have the def type set to the same thing.
> 
> Now, if all CPUs go write that same deftype_lo variable in the
> rendezvous handler, the only issue that could happen is if a read
> sees a partial write. BUT, AFAIK, x86 doesn't tear 32-bit writes so I
> *think* all CPUs see the same value being corrected by using mtrr_state
> previously saved on the BSP.
> 
> As I said, we should've seen this exploding left and right otherwise...

And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c
which has a comment saying:

/* Some BIOS's are messed up and don't set all MTRRs the same! */

Yes, the chances are slim to hit such a box, but your reasoning suggests
I should remove the related code?


Juergen
Borislav Petkov Aug. 22, 2022, 8:28 a.m. UTC | #5
On Mon, Aug 22, 2022 at 07:17:40AM +0200, Juergen Gross wrote:
> And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c
> which has a comment saying:
> 
> /* Some BIOS's are messed up and don't set all MTRRs the same! */

That thing also says:

        pr_info("mtrr: probably your BIOS does not setup all CPUs.\n");
        pr_info("mtrr: corrected configuration.\n");

because it'll go and force on all CPUs the MTRR state it read from the
BSP in mtrr_bp_init->get_mtrr_state.

> Yes, the chances are slim to hit such a box,

Well, my workstation says:

$ dmesg | grep -i mtrr
[    0.391514] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.395199] mtrr: probably your BIOS does not setup all CPUs.
[    0.399199] mtrr: corrected configuration.

but that's the variable MTRRs.

> but your reasoning suggests I should remove the related code?

My reasoning says you should not do anything at all here - works as
advertized. :-)
Jürgen Groß Aug. 22, 2022, 8:32 a.m. UTC | #6
On 22.08.22 10:28, Borislav Petkov wrote:
> On Mon, Aug 22, 2022 at 07:17:40AM +0200, Juergen Gross wrote:
>> And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c
>> which has a comment saying:
>>
>> /* Some BIOS's are messed up and don't set all MTRRs the same! */
> 
> That thing also says:
> 
>          pr_info("mtrr: probably your BIOS does not setup all CPUs.\n");
>          pr_info("mtrr: corrected configuration.\n");
> 
> because it'll go and force on all CPUs the MTRR state it read from the
> BSP in mtrr_bp_init->get_mtrr_state.
> 
>> Yes, the chances are slim to hit such a box,
> 
> Well, my workstation says:
> 
> $ dmesg | grep -i mtrr
> [    0.391514] mtrr: your CPUs had inconsistent variable MTRR settings
> [    0.395199] mtrr: probably your BIOS does not setup all CPUs.
> [    0.399199] mtrr: corrected configuration.
> 
> but that's the variable MTRRs.
> 
>> but your reasoning suggests I should remove the related code?
> 
> My reasoning says you should not do anything at all here - works as
> advertized. :-)
> 

And what about the:

   pr_warn("mtrr: your CPUs had inconsistent MTRRdefType settings\n");

This is the case the patch would fix.


Juergen
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 558108296f3c..3d185fcf08ca 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -679,7 +679,8 @@  static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr)
 	return changed;
 }
 
-static u32 deftype_lo, deftype_hi;
+static DEFINE_PER_CPU(u32, deftype_lo);
+static DEFINE_PER_CPU(u32, deftype_hi);
 
 /**
  * set_mtrr_state - Set the MTRR state for this CPU.
@@ -691,6 +692,7 @@  static unsigned long set_mtrr_state(void)
 {
 	unsigned long change_mask = 0;
 	unsigned int i;
+	u32 *lo = this_cpu_ptr(&deftype_lo);
 
 	for (i = 0; i < num_var_ranges; i++) {
 		if (set_mtrr_var_ranges(i, &mtrr_state.var_ranges[i]))
@@ -704,10 +706,10 @@  static unsigned long set_mtrr_state(void)
 	 * Set_mtrr_restore restores the old value of MTRRdefType,
 	 * so to set it we fiddle with the saved value:
 	 */
-	if ((deftype_lo & 0xff) != mtrr_state.def_type
-	    || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
+	if ((*lo & 0xff) != mtrr_state.def_type
+	    || ((*lo & 0xc00) >> 10) != mtrr_state.enabled) {
 
-		deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
+		*lo = (*lo & ~0xcff) | mtrr_state.def_type |
 			     (mtrr_state.enabled << 10);
 		change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
 	}
@@ -729,6 +731,8 @@  static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
 static void prepare_set(void) __acquires(set_atomicity_lock)
 {
 	unsigned long cr0;
+	u32 *lo = this_cpu_ptr(&deftype_lo);
+	u32 *hi = this_cpu_ptr(&deftype_hi);
 
 	/*
 	 * Note that this is not ideal
@@ -763,10 +767,10 @@  static void prepare_set(void) __acquires(set_atomicity_lock)
 	flush_tlb_local();
 
 	/* Save MTRR state */
-	rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+	rdmsr(MSR_MTRRdefType, *lo, *hi);
 
 	/* Disable MTRRs, and set the default type to uncached */
-	mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
+	mtrr_wrmsr(MSR_MTRRdefType, *lo & ~0xcff, *hi);
 
 	/* Again, only flush caches if we have to. */
 	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
@@ -775,12 +779,15 @@  static void prepare_set(void) __acquires(set_atomicity_lock)
 
 static void post_set(void) __releases(set_atomicity_lock)
 {
+	u32 *lo = this_cpu_ptr(&deftype_lo);
+	u32 *hi = this_cpu_ptr(&deftype_hi);
+
 	/* Flush TLBs (no need to flush caches - they are disabled) */
 	count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
 	flush_tlb_local();
 
 	/* Intel (P6) standard MTRRs */
-	mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
+	mtrr_wrmsr(MSR_MTRRdefType, *lo, *hi);
 
 	/* Enable caches */
 	write_cr0(read_cr0() & ~X86_CR0_CD);