diff mbox series

[v2,05/17] x86/cpu/intel: Fix page copy performance for extended Families

Message ID 20250211194407.2577252-6-sohil.mehta@intel.com (mailing list archive)
State New
Headers show
Series Prepare for new Intel Family numbers | expand

Commit Message

Sohil Mehta Feb. 11, 2025, 7:43 p.m. UTC
X86_FEATURE_REP_GOOD is a linux defined feature flag to track whether
fast string operations should be used for copy_page(). It is also used
as a backup alternative for clear_page() if enhanced fast string
operations (ERMS) are not available.

Currently, the flag is only set for Family 6 processors. Extend the
check to include upcoming processors in Family 18 and 19.

It is uncertain whether X86_FEATURE_REP_GOOD should be set for Family 15
(Pentium 4) as well. Commit 185f3b9da24c ("x86: make intel.c have 64-bit
support code") that originally set the flag also set the
x86_cache_alignment preference for Family 15 processors in the same
commit. The omission of the Family 15 may have been intentional.

Also, move the check before a related check in early_init_intel() to
avoid resetting the flag.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>

---

v2: Separate out the REP_GOOD (copy page) specific change into a
separate commit.

From the archives, it wasn't exactly clear why the set_cpu_cap() and
clear_cpu_cap() calls for X86_FEATURE_REP_GOOD are in distinct
locations. Also, why there is a difference between 32-bit and 64-bit.
Any insight there would be useful. For now, I have kept the change
minimal based on my limited understanding.

---
 arch/x86/kernel/cpu/intel.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Dave Hansen Feb. 11, 2025, 8:53 p.m. UTC | #1
On 2/11/25 11:43, Sohil Mehta wrote:
> +	/*
> +	 * Modern CPUs are generally expected to have a sane fast string
> +	 * implementation. However, the BIOS may disable it on certain CPUs
> +	 * via the architectural FAST_STRING bit.
> +	 */
> +	if (IS_ENABLED(CONFIG_X86_64) && (c->x86 == 6 || c->x86 > 15))
> +		set_cpu_cap(c, X86_FEATURE_REP_GOOD);

I'm not sure the BIOS comment is helpful here.

Also, at this point, let's just make the check >=6 (or the >=PPRO
equivalent).

It will only matter if *all* of these are true:
1. Someone has a 64-bit capable P4 that powers on
2. They're running a 64-bit mainline kernel
3. String copy is *actually* slower than the alternative
4. They are performance sensitive enough to notice

We don't even know the answer to #3 for sure. Let's just say what we're
doing in a comment:

	/* Assume that any 64-bit CPU has a good implementation */
Andrew Cooper Feb. 12, 2025, 12:54 a.m. UTC | #2
On 11/02/2025 8:53 pm, Dave Hansen wrote:
> On 2/11/25 11:43, Sohil Mehta wrote:
>> +	/*
>> +	 * Modern CPUs are generally expected to have a sane fast string
>> +	 * implementation. However, the BIOS may disable it on certain CPUs
>> +	 * via the architectural FAST_STRING bit.
>> +	 */
>> +	if (IS_ENABLED(CONFIG_X86_64) && (c->x86 == 6 || c->x86 > 15))
>> +		set_cpu_cap(c, X86_FEATURE_REP_GOOD);
> I'm not sure the BIOS comment is helpful here.
>
> Also, at this point, let's just make the check >=6 (or the >=PPRO
> equivalent).
>
> It will only matter if *all* of these are true:
> 1. Someone has a 64-bit capable P4 that powers on
> 2. They're running a 64-bit mainline kernel
> 3. String copy is *actually* slower than the alternative
> 4. They are performance sensitive enough to notice
>
> We don't even know the answer to #3 for sure. Let's just say what we're
> doing in a comment:
>
> 	/* Assume that any 64-bit CPU has a good implementation */

If you're going to override the BIOS setting, then you need to
explicitly set MSR_MISC_ENABLE.FAST_STRINGS.

Otherwise you're claiming to Linux that REP is good even when hardware
is prohibited from using optimisations.

~Andrew
Sohil Mehta Feb. 12, 2025, 9:19 p.m. UTC | #3
On 2/11/2025 4:54 PM, Andrew Cooper wrote:

> If you're going to override the BIOS setting, then you need to
> explicitly set MSR_MISC_ENABLE.FAST_STRINGS.
> 
> Otherwise you're claiming to Linux that REP is good even when hardware
> is prohibited from using optimisations.
> 

I think the current checks have unnecessary overlap which makes them
confusing. We should be fine if we only rely on the architectural
MSR_MISC_ENABLE.FAST_STRINGS bit and rely just on the BIOS setting. My
justification is below.

The simplified version of the current checks is as follows:

Check 1 (Based on Family Model numbers):
> /*
>  * Unconditionally set REP_GOOD on early Family 6 processors
>  */
> if (IS_ENABLED(CONFIG_X86_64) &&
>     (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN))
> 	set_cpu_cap(c, X86_FEATURE_REP_GOOD);

This check is mostly redundant since it is targeted for 64 bit and very
few if any of those CPUs support 64 bit processing. I suggest that we
get rid of this check completely. The risk here is fairly limited as well.

Check 2 (Based on MISC_ENABLE.FAST_STRING):
> /*
>  * If fast string is not enabled in IA32_MISC_ENABLE for any reason,
>  * clear the fast string and enhanced fast string CPU capabilities.
>  */
> if (c->x86_vfm >= INTEL_PENTIUM_M_DOTHAN) {
> 	rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
> 	if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) {
> 		/* X86_FEATURE_ERMS will be automatically set based on CPUID */
> 		set_cpu_cap(c, X86_FEATURE_REP_GOOD);
> 	} else {
> 		pr_info("Disabled fast string operations\n");
> 		setup_clear_cpu_cap(X86_FEATURE_REP_GOOD);
> 		setup_clear_cpu_cap(X86_FEATURE_ERMS);
> 	}
> }

This is the only real check that is needed and should likely suffice in
all meaningful scenarios.

Comments?
Andrew Cooper Feb. 13, 2025, 11:02 p.m. UTC | #4
On 12/02/2025 9:19 pm, Sohil Mehta wrote:
> Check 1 (Based on Family Model numbers):
>> /*
>>  * Unconditionally set REP_GOOD on early Family 6 processors
>>  */
>> if (IS_ENABLED(CONFIG_X86_64) &&
>>     (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN))
>> 	set_cpu_cap(c, X86_FEATURE_REP_GOOD);
> This check is mostly redundant since it is targeted for 64 bit and very
> few if any of those CPUs support 64 bit processing. I suggest that we
> get rid of this check completely. The risk here is fairly limited as well.

PENTIUM_PRO is model 0x1.  M_DOTHAN isn't introduced until patch 10, but
is model 0xd.

And model 0xf (Memron) is the first 64bit capable fam6 CPU, so this is
dead code given the CONFIG_X86_64 which the compiler can't actually
optimise out.

>
> Check 2 (Based on MISC_ENABLE.FAST_STRING):
>> /*
>>  * If fast string is not enabled in IA32_MISC_ENABLE for any reason,
>>  * clear the fast string and enhanced fast string CPU capabilities.

I'd suggest that a better way of phrasing this is:

/* BIOSes typically have a knob for Fast Strings.  Honour the user's
wishes. */

>>  */
>> if (c->x86_vfm >= INTEL_PENTIUM_M_DOTHAN) {
>> 	rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
>> 	if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) {
>> 		/* X86_FEATURE_ERMS will be automatically set based on CPUID */
>> 		set_cpu_cap(c, X86_FEATURE_REP_GOOD);
>> 	} else {
>> 		pr_info("Disabled fast string operations\n");
>> 		setup_clear_cpu_cap(X86_FEATURE_REP_GOOD);
>> 		setup_clear_cpu_cap(X86_FEATURE_ERMS);
>> 	}
>> }

MSR_MISC_ENABLE exists on all 64bit CPUs, and some 32bit ones too. 
Therefore, this section alone seems to suffice in order to set up
REP_GOOD properly.

~Andrew
Sohil Mehta Feb. 14, 2025, 12:29 a.m. UTC | #5
On 2/13/2025 3:02 PM, Andrew Cooper wrote:
> On 12/02/2025 9:19 pm, Sohil Mehta wrote:
>> Check 1 (Based on Family Model numbers):
>>> /*
>>>  * Unconditionally set REP_GOOD on early Family 6 processors
>>>  */
>>> if (IS_ENABLED(CONFIG_X86_64) &&
>>>     (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN))
>>> 	set_cpu_cap(c, X86_FEATURE_REP_GOOD);
>> This check is mostly redundant since it is targeted for 64 bit and very
>> few if any of those CPUs support 64 bit processing. I suggest that we
>> get rid of this check completely. The risk here is fairly limited as well.
> 
> PENTIUM_PRO is model 0x1.  M_DOTHAN isn't introduced until patch 10, but
> is model 0xd.
> 
> And model 0xf (Memron) is the first 64bit capable fam6 CPU, so this is
> dead code given the CONFIG_X86_64 which the compiler can't actually
> optimise out.
> 

Thanks for confirming. I figured this is likely dead code but wasn't
completely sure.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index e5f34a90963e..4f8b02cbe8c5 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -297,6 +297,14 @@  static void early_init_intel(struct cpuinfo_x86 *c)
 	    c->x86_vfm <= INTEL_CORE_YONAH)
 		clear_cpu_cap(c, X86_FEATURE_PAT);
 
+	/*
+	 * Modern CPUs are generally expected to have a sane fast string
+	 * implementation. However, the BIOS may disable it on certain CPUs
+	 * via the architectural FAST_STRING bit.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64) && (c->x86 == 6 || c->x86 > 15))
+		set_cpu_cap(c, X86_FEATURE_REP_GOOD);
+
 	/*
 	 * If fast string is not enabled in IA32_MISC_ENABLE for any reason,
 	 * clear the fast string and enhanced fast string CPU capabilities.
@@ -556,8 +564,6 @@  static void init_intel(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_64
 	if (c->x86 == 15)
 		c->x86_cache_alignment = c->x86_clflush_size * 2;
-	if (c->x86 == 6)
-		set_cpu_cap(c, X86_FEATURE_REP_GOOD);
 #else
 	/*
 	 * Names for the Pentium II/Celeron processors