diff mbox series

[1/6] x86: move zmm exclusion list into CPU feature flag

Message ID 20241125041129.192999-2-ebiggers@kernel.org (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series x86: new optimized CRC functions, with VPCLMULQDQ support | expand

Commit Message

Eric Biggers Nov. 25, 2024, 4:11 a.m. UTC
From: Eric Biggers <ebiggers@google.com>

Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup
code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is
set when the CPU is on this list.

This allows other code in arch/x86/, such as the CRC library code, to
apply the same exclusion list when deciding whether to execute 256-bit
or 512-bit optimized functions.

Note that full AVX512 support including zmm registers is still exposed
to userspace and is still supported for in-kernel use.  This flag just
indicates whether in-kernel code should prefer to use ymm registers.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/aesni-intel_glue.c | 22 +---------------------
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/intel.c        | 22 ++++++++++++++++++++++
 3 files changed, 24 insertions(+), 21 deletions(-)

Comments

Ingo Molnar Nov. 25, 2024, 8:33 a.m. UTC | #1
* Eric Biggers <ebiggers@kernel.org> wrote:

> From: Eric Biggers <ebiggers@google.com>
> 
> Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup
> code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is
> set when the CPU is on this list.
> 
> This allows other code in arch/x86/, such as the CRC library code, to
> apply the same exclusion list when deciding whether to execute 256-bit
> or 512-bit optimized functions.
> 
> Note that full AVX512 support including zmm registers is still exposed
> to userspace and is still supported for in-kernel use.  This flag just
> indicates whether in-kernel code should prefer to use ymm registers.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  arch/x86/crypto/aesni-intel_glue.c | 22 +---------------------
>  arch/x86/include/asm/cpufeatures.h |  1 +
>  arch/x86/kernel/cpu/intel.c        | 22 ++++++++++++++++++++++
>  3 files changed, 24 insertions(+), 21 deletions(-)

Acked-by: Ingo Molnar <mingo@kernel.org>

I suppose you'd like to carry this in the crypto tree?

> +/*
> + * This is a list of Intel CPUs that are known to suffer from downclocking when
> + * zmm registers (512-bit vectors) are used.  On these CPUs, when the kernel
> + * executes SIMD-optimized code such as cryptography functions or CRCs, it
> + * should prefer 256-bit (ymm) code to 512-bit (zmm) code.
> + */

One speling nit, could you please do:

  s/ymm/YMM
  s/zmm/ZMM

... to make it consistent with how the rest of the x86 code is 
capitalizing the names of FPU vector register classes. Just like
we are capitalizing CPU and CRC properly ;-)

Thanks,

	Ingo
Eric Biggers Nov. 25, 2024, 6:08 p.m. UTC | #2
On Mon, Nov 25, 2024 at 09:33:46AM +0100, Ingo Molnar wrote:
> 
> * Eric Biggers <ebiggers@kernel.org> wrote:
> 
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup
> > code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is
> > set when the CPU is on this list.
> > 
> > This allows other code in arch/x86/, such as the CRC library code, to
> > apply the same exclusion list when deciding whether to execute 256-bit
> > or 512-bit optimized functions.
> > 
> > Note that full AVX512 support including zmm registers is still exposed
> > to userspace and is still supported for in-kernel use.  This flag just
> > indicates whether in-kernel code should prefer to use ymm registers.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> >  arch/x86/crypto/aesni-intel_glue.c | 22 +---------------------
> >  arch/x86/include/asm/cpufeatures.h |  1 +
> >  arch/x86/kernel/cpu/intel.c        | 22 ++++++++++++++++++++++
> >  3 files changed, 24 insertions(+), 21 deletions(-)
> 
> Acked-by: Ingo Molnar <mingo@kernel.org>
> 
> I suppose you'd like to carry this in the crypto tree?

I am planning to carry CRC-related patches myself
(https://lore.kernel.org/lkml/20241117002244.105200-12-ebiggers@kernel.org/).

> 
> > +/*
> > + * This is a list of Intel CPUs that are known to suffer from downclocking when
> > + * zmm registers (512-bit vectors) are used.  On these CPUs, when the kernel
> > + * executes SIMD-optimized code such as cryptography functions or CRCs, it
> > + * should prefer 256-bit (ymm) code to 512-bit (zmm) code.
> > + */
> 
> One speling nit, could you please do:
> 
>   s/ymm/YMM
>   s/zmm/ZMM
> 
> ... to make it consistent with how the rest of the x86 code is 
> capitalizing the names of FPU vector register classes. Just like
> we are capitalizing CPU and CRC properly ;-)
> 

Will do, thanks.

- Eric
Ingo Molnar Nov. 25, 2024, 8:25 p.m. UTC | #3
* Eric Biggers <ebiggers@kernel.org> wrote:

> On Mon, Nov 25, 2024 at 09:33:46AM +0100, Ingo Molnar wrote:
> > 
> > * Eric Biggers <ebiggers@kernel.org> wrote:
> > 
> > > From: Eric Biggers <ebiggers@google.com>
> > > 
> > > Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup
> > > code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is
> > > set when the CPU is on this list.
> > > 
> > > This allows other code in arch/x86/, such as the CRC library code, to
> > > apply the same exclusion list when deciding whether to execute 256-bit
> > > or 512-bit optimized functions.
> > > 
> > > Note that full AVX512 support including zmm registers is still exposed
> > > to userspace and is still supported for in-kernel use.  This flag just
> > > indicates whether in-kernel code should prefer to use ymm registers.
> > > 
> > > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > > ---
> > >  arch/x86/crypto/aesni-intel_glue.c | 22 +---------------------
> > >  arch/x86/include/asm/cpufeatures.h |  1 +
> > >  arch/x86/kernel/cpu/intel.c        | 22 ++++++++++++++++++++++
> > >  3 files changed, 24 insertions(+), 21 deletions(-)
> > 
> > Acked-by: Ingo Molnar <mingo@kernel.org>
> > 
> > I suppose you'd like to carry this in the crypto tree?
> 
> I am planning to carry CRC-related patches myself
> (https://lore.kernel.org/lkml/20241117002244.105200-12-ebiggers@kernel.org/).

Sounds good!

> 
> > 
> > > +/*
> > > + * This is a list of Intel CPUs that are known to suffer from downclocking when
> > > + * zmm registers (512-bit vectors) are used.  On these CPUs, when the kernel
> > > + * executes SIMD-optimized code such as cryptography functions or CRCs, it
> > > + * should prefer 256-bit (ymm) code to 512-bit (zmm) code.
> > > + */
> > 
> > One speling nit, could you please do:
> > 
> >   s/ymm/YMM
> >   s/zmm/ZMM
> > 
> > ... to make it consistent with how the rest of the x86 code is 
> > capitalizing the names of FPU vector register classes. Just like
> > we are capitalizing CPU and CRC properly ;-)
> > 
> 
> Will do, thanks.

Thank you!

	Ingo
diff mbox series

Patch

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index fbf43482e1f5e..8e648abfb5ab8 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1534,30 +1534,10 @@  DEFINE_GCM_ALGS(vaes_avx10_256, FLAG_AVX10_256,
 DEFINE_GCM_ALGS(vaes_avx10_512, FLAG_AVX10_512,
 		"generic-gcm-vaes-avx10_512", "rfc4106-gcm-vaes-avx10_512",
 		AES_GCM_KEY_AVX10_SIZE, 800);
 #endif /* CONFIG_AS_VAES && CONFIG_AS_VPCLMULQDQ */
 
-/*
- * This is a list of CPU models that are known to suffer from downclocking when
- * zmm registers (512-bit vectors) are used.  On these CPUs, the AES mode
- * implementations with zmm registers won't be used by default.  Implementations
- * with ymm registers (256-bit vectors) will be used by default instead.
- */
-static const struct x86_cpu_id zmm_exclusion_list[] = {
-	X86_MATCH_VFM(INTEL_SKYLAKE_X,		0),
-	X86_MATCH_VFM(INTEL_ICELAKE_X,		0),
-	X86_MATCH_VFM(INTEL_ICELAKE_D,		0),
-	X86_MATCH_VFM(INTEL_ICELAKE,		0),
-	X86_MATCH_VFM(INTEL_ICELAKE_L,		0),
-	X86_MATCH_VFM(INTEL_ICELAKE_NNPI,	0),
-	X86_MATCH_VFM(INTEL_TIGERLAKE_L,	0),
-	X86_MATCH_VFM(INTEL_TIGERLAKE,		0),
-	/* Allow Rocket Lake and later, and Sapphire Rapids and later. */
-	/* Also allow AMD CPUs (starting with Zen 4, the first with AVX-512). */
-	{},
-};
-
 static int __init register_avx_algs(void)
 {
 	int err;
 
 	if (!boot_cpu_has(X86_FEATURE_AVX))
@@ -1598,11 +1578,11 @@  static int __init register_avx_algs(void)
 					 ARRAY_SIZE(aes_gcm_algs_vaes_avx10_256),
 					 aes_gcm_simdalgs_vaes_avx10_256);
 	if (err)
 		return err;
 
-	if (x86_match_cpu(zmm_exclusion_list)) {
+	if (boot_cpu_has(X86_FEATURE_PREFER_YMM)) {
 		int i;
 
 		aes_xts_alg_vaes_avx10_512.base.cra_priority = 1;
 		for (i = 0; i < ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512); i++)
 			aes_gcm_algs_vaes_avx10_512[i].base.cra_priority = 1;
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 17b6590748c00..948bfa25ccc7b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -477,10 +477,11 @@ 
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */
 #define X86_FEATURE_AMD_FAST_CPPC	(21*32 + 5) /* Fast CPPC */
 #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
 #define X86_FEATURE_AMD_WORKLOAD_CLASS	(21*32 + 7) /* Workload Classification */
+#define X86_FEATURE_PREFER_YMM		(21*32 + 8) /* Avoid zmm registers due to downclocking */
 
 /*
  * BUG word(s)
  */
 #define X86_BUG(x)			(NCAPINTS*32 + (x))
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index d1de300af1737..0beb44c4ac026 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -519,10 +519,29 @@  static void init_intel_misc_features(struct cpuinfo_x86 *c)
 
 	msr = this_cpu_read(msr_misc_features_shadow);
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+/*
+ * This is a list of Intel CPUs that are known to suffer from downclocking when
+ * zmm registers (512-bit vectors) are used.  On these CPUs, when the kernel
+ * executes SIMD-optimized code such as cryptography functions or CRCs, it
+ * should prefer 256-bit (ymm) code to 512-bit (zmm) code.
+ */
+static const struct x86_cpu_id zmm_exclusion_list[] = {
+	X86_MATCH_VFM(INTEL_SKYLAKE_X,		0),
+	X86_MATCH_VFM(INTEL_ICELAKE_X,		0),
+	X86_MATCH_VFM(INTEL_ICELAKE_D,		0),
+	X86_MATCH_VFM(INTEL_ICELAKE,		0),
+	X86_MATCH_VFM(INTEL_ICELAKE_L,		0),
+	X86_MATCH_VFM(INTEL_ICELAKE_NNPI,	0),
+	X86_MATCH_VFM(INTEL_TIGERLAKE_L,	0),
+	X86_MATCH_VFM(INTEL_TIGERLAKE,		0),
+	/* Allow Rocket Lake and later, and Sapphire Rapids and later. */
+	{},
+};
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
 
 	intel_workarounds(c);
@@ -602,10 +621,13 @@  static void init_intel(struct cpuinfo_x86 *c)
 		set_cpu_cap(c, X86_FEATURE_P4);
 	if (c->x86 == 6)
 		set_cpu_cap(c, X86_FEATURE_P3);
 #endif
 
+	if (x86_match_cpu(zmm_exclusion_list))
+		set_cpu_cap(c, X86_FEATURE_PREFER_YMM);
+
 	/* Work around errata */
 	srat_detect_node(c);
 
 	init_ia32_feat_ctl(c);