diff mbox series

arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()

Message ID 20241214-arm64-fix-boot-cpu-smidr-v1-1-0745c40772dd@kernel.org (mailing list archive)
State New
Headers show
Series arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu() | expand

Commit Message

Mark Brown Dec. 14, 2024, 12:52 a.m. UTC
In commit 892f7237b3ff ("arm64: Delay initialisation of
cpuinfo_arm64::reg_{zcr,smcr}") we moved access to ZCR, SMCR and SMIDR
later in the boot process in order to ensure that we don't attempt to
interact with them if SVE or SME is disabled on the command line.
Unfortunately when initialising the boot CPU in init_cpu_features() we work
on a copy of the struct cpuinfo_arm64 for the boot CPU used only during
boot, not the percpu copy used by the sysfs code.

Fix this by moving the handling for SMIDR_EL1 for the boot CPU to
cpuinfo_store_boot_cpu() so it can operate on the percpu copy of the data.
This reduces the potential for error that could come from having both the
percpu and boot CPU copies in init_cpu_features().

This issue wasn't apparent when testing on emulated platforms that do not
report values in this ID register.

Fixes: 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
---
 arch/arm64/kernel/cpufeature.c |  6 ------
 arch/arm64/kernel/cpuinfo.c    | 11 +++++++++++
 2 files changed, 11 insertions(+), 6 deletions(-)


---
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
change-id: 20241213-arm64-fix-boot-cpu-smidr-386b8db292b2

Best regards,

Comments

Marc Zyngier Dec. 14, 2024, 10:56 a.m. UTC | #1
[+ Mark]
On Sat, 14 Dec 2024 00:52:08 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> In commit 892f7237b3ff ("arm64: Delay initialisation of
> cpuinfo_arm64::reg_{zcr,smcr}") we moved access to ZCR, SMCR and SMIDR
> later in the boot process in order to ensure that we don't attempt to
> interact with them if SVE or SME is disabled on the command line.
> Unfortunately when initialising the boot CPU in init_cpu_features() we work
> on a copy of the struct cpuinfo_arm64 for the boot CPU used only during
> boot, not the percpu copy used by the sysfs code.
> 
> Fix this by moving the handling for SMIDR_EL1 for the boot CPU to
> cpuinfo_store_boot_cpu() so it can operate on the percpu copy of the data.
> This reduces the potential for error that could come from having both the
> percpu and boot CPU copies in init_cpu_features().
> 
> This issue wasn't apparent when testing on emulated platforms that do not
> report values in this ID register.
> 
> Fixes: 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
> Signed-off-by: Mark Brown <broonie@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  arch/arm64/kernel/cpufeature.c |  6 ------
>  arch/arm64/kernel/cpuinfo.c    | 11 +++++++++++
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6ce71f444ed84f9056196bb21bbfac61c9687e30..b88102fd2c20f77e25af6df513fda09a484e882e 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
>  	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
>  		unsigned long cpacr = cpacr_save_enable_kernel_sme();
>  
> -		/*
> -		 * We mask out SMPS since even if the hardware
> -		 * supports priorities the kernel does not at present
> -		 * and we block access to them.
> -		 */
> -		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
>  		vec_init_vq_map(ARM64_VEC_SME);
>  
>  		cpacr_restore(cpacr);
> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> index d79e88fccdfce427507e7a34c5959ce6309cbd12..b7d403da71e5a01ed3943eb37e7a00af238771a2 100644
> --- a/arch/arm64/kernel/cpuinfo.c
> +++ b/arch/arm64/kernel/cpuinfo.c
> @@ -499,4 +499,15 @@ void __init cpuinfo_store_boot_cpu(void)
>  
>  	boot_cpu_data = *info;
>  	init_cpu_features(&boot_cpu_data);
> +
> +	/* SMIDR_EL1 needs to be stored in the percpu data for sysfs */
> +	if (IS_ENABLED(CONFIG_ARM64_SME) &&
> +	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
> +		/*
> +		 * We mask out SMPS since even if the hardware
> +		 * supports priorities the kernel does not at present
> +		 * and we block access to them.
> +		 */
> +		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
> +	}
>  }

I don't understand the need to single out SMIDR_EL1. It seems to only
make things even more fragile than they already are by adding more
synchronisation phases.

Why isn't the following a good enough fix? It makes it plain that
boot_cpu_data is only a copy of CPU0's initial boot state.

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index d79e88fccdfce..0cbb42fd48850 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -497,6 +497,6 @@ void __init cpuinfo_store_boot_cpu(void)
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0);
 	__cpuinfo_store_cpu(info);
 
+	init_cpu_features(info);
 	boot_cpu_data = *info;
-	init_cpu_features(&boot_cpu_data);
 }


Thanks,

	M.
Mark Brown Dec. 16, 2024, 12:17 p.m. UTC | #2
On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:

> I don't understand the need to single out SMIDR_EL1. It seems to only
> make things even more fragile than they already are by adding more
> synchronisation phases.

> Why isn't the following a good enough fix? It makes it plain that
> boot_cpu_data is only a copy of CPU0's initial boot state.

That would work but it's not clear to me that that is what the intent is
here.  The current ordering seemed like a strange enough decision to be
deliberate, though I couldn't identify the reasoning.
Mark Rutland Dec. 16, 2024, 12:38 p.m. UTC | #3
On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:
> [+ Mark]
> On Sat, 14 Dec 2024 00:52:08 +0000,
> Mark Brown <broonie@kernel.org> wrote:
> > 
> > In commit 892f7237b3ff ("arm64: Delay initialisation of
> > cpuinfo_arm64::reg_{zcr,smcr}") we moved access to ZCR, SMCR and SMIDR
> > later in the boot process in order to ensure that we don't attempt to
> > interact with them if SVE or SME is disabled on the command line.
> > Unfortunately when initialising the boot CPU in init_cpu_features() we work
> > on a copy of the struct cpuinfo_arm64 for the boot CPU used only during
> > boot, not the percpu copy used by the sysfs code.
> > 
> > Fix this by moving the handling for SMIDR_EL1 for the boot CPU to
> > cpuinfo_store_boot_cpu() so it can operate on the percpu copy of the data.
> > This reduces the potential for error that could come from having both the
> > percpu and boot CPU copies in init_cpu_features().
> > 
> > This issue wasn't apparent when testing on emulated platforms that do not
> > report values in this ID register.
> > 
> > Fixes: 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
> > Signed-off-by: Mark Brown <broonie@kernel.org>
> > Cc: stable@vger.kernel.org
> > ---
> >  arch/arm64/kernel/cpufeature.c |  6 ------
> >  arch/arm64/kernel/cpuinfo.c    | 11 +++++++++++
> >  2 files changed, 11 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 6ce71f444ed84f9056196bb21bbfac61c9687e30..b88102fd2c20f77e25af6df513fda09a484e882e 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
> >  	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
> >  		unsigned long cpacr = cpacr_save_enable_kernel_sme();
> >  
> > -		/*
> > -		 * We mask out SMPS since even if the hardware
> > -		 * supports priorities the kernel does not at present
> > -		 * and we block access to them.
> > -		 */
> > -		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
> >  		vec_init_vq_map(ARM64_VEC_SME);
> >  
> >  		cpacr_restore(cpacr);
> > diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> > index d79e88fccdfce427507e7a34c5959ce6309cbd12..b7d403da71e5a01ed3943eb37e7a00af238771a2 100644
> > --- a/arch/arm64/kernel/cpuinfo.c
> > +++ b/arch/arm64/kernel/cpuinfo.c
> > @@ -499,4 +499,15 @@ void __init cpuinfo_store_boot_cpu(void)
> >  
> >  	boot_cpu_data = *info;
> >  	init_cpu_features(&boot_cpu_data);
> > +
> > +	/* SMIDR_EL1 needs to be stored in the percpu data for sysfs */
> > +	if (IS_ENABLED(CONFIG_ARM64_SME) &&
> > +	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
> > +		/*
> > +		 * We mask out SMPS since even if the hardware
> > +		 * supports priorities the kernel does not at present
> > +		 * and we block access to them.
> > +		 */
> > +		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
> > +	}
> >  }
> 
> I don't understand the need to single out SMIDR_EL1. It seems to only
> make things even more fragile than they already are by adding more
> synchronisation phases.
> 
> Why isn't the following a good enough fix? It makes it plain that
> boot_cpu_data is only a copy of CPU0's initial boot state.
> 
> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> index d79e88fccdfce..0cbb42fd48850 100644
> --- a/arch/arm64/kernel/cpuinfo.c
> +++ b/arch/arm64/kernel/cpuinfo.c
> @@ -497,6 +497,6 @@ void __init cpuinfo_store_boot_cpu(void)
>  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0);
>  	__cpuinfo_store_cpu(info);
>  
> +	init_cpu_features(info);
>  	boot_cpu_data = *info;
> -	init_cpu_features(&boot_cpu_data);
>  }

I think that change in isolation is fine, but I don't think that's the
right fix.

I think that what we did in commit:

   892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")

... introduces an anti-pattern that'd be nice to avoid. That broke the
existing split of __cpuinfo_store_cpu() and init_cpu_features(), where
the former read the ID regs, and the latter set up the features
*without* altering the copy of the ID regs that was read. i.e.
init_cpu_features() shouldn't write to its info argument at all.

I understand that we have to do something as a bodge for broken FW which
traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().

Can we add something to check whether SME was disabled on the command
line, and use that in __cpuinfo_store_cpu(), effectively reverting
892f7237b3ff?

Mark.
Mark Rutland Dec. 16, 2024, 12:44 p.m. UTC | #4
On Mon, Dec 16, 2024 at 12:17:54PM +0000, Mark Brown wrote:
> On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:
> 
> > I don't understand the need to single out SMIDR_EL1. It seems to only
> > make things even more fragile than they already are by adding more
> > synchronisation phases.
> 
> > Why isn't the following a good enough fix? It makes it plain that
> > boot_cpu_data is only a copy of CPU0's initial boot state.
> 
> That would work but it's not clear to me that that is what the intent is
> here.  The current ordering seemed like a strange enough decision to be
> deliberate, though I couldn't identify the reasoning.

The original intent was that __cpuinfo_store_cpu() read *all* of a CPU's
implemented ID regs, and init_cpu_features() initialised the expected
system features based on the boot CPU's ID regs.

The expectation was that init_cpu_features() would only consume the
register values, and would not alter the cpuinfo_arm64 values, so the
order of:

	boot_cpu_data = *info;
	init_cpu_features(&boot_cpu_data);

... didn't matter either way, and using '&boot_cpu_data' was intended to
make it clear that the features were based on the boot CPU's info, even
if you just grepped for that and didn't see the surrounding context.

I think the real fix here is to move the reading back into
__cpuinfo_store_cpu(), but to have an explicit check that SME has been
disabled on the commandline, with a comment explaining that this is a
bodge for broken FW which traps the SME ID regs.

Mark.
Mark Brown Dec. 16, 2024, 1:23 p.m. UTC | #5
On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:

> ... didn't matter either way, and using '&boot_cpu_data' was intended to
> make it clear that the features were based on the boot CPU's info, even
> if you just grepped for that and didn't see the surrounding context.

Right, that was my best guess as to what was supposed to be going on
but it wasn't super clear.  The code could use some more comments.

> I think the real fix here is to move the reading back into
> __cpuinfo_store_cpu(), but to have an explicit check that SME has been
> disabled on the commandline, with a comment explaining that this is a
> bodge for broken FW which traps the SME ID regs.

That should be doable.

There's a few other similar ID registers (eg, we already read GMID_EL1
and MPAMIDR_EL1) make me a bit nervous that we might need to generalise
it a bit, but we can deal with that if it comes up.  Even for SME the
disable was added speculatively, the factors that made this come up for
SVE are less likely to be an issue with SME.
Marc Zyngier Dec. 16, 2024, 2:31 p.m. UTC | #6
On Mon, 16 Dec 2024 12:38:17 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Sat, Dec 14, 2024 at 10:56:13AM +0000, Marc Zyngier wrote:
>
> > Why isn't the following a good enough fix? It makes it plain that
> > boot_cpu_data is only a copy of CPU0's initial boot state.
> > 
> > diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> > index d79e88fccdfce..0cbb42fd48850 100644
> > --- a/arch/arm64/kernel/cpuinfo.c
> > +++ b/arch/arm64/kernel/cpuinfo.c
> > @@ -497,6 +497,6 @@ void __init cpuinfo_store_boot_cpu(void)
> >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0);
> >  	__cpuinfo_store_cpu(info);
> >  
> > +	init_cpu_features(info);
> >  	boot_cpu_data = *info;
> > -	init_cpu_features(&boot_cpu_data);
> >  }
> 
> I think that change in isolation is fine, but I don't think that's the
> right fix.
> 
> I think that what we did in commit:
> 
>    892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
> 
> ... introduces an anti-pattern that'd be nice to avoid. That broke the
> existing split of __cpuinfo_store_cpu() and init_cpu_features(), where
> the former read the ID regs, and the latter set up the features
> *without* altering the copy of the ID regs that was read. i.e.
> init_cpu_features() shouldn't write to its info argument at all.
> 
> I understand that we have to do something as a bodge for broken FW which
> traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().

Honestly, I'd rather revert that patch, together with b3000e2133d8
("arm64: Add the arm64.nosme command line option"). I'm getting tired
of the FW nonsense, and we are only allowing vendors to ship untested
crap.

Furthermore, given the state of SME in the kernel, I don't think this
is makes any difference. So maybe this is the right time to reset
everything to a sane state.

> Can we add something to check whether SME was disabled on the command
> line, and use that in __cpuinfo_store_cpu(), effectively reverting
> 892f7237b3ff?

Maybe, but that'd be before any sanitisation of the overrides, so it
would have to severely limit its scope. Something like this, which I
haven't tested:

diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index d79e88fccdfce..9e9295e045009 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -492,10 +492,22 @@ void cpuinfo_store_cpu(void)
 	update_cpu_features(smp_processor_id(), info, &boot_cpu_data);
 }
 
+static void cpuinfo_apply_overrides(struct cpuinfo_arm64 *info)
+{
+	if (FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.mask) &&
+	    !FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.val))
+		info->reg_id_aa64pfr0 &= ~ID_AA64PFR0_EL1_SVE;
+
+	if (FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.mask) &&
+	    !FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.val))
+		info->reg_id_aa64pfr1 &= ~ID_AA64PFR1_EL1_SME;
+}
+
 void __init cpuinfo_store_boot_cpu(void)
 {
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0);
 	__cpuinfo_store_cpu(info);
+	cpuinfo_apply_overrides(info);
 
 	boot_cpu_data = *info;
 	init_cpu_features(&boot_cpu_data);

But this will have ripple effects on the rest of the override code
(the kernel messages are likely to be wrong).

	M.
Mark Rutland Dec. 16, 2024, 2:31 p.m. UTC | #7
On Mon, Dec 16, 2024 at 01:23:55PM +0000, Mark Brown wrote:
> On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:
> 
> > ... didn't matter either way, and using '&boot_cpu_data' was intended to
> > make it clear that the features were based on the boot CPU's info, even
> > if you just grepped for that and didn't see the surrounding context.
> 
> Right, that was my best guess as to what was supposed to be going on
> but it wasn't super clear.  The code could use some more comments.
> 
> > I think the real fix here is to move the reading back into
> > __cpuinfo_store_cpu(), but to have an explicit check that SME has been
> > disabled on the commandline, with a comment explaining that this is a
> > bodge for broken FW which traps the SME ID regs.
> 
> That should be doable.
> 
> There's a few other similar ID registers (eg, we already read GMID_EL1
> and MPAMIDR_EL1) make me a bit nervous that we might need to generalise
> it a bit, but we can deal with that if it comes up.  Even for SME the
> disable was added speculatively, the factors that made this come up for
> SVE are less likely to be an issue with SME.

FWIW, I had a quick go (with only the SME case), and I think the shape
that we want is roughly as below, which I think is easy to generalise to
those other cases.

MarcZ, thoughts?

Mark.

---->8----
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 8b4e5a3cd24c8..f16eb99c10723 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -91,6 +91,16 @@ struct arm64_ftr_override {
 	u64		mask;
 };
 
+static inline u64
+arm64_ftr_override_apply(const struct arm64_ftr_override *override,
+			 u64 val)
+{
+	val &= ~override->mask;
+	val |= override->val & override->mask;
+
+	return val;
+}
+
 /*
  * @arm64_ftr_reg - Feature register
  * @strict_mask		Bits which should match across all CPUs for sanity.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6ce71f444ed84..faad7d3e4cf5f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
 		unsigned long cpacr = cpacr_save_enable_kernel_sme();
 
-		/*
-		 * We mask out SMPS since even if the hardware
-		 * supports priorities the kernel does not at present
-		 * and we block access to them.
-		 */
-		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
 		vec_init_vq_map(ARM64_VEC_SME);
 
 		cpacr_restore(cpacr);
@@ -1550,10 +1544,8 @@ u64 __read_sysreg_by_encoding(u32 sys_id)
 	}
 
 	regp  = get_arm64_ftr_reg(sys_id);
-	if (regp) {
-		val &= ~regp->override->mask;
-		val |= (regp->override->val & regp->override->mask);
-	}
+	if (regp)
+		val = arm64_ftr_override_apply(regp->override, val);
 
 	return val;
 }
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index d79e88fccdfce..1460e3541132f 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -439,6 +439,24 @@ static void __cpuinfo_store_cpu_32bit(struct cpuinfo_32bit *info)
 	info->reg_mvfr2 = read_cpuid(MVFR2_EL1);
 }
 
+static void __cpuinfo_store_cpu_sme(struct cpuinfo_arm64 *info)
+{
+	/*
+	 * TODO: explain that this bodge is to avoid trapping.
+	 */
+	u64 pfr1 = info->reg_id_aa64pfr1;
+	pfr1 = arm64_ftr_override_apply(&id_aa64pfr1_override, pfr1);
+	if (!id_aa64pfr1_sme(pfr1))
+		return;
+
+	/*
+	 * We mask out SMPS since even if the hardware
+	 * supports priorities the kernel does not at present
+	 * and we block access to them.
+	 */
+	info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
+}
+
 static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 {
 	info->reg_cntfrq = arch_timer_get_cntfrq();
@@ -482,6 +500,8 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0))
 		info->reg_mpamidr = read_cpuid(MPAMIDR_EL1);
 
+	__cpuinfo_store_cpu_sme(info);
+
 	cpuinfo_detect_icache_policy(info);
 }
Marc Zyngier Dec. 16, 2024, 2:44 p.m. UTC | #8
On Mon, 16 Dec 2024 14:31:47 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Mon, Dec 16, 2024 at 01:23:55PM +0000, Mark Brown wrote:
> > On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:
> > 
> > > ... didn't matter either way, and using '&boot_cpu_data' was intended to
> > > make it clear that the features were based on the boot CPU's info, even
> > > if you just grepped for that and didn't see the surrounding context.
> > 
> > Right, that was my best guess as to what was supposed to be going on
> > but it wasn't super clear.  The code could use some more comments.
> > 
> > > I think the real fix here is to move the reading back into
> > > __cpuinfo_store_cpu(), but to have an explicit check that SME has been
> > > disabled on the commandline, with a comment explaining that this is a
> > > bodge for broken FW which traps the SME ID regs.
> > 
> > That should be doable.
> > 
> > There's a few other similar ID registers (eg, we already read GMID_EL1
> > and MPAMIDR_EL1) make me a bit nervous that we might need to generalise
> > it a bit, but we can deal with that if it comes up.  Even for SME the
> > disable was added speculatively, the factors that made this come up for
> > SVE are less likely to be an issue with SME.
> 
> FWIW, I had a quick go (with only the SME case), and I think the shape
> that we want is roughly as below, which I think is easy to generalise to
> those other cases.
> 
> MarcZ, thoughts?
> 
> Mark.
> 
> ---->8----
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 8b4e5a3cd24c8..f16eb99c10723 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -91,6 +91,16 @@ struct arm64_ftr_override {
>  	u64		mask;
>  };
>  
> +static inline u64
> +arm64_ftr_override_apply(const struct arm64_ftr_override *override,
> +			 u64 val)
> +{
> +	val &= ~override->mask;
> +	val |= override->val & override->mask;
> +
> +	return val;
> +}
> +
>  /*
>   * @arm64_ftr_reg - Feature register
>   * @strict_mask		Bits which should match across all CPUs for sanity.
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6ce71f444ed84..faad7d3e4cf5f 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1167,12 +1167,6 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
>  	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
>  		unsigned long cpacr = cpacr_save_enable_kernel_sme();
>  
> -		/*
> -		 * We mask out SMPS since even if the hardware
> -		 * supports priorities the kernel does not at present
> -		 * and we block access to them.
> -		 */
> -		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
>  		vec_init_vq_map(ARM64_VEC_SME);
>  
>  		cpacr_restore(cpacr);
> @@ -1550,10 +1544,8 @@ u64 __read_sysreg_by_encoding(u32 sys_id)
>  	}
>  
>  	regp  = get_arm64_ftr_reg(sys_id);
> -	if (regp) {
> -		val &= ~regp->override->mask;
> -		val |= (regp->override->val & regp->override->mask);
> -	}
> +	if (regp)
> +		val = arm64_ftr_override_apply(regp->override, val);
>  
>  	return val;
>  }
> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> index d79e88fccdfce..1460e3541132f 100644
> --- a/arch/arm64/kernel/cpuinfo.c
> +++ b/arch/arm64/kernel/cpuinfo.c
> @@ -439,6 +439,24 @@ static void __cpuinfo_store_cpu_32bit(struct cpuinfo_32bit *info)
>  	info->reg_mvfr2 = read_cpuid(MVFR2_EL1);
>  }
>  
> +static void __cpuinfo_store_cpu_sme(struct cpuinfo_arm64 *info)
> +{
> +	/*
> +	 * TODO: explain that this bodge is to avoid trapping.
> +	 */
> +	u64 pfr1 = info->reg_id_aa64pfr1;
> +	pfr1 = arm64_ftr_override_apply(&id_aa64pfr1_override, pfr1);
> +	if (!id_aa64pfr1_sme(pfr1))
> +		return;

I don't think blindly applying the override at this stage is a good
thing. Specially not the whole register, and definitely not
non-disabling values.

It needs to be done on a per feature basis, and only to disable
things.

See the hack I posted for the things I think need checking.

	M.
Mark Brown Dec. 16, 2024, 3:05 p.m. UTC | #9
On Mon, Dec 16, 2024 at 02:31:43PM +0000, Marc Zyngier wrote:
> Mark Rutland <mark.rutland@arm.com> wrote:

> > I understand that we have to do something as a bodge for broken FW which
> > traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().

> Honestly, I'd rather revert that patch, together with b3000e2133d8
> ("arm64: Add the arm64.nosme command line option"). I'm getting tired
> of the FW nonsense, and we are only allowing vendors to ship untested
> crap.

I'd certainly be happy to remove the override for SME, the circumstances
that lead to the need to override SVE are much less likely to occur with
SME.  We can add it again later if there's a need for it.

> Furthermore, given the state of SME in the kernel, I don't think this
> is makes any difference. So maybe this is the right time to reset
> everything to a sane state.

I'm not aware of any issues that don't have fixes on the list (the fixes
have all been on the list for about a month, apart from this one).
Mark Rutland Dec. 16, 2024, 3:07 p.m. UTC | #10
On Mon, Dec 16, 2024 at 02:31:43PM +0000, Marc Zyngier wrote:
> On Mon, 16 Dec 2024 12:38:17 +0000,
> Mark Rutland <mark.rutland@arm.com> wrote:
> > I think that what we did in commit:
> > 
> >    892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
> > 
> > ... introduces an anti-pattern that'd be nice to avoid. That broke the
> > existing split of __cpuinfo_store_cpu() and init_cpu_features(), where
> > the former read the ID regs, and the latter set up the features
> > *without* altering the copy of the ID regs that was read. i.e.
> > init_cpu_features() shouldn't write to its info argument at all.
> > 
> > I understand that we have to do something as a bodge for broken FW which
> > traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().
> 
> Honestly, I'd rather revert that patch, together with b3000e2133d8
> ("arm64: Add the arm64.nosme command line option"). I'm getting tired
> of the FW nonsense, and we are only allowing vendors to ship untested
> crap.
> 
> Furthermore, given the state of SME in the kernel, I don't think this
> is makes any difference. So maybe this is the right time to reset
> everything to a sane state.

Looking again, a revert does look to be the best option.

We removed reg_zcr and reg_smcr in v6.7 in commits:

  abef0695f9665c3d ("arm64/sve: Remove ZCR pseudo register from cpufeature code")
  391208485c3ad50f ("arm64/sve: Remove SMCR pseudo register from cpufeature code")

As of those commits, ZCR and SCMR no longer matter to
__cpuinfo_store_cpu(), and only SMIDR_EL1 remains...

Per ARM DDI 0487 L.a, accesses to SMIDR_EL1 never trap to EL3, so we can
read that safely as long as ID_AA64PFR1_EL1.SME indicates that SME is
implemented.

Which is to say that if we revert the remaining portion of 892f7237b3ff
and restore the read of SMIDR, that should be good as far back as v6.7,
which sounds good to me.

Mark.

> > Can we add something to check whether SME was disabled on the command
> > line, and use that in __cpuinfo_store_cpu(), effectively reverting
> > 892f7237b3ff?
> 
> Maybe, but that'd be before any sanitisation of the overrides, so it
> would have to severely limit its scope. Something like this, which I
> haven't tested:
> 
> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> index d79e88fccdfce..9e9295e045009 100644
> --- a/arch/arm64/kernel/cpuinfo.c
> +++ b/arch/arm64/kernel/cpuinfo.c
> @@ -492,10 +492,22 @@ void cpuinfo_store_cpu(void)
>  	update_cpu_features(smp_processor_id(), info, &boot_cpu_data);
>  }
>  
> +static void cpuinfo_apply_overrides(struct cpuinfo_arm64 *info)
> +{
> +	if (FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.mask) &&
> +	    !FIELD_GET(ID_AA64PFR0_EL1_SVE, id_aa64pfr0_override.val))
> +		info->reg_id_aa64pfr0 &= ~ID_AA64PFR0_EL1_SVE;
> +
> +	if (FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.mask) &&
> +	    !FIELD_GET(ID_AA64PFR1_EL1_SME, id_aa64pfr1_override.val))
> +		info->reg_id_aa64pfr1 &= ~ID_AA64PFR1_EL1_SME;
> +}
> +
>  void __init cpuinfo_store_boot_cpu(void)
>  {
>  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, 0);
>  	__cpuinfo_store_cpu(info);
> +	cpuinfo_apply_overrides(info);
>  
>  	boot_cpu_data = *info;
>  	init_cpu_features(&boot_cpu_data);
> 
> But this will have ripple effects on the rest of the override code
> (the kernel messages are likely to be wrong).
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
Mark Rutland Dec. 16, 2024, 3:11 p.m. UTC | #11
On Mon, Dec 16, 2024 at 02:44:07PM +0000, Marc Zyngier wrote:
> On Mon, 16 Dec 2024 14:31:47 +0000,
> Mark Rutland <mark.rutland@arm.com> wrote:
> > 
> > On Mon, Dec 16, 2024 at 01:23:55PM +0000, Mark Brown wrote:
> > > On Mon, Dec 16, 2024 at 12:44:14PM +0000, Mark Rutland wrote:
> > > 
> > > > ... didn't matter either way, and using '&boot_cpu_data' was intended to
> > > > make it clear that the features were based on the boot CPU's info, even
> > > > if you just grepped for that and didn't see the surrounding context.
> > > 
> > > Right, that was my best guess as to what was supposed to be going on
> > > but it wasn't super clear.  The code could use some more comments.
> > > 
> > > > I think the real fix here is to move the reading back into
> > > > __cpuinfo_store_cpu(), but to have an explicit check that SME has been
> > > > disabled on the commandline, with a comment explaining that this is a
> > > > bodge for broken FW which traps the SME ID regs.
> > > 
> > > That should be doable.
> > > 
> > > There's a few other similar ID registers (eg, we already read GMID_EL1
> > > and MPAMIDR_EL1) make me a bit nervous that we might need to generalise
> > > it a bit, but we can deal with that if it comes up.  Even for SME the
> > > disable was added speculatively, the factors that made this come up for
> > > SVE are less likely to be an issue with SME.
> > 
> > FWIW, I had a quick go (with only the SME case), and I think the shape
> > that we want is roughly as below, which I think is easy to generalise to
> > those other cases.
> > 
> > MarcZ, thoughts?
> > 
> > Mark.

[... dodgy patch was here ...]

> I don't think blindly applying the override at this stage is a good
> thing. Specially not the whole register, and definitely not
> non-disabling values.
> 
> It needs to be done on a per feature basis, and only to disable
> things.
> 
> See the hack I posted for the things I think need checking.

Understood; sorry for the noise -- we raced when replying and I only
spotted your reply after sending this. I think I'm more in favour of the
revert option now; I've repled with more details at:

  https://lore.kernel.org/linux-arm-kernel/Z2BCI61c9QWG7mMB@J2N7QTR9R3.cambridge.arm.com/T/#m8d6ace8d6201591ca72d51cf14c4a605e7d98a88

Mark.
Mark Brown Dec. 16, 2024, 3:21 p.m. UTC | #12
On Mon, Dec 16, 2024 at 03:07:15PM +0000, Mark Rutland wrote:

> Looking again, a revert does look to be the best option.

Since we all seem to agree I'll send a patch doing this.
Marc Zyngier Dec. 16, 2024, 3:28 p.m. UTC | #13
On Mon, 16 Dec 2024 15:07:15 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Mon, Dec 16, 2024 at 02:31:43PM +0000, Marc Zyngier wrote:
> > On Mon, 16 Dec 2024 12:38:17 +0000,
> > Mark Rutland <mark.rutland@arm.com> wrote:
> > > I think that what we did in commit:
> > > 
> > >    892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
> > > 
> > > ... introduces an anti-pattern that'd be nice to avoid. That broke the
> > > existing split of __cpuinfo_store_cpu() and init_cpu_features(), where
> > > the former read the ID regs, and the latter set up the features
> > > *without* altering the copy of the ID regs that was read. i.e.
> > > init_cpu_features() shouldn't write to its info argument at all.
> > > 
> > > I understand that we have to do something as a bodge for broken FW which
> > > traps SME, but I'd much rather we did that within __cpuinfo_store_cpu().
> > 
> > Honestly, I'd rather revert that patch, together with b3000e2133d8
> > ("arm64: Add the arm64.nosme command line option"). I'm getting tired
> > of the FW nonsense, and we are only allowing vendors to ship untested
> > crap.
> > 
> > Furthermore, given the state of SME in the kernel, I don't think this
> > is makes any difference. So maybe this is the right time to reset
> > everything to a sane state.
> 
> Looking again, a revert does look to be the best option.
> 
> We removed reg_zcr and reg_smcr in v6.7 in commits:
> 
>   abef0695f9665c3d ("arm64/sve: Remove ZCR pseudo register from cpufeature code")
>   391208485c3ad50f ("arm64/sve: Remove SMCR pseudo register from cpufeature code")
> 
> As of those commits, ZCR and SCMR no longer matter to
> __cpuinfo_store_cpu(), and only SMIDR_EL1 remains...
> 
> Per ARM DDI 0487 L.a, accesses to SMIDR_EL1 never trap to EL3, so we can
> read that safely as long as ID_AA64PFR1_EL1.SME indicates that SME is
> implemented.
> 
> Which is to say that if we revert the remaining portion of 892f7237b3ff
> and restore the read of SMIDR, that should be good as far back as v6.7,
> which sounds good to me.

Sounds reasonable indeed. I guess *someone* will want it for the
previous kernel versions, but they can have fun with the backport on
their own.

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6ce71f444ed84f9056196bb21bbfac61c9687e30..b88102fd2c20f77e25af6df513fda09a484e882e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1167,12 +1167,6 @@  void __init init_cpu_features(struct cpuinfo_arm64 *info)
 	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
 		unsigned long cpacr = cpacr_save_enable_kernel_sme();
 
-		/*
-		 * We mask out SMPS since even if the hardware
-		 * supports priorities the kernel does not at present
-		 * and we block access to them.
-		 */
-		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
 		vec_init_vq_map(ARM64_VEC_SME);
 
 		cpacr_restore(cpacr);
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index d79e88fccdfce427507e7a34c5959ce6309cbd12..b7d403da71e5a01ed3943eb37e7a00af238771a2 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -499,4 +499,15 @@  void __init cpuinfo_store_boot_cpu(void)
 
 	boot_cpu_data = *info;
 	init_cpu_features(&boot_cpu_data);
+
+	/* SMIDR_EL1 needs to be stored in the percpu data for sysfs */
+	if (IS_ENABLED(CONFIG_ARM64_SME) &&
+	    id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1))) {
+		/*
+		 * We mask out SMPS since even if the hardware
+		 * supports priorities the kernel does not at present
+		 * and we block access to them.
+		 */
+		info->reg_smidr = read_cpuid(SMIDR_EL1) & ~SMIDR_EL1_SMPS;
+	}
 }