Message ID | 20220909165938.3931307-1-james.morse@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: errata: remove BF16 HWCAP due to incorrect result on Cortex-A510 | expand |
On 2022-09-09 17:59, James Morse wrote: > Hello! > > Cortex-A510 has an erratum where the BFMMLA or VMMLA instructions might > produce the incorrect result. This only happens if two Cortex-A510 CPUs > are configured by the SoC manafacturer in a particular way to share some > hardware, and are using it at the same time. It isn't possible for linux > to know how the CPUs were configured, so the only option is to disable > the BF16 feature for all Cortex-A510 CPUs. Hmm, the TRM doesn't seem to describe any obvious restrictions on accessing IMP_CPUCFR_EL1 - is there some other nefarious secret at play there? Robin. > This won't stop user-space from trying to use the instructions to see > if they work - but such software is already broken on big/little systems > with mismathed features. > > Removing the BF16 feature involves removing both the HWCAP, as was done > by commit 44b3834b2eed "arm64: errata: Remove AES hwcap for COMPAT tasks", > and the emulated view of that register that user-space has. > (This wasn't previously needed as aarch32 ID registers aren't accessible > like this). > > As there are now two of these things, this series tries to add a more > maintainable way to remove features, to avoid spilling workaround like > this into cpufeature.c. > > Instead, cpu_errata.c modifies the user_mask that is used for emulation > of the id registers, and cpufeature.c builds the HWCAPs based on this. > > I have patches to convert the AES workaround to do the same, but that > should wait until the aarch32 ID registers are generated by the sysreg > awk scripts (it needs some new masks defined). > > [0] https://developer.arm.com/documentation/SDEN1873351/1400/?lang=en > > Thanks, > > James Morse (3): > arm64: cpufeature: Force HWCAP to be based on the sysreg visible to > user-space > arm64: cpufeature: Expose get_arm64_ftr_reg() outside cpufeature.c > arm64: errata: remove BF16 HWCAP due to incorrect result on > Cortex-A510 > > Documentation/arm64/silicon-errata.rst | 2 ++ > arch/arm64/Kconfig | 14 +++++++++ > arch/arm64/include/asm/cpufeature.h | 2 ++ > arch/arm64/kernel/cpu_errata.c | 27 ++++++++++++++++ > arch/arm64/kernel/cpufeature.c | 43 ++++++++++++++++++++------ > arch/arm64/tools/cpucaps | 1 + > 6 files changed, 79 insertions(+), 10 deletions(-) >
Hi Robin, On 09/09/2022 19:29, Robin Murphy wrote: > On 2022-09-09 17:59, James Morse wrote: >> Cortex-A510 has an erratum where the BFMMLA or VMMLA instructions might >> produce the incorrect result. This only happens if two Cortex-A510 CPUs >> are configured by the SoC manafacturer in a particular way to share some >> hardware, and are using it at the same time. It isn't possible for linux >> to know how the CPUs were configured, so the only option is to disable >> the BF16 feature for all Cortex-A510 CPUs. > Hmm, the TRM doesn't seem to describe any obvious restrictions on accessing IMP_CPUCFR_EL1 > - is there some other nefarious secret at play there? HCR_EL2.TIDCP. KVM traps all IMP_* registers from EL1, and has no ability to emulate them. For an erratum workaround at EL1, touching an imp-def register is the wrong thing to do. We could add some discovery API with firmware to determine if the part is affected, but I think that is overkill for this. I'm pretty sure affected Cortex-A510 silicon is only in mobile phones, which can have the Kconfig option disabled if the target platform isn't affected. (Even with GKI, I assume things don't run one kernel image in the same way distros do) Thanks, James
On 2022-09-15 14:39, James Morse wrote: > Hi Robin, > > On 09/09/2022 19:29, Robin Murphy wrote: >> On 2022-09-09 17:59, James Morse wrote: >>> Cortex-A510 has an erratum where the BFMMLA or VMMLA instructions might >>> produce the incorrect result. This only happens if two Cortex-A510 CPUs >>> are configured by the SoC manafacturer in a particular way to share some >>> hardware, and are using it at the same time. It isn't possible for linux >>> to know how the CPUs were configured, so the only option is to disable >>> the BF16 feature for all Cortex-A510 CPUs. > >> Hmm, the TRM doesn't seem to describe any obvious restrictions on accessing IMP_CPUCFR_EL1 >> - is there some other nefarious secret at play there? > > HCR_EL2.TIDCP. Aha, indeed I should have looked more closely at the pseudocode. > KVM traps all IMP_* registers from EL1, and has no ability to emulate them. > For an erratum workaround at EL1, touching an imp-def register is the wrong thing to do. Well, there's no real reason we couldn't make KVM handle this case, but that doesn't help if we're booted at EL1 under some other hypervisor, so in general I concur. > We could add some discovery API with firmware to determine if the part is affected, but I > think that is overkill for this. I'm pretty sure affected Cortex-A510 silicon is only in > mobile phones, which can have the Kconfig option disabled if the target platform isn't > affected. (Even with GKI, I assume things don't run one kernel image in the same way > distros do) I suppose it wouldn't take *too* much to check this in init_el2 and stash a "no actually it's fine" flag for later, but whether even that's worth it depends on how big the impact of losing BF16 on unaffected configurations actually is (which I have no idea about). If someone does care then that can always be added later, so that's good enough for me! Cheers, Robin.
On Fri, 9 Sep 2022 17:59:35 +0100, James Morse wrote: > Cortex-A510 has an erratum where the BFMMLA or VMMLA instructions might > produce the incorrect result. This only happens if two Cortex-A510 CPUs > are configured by the SoC manafacturer in a particular way to share some > hardware, and are using it at the same time. It isn't possible for linux > to know how the CPUs were configured, so the only option is to disable > the BF16 feature for all Cortex-A510 CPUs. > > [...] Applied to arm64 (for-next/a510-erratum-2658417), thanks! (I updated the last patch following Suzuki's comments) [1/3] arm64: cpufeature: Force HWCAP to be based on the sysreg visible to user-space https://git.kernel.org/arm64/c/237405ebef58 [2/3] arm64: cpufeature: Expose get_arm64_ftr_reg() outside cpufeature.c https://git.kernel.org/arm64/c/445c953e4a84 [3/3] arm64: errata: remove BF16 HWCAP due to incorrect result on Cortex-A510 https://git.kernel.org/arm64/c/1bdb0fbb2e27