diff mbox series

[RFC,3/3] arm64: errata: Disable FWB on parts with non-ARM interconnects

Message ID 20230216182201.1705406-4-james.morse@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: errata: Disable FWB on parts with non-ARM interconnects | expand

Commit Message

James Morse Feb. 16, 2023, 6:22 p.m. UTC
Force Write Back (FWB) allows the hypervisor to force non-cacheable
accesses made by a guest to be cacheable. This saves the hypervisor
from doing cache maintenance on all pages the guest can access, to
ensure the guest doesn't see stale (and possibly sensitive) data when
making a non-cacheable access.

When stage1 translation is disabled, the SCTRL_E1.I bit controls the
attributes used for instruction fetch, one of the options results in a
non-cacheable access. A whole host of CPUs missed the FWB override
in this case, meaning a KVM guest could fetch stale/junk data instead of
instructions.

The workaround is to always do the cache maintenance. These parts don't
have fine-grained-traps, so it isn't feasible to detect the guest
disabling the MMU. Instead, disable FWB on the host.

While the CPUs are affected, this erratum doesn't occur on parts using
Arm's CMN interconnects. Use the Errata Management API to discover whether
this CPU is affected.

Because guest execution is compromised, the workaround is enabled by
default. If the Errata Management API isn't implemented by firmware, the
workaround will be enabled. If a target platform is not affected, and it
isn't possible to add support for the Errata Management API, the erratum
can be disabled in Kconfig.

Signed-off-by: James Morse <james.morse@arm.com>
---
 Documentation/arm64/silicon-errata.rst | 18 +++++++
 arch/arm64/Kconfig                     | 27 ++++++++++
 arch/arm64/kernel/cpufeature.c         | 71 +++++++++++++++++++++++++-
 3 files changed, 115 insertions(+), 1 deletion(-)

Comments

Marc Zyngier Feb. 16, 2023, 6:46 p.m. UTC | #1
On Thu, 16 Feb 2023 18:22:01 +0000,
James Morse <james.morse@arm.com> wrote:
> 
> Force Write Back (FWB) allows the hypervisor to force non-cacheable
> accesses made by a guest to be cacheable. This saves the hypervisor
> from doing cache maintenance on all pages the guest can access, to
> ensure the guest doesn't see stale (and possibly sensitive) data when
> making a non-cacheable access.
> 
> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> attributes used for instruction fetch, one of the options results in a
> non-cacheable access. A whole host of CPUs missed the FWB override
> in this case, meaning a KVM guest could fetch stale/junk data instead of
> instructions.
> 
> The workaround is to always do the cache maintenance. These parts don't
> have fine-grained-traps, so it isn't feasible to detect the guest
> disabling the MMU. Instead, disable FWB on the host.
> 
> While the CPUs are affected, this erratum doesn't occur on parts using
> Arm's CMN interconnects. Use the Errata Management API to discover whether
> this CPU is affected.
> 
> Because guest execution is compromised, the workaround is enabled by
> default. If the Errata Management API isn't implemented by firmware, the
> workaround will be enabled. If a target platform is not affected, and it
> isn't possible to add support for the Errata Management API, the erratum
> can be disabled in Kconfig.

I'm feeling a bit sick...

My main concern is hardly anyone implements this errata management
API, if at all. We should:

- give people an option to disable this from the command-line if they
  know they are on an unaffected system

- have some form of DT property that indicates the HW isn't affected

Thoughts?

	M.
James Morse Feb. 21, 2023, 5:48 p.m. UTC | #2
Hi Marc,

On 16/02/2023 18:46, Marc Zyngier wrote:
> On Thu, 16 Feb 2023 18:22:01 +0000,
> James Morse <james.morse@arm.com> wrote:
>>
>> Force Write Back (FWB) allows the hypervisor to force non-cacheable
>> accesses made by a guest to be cacheable. This saves the hypervisor
>> from doing cache maintenance on all pages the guest can access, to
>> ensure the guest doesn't see stale (and possibly sensitive) data when
>> making a non-cacheable access.
>>
>> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
>> attributes used for instruction fetch, one of the options results in a
>> non-cacheable access. A whole host of CPUs missed the FWB override
>> in this case, meaning a KVM guest could fetch stale/junk data instead of
>> instructions.
>>
>> The workaround is to always do the cache maintenance. These parts don't
>> have fine-grained-traps, so it isn't feasible to detect the guest
>> disabling the MMU. Instead, disable FWB on the host.
>>
>> While the CPUs are affected, this erratum doesn't occur on parts using
>> Arm's CMN interconnects. Use the Errata Management API to discover whether
>> this CPU is affected.
>>
>> Because guest execution is compromised, the workaround is enabled by
>> default. If the Errata Management API isn't implemented by firmware, the
>> workaround will be enabled. If a target platform is not affected, and it
>> isn't possible to add support for the Errata Management API, the erratum
>> can be disabled in Kconfig.

> I'm feeling a bit sick...

> My main concern is hardly anyone implements this errata management
> API, if at all. We should:

If anyone? Today no-one implements it!

We've always had to update one of the firmware or kernel for any errata workaround. I
agree this 'both' option is annoying, but if half the story was missing, you already had a
problem.


> - give people an option to disable this from the command-line if they
>   know they are on an unaffected system

(my least favourite)


> - have some form of DT property that indicates the HW isn't affected

All perfectly valid options. The one part Arm is aware that is affected uses Neoverse-V2,
which is much more likely to appear in ACPI machines. The firmware discovery is preferable
to trying to match the 'OEM id' of some random ACPI to determine if the part is affected -
that whole model falls down if the SoC is OEM'd. (Dell, HP, Lenovo, etc)

I think its fair to say you have to support the firmware discovery API if you use ACPI,
and its optional for DT.


Thanks,

James
diff mbox series

Patch

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index ec5f889d7681..d6ca86ebc7af 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -106,6 +106,10 @@  stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A78      | #2712571        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A78C     | #2712575,2712572| ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A510     | #2051678        | ARM64_ERRATUM_2051678       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A510     | #2077057        | ARM64_ERRATUM_2077057       |
@@ -120,12 +124,20 @@  stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A710     | #2224489        | ARM64_ERRATUM_2224489       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A710     | #2701952        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-A715     | #2645198        | ARM64_ERRATUM_2645198       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X1       | #2712571        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X2       | #2119858        | ARM64_ERRATUM_2119858       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Cortex-X2       | #2224489        | ARM64_ERRATUM_2224489       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X2       | #2701952        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-X3       | #2701951        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N1     | #1349291        | N/A                         |
@@ -138,6 +150,12 @@  stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ARM            | Neoverse-N2     | #2253138        | ARM64_ERRATUM_2253138       |
 +----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-N2     | #2728475        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-V1     | #2701953        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-V2     | #2719103        | ARM64_ERRATUM_2701951       |
++----------------+-----------------+-----------------+-----------------------------+
 | ARM            | MMU-500         | #841119,826419  | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c5ccca26a408..adc46e82cee6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -986,6 +986,33 @@  config ARM64_ERRATUM_2645198
 
 	  If unsure, say Y.
 
+config ARM64_ERRATUM_2701951
+	bool "ARM CPUs: 2701951: disable FWB on affected parts"
+	select ARM_SMCCC_EM
+	default y
+	help
+	  This option adds the workaround for multiple ARM errata titled
+	  "The core might fetch stale instruction from memory when both Stage 1
+	   Translation and Instruction Cache are Disabled with Stage 2 forced
+	   Write-Back".
+	  This affects Cortex cores: A78, A78C, A710, X1, X2, X3, and Neoverse
+	  cores: V1, V2 and N2.
+
+	  Affected cores fail to apply the FWB override to instruction fetch
+	  when stage1 translation is disabled, and SCTLR_EL1.I is clear. This
+	  results in stale data being fetched and executed. Only CPUs that are
+	  connected to a non-Arm interconnect will exhibit symptoms due to this
+	  errata.
+
+	  Work around this problem in the driver by disabling FWB on affected
+	  parts. The SMCCC Errata Management API is used to query firmware to
+	  learn if the part is affected.
+
+	  If the SMCCC Errata Management API is not implemented on a platform
+	  with an affected core, the workaround will be applied.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2eb4d38e491a..1d7156e75468 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1574,6 +1574,75 @@  static bool has_cache_dic(const struct arm64_cpu_capabilities *entry,
 	return ctr & BIT(CTR_EL0_DIC_SHIFT);
 }
 
+static bool has_stage2_fwb(const struct arm64_cpu_capabilities *entry,
+			   int scope)
+{
+	bool has_feature = has_cpuid_feature(entry, scope);
+
+	/* List of CPUs which may have broken FWB support. */
+	static const struct midr_range cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2701951
+		MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
+		MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
+		MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+		MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
+		MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
+		MIDR_RANGE(MIDR_CORTEX_X3, 0, 0, 1, 1),
+		MIDR_RANGE(MIDR_NEOVERSE_V1, 0, 0, 1, 1),
+		MIDR_RANGE(MIDR_NEOVERSE_V2, 0, 0, 0, 1),
+		MIDR_RANGE(MIDR_NEOVERSE_N2, 0, 0, 0, 2),
+#endif
+		{ /* sentinel */ },
+	};
+
+	if (!has_feature)
+		return false;
+
+	if (is_midr_in_range_list(read_cpuid_id(), cpus)) {
+		int i;
+		bool fwb_broken = true;
+
+		/*
+		 * List of erratum numbers for these CPUs.
+		 * It isn't possible to match these to their CPUs, as A78C has
+		 * two erratum numbers. The errata management API will return
+		 * 'UNKNOWN' for an erratum it doesn't recognise.
+		 */
+		static const u32 erratum_nums[] = {
+			2701951,
+			2701952,
+			2701953,
+			2712571,
+			2712572,
+			2712575,
+			2719103,
+			2728475,
+		};
+
+		/*
+		 * The CPU is affected, but what about this configuration?
+		 * Only firmware has the answer. Assume the part is affected,
+		 * and query firmware for the set of erratum numbers. If one
+		 * returns not-affected, the workaround isn't needed.
+		 */
+		for (i = 0; i < ARRAY_SIZE(erratum_nums); i++) {
+			int state = arm_smccc_em_cpu_features(erratum_nums[i]);
+
+			if (state == SMCCC_EM_RET_NOT_AFFECTED) {
+				fwb_broken = false;
+				break;
+			}
+		}
+
+		if (fwb_broken) {
+			pr_info_once("%s disabled due to erratum #2701951\n", entry->desc);
+			return false;
+		}
+	}
+
+	return has_feature;
+}
+
 static bool __maybe_unused
 has_useable_cnp(const struct arm64_cpu_capabilities *entry, int scope)
 {
@@ -2365,7 +2434,7 @@  static const struct arm64_cpu_capabilities arm64_features[] = {
 		.field_pos = ID_AA64MMFR2_EL1_FWB_SHIFT,
 		.field_width = 4,
 		.min_field_value = 1,
-		.matches = has_cpuid_feature,
+		.matches = has_stage2_fwb,
 	},
 	{
 		.desc = "ARMv8.4 Translation Table Level",