diff mbox series

[v3,02/14] arm64: Detect if in a realm and set RIPAS RAM

Message ID 20240605093006.145492-3-steven.price@arm.com (mailing list archive)
State New
Headers show
Series arm64: Support for running as a guest in Arm CCA | expand

Commit Message

Steven Price June 5, 2024, 9:29 a.m. UTC
From: Suzuki K Poulose <suzuki.poulose@arm.com>

Detect that the VM is a realm guest by the presence of the RSI
interface.

If in a realm then all memory needs to be marked as RIPAS RAM initially,
the loader may or may not have done this for us. To be sure iterate over
all RAM and mark it as such. Any failure is fatal as that implies the
RAM regions passed to Linux are incorrect - which would mean failing
later when attempting to access non-existent RAM.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v2:
 * Use DECLARE_STATIC_KEY_FALSE rather than "extern struct
   static_key_false".
 * Rename set_memory_range() to rsi_set_memory_range().
 * Downgrade some BUG()s to WARN()s and handle the condition by
   propagating up the stack. Comment the remaining case that ends in a
   BUG() to explain why.
 * Rely on the return from rsi_request_version() rather than checking
   the version the RMM claims to support.
 * Rename the generic sounding arm64_setup_memory() to
   arm64_rsi_setup_memory() and move the call site to setup_arch().
---
 arch/arm64/include/asm/rsi.h      | 48 +++++++++++++++++++++
 arch/arm64/include/asm/rsi_cmds.h | 22 ++++++++++
 arch/arm64/kernel/Makefile        |  3 +-
 arch/arm64/kernel/rsi.c           | 69 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/setup.c         |  8 ++++
 arch/arm64/mm/init.c              |  1 +
 6 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/rsi.h
 create mode 100644 arch/arm64/kernel/rsi.c

Comments

Catalin Marinas June 10, 2024, 2:11 p.m. UTC | #1
On Wed, Jun 05, 2024 at 10:29:54AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..9d8d38e3bee2 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -41,6 +41,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/memory.h>
>  #include <asm/numa.h>
> +#include <asm/rsi.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <linux/sizes.h>

What's this random include here? It looks like a leftover from the
previous version.
Steven Price June 10, 2024, 2:16 p.m. UTC | #2
On 10/06/2024 15:11, Catalin Marinas wrote:
> On Wed, Jun 05, 2024 at 10:29:54AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 9b5ab6818f7f..9d8d38e3bee2 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -41,6 +41,7 @@
>>  #include <asm/kvm_host.h>
>>  #include <asm/memory.h>
>>  #include <asm/numa.h>
>> +#include <asm/rsi.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <linux/sizes.h>
> 
> What's this random include here? It looks like a leftover from the
> previous version.
> 

Whoops - indeed that shouldn't be there.

Thanks,
Steve
Jean-Philippe Brucker June 12, 2024, 10:40 a.m. UTC | #3
On Wed, Jun 05, 2024 at 10:29:54AM +0100, Steven Price wrote:
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> Detect that the VM is a realm guest by the presence of the RSI
> interface.
> 
> If in a realm then all memory needs to be marked as RIPAS RAM initially,
> the loader may or may not have done this for us. To be sure iterate over
> all RAM and mark it as such. Any failure is fatal as that implies the
> RAM regions passed to Linux are incorrect - which would mean failing
> later when attempting to access non-existent RAM.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Co-developed-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>

> +static bool rsi_version_matches(void)
> +{
> +	unsigned long ver_lower, ver_higher;
> +	unsigned long ret = rsi_request_version(RSI_ABI_VERSION,
> +						&ver_lower,
> +						&ver_higher);

There is a regression on QEMU TCG (in emulation mode, not running under KVM):

  qemu-system-aarch64 -M virt -cpu max -kernel Image -nographic

This doesn't implement EL3 or EL2, so SMC is UNDEFINED (DDI0487J.a R_HMXQS),
and we end up with an undef instruction exception. So this patch would
also break hardware that only implements EL1 (I don't know if it exists).

The easiest fix is to detect the SMC conduit through the PSCI node in DT.
SMCCC helpers already do this, but we can't use them this early in the
boot. I tested adding an early probe to the PSCI driver to check this, see
attached patches.

Note that we do need to test the conduit after finding a PSCI node,
because even though it doesn't implement EL2 in this configuration, QEMU
still accepts PSCI HVCs in order to support SMP.

Thanks,
Jean
From 788bfd45e7ce521666a19dba99277106e4d33c80 Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
Date: Tue, 11 Jun 2024 19:15:30 +0100
Subject: [PATCH 1/2] firmware/psci: Add psci_early_test_conduit()

Add a function to test early if PSCI is present and what conduit it
uses. Because the PSCI conduit corresponds to the SMCCC one, this will
let the kernel know whether it can use SMC instructions to discuss with
the Realm Management Monitor (RMM), early enough to enable RAM and
serial access when running in a Realm.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/psci.h         |  5 +++++
 drivers/firmware/psci/psci.c | 25 +++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/psci.h b/include/linux/psci.h
index 4ca0060a3fc4..a1fc1703ba20 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -45,8 +45,13 @@ struct psci_0_1_function_ids get_psci_0_1_function_ids(void);
 
 #if defined(CONFIG_ARM_PSCI_FW)
 int __init psci_dt_init(void);
+bool __init psci_early_test_conduit(enum arm_smccc_conduit conduit);
 #else
 static inline int psci_dt_init(void) { return 0; }
+static inline bool psci_early_test_conduit(enum arm_smccc_conduit conduit)
+{
+	return false;
+}
 #endif
 
 #if defined(CONFIG_ARM_PSCI_FW) && defined(CONFIG_ACPI)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index d9629ff87861..a40dcaf17822 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -13,6 +13,7 @@
 #include <linux/errno.h>
 #include <linux/linkage.h>
 #include <linux/of.h>
+#include <linux/of_fdt.h>
 #include <linux/pm.h>
 #include <linux/printk.h>
 #include <linux/psci.h>
@@ -767,6 +768,30 @@ int __init psci_dt_init(void)
 	return ret;
 }
 
+/*
+ * Test early if PSCI is supported, and if its conduit matches @conduit
+ */
+bool __init psci_early_test_conduit(enum arm_smccc_conduit conduit)
+{
+	int len;
+	int psci_node;
+	const char *method;
+	unsigned long dt_root;
+
+	/* DT hasn't been unflattened yet, we have to work with the flat blob */
+	dt_root = of_get_flat_dt_root();
+	psci_node = of_get_flat_dt_subnode_by_name(dt_root, "psci");
+	if (psci_node <= 0)
+		return false;
+
+	method = of_get_flat_dt_prop(psci_node, "method", &len);
+	if (!method)
+		return false;
+
+	return  (conduit == SMCCC_CONDUIT_SMC && strncmp(method, "smc", len) == 0) ||
+		(conduit == SMCCC_CONDUIT_HVC && strncmp(method, "hvc", len) == 0);
+}
+
 #ifdef CONFIG_ACPI
 /*
  * We use PSCI 0.2+ when ACPI is deployed on ARM64 and it's
Suzuki K Poulose June 12, 2024, 10:59 a.m. UTC | #4
On 12/06/2024 11:40, Jean-Philippe Brucker wrote:
> On Wed, Jun 05, 2024 at 10:29:54AM +0100, Steven Price wrote:
>> From: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> Detect that the VM is a realm guest by the presence of the RSI
>> interface.
>>
>> If in a realm then all memory needs to be marked as RIPAS RAM initially,
>> the loader may or may not have done this for us. To be sure iterate over
>> all RAM and mark it as such. Any failure is fatal as that implies the
>> RAM regions passed to Linux are incorrect - which would mean failing
>> later when attempting to access non-existent RAM.
>>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Co-developed-by: Steven Price <steven.price@arm.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
> 
>> +static bool rsi_version_matches(void)
>> +{
>> +	unsigned long ver_lower, ver_higher;
>> +	unsigned long ret = rsi_request_version(RSI_ABI_VERSION,
>> +						&ver_lower,
>> +						&ver_higher);
> 
> There is a regression on QEMU TCG (in emulation mode, not running under KVM):
> 
>    qemu-system-aarch64 -M virt -cpu max -kernel Image -nographic
> 
> This doesn't implement EL3 or EL2, so SMC is UNDEFINED (DDI0487J.a R_HMXQS),
> and we end up with an undef instruction exception. So this patch would
> also break hardware that only implements EL1 (I don't know if it exists).

Thanks for the report,  Could we not check ID_AA64PFR0_EL1.EL3 >= 0 ? I
think we do this for kvm-unit-tests, we need the same here.


Suzuki

> 
> The easiest fix is to detect the SMC conduit through the PSCI node in DT.
> SMCCC helpers already do this, but we can't use them this early in the
> boot. I tested adding an early probe to the PSCI driver to check this, see
> attached patches.
> 
> Note that we do need to test the conduit after finding a PSCI node,
> because even though it doesn't implement EL2 in this configuration, QEMU
> still accepts PSCI HVCs in order to support SMP.
> 
> Thanks,
> Jean
>
Jean-Philippe Brucker June 13, 2024, 10:51 a.m. UTC | #5
On Wed, Jun 12, 2024 at 11:59:22AM +0100, Suzuki K Poulose wrote:
> On 12/06/2024 11:40, Jean-Philippe Brucker wrote:
> > On Wed, Jun 05, 2024 at 10:29:54AM +0100, Steven Price wrote:
> > > From: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > 
> > > Detect that the VM is a realm guest by the presence of the RSI
> > > interface.
> > > 
> > > If in a realm then all memory needs to be marked as RIPAS RAM initially,
> > > the loader may or may not have done this for us. To be sure iterate over
> > > all RAM and mark it as such. Any failure is fatal as that implies the
> > > RAM regions passed to Linux are incorrect - which would mean failing
> > > later when attempting to access non-existent RAM.
> > > 
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > Co-developed-by: Steven Price <steven.price@arm.com>
> > > Signed-off-by: Steven Price <steven.price@arm.com>
> > 
> > > +static bool rsi_version_matches(void)
> > > +{
> > > +	unsigned long ver_lower, ver_higher;
> > > +	unsigned long ret = rsi_request_version(RSI_ABI_VERSION,
> > > +						&ver_lower,
> > > +						&ver_higher);
> > 
> > There is a regression on QEMU TCG (in emulation mode, not running under KVM):
> > 
> >    qemu-system-aarch64 -M virt -cpu max -kernel Image -nographic
> > 
> > This doesn't implement EL3 or EL2, so SMC is UNDEFINED (DDI0487J.a R_HMXQS),
> > and we end up with an undef instruction exception. So this patch would
> > also break hardware that only implements EL1 (I don't know if it exists).
> 
> Thanks for the report,  Could we not check ID_AA64PFR0_EL1.EL3 >= 0 ? I
> think we do this for kvm-unit-tests, we need the same here.

Good point, it also fixes this case and is simpler. It assumes RMM doesn't
hide this field, but I can't think of a reason it would.

This command won't work anymore:

  qemu-system-aarch64 -M virt,secure=on -cpu max -kernel Image -nographic

implements EL3 and SMC still treated as undef. QEMU has a special case for
starting at EL2 in this case, but I couldn't find what this is for.

Treating SMC as undef is correct because SCR_EL3.SMD resets to an
architectutally UNKNOWN value. But the architecture requires that the CPU
resets to the highest implemented exception level (DDI0487J.a R_JYLQV). So
in my opinion we can use the ID_AA64PFR0_EL1.EL3 field here, and breaking
this particular configuration is not a problem: users shouldn't expect
Linux to boot when EL3 is implemented and doesn't run a firmware.

Thanks,
Jean

> 
> 
> Suzuki
> 
> > 
> > The easiest fix is to detect the SMC conduit through the PSCI node in DT.
> > SMCCC helpers already do this, but we can't use them this early in the
> > boot. I tested adding an early probe to the PSCI driver to check this, see
> > attached patches.
> > 
> > Note that we do need to test the conduit after finding a PSCI node,
> > because even though it doesn't implement EL2 in this configuration, QEMU
> > still accepts PSCI HVCs in order to support SMP.
> > 
> > Thanks,
> > Jean
> > 
>
Suzuki K Poulose June 14, 2024, 6:57 p.m. UTC | #6
Hi Steven

On 05/06/2024 10:29, Steven Price wrote:
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> Detect that the VM is a realm guest by the presence of the RSI
> interface.
> 
> If in a realm then all memory needs to be marked as RIPAS RAM initially,
> the loader may or may not have done this for us. To be sure iterate over
> all RAM and mark it as such. Any failure is fatal as that implies the
> RAM regions passed to Linux are incorrect - which would mean failing
> later when attempting to access non-existent RAM.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Co-developed-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v2:
>   * Use DECLARE_STATIC_KEY_FALSE rather than "extern struct
>     static_key_false".
>   * Rename set_memory_range() to rsi_set_memory_range().
>   * Downgrade some BUG()s to WARN()s and handle the condition by
>     propagating up the stack. Comment the remaining case that ends in a
>     BUG() to explain why.
>   * Rely on the return from rsi_request_version() rather than checking
>     the version the RMM claims to support.
>   * Rename the generic sounding arm64_setup_memory() to
>     arm64_rsi_setup_memory() and move the call site to setup_arch().
> ---
>   arch/arm64/include/asm/rsi.h      | 48 +++++++++++++++++++++
>   arch/arm64/include/asm/rsi_cmds.h | 22 ++++++++++
>   arch/arm64/kernel/Makefile        |  3 +-
>   arch/arm64/kernel/rsi.c           | 69 +++++++++++++++++++++++++++++++
>   arch/arm64/kernel/setup.c         |  8 ++++
>   arch/arm64/mm/init.c              |  1 +
>   6 files changed, 150 insertions(+), 1 deletion(-)
>   create mode 100644 arch/arm64/include/asm/rsi.h
>   create mode 100644 arch/arm64/kernel/rsi.c
> 
> diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
> new file mode 100644
> index 000000000000..ce2cdb501d84
> --- /dev/null
> +++ b/arch/arm64/include/asm/rsi.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024 ARM Ltd.
> + */
> +
> +#ifndef __ASM_RSI_H_
> +#define __ASM_RSI_H_
> +
> +#include <linux/jump_label.h>
> +#include <asm/rsi_cmds.h>
> +
> +DECLARE_STATIC_KEY_FALSE(rsi_present);
> +
> +void __init arm64_rsi_init(void);
> +void __init arm64_rsi_setup_memory(void);
> +static inline bool is_realm_world(void)
> +{
> +	return static_branch_unlikely(&rsi_present);
> +}
> +
> +static inline int rsi_set_memory_range(phys_addr_t start, phys_addr_t end,
> +				       enum ripas state)
> +{
> +	unsigned long ret;
> +	phys_addr_t top;
> +
> +	while (start != end) {
> +		ret = rsi_set_addr_range_state(start, end, state, &top);
> +		if (WARN_ON(ret || top < start || top > end))
> +			return -EINVAL;
> +		start = top;
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int rsi_set_memory_range_protected(phys_addr_t start,
> +						 phys_addr_t end)
> +{
> +	return rsi_set_memory_range(start, end, RSI_RIPAS_RAM);
> +}
> +
> +static inline int rsi_set_memory_range_shared(phys_addr_t start,
> +					      phys_addr_t end)
> +{
> +	return rsi_set_memory_range(start, end, RSI_RIPAS_EMPTY);
> +}
> +#endif
> diff --git a/arch/arm64/include/asm/rsi_cmds.h b/arch/arm64/include/asm/rsi_cmds.h
> index ad425c5d6f1b..ab8ad435f10e 100644
> --- a/arch/arm64/include/asm/rsi_cmds.h
> +++ b/arch/arm64/include/asm/rsi_cmds.h
> @@ -10,6 +10,11 @@
>   
>   #include <asm/rsi_smc.h>
>   
> +enum ripas {
> +	RSI_RIPAS_EMPTY,
> +	RSI_RIPAS_RAM,
> +};
> +
>   static inline void invoke_rsi_fn_smc_with_res(unsigned long function_id,
>   					      unsigned long arg0,
>   					      unsigned long arg1,
> @@ -44,4 +49,21 @@ static inline unsigned long rsi_get_realm_config(struct realm_config *cfg)
>   	return res.a0;
>   }
>   
> +static inline unsigned long rsi_set_addr_range_state(phys_addr_t start,
> +						     phys_addr_t end,
> +						     enum ripas state,
> +						     phys_addr_t *top)
> +{
> +	struct arm_smccc_res res;
> +
> +	invoke_rsi_fn_smc_with_res(SMC_RSI_IPA_STATE_SET,
> +				   start, end, state, RSI_NO_CHANGE_DESTROYED,

Though this is fine from the KVM as NS Host perspective, it may be 
unnecessarily restrictive in general for a Host implementation. We only
need that RSI_NO_CHANGE_DESTROYED flag for "init all DRAM range as RAM"
(where we want to prevent a host from "destroying pages" that were
populated before activation, without consent). But in all other cases
where we do not rely on the content of the "newly" encrypted page,
we could drop the flag.

I think we need could have variants of this helper one which allows 
"DESTROYED" granules to be converted, which must be only used while 
"transitioning" a page to encrypted, where we don't rely on the contents 
of the page.

Something like :

rsi_set_memory_range_protected_safe() : Do not allow DESTROYED contents 
to be converted.

rsi_set_memory_range_protected().

Something like:

---8>---

diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
index ce2cdb501d84..dea2ed99f6d1 100644
--- a/arch/arm64/include/asm/rsi.h
+++ b/arch/arm64/include/asm/rsi.h
@@ -19,13 +19,13 @@ static inline bool is_realm_world(void)
  }

  static inline int rsi_set_memory_range(phys_addr_t start, phys_addr_t end,
-				       enum ripas state)
+				       enum ripas state, unsigned long flags)
  {
  	unsigned long ret;
  	phys_addr_t top;

  	while (start != end) {
-		ret = rsi_set_addr_range_state(start, end, state, &top);
+		ret = rsi_set_addr_range_state(start, end, state, flags, &top);
  		if (WARN_ON(ret || top < start || top > end))
  			return -EINVAL;
  		start = top;
@@ -34,15 +34,29 @@ static inline int rsi_set_memory_range(phys_addr_t 
start, phys_addr_t end,
  	return 0;
  }

+/*
+ * Convert the specified range to RAM. Do not use this if you rely on 
the contents
+ * of a page that may already be in RAM state.
+ */
  static inline int rsi_set_memory_range_protected(phys_addr_t start,
  						 phys_addr_t end)
  {
-	return rsi_set_memory_range(start, end, RSI_RIPAS_RAM);
+	return rsi_set_memory_range(start, end, RSI_RIPAS_RAM, 0);
+}
+
+/*
+ * Convert the specified range to RAM. Do not convert any pages that 
may have
+ * been DESTROYED, without our permission.
+ */
+static inline int rsi_set_memory_range_protected_safe(phys_addr_t start,
+						      phys_addr_t end)
+{
+	return rsi_set_memory_range(start, end, RSI_RIPAS_RAM, 
RSI_NO_CHANGE_DESTROYED);
  }

  static inline int rsi_set_memory_range_shared(phys_addr_t start,
  					      phys_addr_t end)
  {
-	return rsi_set_memory_range(start, end, RSI_RIPAS_EMPTY);
+	return rsi_set_memory_range(start, end, RSI_RIPAS_EMPTY, 0);
  }
  #endif
diff --git a/arch/arm64/include/asm/rsi_cmds.h 
b/arch/arm64/include/asm/rsi_cmds.h
index ab8ad435f10e..466615ff90de 100644
--- a/arch/arm64/include/asm/rsi_cmds.h
+++ b/arch/arm64/include/asm/rsi_cmds.h
@@ -52,12 +52,13 @@ static inline unsigned long 
rsi_get_realm_config(struct realm_config *cfg)
  static inline unsigned long rsi_set_addr_range_state(phys_addr_t start,
  						     phys_addr_t end,
  						     enum ripas state,
+						     unsigned long flags,
  						     phys_addr_t *top)
  {
  	struct arm_smccc_res res;

  	invoke_rsi_fn_smc_with_res(SMC_RSI_IPA_STATE_SET,
-				   start, end, state, RSI_NO_CHANGE_DESTROYED,
+				   start, end, state, flags,
  				   &res);

  	if (top)
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index 3a992bdfd6bb..e6a6681524a0 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -46,8 +46,9 @@ void __init arm64_rsi_setup_memory(void)
  		return;

  	/*
-	 * Iterate over the available memory ranges
-	 * and convert the state to protected memory.
+	 * Iterate over the available memory ranges and convert the state to
+	 * protected memory. We should take extra care to ensure that we DO NOT
+	 * permit any "DESTROYED" pages to be converted to "RAM".
  	 *
  	 * BUG_ON is used because if the attempt to switch the memory to
  	 * protected has failed here, then future accesses to the memory are
@@ -55,7 +56,7 @@ void __init arm64_rsi_setup_memory(void)
  	 * Bailing out early prevents the guest limping on and dieing later.
  	 */
  	for_each_mem_range(i, &start, &end) {
-		BUG_ON(rsi_set_memory_range_protected(start, end));
+		BUG_ON(rsi_set_memory_range_protected_safe(start, end));
  	}
  }



Kind regards


Suzuki


> +				   &res);
> +
> +	if (top)
> +		*top = res.a1;
> +
> +	return res.a0;
> +}
> +
>   #endif
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 763824963ed1..a483b916ed11 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -33,7 +33,8 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
>   			   return_address.o cpuinfo.o cpu_errata.o		\
>   			   cpufeature.o alternative.o cacheinfo.o		\
>   			   smp.o smp_spin_table.o topology.o smccc-call.o	\
> -			   syscall.o proton-pack.o idle.o patching.o pi/
> +			   syscall.o proton-pack.o idle.o patching.o pi/	\
> +			   rsi.o
>   
>   obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
>   					   sys_compat.o
> diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
> new file mode 100644
> index 000000000000..3a992bdfd6bb
> --- /dev/null
> +++ b/arch/arm64/kernel/rsi.c
> @@ -0,0 +1,69 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#include <linux/jump_label.h>
> +#include <linux/memblock.h>
> +#include <asm/rsi.h>
> +
> +DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
> +EXPORT_SYMBOL(rsi_present);
> +
> +static bool rsi_version_matches(void)
> +{
> +	unsigned long ver_lower, ver_higher;
> +	unsigned long ret = rsi_request_version(RSI_ABI_VERSION,
> +						&ver_lower,
> +						&ver_higher);
> +
> +	if (ret == SMCCC_RET_NOT_SUPPORTED)
> +		return false;
> +
> +	if (ret != RSI_SUCCESS) {
> +		pr_err("RME: RMM doesn't support RSI version %u.%u. Supported range: %lu.%lu-%lu.%lu\n",
> +		       RSI_ABI_VERSION_MAJOR, RSI_ABI_VERSION_MINOR,
> +		       RSI_ABI_VERSION_GET_MAJOR(ver_lower),
> +		       RSI_ABI_VERSION_GET_MINOR(ver_lower),
> +		       RSI_ABI_VERSION_GET_MAJOR(ver_higher),
> +		       RSI_ABI_VERSION_GET_MINOR(ver_higher));
> +		return false;
> +	}
> +
> +	pr_info("RME: Using RSI version %lu.%lu\n",
> +		RSI_ABI_VERSION_GET_MAJOR(ver_lower),
> +		RSI_ABI_VERSION_GET_MINOR(ver_lower));
> +
> +	return true;
> +}
> +
> +void __init arm64_rsi_setup_memory(void)
> +{
> +	u64 i;
> +	phys_addr_t start, end;
> +
> +	if (!is_realm_world())
> +		return;
> +
> +	/*
> +	 * Iterate over the available memory ranges
> +	 * and convert the state to protected memory.
> +	 *
> +	 * BUG_ON is used because if the attempt to switch the memory to
> +	 * protected has failed here, then future accesses to the memory are
> +	 * simply going to be reflected as a fault which we can't handle.
> +	 * Bailing out early prevents the guest limping on and dieing later.
> +	 */
> +	for_each_mem_range(i, &start, &end) {
> +		BUG_ON(rsi_set_memory_range_protected(start, end));
> +	}
> +}
> +
> +void __init arm64_rsi_init(void)
> +{
> +	if (!rsi_version_matches())
> +		return;
> +
> +	static_branch_enable(&rsi_present);
> +}
> +
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index a096e2451044..143f87615af0 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -43,6 +43,7 @@
>   #include <asm/cpu_ops.h>
>   #include <asm/kasan.h>
>   #include <asm/numa.h>
> +#include <asm/rsi.h>
>   #include <asm/scs.h>
>   #include <asm/sections.h>
>   #include <asm/setup.h>
> @@ -293,6 +294,11 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
>   	 * cpufeature code and early parameters.
>   	 */
>   	jump_label_init();
> +	/*
> +	 * Init RSI before early param so that "earlycon" console uses the
> +	 * shared alias when in a realm
> +	 */
> +	arm64_rsi_init();
>   	parse_early_param();
>   
>   	dynamic_scs_init();
> @@ -328,6 +334,8 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
>   
>   	arm64_memblock_init();
>   
> +	arm64_rsi_setup_memory();
> +
>   	paging_init();
>   
>   	acpi_table_upgrade();
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..9d8d38e3bee2 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -41,6 +41,7 @@
>   #include <asm/kvm_host.h>
>   #include <asm/memory.h>
>   #include <asm/numa.h>
> +#include <asm/rsi.h>
>   #include <asm/sections.h>
>   #include <asm/setup.h>
>   #include <linux/sizes.h>
Peter Maydell June 17, 2024, 10:27 a.m. UTC | #7
On Thu, 13 Jun 2024 at 11:50, Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Wed, Jun 12, 2024 at 11:59:22AM +0100, Suzuki K Poulose wrote:
> > On 12/06/2024 11:40, Jean-Philippe Brucker wrote:
> > > On Wed, Jun 05, 2024 at 10:29:54AM +0100, Steven Price wrote:
> > > > From: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > >
> > > > Detect that the VM is a realm guest by the presence of the RSI
> > > > interface.
> > > >
> > > > If in a realm then all memory needs to be marked as RIPAS RAM initially,
> > > > the loader may or may not have done this for us. To be sure iterate over
> > > > all RAM and mark it as such. Any failure is fatal as that implies the
> > > > RAM regions passed to Linux are incorrect - which would mean failing
> > > > later when attempting to access non-existent RAM.
> > > >
> > > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > Co-developed-by: Steven Price <steven.price@arm.com>
> > > > Signed-off-by: Steven Price <steven.price@arm.com>
> > >
> > > > +static bool rsi_version_matches(void)
> > > > +{
> > > > + unsigned long ver_lower, ver_higher;
> > > > + unsigned long ret = rsi_request_version(RSI_ABI_VERSION,
> > > > +                                         &ver_lower,
> > > > +                                         &ver_higher);
> > >
> > > There is a regression on QEMU TCG (in emulation mode, not running under KVM):
> > >
> > >    qemu-system-aarch64 -M virt -cpu max -kernel Image -nographic
> > >
> > > This doesn't implement EL3 or EL2, so SMC is UNDEFINED (DDI0487J.a R_HMXQS),
> > > and we end up with an undef instruction exception. So this patch would
> > > also break hardware that only implements EL1 (I don't know if it exists).
> >
> > Thanks for the report,  Could we not check ID_AA64PFR0_EL1.EL3 >= 0 ? I
> > think we do this for kvm-unit-tests, we need the same here.
>
> Good point, it also fixes this case and is simpler. It assumes RMM doesn't
> hide this field, but I can't think of a reason it would.
>
> This command won't work anymore:
>
>   qemu-system-aarch64 -M virt,secure=on -cpu max -kernel Image -nographic
>
> implements EL3 and SMC still treated as undef. QEMU has a special case for
> starting at EL2 in this case, but I couldn't find what this is for.

That's a bit of an odd config, because it says "emulate EL3 but
never use it". QEMU's boot loader starts the kernel at EL2 because
the kernel boot protocol requires that (this is more relevant on
boards other than virt where EL3 is not command-line disableable).
I have a feeling we've occasionally found that somebody's had some
corner case reason to use it, though. (eg
https://gitlab.com/qemu-project/qemu/-/issues/1899
is from somebody who says they use this when booting Windows 11 because
it asserts at boot time that EL3 is present and won't boot otherwise.)

Your underlying problem here seems to be that you don't have
a way for the firmware to say "hey, SMC works, you can use it" ?

-- PMM
Jean-Philippe Brucker June 17, 2024, 11:23 a.m. UTC | #8
On Mon, Jun 17, 2024 at 11:27:31AM +0100, Peter Maydell wrote:
> On Thu, 13 Jun 2024 at 11:50, Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
> >
> > On Wed, Jun 12, 2024 at 11:59:22AM +0100, Suzuki K Poulose wrote:
> > > On 12/06/2024 11:40, Jean-Philippe Brucker wrote:
> > > > On Wed, Jun 05, 2024 at 10:29:54AM +0100, Steven Price wrote:
> > > > > From: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > >
> > > > > Detect that the VM is a realm guest by the presence of the RSI
> > > > > interface.
> > > > >
> > > > > If in a realm then all memory needs to be marked as RIPAS RAM initially,
> > > > > the loader may or may not have done this for us. To be sure iterate over
> > > > > all RAM and mark it as such. Any failure is fatal as that implies the
> > > > > RAM regions passed to Linux are incorrect - which would mean failing
> > > > > later when attempting to access non-existent RAM.
> > > > >
> > > > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > > Co-developed-by: Steven Price <steven.price@arm.com>
> > > > > Signed-off-by: Steven Price <steven.price@arm.com>
> > > >
> > > > > +static bool rsi_version_matches(void)
> > > > > +{
> > > > > + unsigned long ver_lower, ver_higher;
> > > > > + unsigned long ret = rsi_request_version(RSI_ABI_VERSION,
> > > > > +                                         &ver_lower,
> > > > > +                                         &ver_higher);
> > > >
> > > > There is a regression on QEMU TCG (in emulation mode, not running under KVM):
> > > >
> > > >    qemu-system-aarch64 -M virt -cpu max -kernel Image -nographic
> > > >
> > > > This doesn't implement EL3 or EL2, so SMC is UNDEFINED (DDI0487J.a R_HMXQS),
> > > > and we end up with an undef instruction exception. So this patch would
> > > > also break hardware that only implements EL1 (I don't know if it exists).
> > >
> > > Thanks for the report,  Could we not check ID_AA64PFR0_EL1.EL3 >= 0 ? I
> > > think we do this for kvm-unit-tests, we need the same here.
> >
> > Good point, it also fixes this case and is simpler. It assumes RMM doesn't
> > hide this field, but I can't think of a reason it would.
> >
> > This command won't work anymore:
> >
> >   qemu-system-aarch64 -M virt,secure=on -cpu max -kernel Image -nographic
> >
> > implements EL3 and SMC still treated as undef. QEMU has a special case for
> > starting at EL2 in this case, but I couldn't find what this is for.
> 
> That's a bit of an odd config, because it says "emulate EL3 but
> never use it". QEMU's boot loader starts the kernel at EL2 because
> the kernel boot protocol requires that (this is more relevant on
> boards other than virt where EL3 is not command-line disableable).
> I have a feeling we've occasionally found that somebody's had some
> corner case reason to use it, though. (eg
> https://gitlab.com/qemu-project/qemu/-/issues/1899
> is from somebody who says they use this when booting Windows 11 because
> it asserts at boot time that EL3 is present and won't boot otherwise.)

Thanks for the pointer. In this case it looks like Linux was used as
reproducer and not the main use-case, though I wonder if some CIs
regularly boot this particular configuration.

> 
> Your underlying problem here seems to be that you don't have
> a way for the firmware to say "hey, SMC works, you can use it" ?

We do: SMCCC recommends to look at the PSCI conduit declared in DT/ACPI.
Given that RMM mandates using the SMC conduit for both PSCI and RSI calls,
we could use this discovery mechanism here. The problem is that we have to
discover it very early at boot, before the DT infrastructure is ready,
so the implementation is a little awkward. I did post one earlier in this
thread but it doesn't yet account for ACPI-only boot, which will need
something similar. Testing ID_AA64PFR0_EL1.EL3 would be much simpler, but
it would break this config.

Thanks,
Jean
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
new file mode 100644
index 000000000000..ce2cdb501d84
--- /dev/null
+++ b/arch/arm64/include/asm/rsi.h
@@ -0,0 +1,48 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024 ARM Ltd.
+ */
+
+#ifndef __ASM_RSI_H_
+#define __ASM_RSI_H_
+
+#include <linux/jump_label.h>
+#include <asm/rsi_cmds.h>
+
+DECLARE_STATIC_KEY_FALSE(rsi_present);
+
+void __init arm64_rsi_init(void);
+void __init arm64_rsi_setup_memory(void);
+static inline bool is_realm_world(void)
+{
+	return static_branch_unlikely(&rsi_present);
+}
+
+static inline int rsi_set_memory_range(phys_addr_t start, phys_addr_t end,
+				       enum ripas state)
+{
+	unsigned long ret;
+	phys_addr_t top;
+
+	while (start != end) {
+		ret = rsi_set_addr_range_state(start, end, state, &top);
+		if (WARN_ON(ret || top < start || top > end))
+			return -EINVAL;
+		start = top;
+	}
+
+	return 0;
+}
+
+static inline int rsi_set_memory_range_protected(phys_addr_t start,
+						 phys_addr_t end)
+{
+	return rsi_set_memory_range(start, end, RSI_RIPAS_RAM);
+}
+
+static inline int rsi_set_memory_range_shared(phys_addr_t start,
+					      phys_addr_t end)
+{
+	return rsi_set_memory_range(start, end, RSI_RIPAS_EMPTY);
+}
+#endif
diff --git a/arch/arm64/include/asm/rsi_cmds.h b/arch/arm64/include/asm/rsi_cmds.h
index ad425c5d6f1b..ab8ad435f10e 100644
--- a/arch/arm64/include/asm/rsi_cmds.h
+++ b/arch/arm64/include/asm/rsi_cmds.h
@@ -10,6 +10,11 @@ 
 
 #include <asm/rsi_smc.h>
 
+enum ripas {
+	RSI_RIPAS_EMPTY,
+	RSI_RIPAS_RAM,
+};
+
 static inline void invoke_rsi_fn_smc_with_res(unsigned long function_id,
 					      unsigned long arg0,
 					      unsigned long arg1,
@@ -44,4 +49,21 @@  static inline unsigned long rsi_get_realm_config(struct realm_config *cfg)
 	return res.a0;
 }
 
+static inline unsigned long rsi_set_addr_range_state(phys_addr_t start,
+						     phys_addr_t end,
+						     enum ripas state,
+						     phys_addr_t *top)
+{
+	struct arm_smccc_res res;
+
+	invoke_rsi_fn_smc_with_res(SMC_RSI_IPA_STATE_SET,
+				   start, end, state, RSI_NO_CHANGE_DESTROYED,
+				   &res);
+
+	if (top)
+		*top = res.a1;
+
+	return res.a0;
+}
+
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 763824963ed1..a483b916ed11 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -33,7 +33,8 @@  obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
-			   syscall.o proton-pack.o idle.o patching.o pi/
+			   syscall.o proton-pack.o idle.o patching.o pi/	\
+			   rsi.o
 
 obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
 					   sys_compat.o
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
new file mode 100644
index 000000000000..3a992bdfd6bb
--- /dev/null
+++ b/arch/arm64/kernel/rsi.c
@@ -0,0 +1,69 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#include <linux/jump_label.h>
+#include <linux/memblock.h>
+#include <asm/rsi.h>
+
+DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
+EXPORT_SYMBOL(rsi_present);
+
+static bool rsi_version_matches(void)
+{
+	unsigned long ver_lower, ver_higher;
+	unsigned long ret = rsi_request_version(RSI_ABI_VERSION,
+						&ver_lower,
+						&ver_higher);
+
+	if (ret == SMCCC_RET_NOT_SUPPORTED)
+		return false;
+
+	if (ret != RSI_SUCCESS) {
+		pr_err("RME: RMM doesn't support RSI version %u.%u. Supported range: %lu.%lu-%lu.%lu\n",
+		       RSI_ABI_VERSION_MAJOR, RSI_ABI_VERSION_MINOR,
+		       RSI_ABI_VERSION_GET_MAJOR(ver_lower),
+		       RSI_ABI_VERSION_GET_MINOR(ver_lower),
+		       RSI_ABI_VERSION_GET_MAJOR(ver_higher),
+		       RSI_ABI_VERSION_GET_MINOR(ver_higher));
+		return false;
+	}
+
+	pr_info("RME: Using RSI version %lu.%lu\n",
+		RSI_ABI_VERSION_GET_MAJOR(ver_lower),
+		RSI_ABI_VERSION_GET_MINOR(ver_lower));
+
+	return true;
+}
+
+void __init arm64_rsi_setup_memory(void)
+{
+	u64 i;
+	phys_addr_t start, end;
+
+	if (!is_realm_world())
+		return;
+
+	/*
+	 * Iterate over the available memory ranges
+	 * and convert the state to protected memory.
+	 *
+	 * BUG_ON is used because if the attempt to switch the memory to
+	 * protected has failed here, then future accesses to the memory are
+	 * simply going to be reflected as a fault which we can't handle.
+	 * Bailing out early prevents the guest limping on and dieing later.
+	 */
+	for_each_mem_range(i, &start, &end) {
+		BUG_ON(rsi_set_memory_range_protected(start, end));
+	}
+}
+
+void __init arm64_rsi_init(void)
+{
+	if (!rsi_version_matches())
+		return;
+
+	static_branch_enable(&rsi_present);
+}
+
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index a096e2451044..143f87615af0 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -43,6 +43,7 @@ 
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
 #include <asm/numa.h>
+#include <asm/rsi.h>
 #include <asm/scs.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -293,6 +294,11 @@  void __init __no_sanitize_address setup_arch(char **cmdline_p)
 	 * cpufeature code and early parameters.
 	 */
 	jump_label_init();
+	/*
+	 * Init RSI before early param so that "earlycon" console uses the
+	 * shared alias when in a realm
+	 */
+	arm64_rsi_init();
 	parse_early_param();
 
 	dynamic_scs_init();
@@ -328,6 +334,8 @@  void __init __no_sanitize_address setup_arch(char **cmdline_p)
 
 	arm64_memblock_init();
 
+	arm64_rsi_setup_memory();
+
 	paging_init();
 
 	acpi_table_upgrade();
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 9b5ab6818f7f..9d8d38e3bee2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -41,6 +41,7 @@ 
 #include <asm/kvm_host.h>
 #include <asm/memory.h>
 #include <asm/numa.h>
+#include <asm/rsi.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <linux/sizes.h>