diff mbox series

[v9,1/7] smccc: Add HVC call variant with result registers other than 0 thru 3

Message ID 1615233439-23346-2-git-send-email-mikelley@microsoft.com (mailing list archive)
State New, archived
Headers show
Series Enable Linux guests on Hyper-V on ARM64 | expand

Commit Message

Michael Kelley (LINUX) March 8, 2021, 7:57 p.m. UTC
Hypercalls to Hyper-V on ARM64 may return results in registers other
than X0 thru X3, as permitted by the SMCCC spec version 1.2 and later.
Accommodate this by adding a variant of arm_smccc_1_1_hvc that allows
the caller to specify which 3 registers are returned in addition to X0.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
There are several ways to support returning results from registers
other than X0 thru X3, and Hyper-V usage should be compatible with
whatever the maintainers prefer.  What's implemented in this patch
may be the most flexible, but it has the downside of not being a
true function interface in that args 0 thru 2 must be fixed strings,
and not general "C" expressions.

Other alternatives include:
* Create a variant that hard codes to return X5 thru X7, though
  in the future there may be Hyper-V hypercalls that need a
  different hard-coded variant.
* Return all of X0 thru X7 in a larger result structure. That
  approach may execute more memory stores, but performance is unlikely
  to be an issue for the Hyper-V hypercalls that would use it.
  However, it's possible in the future that Hyper-V results might
  be beyond X7, as allowed by the SMCCC v1.3 spec.
* The macro __arm_smccc_1_1() could be cloned in Hyper-V specific
  code and modified to meet Hyper-V specific needs, but this seems
  undesirable since Hyper-V is operating within the v1.2 spec.

In any of these cases, the call might be renamed from "_1_1_" to
"_1_2_" to reflect conformance to the later spec version.


 include/linux/arm-smccc.h | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

Comments

Mark Rutland March 24, 2021, 4:55 p.m. UTC | #1
Hi Michael,

On Mon, Mar 08, 2021 at 11:57:13AM -0800, Michael Kelley wrote:
> Hypercalls to Hyper-V on ARM64 may return results in registers other
> than X0 thru X3, as permitted by the SMCCC spec version 1.2 and later.
> Accommodate this by adding a variant of arm_smccc_1_1_hvc that allows
> the caller to specify which 3 registers are returned in addition to X0.
> 
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> ---
> There are several ways to support returning results from registers
> other than X0 thru X3, and Hyper-V usage should be compatible with
> whatever the maintainers prefer.  What's implemented in this patch
> may be the most flexible, but it has the downside of not being a
> true function interface in that args 0 thru 2 must be fixed strings,
> and not general "C" expressions.

For the benefit of others here, SMCCCv1.2 allows:

* SMC64/HVC64 to use all of x1-x17 for both parameters and return values
* SMC32/HVC32 to use all of r1-r7 for both parameters and return values

The rationale for this was to make it possible to pass a large number of
arguments in one call without the hypervisor/firmware needing to access
the memory of the caller.

My preference would be to add arm_smccc_1_2_{hvc,smc}() assembly
functions which read all the permitted argument registers from a struct,
and write all the permitted result registers to a struct, leaving it to
callers to set those up and decompose them.

That way we only have to write one implementation that all callers can
use, which'll be far easier to maintain. I suspect that in general the
cost of temporarily bouncing the values through memory will be dominated
by whatever the hypervisor/firmware is going to do, and if it's not we
can optimize that away in future.

> Other alternatives include:
> * Create a variant that hard codes to return X5 thru X7, though
>   in the future there may be Hyper-V hypercalls that need a
>   different hard-coded variant.
> * Return all of X0 thru X7 in a larger result structure. That
>   approach may execute more memory stores, but performance is unlikely
>   to be an issue for the Hyper-V hypercalls that would use it.
>   However, it's possible in the future that Hyper-V results might
>   be beyond X7, as allowed by the SMCCC v1.3 spec.

As above, something of this sort would be my preferred approach.

Thanks,
Mark.

> * The macro __arm_smccc_1_1() could be cloned in Hyper-V specific
>   code and modified to meet Hyper-V specific needs, but this seems
>   undesirable since Hyper-V is operating within the v1.2 spec.
> 
> In any of these cases, the call might be renamed from "_1_1_" to
> "_1_2_" to reflect conformance to the later spec version.
> 
> 
>  include/linux/arm-smccc.h | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index f860645..acda958 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -300,12 +300,12 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
>   * entitled to optimise the whole sequence away. "volatile" is what
>   * makes it stick.
>   */
> -#define __arm_smccc_1_1(inst, ...)					\
> +#define __arm_smccc_1_1(inst, reg1, reg2, reg3, ...)			\
>  	do {								\
>  		register unsigned long r0 asm("r0");			\
> -		register unsigned long r1 asm("r1");			\
> -		register unsigned long r2 asm("r2");			\
> -		register unsigned long r3 asm("r3"); 			\
> +		register unsigned long r1 asm(reg1);			\
> +		register unsigned long r2 asm(reg2);			\
> +		register unsigned long r3 asm(reg3);			\
>  		__declare_args(__count_args(__VA_ARGS__), __VA_ARGS__);	\
>  		asm volatile(inst "\n" :				\
>  			     "=r" (r0), "=r" (r1), "=r" (r2), "=r" (r3)	\
> @@ -328,7 +328,8 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
>   * to the SMC instruction. The return values are updated with the content
>   * from register 0 to 3 on return from the SMC instruction if not NULL.
>   */
> -#define arm_smccc_1_1_smc(...)	__arm_smccc_1_1(SMCCC_SMC_INST, __VA_ARGS__)
> +#define arm_smccc_1_1_smc(...)\
> +	__arm_smccc_1_1(SMCCC_SMC_INST, "r1", "r2", "r3", __VA_ARGS__)
>  
>  /*
>   * arm_smccc_1_1_hvc() - make an SMCCC v1.1 compliant HVC call
> @@ -344,7 +345,23 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
>   * to the HVC instruction. The return values are updated with the content
>   * from register 0 to 3 on return from the HVC instruction if not NULL.
>   */
> -#define arm_smccc_1_1_hvc(...)	__arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
> +#define arm_smccc_1_1_hvc(...) \
> +	__arm_smccc_1_1(SMCCC_HVC_INST, "r1", "r2", "r3", __VA_ARGS__)
> +
> +/*
> + * arm_smccc_1_1_hvc_reg() - make an SMCCC v1.1 compliant HVC call
> + * specifying output registers
> + *
> + * This is a variant of arm_smccc_1_1_hvc() that allows specifying
> + * three registers from which result values will be returned in
> + * addition to r0.
> + *
> + * @a0-a2: register specifications for 3 return registers (e.g., "r5")
> + * @a3-a10: arguments passed in registers 0 to 7
> + * @res: result values from register 0 and the three registers specified
> + * in a0-a2.
> + */
> +#define arm_smccc_1_1_hvc_reg(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
>  
>  /*
>   * Like arm_smccc_1_1* but always returns SMCCC_RET_NOT_SUPPORTED.
> -- 
> 1.8.3.1
>
Michael Kelley (LINUX) March 25, 2021, 4:55 a.m. UTC | #2
From: Mark Rutland <mark.rutland@arm.com> Sent: Wednesday, March 24, 2021 9:55 AM
> 
> Hi Michael,
> 
> On Mon, Mar 08, 2021 at 11:57:13AM -0800, Michael Kelley wrote:
> > Hypercalls to Hyper-V on ARM64 may return results in registers other
> > than X0 thru X3, as permitted by the SMCCC spec version 1.2 and later.
> > Accommodate this by adding a variant of arm_smccc_1_1_hvc that allows
> > the caller to specify which 3 registers are returned in addition to X0.
> >
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > ---
> > There are several ways to support returning results from registers
> > other than X0 thru X3, and Hyper-V usage should be compatible with
> > whatever the maintainers prefer.  What's implemented in this patch
> > may be the most flexible, but it has the downside of not being a
> > true function interface in that args 0 thru 2 must be fixed strings,
> > and not general "C" expressions.
> 
> For the benefit of others here, SMCCCv1.2 allows:
> 
> * SMC64/HVC64 to use all of x1-x17 for both parameters and return values
> * SMC32/HVC32 to use all of r1-r7 for both parameters and return values
> 
> The rationale for this was to make it possible to pass a large number of
> arguments in one call without the hypervisor/firmware needing to access
> the memory of the caller.
> 
> My preference would be to add arm_smccc_1_2_{hvc,smc}() assembly
> functions which read all the permitted argument registers from a struct,
> and write all the permitted result registers to a struct, leaving it to
> callers to set those up and decompose them.
> 
> That way we only have to write one implementation that all callers can
> use, which'll be far easier to maintain. I suspect that in general the
> cost of temporarily bouncing the values through memory will be dominated
> by whatever the hypervisor/firmware is going to do, and if it's not we
> can optimize that away in future.
> 

Thanks for the feedback, and I'm working on implementing this approach.
But I've hit a snag in that gcc limits the "asm" statement to 30 arguments,
which gives us 15 registers as parameters and 15 registers as return
values, instead of the 18 each allowed by SMCCC v1.2.  I will continue
with the 15 register limit for now, unless someone knows a way to exceed
that.  The alternative would be to go to pure assembly language.

I'll post a standalone RFC patch when I have something that works.  My
C pre-processor wizardry is limited, so others will probably know some
tricks that can improve on my first cut.

Michael
Mark Rutland March 25, 2021, 9:56 a.m. UTC | #3
On Thu, Mar 25, 2021 at 04:55:51AM +0000, Michael Kelley wrote:
> From: Mark Rutland <mark.rutland@arm.com> Sent: Wednesday, March 24, 2021 9:55 AM
> > For the benefit of others here, SMCCCv1.2 allows:
> > 
> > * SMC64/HVC64 to use all of x1-x17 for both parameters and return values
> > * SMC32/HVC32 to use all of r1-r7 for both parameters and return values
> > 
> > The rationale for this was to make it possible to pass a large number of
> > arguments in one call without the hypervisor/firmware needing to access
> > the memory of the caller.
> > 
> > My preference would be to add arm_smccc_1_2_{hvc,smc}() assembly
> > functions which read all the permitted argument registers from a struct,
> > and write all the permitted result registers to a struct, leaving it to
> > callers to set those up and decompose them.
> > 
> > That way we only have to write one implementation that all callers can
> > use, which'll be far easier to maintain. I suspect that in general the
> > cost of temporarily bouncing the values through memory will be dominated
> > by whatever the hypervisor/firmware is going to do, and if it's not we
> > can optimize that away in future.
> 
> Thanks for the feedback, and I'm working on implementing this approach.
> But I've hit a snag in that gcc limits the "asm" statement to 30 arguments,
> which gives us 15 registers as parameters and 15 registers as return
> values, instead of the 18 each allowed by SMCCC v1.2.  I will continue
> with the 15 register limit for now, unless someone knows a way to exceed
> that.  The alternative would be to go to pure assembly language.

I realise in retrospect this is not clear, but when I said "assembly
functions" I had meant raw assembly functions rather than inline
assembly.

We already have __arm_smccc_smc and __arm_smccc_hvc assembly functions
in arch/{arm,arm64}/kernel/smccc-call.S, and I'd expected we'd add the
full fat SMCCCv1.2 variants there.

Thanks,
Mark.
Michael Kelley (LINUX) March 25, 2021, 5:19 p.m. UTC | #4
From: Mark Rutland <mark.rutland@arm.com> Sent: Thursday, March 25, 2021 2:56 AM
> 
> On Thu, Mar 25, 2021 at 04:55:51AM +0000, Michael Kelley wrote:
> > From: Mark Rutland <mark.rutland@arm.com> Sent: Wednesday, March 24, 2021 9:55 AM
> > > For the benefit of others here, SMCCCv1.2 allows:
> > >
> > > * SMC64/HVC64 to use all of x1-x17 for both parameters and return values
> > > * SMC32/HVC32 to use all of r1-r7 for both parameters and return values
> > >
> > > The rationale for this was to make it possible to pass a large number of
> > > arguments in one call without the hypervisor/firmware needing to access
> > > the memory of the caller.
> > >
> > > My preference would be to add arm_smccc_1_2_{hvc,smc}() assembly
> > > functions which read all the permitted argument registers from a struct,
> > > and write all the permitted result registers to a struct, leaving it to
> > > callers to set those up and decompose them.
> > >
> > > That way we only have to write one implementation that all callers can
> > > use, which'll be far easier to maintain. I suspect that in general the
> > > cost of temporarily bouncing the values through memory will be dominated
> > > by whatever the hypervisor/firmware is going to do, and if it's not we
> > > can optimize that away in future.
> >
> > Thanks for the feedback, and I'm working on implementing this approach.
> > But I've hit a snag in that gcc limits the "asm" statement to 30 arguments,
> > which gives us 15 registers as parameters and 15 registers as return
> > values, instead of the 18 each allowed by SMCCC v1.2.  I will continue
> > with the 15 register limit for now, unless someone knows a way to exceed
> > that.  The alternative would be to go to pure assembly language.
> 
> I realise in retrospect this is not clear, but when I said "assembly
> functions" I had meant raw assembly functions rather than inline
> assembly.
> 
> We already have __arm_smccc_smc and __arm_smccc_hvc assembly functions
> in arch/{arm,arm64}/kernel/smccc-call.S, and I'd expected we'd add the
> full fat SMCCCv1.2 variants there.
> 

FWIW, here's an inline assembly version that I have working.  On the plus
side, gcc does a decent job of optimizing.  It doesn't store to memory any
result registers that aren't consumed by the caller.  On the downside, it's
limited to 15 args and 15 results as noted previously.  So it doesn't meet
your goal of fully implementing the v1.2 spec.  Also, all 15 input arguments
must be initialized or gcc complains about using uninitialized values.

This version should handle both the 32-bit and 64-bit worlds, though I've
only tested in the 64-bit world.

I've made the input and output structures be arrays rather than listing
each register as a separate field.  Either approach should work, and I'm not
sure what the tradeoffs are.

But if building on what Sudeep Holla has already done with raw assembly
is preferred, I'm OK with that as well.

Michael

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index acda958..e98cf07 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -408,5 +408,112 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
 		method;							\
 	})
 
+
+#ifdef CONFIG_ARM64
+#define SMCCC_1_2_REG_COUNT	18
+
+#define ARM64_ADDITIONAL_RESULT_DECLARATIONS				\
+		register unsigned long r8 asm("r8");			\
+		register unsigned long r9 asm("r9");			\
+		register unsigned long r10 asm("r10");			\
+		register unsigned long r11 asm("r11");			\
+		register unsigned long r12 asm("r12");			\
+		register unsigned long r13 asm("r13");			\
+		register unsigned long r14 asm("r14");
+
+#define ARM64_ADDITIONAL_ARG_DECLARATIONS				\
+		register unsigned long arg8 asm("r8") = __a->reg[8];	\
+		register unsigned long arg9 asm("r9") = __a->reg[9];	\
+		register unsigned long arg10 asm("r10") = __a->reg[10];	\
+		register unsigned long arg11 asm("r11") = __a->reg[11];	\
+		register unsigned long arg12 asm("r12") = __a->reg[12];	\
+		register unsigned long arg13 asm("r13") = __a->reg[13];	\
+		register unsigned long arg14 asm("r14") = __a->reg[14];
+
+#define ARM64_ADDITIONAL_OUTPUT_REGS					\
+			, "=r" (r8), "=r" (r9), "=r" (r10), "=r" (r11),	\
+			"=r" (r12), "=r" (r13), "=r" (r14)
+
+#define ARM64_ADDITIONAL_INPUT_REGS					\
+			, "r" (arg8), "r" (arg9), "r" (arg10),		\
+			"r" (arg11), "r" (arg12), "r" (arg13),		\
+			"r" (arg14)
+
+#define ARM64_ADDITIONAL_RESULTS					\
+			__r->reg[8] = r8;				\
+			__r->reg[9] = r9;				\
+			__r->reg[10] = r10;				\
+			__r->reg[11] = r11;				\
+			__r->reg[12] = r12;				\
+			__r->reg[13] = r13;				\
+			__r->reg[14] = r14;
+
+#else /* CONFIG_ARM64 */
+
+#define SMCCC_1_2_REG_COUNT	8
+#define ARM64_ADDITIONAL_RESULT_DECLARATIONS
+#define ARM64_ADDITIONAL_ARG_DECLARATIONS
+#define ARM64_ADDITIONAL_OUTPUT_REGS
+#define ARM64_ADDITIONAL_INPUT_REGS
+#define ARM64_ADDITIONAL_RESULTS
+
+#endif /* CONFIG_ARM64 */
+
+struct arm_smccc_1_2_regs {
+	unsigned long reg[SMCCC_1_2_REG_COUNT];
+};
+
+
+#define __arm_smccc_1_2(inst, args, res)				\
+	do {								\
+		struct arm_smccc_1_2_regs * __a = args;			\
+		struct arm_smccc_1_2_regs * __r = res;			\
+		register unsigned long r0 asm("r0");			\
+		register unsigned long r1 asm("r1");			\
+		register unsigned long r2 asm("r2");			\
+		register unsigned long r3 asm("r3");			\
+		register unsigned long r4 asm("r4");			\
+		register unsigned long r5 asm("r5");			\
+		register unsigned long r6 asm("r6");			\
+		register unsigned long r7 asm("r7");			\
+		ARM64_ADDITIONAL_RESULT_DECLARATIONS			\
+		register unsigned long arg0 asm("r0") = (u32)(__a->reg[0]); \
+		register unsigned long arg1 asm("r1") = __a->reg[1];	\
+		register unsigned long arg2 asm("r2") = __a->reg[2];	\
+		register unsigned long arg3 asm("r3") = __a->reg[3];	\
+		register unsigned long arg4 asm("r4") = __a->reg[4];	\
+		register unsigned long arg5 asm("r5") = __a->reg[5];	\
+		register unsigned long arg6 asm("r6") = __a->reg[6];	\
+		register unsigned long arg7 asm("r7") = __a->reg[7];	\
+		ARM64_ADDITIONAL_ARG_DECLARATIONS			\
+		asm volatile(inst "\n" :				\
+			"=r" (r0), "=r" (r1), "=r" (r2), "=r" (r3),	\
+			"=r" (r4), "=r" (r5), "=r" (r6), "=r" (r7)	\
+			ARM64_ADDITIONAL_OUTPUT_REGS :			\
+			"r" (arg0), "r" (arg1), "r" (arg2),		\
+			"r" (arg3), "r" (arg4), "r" (arg5),		\
+			"r" (arg6), "r" (arg7)				\
+			ARM64_ADDITIONAL_INPUT_REGS :			\
+			"memory");					\
+		if(__r) {						\
+			__r->reg[0] = r0;				\
+			__r->reg[1] = r1;				\
+			__r->reg[2] = r2;				\
+			__r->reg[3] = r3;				\
+			__r->reg[4] = r4;				\
+			__r->reg[5] = r5;				\
+			__r->reg[6] = r6;				\
+			__r->reg[7] = r7;				\
+			ARM64_ADDITIONAL_RESULTS			\
+		}							\
+	} while (0)
+
+#define arm_smccc_1_2_smc(args, res)					\
+	__arm_smccc_1_2(SMCCC_SMC_INST, args, res)
+
+#define arm_smccc_1_2_hvc(args, res)					\
+	__arm_smccc_1_2(SMCCC_HVC_INST, args, res)
+
+
 #endif /*__ASSEMBLY__*/
 #endif /*__LINUX_ARM_SMCCC_H*/
diff mbox series

Patch

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index f860645..acda958 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -300,12 +300,12 @@  asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
  * entitled to optimise the whole sequence away. "volatile" is what
  * makes it stick.
  */
-#define __arm_smccc_1_1(inst, ...)					\
+#define __arm_smccc_1_1(inst, reg1, reg2, reg3, ...)			\
 	do {								\
 		register unsigned long r0 asm("r0");			\
-		register unsigned long r1 asm("r1");			\
-		register unsigned long r2 asm("r2");			\
-		register unsigned long r3 asm("r3"); 			\
+		register unsigned long r1 asm(reg1);			\
+		register unsigned long r2 asm(reg2);			\
+		register unsigned long r3 asm(reg3);			\
 		__declare_args(__count_args(__VA_ARGS__), __VA_ARGS__);	\
 		asm volatile(inst "\n" :				\
 			     "=r" (r0), "=r" (r1), "=r" (r2), "=r" (r3)	\
@@ -328,7 +328,8 @@  asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
  * to the SMC instruction. The return values are updated with the content
  * from register 0 to 3 on return from the SMC instruction if not NULL.
  */
-#define arm_smccc_1_1_smc(...)	__arm_smccc_1_1(SMCCC_SMC_INST, __VA_ARGS__)
+#define arm_smccc_1_1_smc(...)\
+	__arm_smccc_1_1(SMCCC_SMC_INST, "r1", "r2", "r3", __VA_ARGS__)
 
 /*
  * arm_smccc_1_1_hvc() - make an SMCCC v1.1 compliant HVC call
@@ -344,7 +345,23 @@  asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
  * to the HVC instruction. The return values are updated with the content
  * from register 0 to 3 on return from the HVC instruction if not NULL.
  */
-#define arm_smccc_1_1_hvc(...)	__arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
+#define arm_smccc_1_1_hvc(...) \
+	__arm_smccc_1_1(SMCCC_HVC_INST, "r1", "r2", "r3", __VA_ARGS__)
+
+/*
+ * arm_smccc_1_1_hvc_reg() - make an SMCCC v1.1 compliant HVC call
+ * specifying output registers
+ *
+ * This is a variant of arm_smccc_1_1_hvc() that allows specifying
+ * three registers from which result values will be returned in
+ * addition to r0.
+ *
+ * @a0-a2: register specifications for 3 return registers (e.g., "r5")
+ * @a3-a10: arguments passed in registers 0 to 7
+ * @res: result values from register 0 and the three registers specified
+ * in a0-a2.
+ */
+#define arm_smccc_1_1_hvc_reg(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
 
 /*
  * Like arm_smccc_1_1* but always returns SMCCC_RET_NOT_SUPPORTED.