diff mbox series

[v3] kselftest/arm64: Exit streaming mode after collecting signal context

Message ID 20230720-arm64-signal-memcpy-fix-v3-1-08aed2385d68@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v3] kselftest/arm64: Exit streaming mode after collecting signal context | expand

Commit Message

Mark Brown July 20, 2023, 6:42 p.m. UTC
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.

For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.

After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.

Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
Changes in v3:
- Open code OPTIMISER_HIDE_VAR() instead of the memory clobber.
- Link to v2: https://lore.kernel.org/r/20230712-arm64-signal-memcpy-fix-v2-1-494f7025caf6@kernel.org

Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829e@kernel.org
---
 .../selftests/arm64/signal/test_signals_utils.h    | 25 +++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)


---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10

Best regards,

Comments

Will Deacon July 27, 2023, 3:50 p.m. UTC | #1
On Thu, Jul 20, 2023 at 07:42:09PM +0100, Mark Brown wrote:
> When we collect a signal context with one of the SME modes enabled we will
> have enabled that mode behind the compiler and libc's back so they may
> issue some instructions not valid in streaming mode, causing spurious
> failures.
> 
> For the code prior to issuing the BRK to trigger signal handling we need to
> stay in streaming mode if we were already there since that's a part of the
> signal context the caller is trying to collect. Unfortunately this code
> includes a memset() which is likely to be heavily optimised and is likely
> to use FP instructions incompatible with streaming mode. We can avoid this
> happening by open coding the memset(), inserting a volatile assembly
> statement to avoid the compiler recognising what's being done and doing
> something in optimisation. This code is not performance critical so the
> inefficiency should not be an issue.
> 
> After collecting the context we can simply exit streaming mode, avoiding
> these issues. Use a full SMSTOP for safety to prevent any issues appearing
> with ZA.
> 
> Reported-by: Will Deacon <will@kernel.org>
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
> Changes in v3:
> - Open code OPTIMISER_HIDE_VAR() instead of the memory clobber.
> - Link to v2: https://lore.kernel.org/r/20230712-arm64-signal-memcpy-fix-v2-1-494f7025caf6@kernel.org
> 
> Changes in v2:
> - Rebase onto v6.5-rc1.
> - Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829e@kernel.org
> ---
>  .../selftests/arm64/signal/test_signals_utils.h    | 25 +++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
> index 222093f51b67..c7f5627171dd 100644
> --- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
> +++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
> @@ -60,13 +60,25 @@ static __always_inline bool get_current_context(struct tdescr *td,
>  						size_t dest_sz)
>  {
>  	static volatile bool seen_already;
> +	int i;
> +	char *uc = (char *)dest_uc;
>  
>  	assert(td && dest_uc);
>  	/* it's a genuine invocation..reinit */
>  	seen_already = 0;
>  	td->live_uc_valid = 0;
>  	td->live_sz = dest_sz;
> -	memset(dest_uc, 0x00, td->live_sz);
> +
> +	/*
> +	 * This is a memset() but we don't want the compiler to
> +	 * optimise it into either instructions or a library call
> +	 * which might be incompatible with streaming mode.
> +	 */
> +	for (i = 0; i < td->live_sz; i++) {
> +		uc[i] = 0;
> +		__asm__ ("" : "=r" (uc[i]) : "0" (uc[i]));
> +	}

Looking more closely at this, I see that the bpf and kvm selftests are
able to include <linux/compiler.h> directly, so I don't see why we need
to open-code this here. I also spotted that we've *already* written our
own version of this as the 'curse()' macro in selftests/arm64/bti/compiler.h!

So I think we should either use linux/compiler.h or make our curse macro
usable to all the arm64 selftests.

What do you think?

Will
Mark Brown July 27, 2023, 4:01 p.m. UTC | #2
On Thu, Jul 27, 2023 at 04:50:10PM +0100, Will Deacon wrote:

> Looking more closely at this, I see that the bpf and kvm selftests are
> able to include <linux/compiler.h> directly, so I don't see why we need

Hrm, it does seem like that's on the specific list of headers we can get
at.  I'm kind of surprised that it's not got dependencies that upset
things.

> to open-code this here. I also spotted that we've *already* written our
> own version of this as the 'curse()' macro in selftests/arm64/bti/compiler.h!

That one wasn't originally a kselftst so it'll just have pulled in
something from completely outside the kernel build systems.
Mark Brown July 27, 2023, 4:30 p.m. UTC | #3
On Thu, Jul 27, 2023 at 04:50:10PM +0100, Will Deacon wrote:

> Looking more closely at this, I see that the bpf and kvm selftests are
> able to include <linux/compiler.h> directly, so I don't see why we need
> to open-code this here. I also spotted that we've *already* written our
> own version of this as the 'curse()' macro in selftests/arm64/bti/compiler.h!

> So I think we should either use linux/compiler.h or make our curse macro
> usable to all the arm64 selftests.

Ah, actually it's not including the linux/compiler.h you might be
thinking of - it's including an entirely unrelated linux/compiler.h
which can be found in tools/include and contains a much smaller set of
things.  We can port the macro over and try to do the dance to pull in
the tools/include headers, hopefully that's not too fragile with the
various combinations of build options and the additional complexity of 
the arm64 tests having an additional layer of directories.

I'd be happier doing this as a fix and then touching the build system in
-next (I've got the build system changes now).  It should be fine but
the build system has changed over time and is prone to fragility - I
can see people wanting to pull the fix back to whatever kernel distros
are using and running into trouble with a more invasive change.

I was going to mention that kselftest had to define it's own
ARRAY_SIZE()...
diff mbox series

Patch

diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..c7f5627171dd 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,25 @@  static __always_inline bool get_current_context(struct tdescr *td,
 						size_t dest_sz)
 {
 	static volatile bool seen_already;
+	int i;
+	char *uc = (char *)dest_uc;
 
 	assert(td && dest_uc);
 	/* it's a genuine invocation..reinit */
 	seen_already = 0;
 	td->live_uc_valid = 0;
 	td->live_sz = dest_sz;
-	memset(dest_uc, 0x00, td->live_sz);
+
+	/*
+	 * This is a memset() but we don't want the compiler to
+	 * optimise it into either instructions or a library call
+	 * which might be incompatible with streaming mode.
+	 */
+	for (i = 0; i < td->live_sz; i++) {
+		uc[i] = 0;
+		__asm__ ("" : "=r" (uc[i]) : "0" (uc[i]));
+	}
+
 	td->live_uc = dest_uc;
 	/*
 	 * Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +115,17 @@  static __always_inline bool get_current_context(struct tdescr *td,
 		      :
 		      : "memory");
 
+	/*
+	 * If we were grabbing a streaming mode context then we may
+	 * have entered streaming mode behind the system's back and
+	 * libc or compiler generated code might decide to do
+	 * something invalid in streaming mode, or potentially even
+	 * the state of ZA.  Issue a SMSTOP to exit both now we have
+	 * grabbed the state.
+	 */
+	if (td->feats_supported & FEAT_SME)
+		asm volatile("msr S0_3_C4_C6_3, xzr");
+
 	/*
 	 * If we get here with seen_already==1 it implies the td->live_uc
 	 * context has been used to get back here....this probably means