diff mbox series

[RFC,1/1] Revert "rseq/selftests: arm: use udf instruction for RSEQ_SIG"

Message ID 20190617152304.23371-1-mathieu.desnoyers@efficios.com (mailing list archive)
State New
Headers show
Series [RFC,1/1] Revert "rseq/selftests: arm: use udf instruction for RSEQ_SIG" | expand

Commit Message

Mathieu Desnoyers June 17, 2019, 3:23 p.m. UTC
This reverts commit 2b845d4b4acd9422bbb668989db8dc36dfc8f438.

That commit introduces build issues for programs compiled in Thumb mode.
Rather than try to be clever and emit a valid trap instruction on arm32,
which requires special care about big/little endian handling on that
architecture, just emit plain data. Data in the instruction stream is
technically expected on arm32: this is how literal pools are
implemented. Reverting to the prior behavior does exactly that.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Joel Fernandes <joelaf@google.com>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Andi Kleen <andi@firstfloor.org>
CC: linux-kselftest@vger.kernel.org
CC: "H . Peter Anvin" <hpa@zytor.com>
CC: Chris Lameter <cl@linux.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
CC: Paul Turner <pjt@google.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ben Maurer <bmaurer@fb.com>
CC: linux-api@vger.kernel.org
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
---
 tools/testing/selftests/rseq/rseq-arm.h | 52 ++-------------------------------
 1 file changed, 2 insertions(+), 50 deletions(-)

Comments

Will Deacon June 24, 2019, 5:24 p.m. UTC | #1
On Mon, Jun 17, 2019 at 05:23:04PM +0200, Mathieu Desnoyers wrote:
> This reverts commit 2b845d4b4acd9422bbb668989db8dc36dfc8f438.
> 
> That commit introduces build issues for programs compiled in Thumb mode.
> Rather than try to be clever and emit a valid trap instruction on arm32,
> which requires special care about big/little endian handling on that
> architecture, just emit plain data. Data in the instruction stream is
> technically expected on arm32: this is how literal pools are
> implemented. Reverting to the prior behavior does exactly that.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Joel Fernandes <joelaf@google.com>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: linux-kselftest@vger.kernel.org
> CC: "H . Peter Anvin" <hpa@zytor.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Michael Kerrisk <mtk.manpages@gmail.com>
> CC: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Paul Turner <pjt@google.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: linux-api@vger.kernel.org
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Carlos O'Donell <carlos@redhat.com>
> CC: Florian Weimer <fweimer@redhat.com>
> ---
>  tools/testing/selftests/rseq/rseq-arm.h | 52 ++-------------------------------
>  1 file changed, 2 insertions(+), 50 deletions(-)
> 
> diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
> index 84f28f147fb6..5f262c54364f 100644
> --- a/tools/testing/selftests/rseq/rseq-arm.h
> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> @@ -5,54 +5,7 @@
>   * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>   */
>  
> -/*
> - * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
> - * value 0x5de3. This traps if user-space reaches this instruction by mistake,
> - * and the uncommon operand ensures the kernel does not move the instruction
> - * pointer to attacker-controlled code on rseq abort.
> - *
> - * The instruction pattern in the A32 instruction set is:
> - *
> - * e7f5def3    udf    #24035    ; 0x5de3
> - *
> - * This translates to the following instruction pattern in the T16 instruction
> - * set:
> - *
> - * little endian:
> - * def3        udf    #243      ; 0xf3
> - * e7f5        b.n    <7f5>
> - *
> - * pre-ARMv6 big endian code:
> - * e7f5        b.n    <7f5>
> - * def3        udf    #243      ; 0xf3
> - *
> - * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
> - * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
> - * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
> - * (which match), so there is no need to reverse the endianness of the data
> - * representation of the signature. However, the choice between BE32 and BE8
> - * is done by the linker, so we cannot know whether code and data endianness
> - * will be mixed before the linker is invoked.
> - */
> -
> -#define RSEQ_SIG_CODE	0xe7f5def3
> -
> -#ifndef __ASSEMBLER__
> -
> -#define RSEQ_SIG_DATA							\
> -	({								\
> -		int sig;						\
> -		asm volatile ("b 2f\n\t"				\
> -			      "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
> -			      "2:\n\t"					\
> -			      "ldr %[sig], 1b\n\t"			\
> -			      : [sig] "=r" (sig));			\
> -		sig;							\
> -	})
> -
> -#define RSEQ_SIG	RSEQ_SIG_DATA
> -
> -#endif
> +#define RSEQ_SIG	0x53053053

I don't get why you're reverting back to this old signature value, when the
one we came up with will work well when interpreted as an instruction in the
*vast* majority of scenarios that people care about (A32/T32 little-endian).
I think you might be under-estimating just how dead things like BE32 really
are.

That said, when you ran into .inst.n/.inst.w issues, did you try something
along the lines of the WASM() macro we use in arch/arm/, which adds the ".w"
suffix when targetting Thumb?

Will
Mathieu Desnoyers June 24, 2019, 6:20 p.m. UTC | #2
----- On Jun 24, 2019, at 1:24 PM, Will Deacon will.deacon@arm.com wrote:

> On Mon, Jun 17, 2019 at 05:23:04PM +0200, Mathieu Desnoyers wrote:
>> This reverts commit 2b845d4b4acd9422bbb668989db8dc36dfc8f438.
>> 
>> That commit introduces build issues for programs compiled in Thumb mode.
>> Rather than try to be clever and emit a valid trap instruction on arm32,
>> which requires special care about big/little endian handling on that
>> architecture, just emit plain data. Data in the instruction stream is
>> technically expected on arm32: this is how literal pools are
>> implemented. Reverting to the prior behavior does exactly that.
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Joel Fernandes <joelaf@google.com>
>> CC: Catalin Marinas <catalin.marinas@arm.com>
>> CC: Dave Watson <davejwatson@fb.com>
>> CC: Will Deacon <will.deacon@arm.com>
>> CC: Shuah Khan <shuah@kernel.org>
>> CC: Andi Kleen <andi@firstfloor.org>
>> CC: linux-kselftest@vger.kernel.org
>> CC: "H . Peter Anvin" <hpa@zytor.com>
>> CC: Chris Lameter <cl@linux.com>
>> CC: Russell King <linux@arm.linux.org.uk>
>> CC: Michael Kerrisk <mtk.manpages@gmail.com>
>> CC: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>> CC: Paul Turner <pjt@google.com>
>> CC: Boqun Feng <boqun.feng@gmail.com>
>> CC: Josh Triplett <josh@joshtriplett.org>
>> CC: Steven Rostedt <rostedt@goodmis.org>
>> CC: Ben Maurer <bmaurer@fb.com>
>> CC: linux-api@vger.kernel.org
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Linus Torvalds <torvalds@linux-foundation.org>
>> CC: Carlos O'Donell <carlos@redhat.com>
>> CC: Florian Weimer <fweimer@redhat.com>
>> ---
>>  tools/testing/selftests/rseq/rseq-arm.h | 52 ++-------------------------------
>>  1 file changed, 2 insertions(+), 50 deletions(-)
>> 
>> diff --git a/tools/testing/selftests/rseq/rseq-arm.h
>> b/tools/testing/selftests/rseq/rseq-arm.h
>> index 84f28f147fb6..5f262c54364f 100644
>> --- a/tools/testing/selftests/rseq/rseq-arm.h
>> +++ b/tools/testing/selftests/rseq/rseq-arm.h
>> @@ -5,54 +5,7 @@
>>   * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>   */
>>  
>> -/*
>> - * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
>> - * value 0x5de3. This traps if user-space reaches this instruction by mistake,
>> - * and the uncommon operand ensures the kernel does not move the instruction
>> - * pointer to attacker-controlled code on rseq abort.
>> - *
>> - * The instruction pattern in the A32 instruction set is:
>> - *
>> - * e7f5def3    udf    #24035    ; 0x5de3
>> - *
>> - * This translates to the following instruction pattern in the T16 instruction
>> - * set:
>> - *
>> - * little endian:
>> - * def3        udf    #243      ; 0xf3
>> - * e7f5        b.n    <7f5>
>> - *
>> - * pre-ARMv6 big endian code:
>> - * e7f5        b.n    <7f5>
>> - * def3        udf    #243      ; 0xf3
>> - *
>> - * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
>> - * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
>> - * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
>> - * (which match), so there is no need to reverse the endianness of the data
>> - * representation of the signature. However, the choice between BE32 and BE8
>> - * is done by the linker, so we cannot know whether code and data endianness
>> - * will be mixed before the linker is invoked.
>> - */
>> -
>> -#define RSEQ_SIG_CODE	0xe7f5def3
>> -
>> -#ifndef __ASSEMBLER__
>> -
>> -#define RSEQ_SIG_DATA							\
>> -	({								\
>> -		int sig;						\
>> -		asm volatile ("b 2f\n\t"				\
>> -			      "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
>> -			      "2:\n\t"					\
>> -			      "ldr %[sig], 1b\n\t"			\
>> -			      : [sig] "=r" (sig));			\
>> -		sig;							\
>> -	})
>> -
>> -#define RSEQ_SIG	RSEQ_SIG_DATA
>> -
>> -#endif
>> +#define RSEQ_SIG	0x53053053
> 
> I don't get why you're reverting back to this old signature value, when the
> one we came up with will work well when interpreted as an instruction in the
> *vast* majority of scenarios that people care about (A32/T32 little-endian).
> I think you might be under-estimating just how dead things like BE32 really
> are.

My issue is that the current .instr approach is broken for programs or functions
built in Thumb mode, and I received no feedback on the solutions I proposed for
those issues, which led me to propose a patch reverting to a simple .word.

> 
> That said, when you ran into .inst.n/.inst.w issues, did you try something
> along the lines of the WASM() macro we use in arch/arm/, which adds the ".w"
> suffix when targetting Thumb?

AFAIU, the WASM macros depend on CONFIG_THUMB2_KERNEL, which may be fine within
the kernel, but for user-space things are a bit more complex.

A compile-unit can be compiled as thumb, which will set a compiler define
which we could use to detect thumb mode. However, unfortunately, a single
function can also be compiled with an attribute selecting thumb mode, which
AFAIU does not influence the preprocessor defines.

If we had the equivalent of pushsection and popsection for "arm" mode vs "thumb"
mode in asm, it would solve all our issues, but the only way I found to do
something equivalent was to use the size of a no-op in a asm conditional
to restore the previous state.

Any other ideas on how to handle this ?

Thanks,

Mathieu


> 
> Will
Will Deacon June 25, 2019, 9:15 a.m. UTC | #3
On Mon, Jun 24, 2019 at 02:20:26PM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 24, 2019, at 1:24 PM, Will Deacon will.deacon@arm.com wrote:
> 
> > On Mon, Jun 17, 2019 at 05:23:04PM +0200, Mathieu Desnoyers wrote:
> >> This reverts commit 2b845d4b4acd9422bbb668989db8dc36dfc8f438.
> >> 
> >> That commit introduces build issues for programs compiled in Thumb mode.
> >> Rather than try to be clever and emit a valid trap instruction on arm32,
> >> which requires special care about big/little endian handling on that
> >> architecture, just emit plain data. Data in the instruction stream is
> >> technically expected on arm32: this is how literal pools are
> >> implemented. Reverting to the prior behavior does exactly that.
> >> 
> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >> CC: Peter Zijlstra <peterz@infradead.org>
> >> CC: Thomas Gleixner <tglx@linutronix.de>
> >> CC: Joel Fernandes <joelaf@google.com>
> >> CC: Catalin Marinas <catalin.marinas@arm.com>
> >> CC: Dave Watson <davejwatson@fb.com>
> >> CC: Will Deacon <will.deacon@arm.com>
> >> CC: Shuah Khan <shuah@kernel.org>
> >> CC: Andi Kleen <andi@firstfloor.org>
> >> CC: linux-kselftest@vger.kernel.org
> >> CC: "H . Peter Anvin" <hpa@zytor.com>
> >> CC: Chris Lameter <cl@linux.com>
> >> CC: Russell King <linux@arm.linux.org.uk>
> >> CC: Michael Kerrisk <mtk.manpages@gmail.com>
> >> CC: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
> >> CC: Paul Turner <pjt@google.com>
> >> CC: Boqun Feng <boqun.feng@gmail.com>
> >> CC: Josh Triplett <josh@joshtriplett.org>
> >> CC: Steven Rostedt <rostedt@goodmis.org>
> >> CC: Ben Maurer <bmaurer@fb.com>
> >> CC: linux-api@vger.kernel.org
> >> CC: Andy Lutomirski <luto@amacapital.net>
> >> CC: Andrew Morton <akpm@linux-foundation.org>
> >> CC: Linus Torvalds <torvalds@linux-foundation.org>
> >> CC: Carlos O'Donell <carlos@redhat.com>
> >> CC: Florian Weimer <fweimer@redhat.com>
> >> ---
> >>  tools/testing/selftests/rseq/rseq-arm.h | 52 ++-------------------------------
> >>  1 file changed, 2 insertions(+), 50 deletions(-)
> >> 
> >> diff --git a/tools/testing/selftests/rseq/rseq-arm.h
> >> b/tools/testing/selftests/rseq/rseq-arm.h
> >> index 84f28f147fb6..5f262c54364f 100644
> >> --- a/tools/testing/selftests/rseq/rseq-arm.h
> >> +++ b/tools/testing/selftests/rseq/rseq-arm.h
> >> @@ -5,54 +5,7 @@
> >>   * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >>   */
> >>  
> >> -/*
> >> - * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
> >> - * value 0x5de3. This traps if user-space reaches this instruction by mistake,
> >> - * and the uncommon operand ensures the kernel does not move the instruction
> >> - * pointer to attacker-controlled code on rseq abort.
> >> - *
> >> - * The instruction pattern in the A32 instruction set is:
> >> - *
> >> - * e7f5def3    udf    #24035    ; 0x5de3
> >> - *
> >> - * This translates to the following instruction pattern in the T16 instruction
> >> - * set:
> >> - *
> >> - * little endian:
> >> - * def3        udf    #243      ; 0xf3
> >> - * e7f5        b.n    <7f5>
> >> - *
> >> - * pre-ARMv6 big endian code:
> >> - * e7f5        b.n    <7f5>
> >> - * def3        udf    #243      ; 0xf3
> >> - *
> >> - * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
> >> - * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
> >> - * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
> >> - * (which match), so there is no need to reverse the endianness of the data
> >> - * representation of the signature. However, the choice between BE32 and BE8
> >> - * is done by the linker, so we cannot know whether code and data endianness
> >> - * will be mixed before the linker is invoked.
> >> - */
> >> -
> >> -#define RSEQ_SIG_CODE	0xe7f5def3
> >> -
> >> -#ifndef __ASSEMBLER__
> >> -
> >> -#define RSEQ_SIG_DATA							\
> >> -	({								\
> >> -		int sig;						\
> >> -		asm volatile ("b 2f\n\t"				\
> >> -			      "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
> >> -			      "2:\n\t"					\
> >> -			      "ldr %[sig], 1b\n\t"			\
> >> -			      : [sig] "=r" (sig));			\
> >> -		sig;							\
> >> -	})
> >> -
> >> -#define RSEQ_SIG	RSEQ_SIG_DATA
> >> -
> >> -#endif
> >> +#define RSEQ_SIG	0x53053053
> > 
> > I don't get why you're reverting back to this old signature value, when the
> > one we came up with will work well when interpreted as an instruction in the
> > *vast* majority of scenarios that people care about (A32/T32 little-endian).
> > I think you might be under-estimating just how dead things like BE32 really
> > are.
> 
> My issue is that the current .instr approach is broken for programs or functions
> built in Thumb mode, and I received no feedback on the solutions I proposed for
> those issues, which led me to propose a patch reverting to a simple .word.

I understand why you're moving from .inst to .word, but I don't understand
why that necessitates a change in the value. Why not .word 0xe7f5def3 ? You
could also flip the bytes around in case of big-endian, which would keep the
instruction coding clean for BE8.

> > That said, when you ran into .inst.n/.inst.w issues, did you try something
> > along the lines of the WASM() macro we use in arch/arm/, which adds the ".w"
> > suffix when targetting Thumb?
> 
> AFAIU, the WASM macros depend on CONFIG_THUMB2_KERNEL, which may be fine within
> the kernel, but for user-space things are a bit more complex.
> 
> A compile-unit can be compiled as thumb, which will set a compiler define
> which we could use to detect thumb mode. However, unfortunately, a single
> function can also be compiled with an attribute selecting thumb mode, which
> AFAIU does not influence the preprocessor defines.

Thanks, I hadn't considered that case. I don't know the right way to handle
that in the toolchain, so using .word is probably the best bet in the
absence of any better suggestions from the tools folks.

Will
Mathieu Desnoyers June 25, 2019, 2:08 p.m. UTC | #4
----- On Jun 25, 2019, at 5:15 AM, Will Deacon will.deacon@arm.com wrote:

> On Mon, Jun 24, 2019 at 02:20:26PM -0400, Mathieu Desnoyers wrote:
>> ----- On Jun 24, 2019, at 1:24 PM, Will Deacon will.deacon@arm.com wrote:
>> 
>> > On Mon, Jun 17, 2019 at 05:23:04PM +0200, Mathieu Desnoyers wrote:
>> >> This reverts commit 2b845d4b4acd9422bbb668989db8dc36dfc8f438.
>> >> 
>> >> That commit introduces build issues for programs compiled in Thumb mode.
>> >> Rather than try to be clever and emit a valid trap instruction on arm32,
>> >> which requires special care about big/little endian handling on that
>> >> architecture, just emit plain data. Data in the instruction stream is
>> >> technically expected on arm32: this is how literal pools are
>> >> implemented. Reverting to the prior behavior does exactly that.
>> >> 
>> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> >> CC: Peter Zijlstra <peterz@infradead.org>
>> >> CC: Thomas Gleixner <tglx@linutronix.de>
>> >> CC: Joel Fernandes <joelaf@google.com>
>> >> CC: Catalin Marinas <catalin.marinas@arm.com>
>> >> CC: Dave Watson <davejwatson@fb.com>
>> >> CC: Will Deacon <will.deacon@arm.com>
>> >> CC: Shuah Khan <shuah@kernel.org>
>> >> CC: Andi Kleen <andi@firstfloor.org>
>> >> CC: linux-kselftest@vger.kernel.org
>> >> CC: "H . Peter Anvin" <hpa@zytor.com>
>> >> CC: Chris Lameter <cl@linux.com>
>> >> CC: Russell King <linux@arm.linux.org.uk>
>> >> CC: Michael Kerrisk <mtk.manpages@gmail.com>
>> >> CC: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>> >> CC: Paul Turner <pjt@google.com>
>> >> CC: Boqun Feng <boqun.feng@gmail.com>
>> >> CC: Josh Triplett <josh@joshtriplett.org>
>> >> CC: Steven Rostedt <rostedt@goodmis.org>
>> >> CC: Ben Maurer <bmaurer@fb.com>
>> >> CC: linux-api@vger.kernel.org
>> >> CC: Andy Lutomirski <luto@amacapital.net>
>> >> CC: Andrew Morton <akpm@linux-foundation.org>
>> >> CC: Linus Torvalds <torvalds@linux-foundation.org>
>> >> CC: Carlos O'Donell <carlos@redhat.com>
>> >> CC: Florian Weimer <fweimer@redhat.com>
>> >> ---
>> >>  tools/testing/selftests/rseq/rseq-arm.h | 52 ++-------------------------------
>> >>  1 file changed, 2 insertions(+), 50 deletions(-)
>> >> 
>> >> diff --git a/tools/testing/selftests/rseq/rseq-arm.h
>> >> b/tools/testing/selftests/rseq/rseq-arm.h
>> >> index 84f28f147fb6..5f262c54364f 100644
>> >> --- a/tools/testing/selftests/rseq/rseq-arm.h
>> >> +++ b/tools/testing/selftests/rseq/rseq-arm.h
>> >> @@ -5,54 +5,7 @@
>> >>   * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> >>   */
>> >>  
>> >> -/*
>> >> - * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
>> >> - * value 0x5de3. This traps if user-space reaches this instruction by mistake,
>> >> - * and the uncommon operand ensures the kernel does not move the instruction
>> >> - * pointer to attacker-controlled code on rseq abort.
>> >> - *
>> >> - * The instruction pattern in the A32 instruction set is:
>> >> - *
>> >> - * e7f5def3    udf    #24035    ; 0x5de3
>> >> - *
>> >> - * This translates to the following instruction pattern in the T16 instruction
>> >> - * set:
>> >> - *
>> >> - * little endian:
>> >> - * def3        udf    #243      ; 0xf3
>> >> - * e7f5        b.n    <7f5>
>> >> - *
>> >> - * pre-ARMv6 big endian code:
>> >> - * e7f5        b.n    <7f5>
>> >> - * def3        udf    #243      ; 0xf3
>> >> - *
>> >> - * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
>> >> - * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
>> >> - * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
>> >> - * (which match), so there is no need to reverse the endianness of the data
>> >> - * representation of the signature. However, the choice between BE32 and BE8
>> >> - * is done by the linker, so we cannot know whether code and data endianness
>> >> - * will be mixed before the linker is invoked.
>> >> - */
>> >> -
>> >> -#define RSEQ_SIG_CODE	0xe7f5def3
>> >> -
>> >> -#ifndef __ASSEMBLER__
>> >> -
>> >> -#define RSEQ_SIG_DATA							\
>> >> -	({								\
>> >> -		int sig;						\
>> >> -		asm volatile ("b 2f\n\t"				\
>> >> -			      "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
>> >> -			      "2:\n\t"					\
>> >> -			      "ldr %[sig], 1b\n\t"			\
>> >> -			      : [sig] "=r" (sig));			\
>> >> -		sig;							\
>> >> -	})
>> >> -
>> >> -#define RSEQ_SIG	RSEQ_SIG_DATA
>> >> -
>> >> -#endif
>> >> +#define RSEQ_SIG	0x53053053
>> > 
>> > I don't get why you're reverting back to this old signature value, when the
>> > one we came up with will work well when interpreted as an instruction in the
>> > *vast* majority of scenarios that people care about (A32/T32 little-endian).
>> > I think you might be under-estimating just how dead things like BE32 really
>> > are.
>> 
>> My issue is that the current .instr approach is broken for programs or functions
>> built in Thumb mode, and I received no feedback on the solutions I proposed for
>> those issues, which led me to propose a patch reverting to a simple .word.
> 
> I understand why you're moving from .inst to .word, but I don't understand
> why that necessitates a change in the value. Why not .word 0xe7f5def3 ? You
> could also flip the bytes around in case of big-endian, which would keep the
> instruction coding clean for BE8.

As long as we state and document that this should not be expected to generate
valid instructions on big endian prior to ARMv6, I'm OK with that approach, e.g.:

/*
 * - ARM little endian
 *
 * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
 * value 0x5de3. This traps if user-space reaches this instruction by mistake,
 * and the uncommon operand ensures the kernel does not move the instruction
 * pointer to attacker-controlled code on rseq abort.
 *
 * The instruction pattern in the A32 instruction set is:
 *
 * e7f5def3    udf    #24035    ; 0x5de3
 *
 * This translates to the following instruction pattern in the T16 instruction
 * set:
 *
 * little endian:
 * def3        udf    #243      ; 0xf3
 * e7f5        b.n    <7f5>
 *
 * - ARMv6+ big endian:
 *
 * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
 * code and big-endian data. The data value of the signature needs to have its
 * byte order reversed to generate the trap instruction:
 *
 * Data: 0xf3def5e7
 *
 * Translates to this A32 instruction pattern:
 *
 * e7f5def3    udf    #24035    ; 0x5de3
 *
 * Translates to this T16 instruction pattern:
 *
 * def3        udf    #243      ; 0xf3
 * e7f5        b.n    <7f5>
 *
 * - Prior to ARMv6 big endian:
 *
 * Prior to ARMv6, -mbig-endian generates big-endian code and data (which match),
 * so the endianness of the data representation of the signature should not be
 * reversed. However, the choice between BE32 and BE8 is done by the linker,
 * so we cannot know whether code and data endianness will be mixed before the
 * linker is invoked. So rather than try to play tricks with the linker, the rseq
 * signature is simply data (not a trap instruction) prior to ARMv6 on big endian.
 * This is why the signature is expressed as data (.word) rather than as instruction
 * (.inst) in assembler.
 */

#ifdef __ARMEB__
#define RSEQ_SIG    0xf3def5e7      /* udf    #24035    ; 0x5de3 (ARMv6+) */
#else
#define RSEQ_SIG    0xe7f5def3      /* udf    #24035    ; 0x5de3 */
#endif

> 
>> > That said, when you ran into .inst.n/.inst.w issues, did you try something
>> > along the lines of the WASM() macro we use in arch/arm/, which adds the ".w"
>> > suffix when targetting Thumb?
>> 
>> AFAIU, the WASM macros depend on CONFIG_THUMB2_KERNEL, which may be fine within
>> the kernel, but for user-space things are a bit more complex.
>> 
>> A compile-unit can be compiled as thumb, which will set a compiler define
>> which we could use to detect thumb mode. However, unfortunately, a single
>> function can also be compiled with an attribute selecting thumb mode, which
>> AFAIU does not influence the preprocessor defines.
> 
> Thanks, I hadn't considered that case. I don't know the right way to handle
> that in the toolchain, so using .word is probably the best bet in the
> absence of any better suggestions from the tools folks.

Emitting a no-op within an excluded section, and using the size of that no-op
to restore the original mode is the best way I found, but I find it rather
tricky and bug-prone, so I would rather prefer the .word approach.

Thoughts ?

Thanks,

Mathieu
Will Deacon June 26, 2019, 6 p.m. UTC | #5
On Tue, Jun 25, 2019 at 10:08:52AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 25, 2019, at 5:15 AM, Will Deacon will.deacon@arm.com wrote:
> > On Mon, Jun 24, 2019 at 02:20:26PM -0400, Mathieu Desnoyers wrote:
> >> ----- On Jun 24, 2019, at 1:24 PM, Will Deacon will.deacon@arm.com wrote:
> >> > On Mon, Jun 17, 2019 at 05:23:04PM +0200, Mathieu Desnoyers wrote:
> >> >> -#define RSEQ_SIG_CODE	0xe7f5def3
> >> >> -
> >> >> -#ifndef __ASSEMBLER__
> >> >> -
> >> >> -#define RSEQ_SIG_DATA							\
> >> >> -	({								\
> >> >> -		int sig;						\
> >> >> -		asm volatile ("b 2f\n\t"				\
> >> >> -			      "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
> >> >> -			      "2:\n\t"					\
> >> >> -			      "ldr %[sig], 1b\n\t"			\
> >> >> -			      : [sig] "=r" (sig));			\
> >> >> -		sig;							\
> >> >> -	})
> >> >> -
> >> >> -#define RSEQ_SIG	RSEQ_SIG_DATA
> >> >> -
> >> >> -#endif
> >> >> +#define RSEQ_SIG	0x53053053
> >> > 
> >> > I don't get why you're reverting back to this old signature value, when the
> >> > one we came up with will work well when interpreted as an instruction in the
> >> > *vast* majority of scenarios that people care about (A32/T32 little-endian).
> >> > I think you might be under-estimating just how dead things like BE32 really
> >> > are.
> >> 
> >> My issue is that the current .instr approach is broken for programs or functions
> >> built in Thumb mode, and I received no feedback on the solutions I proposed for
> >> those issues, which led me to propose a patch reverting to a simple .word.
> > 
> > I understand why you're moving from .inst to .word, but I don't understand
> > why that necessitates a change in the value. Why not .word 0xe7f5def3 ? You
> > could also flip the bytes around in case of big-endian, which would keep the
> > instruction coding clean for BE8.
> 
> As long as we state and document that this should not be expected to generate
> valid instructions on big endian prior to ARMv6, I'm OK with that approach, e.g.:
> 
> /*
>  * - ARM little endian
>  *
>  * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
>  * value 0x5de3. This traps if user-space reaches this instruction by mistake,
>  * and the uncommon operand ensures the kernel does not move the instruction
>  * pointer to attacker-controlled code on rseq abort.
>  *
>  * The instruction pattern in the A32 instruction set is:
>  *
>  * e7f5def3    udf    #24035    ; 0x5de3
>  *
>  * This translates to the following instruction pattern in the T16 instruction
>  * set:
>  *
>  * little endian:
>  * def3        udf    #243      ; 0xf3
>  * e7f5        b.n    <7f5>
>  *
>  * - ARMv6+ big endian:

Maybe mention "(BE8)" here...

>  *
>  * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
>  * code and big-endian data. The data value of the signature needs to have its
>  * byte order reversed to generate the trap instruction:
>  *
>  * Data: 0xf3def5e7
>  *
>  * Translates to this A32 instruction pattern:
>  *
>  * e7f5def3    udf    #24035    ; 0x5de3
>  *
>  * Translates to this T16 instruction pattern:
>  *
>  * def3        udf    #243      ; 0xf3
>  * e7f5        b.n    <7f5>
>  *
>  * - Prior to ARMv6 big endian:

... and "(BE32)" here.

With that, this looks fine to me.

Will
diff mbox series

Patch

diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 84f28f147fb6..5f262c54364f 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -5,54 +5,7 @@ 
  * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
  */
 
-/*
- * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
- * value 0x5de3. This traps if user-space reaches this instruction by mistake,
- * and the uncommon operand ensures the kernel does not move the instruction
- * pointer to attacker-controlled code on rseq abort.
- *
- * The instruction pattern in the A32 instruction set is:
- *
- * e7f5def3    udf    #24035    ; 0x5de3
- *
- * This translates to the following instruction pattern in the T16 instruction
- * set:
- *
- * little endian:
- * def3        udf    #243      ; 0xf3
- * e7f5        b.n    <7f5>
- *
- * pre-ARMv6 big endian code:
- * e7f5        b.n    <7f5>
- * def3        udf    #243      ; 0xf3
- *
- * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
- * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
- * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
- * (which match), so there is no need to reverse the endianness of the data
- * representation of the signature. However, the choice between BE32 and BE8
- * is done by the linker, so we cannot know whether code and data endianness
- * will be mixed before the linker is invoked.
- */
-
-#define RSEQ_SIG_CODE	0xe7f5def3
-
-#ifndef __ASSEMBLER__
-
-#define RSEQ_SIG_DATA							\
-	({								\
-		int sig;						\
-		asm volatile ("b 2f\n\t"				\
-			      "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
-			      "2:\n\t"					\
-			      "ldr %[sig], 1b\n\t"			\
-			      : [sig] "=r" (sig));			\
-		sig;							\
-	})
-
-#define RSEQ_SIG	RSEQ_SIG_DATA
-
-#endif
+#define RSEQ_SIG	0x53053053
 
 #define rseq_smp_mb()	__asm__ __volatile__ ("dmb" ::: "memory", "cc")
 #define rseq_smp_rmb()	__asm__ __volatile__ ("dmb" ::: "memory", "cc")
@@ -125,8 +78,7 @@  do {									\
 		__rseq_str(table_label) ":\n\t"				\
 		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
 		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
-		".arm\n\t"						\
-		".inst " __rseq_str(RSEQ_SIG_CODE) "\n\t"		\
+		".word " __rseq_str(RSEQ_SIG) "\n\t"			\
 		__rseq_str(label) ":\n\t"				\
 		teardown						\
 		"b %l[" __rseq_str(abort_label) "]\n\t"