[4/6] arm64: uaccess: Use asm/ccset.h macros in __range_ok
diff mbox series

Message ID 20200311180416.6509-5-richard.henderson@linaro.org
State New
Headers show
Series
  • arm64: gcc asm flag outputs
Related show

Commit Message

Richard Henderson March 11, 2020, 6:04 p.m. UTC
Uses of __range_ok almost always feed a branch.
This allows the compiler to use flags directly.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 arch/arm64/include/asm/uaccess.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Robin Murphy March 12, 2020, 11:48 a.m. UTC | #1
On 2020-03-11 6:04 pm, Richard Henderson wrote:
> Uses of __range_ok almost always feed a branch.
> This allows the compiler to use flags directly.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   arch/arm64/include/asm/uaccess.h | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index fe3dd70e901e..ca1acd7b95c3 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -22,6 +22,7 @@
>   #include <asm/ptrace.h>
>   #include <asm/memory.h>
>   #include <asm/extable.h>
> +#include <asm/ccset.h>
>   
>   #define get_fs()	(current_thread_info()->addr_limit)
>   
> @@ -86,10 +87,10 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si
>   	//    comes from the carry in being clear. Otherwise, we are
>   	//    testing X' - C == 0, subject to the previous adjustments.
>   	"	sbcs	xzr, %[addr], %[limit]\n"
> -	"       cset    %[addr], ls\n"
> -	: [addr] "=&r" (ret), [limit] "+r" (limit)
> +		CC_SET(ls)
> +	: [addr] "=&r" (addr), [limit] "+r" (limit), CC_OUT(ls) (ret)

...and we've immediately undone any benefit of the previous patch by 
effectively recoupling %0 with addr again :/

I don't entirely follow why, not least because this CC_SET/CC_OUT 
business is virtually unreadable. At the very least the macro should at 
least take an operand identifier as an explicit argument rather than 
secretly generating one, so that we're not all scratching our heads 
wondering how it can possibly work at all. Furthermore, if the 
associated C variable is a 32-bit or smaller type, then won't it provoke 
warnings from Clang due to the operand lacking the "w" modifier?

Robin.

>   	: [size] "Ir" (size), [addr_in] "r" (addr)
> -	: "cc");
> +	: CC_CLOBBER);
>   
>   	return ret;
>   }
>
Mark Rutland March 13, 2020, 11:04 a.m. UTC | #2
On Wed, Mar 11, 2020 at 11:04:14AM -0700, Richard Henderson wrote:
> Uses of __range_ok almost always feed a branch.
> This allows the compiler to use flags directly.

If we want to give hte compiler the most freedom, the best thing would
be to write this in C. IIUC this code is written in assembly largely for
historical reasons, and the comment above says:

| This is equivalent to the following test:
| (u65)addr + (u65)size <= (u65)current->addr_limit + 1

... which e.g. we could write as:

	(__uint128_t)addr + (__uint128_t)size <= (__uint128_t)limit + 1;

... which would be much clearer than the assembly.

Is there any pattern like that for which the compiler generates
reasonable looking code, and if not, is that something that could be
improved compiler side?

Thanks,
Mark.

> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  arch/arm64/include/asm/uaccess.h | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index fe3dd70e901e..ca1acd7b95c3 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -22,6 +22,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/memory.h>
>  #include <asm/extable.h>
> +#include <asm/ccset.h>
>  
>  #define get_fs()	(current_thread_info()->addr_limit)
>  
> @@ -86,10 +87,10 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si
>  	//    comes from the carry in being clear. Otherwise, we are
>  	//    testing X' - C == 0, subject to the previous adjustments.
>  	"	sbcs	xzr, %[addr], %[limit]\n"
> -	"       cset    %[addr], ls\n"
> -	: [addr] "=&r" (ret), [limit] "+r" (limit)
> +		CC_SET(ls)
> +	: [addr] "=&r" (addr), [limit] "+r" (limit), CC_OUT(ls) (ret)
>  	: [size] "Ir" (size), [addr_in] "r" (addr)
> -	: "cc");
> +	: CC_CLOBBER);
>  
>  	return ret;
>  }
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Robin Murphy March 13, 2020, 4:51 p.m. UTC | #3
On 2020-03-13 11:04 am, Mark Rutland wrote:
> On Wed, Mar 11, 2020 at 11:04:14AM -0700, Richard Henderson wrote:
>> Uses of __range_ok almost always feed a branch.
>> This allows the compiler to use flags directly.
> 
> If we want to give hte compiler the most freedom, the best thing would
> be to write this in C. IIUC this code is written in assembly largely for
> historical reasons, and the comment above says:
> 
> | This is equivalent to the following test:
> | (u65)addr + (u65)size <= (u65)current->addr_limit + 1
> 
> ... which e.g. we could write as:
> 
> 	(__uint128_t)addr + (__uint128_t)size <= (__uint128_t)limit + 1;
> 
> ... which would be much clearer than the assembly.
> 
> Is there any pattern like that for which the compiler generates
> reasonable looking code, and if not, is that something that could be
> improved compiler side?

Hmm, in fact this:

	__uint128_t tmp = (__uint128_t)(unsigned long)addr + size;
	return !tmp || tmp - 1 <= limit;

generates code that looks like utter crap in isolation[1], but once 
inlined actually leads to a modest overall reduction (-0.09%) across the 
whole of vmlinux according to bloat-o-meter (presumably most of those 
branches roll up into the overall "if(access_ok())..." control flow for 
the typical case, and I'm sure size and limit get constant-folded a lot).

IIRC at the time there were so many uncertainties flying around that 
sticking with asm to take compiler unknowns out of the picture felt 
desirable, but perhaps the time might be nigh to retire my baby after 
all... I'll investigate a bit further.

Robin.


[1]:
0000000000000000 <range_ok>:
    0:   ab010000        adds    x0, x0, x1
    4:   9a9f37e3        cset    x3, cs  // cs = hs, nlast
    8:   aa030001        orr     x1, x0, x3
    c:   b4000161        cbz     x1, 38 <range_ok+0x38>
   10:   f1000401        subs    x1, x0, #0x1
   14:   d2800020        mov     x0, #0x1                        // #1
   18:   da1f0063        sbc     x3, x3, xzr
   1c:   b4000063        cbz     x3, 28 <range_ok+0x28>
   20:   d2800000        mov     x0, #0x0                        // #0
   24:   d65f03c0        ret
   28:   eb02003f        cmp     x1, x2
   2c:   54ffffc9        b.ls    24 <range_ok+0x24>  // b.plast
   30:   d2800000        mov     x0, #0x0                        // #0
   34:   17fffffc        b       24 <range_ok+0x24>
   38:   d2800020        mov     x0, #0x1                        // #1
   3c:   d65f03c0        ret
Mark Rutland March 13, 2020, 5:14 p.m. UTC | #4
On Fri, Mar 13, 2020 at 04:51:28PM +0000, Robin Murphy wrote:
> On 2020-03-13 11:04 am, Mark Rutland wrote:
> > On Wed, Mar 11, 2020 at 11:04:14AM -0700, Richard Henderson wrote:
> > > Uses of __range_ok almost always feed a branch.
> > > This allows the compiler to use flags directly.
> > 
> > If we want to give hte compiler the most freedom, the best thing would
> > be to write this in C. IIUC this code is written in assembly largely for
> > historical reasons, and the comment above says:
> > 
> > | This is equivalent to the following test:
> > | (u65)addr + (u65)size <= (u65)current->addr_limit + 1
> > 
> > ... which e.g. we could write as:
> > 
> > 	(__uint128_t)addr + (__uint128_t)size <= (__uint128_t)limit + 1;
> > 
> > ... which would be much clearer than the assembly.
> > 
> > Is there any pattern like that for which the compiler generates
> > reasonable looking code, and if not, is that something that could be
> > improved compiler side?
> 
> Hmm, in fact this:
> 
> 	__uint128_t tmp = (__uint128_t)(unsigned long)addr + size;
> 	return !tmp || tmp - 1 <= limit;
> 
> generates code that looks like utter crap in isolation[1], but once inlined
> actually leads to a modest overall reduction (-0.09%) across the whole of
> vmlinux according to bloat-o-meter (presumably most of those branches roll
> up into the overall "if(access_ok())..." control flow for the typical case,
> and I'm sure size and limit get constant-folded a lot).

Neat.

IIUC for a non-zero size the !tmp check can be elided, and for a
constant size the subtraction can be folded in at compile time, so for a
{get,put}_user(), the compiler only needs to do the addition and a check
against limit.

> IIRC at the time there were so many uncertainties flying around that
> sticking with asm to take compiler unknowns out of the picture felt
> desirable, but perhaps the time might be nigh to retire my baby after all...

I guess we might have thought we'd need to pass the result into some
masking logic, but I think uaccess_mask_ptr() turned out to be good
enough in practice.

> I'll investigate a bit further.

Great!

Thanks,
Mark.

Patch
diff mbox series

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index fe3dd70e901e..ca1acd7b95c3 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -22,6 +22,7 @@ 
 #include <asm/ptrace.h>
 #include <asm/memory.h>
 #include <asm/extable.h>
+#include <asm/ccset.h>
 
 #define get_fs()	(current_thread_info()->addr_limit)
 
@@ -86,10 +87,10 @@  static inline unsigned long __range_ok(const void __user *addr, unsigned long si
 	//    comes from the carry in being clear. Otherwise, we are
 	//    testing X' - C == 0, subject to the previous adjustments.
 	"	sbcs	xzr, %[addr], %[limit]\n"
-	"       cset    %[addr], ls\n"
-	: [addr] "=&r" (ret), [limit] "+r" (limit)
+		CC_SET(ls)
+	: [addr] "=&r" (addr), [limit] "+r" (limit), CC_OUT(ls) (ret)
 	: [size] "Ir" (size), [addr_in] "r" (addr)
-	: "cc");
+	: CC_CLOBBER);
 
 	return ret;
 }