[v2] arm64: Simplify __range_ok
diff mbox series

Message ID 20200327030918.7292-1-richard.henderson@linaro.org
State New
Headers show
Series
  • [v2] arm64: Simplify __range_ok
Related show

Commit Message

Richard Henderson March 27, 2020, 3:09 a.m. UTC
The general case is not quite as compact as the inline assembly,
but with a sufficiently advanced compiler it is only 6 insns vs 5.

The real improvement comes from assuming that limit is never tiny,
and using __builtin_constant_p to make sure the constant folding
does not go awry.  This produces a 2 insn sequence even for older
compilers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---

Thanks for the v1 review, Mark.

Text section size change:

  $ scripts/bloat-o-meter -t vmlinux{-orig,} | tail -1

  * When built with gcc-10 (master) + cmpti patch set:
    Total: Before=12820824, After=12774708, chg -0.36%

  * When built with gcc-7.5.0-3ubuntu1~18.04:
    Total: Before=13564038, After=13513258, chg -0.37%

Changes in v2:
  * Adjust initialization of iaddr.
  * Use USER_DS as the constant limit and update commentary.


r~

---
 arch/arm64/include/asm/uaccess.h | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

Comments

Mark Rutland April 3, 2020, 10:50 a.m. UTC | #1
Hi Richard,

On Thu, Mar 26, 2020 at 08:09:18PM -0700, Richard Henderson wrote:
> The general case is not quite as compact as the inline assembly,
> but with a sufficiently advanced compiler it is only 6 insns vs 5.
> 
> The real improvement comes from assuming that limit is never tiny,
> and using __builtin_constant_p to make sure the constant folding
> does not go awry.  This produces a 2 insn sequence even for older
> compilers.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Thanks for this; it looks good to me, minor comments below.

I believe the const cases capture all realistic contsnt sizes, and
handles those correctly. We miss a couple of constant cases that we
could catch (and might want to do so for completeness) but either way I
think this looks good.

> ---
> 
> Thanks for the v1 review, Mark.
> 
> Text section size change:
> 
>   $ scripts/bloat-o-meter -t vmlinux{-orig,} | tail -1
> 
>   * When built with gcc-10 (master) + cmpti patch set:
>     Total: Before=12820824, After=12774708, chg -0.36%
> 
>   * When built with gcc-7.5.0-3ubuntu1~18.04:
>     Total: Before=13564038, After=13513258, chg -0.37%
> 
> Changes in v2:
>   * Adjust initialization of iaddr.
>   * Use USER_DS as the constant limit and update commentary.
> 
> 
> r~
> 
> ---
>  arch/arm64/include/asm/uaccess.h | 32 +++++++++++++-------------------
>  1 file changed, 13 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 32fc8061aa76..04ef201e6179 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -60,7 +60,8 @@ static inline void set_fs(mm_segment_t fs)
>   */
>  static inline unsigned long __range_ok(const void __user *addr, unsigned long size)
>  {
> -	unsigned long ret, limit = current_thread_info()->addr_limit;
> +	unsigned long limit = current_thread_info()->addr_limit;
> +	unsigned long iaddr = (unsigned long)addr;
>  
>  	/*
>  	 * Asynchronous I/O running in a kernel thread does not have the
> @@ -72,24 +73,17 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si
>  		addr = untagged_addr(addr);
>  
>  	__chk_user_ptr(addr);
> -	asm volatile(
> -	// A + B <= C + 1 for all A,B,C, in four easy steps:
> -	// 1: X = A + B; X' = X % 2^64
> -	"	adds	%0, %3, %2\n"
> -	// 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
> -	"	csel	%1, xzr, %1, hi\n"
> -	// 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
> -	//    to compensate for the carry flag being set in step 4. For
> -	//    X > 2^64, X' merely has to remain nonzero, which it does.
> -	"	csinv	%0, %0, xzr, cc\n"
> -	// 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
> -	//    comes from the carry in being clear. Otherwise, we are
> -	//    testing X' - C == 0, subject to the previous adjustments.
> -	"	sbcs	xzr, %0, %1\n"
> -	"	cset	%0, ls\n"
> -	: "=&r" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
> -
> -	return ret;
> +
> +	/*
> +	 * The minimum value for limit is USER_DS, and quite a lot of
> +	 * range checks use sizeof(some_type).  With both constants,
> +	 * we can rearrange the computation to avoid the need for
> +	 * 65-bit arithmetic.
> +	 */
> +	if (__builtin_constant_p(size) && size > 0 && size < USER_DS)
> +		return iaddr <= limit + 1 - size;

As above, I believe this can be:

	if (__builtin_constant_p(size) && size > 0 && size <= USER_DS + 1)
		return iaddr <= limit + 1 - size;

... but either way I think this is sound, and that's just for symmetry
of the test cases.

> +
> +	return (__uint128_t)iaddr + size <= (__uint128_t)limit + 1;

I believe this is obviously correct.

So with or without the change suggested above:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

>  }
>  
>  #define access_ok(addr, size)	__range_ok(addr, size)
> -- 
> 2.17.1
>

Patch
diff mbox series

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 32fc8061aa76..04ef201e6179 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -60,7 +60,8 @@  static inline void set_fs(mm_segment_t fs)
  */
 static inline unsigned long __range_ok(const void __user *addr, unsigned long size)
 {
-	unsigned long ret, limit = current_thread_info()->addr_limit;
+	unsigned long limit = current_thread_info()->addr_limit;
+	unsigned long iaddr = (unsigned long)addr;
 
 	/*
 	 * Asynchronous I/O running in a kernel thread does not have the
@@ -72,24 +73,17 @@  static inline unsigned long __range_ok(const void __user *addr, unsigned long si
 		addr = untagged_addr(addr);
 
 	__chk_user_ptr(addr);
-	asm volatile(
-	// A + B <= C + 1 for all A,B,C, in four easy steps:
-	// 1: X = A + B; X' = X % 2^64
-	"	adds	%0, %3, %2\n"
-	// 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
-	"	csel	%1, xzr, %1, hi\n"
-	// 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
-	//    to compensate for the carry flag being set in step 4. For
-	//    X > 2^64, X' merely has to remain nonzero, which it does.
-	"	csinv	%0, %0, xzr, cc\n"
-	// 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
-	//    comes from the carry in being clear. Otherwise, we are
-	//    testing X' - C == 0, subject to the previous adjustments.
-	"	sbcs	xzr, %0, %1\n"
-	"	cset	%0, ls\n"
-	: "=&r" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
-
-	return ret;
+
+	/*
+	 * The minimum value for limit is USER_DS, and quite a lot of
+	 * range checks use sizeof(some_type).  With both constants,
+	 * we can rearrange the computation to avoid the need for
+	 * 65-bit arithmetic.
+	 */
+	if (__builtin_constant_p(size) && size > 0 && size < USER_DS)
+		return iaddr <= limit + 1 - size;
+
+	return (__uint128_t)iaddr + size <= (__uint128_t)limit + 1;
 }
 
 #define access_ok(addr, size)	__range_ok(addr, size)