[1/4] ARM/div: Drop __div64_fls()

Message ID	20240902100355.3032079-2-andrew.cooper3@citrix.com (mailing list archive)
State	New
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Andrew Cooper <andrew.cooper3@citrix.com> To: Xen-devel <xen-devel@lists.xenproject.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>, Bertrand Marquis <bertrand.marquis@arm.com>, Michal Orzel <michal.orzel@amd.com> Subject: [PATCH 1/4] ARM/div: Drop __div64_fls() Date: Mon, 2 Sep 2024 11:03:52 +0100 Message-Id: <20240902100355.3032079-2-andrew.cooper3@citrix.com> In-Reply-To: <20240902100355.3032079-1-andrew.cooper3@citrix.com> References: <20240902100355.3032079-1-andrew.cooper3@citrix.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	ARM: Cleanup following bitops improvements \| expand [0/4] ARM: Cleanup following bitops improvements [1/4] ARM/div: Drop __div64_fls() [2/4] ARM/vgic: Correct the expression for lr_all_full() [3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr() [4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio*

Message ID

20240902100355.3032079-2-andrew.cooper3@citrix.com (mailing list archive)

State

New

Headers

Errors-To: xen-devel-bounces@lists.xenproject.org
Precedence: list
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel <xen-devel@lists.xenproject.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Julien Grall <julien@xen.org>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Bertrand Marquis <bertrand.marquis@arm.com>,
	Michal Orzel <michal.orzel@amd.com>
Subject: [PATCH 1/4] ARM/div: Drop __div64_fls()
Date: Mon,  2 Sep 2024 11:03:52 +0100
Message-Id: <20240902100355.3032079-2-andrew.cooper3@citrix.com>
In-Reply-To: <20240902100355.3032079-1-andrew.cooper3@citrix.com>
References: <20240902100355.3032079-1-andrew.cooper3@citrix.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

ARM: Cleanup following bitops improvements | expand

Commit Message

Andrew Cooper Sept. 2, 2024, 10:03 a.m. UTC

Following the improvements to Xen's bitops, fls() does constant propagation in
all cases.  Use it, and drop the local opencoded helper.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>

ARM32 gets a very minor code generation improvement:

  xen.git/xen$ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
  add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-48 (-48)
  Function                                     old     new   delta
  wallclock_time                               288     280      -8
  printk_start_of_line                         560     552      -8
  domain_vtimer_init                           472     464      -8
  do_settime                                   376     368      -8
  burn_credits                                 760     752      -8
  __printk_ratelimit                           424     416      -8

But it's just a couple of operations improvement and no real change in code
structure.  I expect that the constant propagation being done through
__builtin_clz(), rather than pure C, is giving the optimiser a bit more
information to work with.

This file also has an __GNUC__ < 4 case which seems ripe for removing...
---
 xen/arch/arm/include/asm/div64.h | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

Comments

Michal Orzel Sept. 3, 2024, 8:37 a.m. UTC | #1

On 02/09/2024 12:03, Andrew Cooper wrote:
> 
> 
> Following the improvements to Xen's bitops, fls() does constant propagation in
> all cases.  Use it, and drop the local opencoded helper.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> CC: Bertrand Marquis <bertrand.marquis@arm.com>
> CC: Michal Orzel <michal.orzel@amd.com>
> 
> ARM32 gets a very minor code generation improvement:
> 
>   xen.git/xen$ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
>   add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-48 (-48)
>   Function                                     old     new   delta
>   wallclock_time                               288     280      -8
>   printk_start_of_line                         560     552      -8
>   domain_vtimer_init                           472     464      -8
>   do_settime                                   376     368      -8
>   burn_credits                                 760     752      -8
>   __printk_ratelimit                           424     416      -8
> 
> But it's just a couple of operations improvement and no real change in code
> structure.  I expect that the constant propagation being done through
> __builtin_clz(), rather than pure C, is giving the optimiser a bit more
> information to work with.
> 
> This file also has an __GNUC__ < 4 case which seems ripe for removing...
Agree and noted.

~Michal

diff --git a/xen/arch/arm/include/asm/div64.h b/xen/arch/arm/include/asm/div64.h
index 0459d5cc0122..da1f1fcbd503 100644
--- a/xen/arch/arm/include/asm/div64.h
+++ b/xen/arch/arm/include/asm/div64.h
@@ -102,7 +102,7 @@ 
 		/* preserve low part of n for reminder computation */	\
 		__r = __n;						\
 		/* determine number of bits to represent __b */		\
-		__p = 1 << __div64_fls(__b);				\
+		__p = 1 << fls(__b);					\
 		/* compute __m = ((__p << 64) + __b - 1) / __b */	\
 		__m = (~0ULL / __b) * __p;				\
 		__m += (((~0ULL % __b + 1) * __p) + __b - 1) / __b;	\
@@ -150,8 +150,8 @@ 
 				__p /= (__m & -__m);			\
 				__m /= (__m & -__m);			\
 			} else {					\
-				__p >>= __div64_fls(__bits);		\
-				__m >>= __div64_fls(__bits);		\
+				__p >>= fls(__bits);			\
+				__m >>= fls(__bits);			\
 			}						\
 			/* No correction needed. */			\
 			__c = 0;					\
@@ -217,18 +217,6 @@ 
 	__r;								\
 })
 
-/* our own fls implementation to make sure constant propagation is fine */
-#define __div64_fls(bits)						\
-({									\
-	unsigned int __left = (bits), __nr = 0;				\
-	if (__left & 0xffff0000) __nr += 16, __left >>= 16;		\
-	if (__left & 0x0000ff00) __nr +=  8, __left >>=  8;		\
-	if (__left & 0x000000f0) __nr +=  4, __left >>=  4;		\
-	if (__left & 0x0000000c) __nr +=  2, __left >>=  2;		\
-	if (__left & 0x00000002) __nr +=  1;				\
-	__nr;								\
-})
-
 #endif /* GCC version */
 
 #endif /* BITS_PER_LONG */

[1/4] ARM/div: Drop __div64_fls()

Commit Message

Comments

Patch