Message ID | 20240313172716.2325427-8-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xen/bitops: Reduce the mess, starting with ffs() | expand |
On 13.03.2024 18:27, Andrew Cooper wrote: > --- a/xen/arch/x86/include/asm/bitops.h > +++ b/xen/arch/x86/include/asm/bitops.h > @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max) > r__; \ > }) > > -/** > - * find_first_set_bit - find the first set bit in @word > - * @word: the word to search > - * > - * Returns the bit-number of the first set bit. The input must *not* be zero. > - */ > -static inline unsigned int find_first_set_bit(unsigned long word) > -{ > - asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) ); > - return (unsigned int)word; > -} And you think it's okay to no longer use TZCNT like this when available, where the output doesn't have to have its value set up front? Jan
On 14/03/2024 3:59 pm, Jan Beulich wrote: > On 13.03.2024 18:27, Andrew Cooper wrote: >> --- a/xen/arch/x86/include/asm/bitops.h >> +++ b/xen/arch/x86/include/asm/bitops.h >> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max) >> r__; \ >> }) >> >> -/** >> - * find_first_set_bit - find the first set bit in @word >> - * @word: the word to search >> - * >> - * Returns the bit-number of the first set bit. The input must *not* be zero. >> - */ >> -static inline unsigned int find_first_set_bit(unsigned long word) >> -{ >> - asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) ); >> - return (unsigned int)word; >> -} > And you think it's okay to no longer use TZCNT like this when available, > where the output doesn't have to have its value set up front? This is a particularly evil piece of inline asm. It is interpreted as BSF or TZCNT depending on the BMI instruction set (Haswell/Piledriver era). Furthermore there are errata on some Intel systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated. Which means this piece of asm suffers from all of an undefined output register, undefined CF behaviour, and differing ZF behaviour (I believe) depending on which hardware you're running on. The only thing the REP prefix is getting you is a deterministic 0 in the destination register, on some hardware only, for code which has already violated the input safety condition. As a piece of defence in depth, then perhaps it's useful. But following up from the other thread, https://gcc.gnu.org/pipermail/gcc/2024-March/243475.html is form where the compiler can (and does!) simplify back to the plain BSF form when it can prove that this is safe. The only case where using TZCNT is helpful is when we're compiling for x86_64-v3 and there is no need to work around BSF's undefined behaviour. Even with x86's arch_ffs() now split nicely based on whether the compiler knows BSF is safe or not, an alternative to swap between BSF and TZCNT probably isn't a win; you still have to cover up 6 or 7 bytes of the -1 setup, which you can't do with leading prefixes on the TZCNT itself. All CPUs with BMI can swallow the double-instruction data dependency without breaking a sweat, at which point you're trading off (at best) a 1-cycle improvement vs the setup costs of the alternative. If there is any real improvement to be had here, it's marginal enough that I'm not sure it's worth doing. ~Andrew
On 14/03/2024 5:14 pm, Andrew Cooper wrote: > On 14/03/2024 3:59 pm, Jan Beulich wrote: >> On 13.03.2024 18:27, Andrew Cooper wrote: >>> --- a/xen/arch/x86/include/asm/bitops.h >>> +++ b/xen/arch/x86/include/asm/bitops.h >>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max) >>> r__; \ >>> }) >>> >>> -/** >>> - * find_first_set_bit - find the first set bit in @word >>> - * @word: the word to search >>> - * >>> - * Returns the bit-number of the first set bit. The input must *not* be zero. >>> - */ >>> -static inline unsigned int find_first_set_bit(unsigned long word) >>> -{ >>> - asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) ); >>> - return (unsigned int)word; >>> -} >> And you think it's okay to no longer use TZCNT like this when available, >> where the output doesn't have to have its value set up front? > This is a particularly evil piece of inline asm. > > It is interpreted as BSF or TZCNT depending on the BMI instruction set > (Haswell/Piledriver era). Furthermore there are errata on some Intel > systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated. > > Which means this piece of asm suffers from all of an undefined output > register, undefined CF behaviour, and differing ZF behaviour (I believe) > depending on which hardware you're running on. > > The only thing the REP prefix is getting you is a deterministic 0 in the > destination register, No, it doesn't. For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64 in this case. It also means there's no chance of coming up with a useful alternative for ffs() to use TZCNT when available. ~Andrew
On 15.03.2024 14:48, Andrew Cooper wrote: > On 14/03/2024 5:14 pm, Andrew Cooper wrote: >> On 14/03/2024 3:59 pm, Jan Beulich wrote: >>> On 13.03.2024 18:27, Andrew Cooper wrote: >>>> --- a/xen/arch/x86/include/asm/bitops.h >>>> +++ b/xen/arch/x86/include/asm/bitops.h >>>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max) >>>> r__; \ >>>> }) >>>> >>>> -/** >>>> - * find_first_set_bit - find the first set bit in @word >>>> - * @word: the word to search >>>> - * >>>> - * Returns the bit-number of the first set bit. The input must *not* be zero. >>>> - */ >>>> -static inline unsigned int find_first_set_bit(unsigned long word) >>>> -{ >>>> - asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) ); >>>> - return (unsigned int)word; >>>> -} >>> And you think it's okay to no longer use TZCNT like this when available, >>> where the output doesn't have to have its value set up front? >> This is a particularly evil piece of inline asm. >> >> It is interpreted as BSF or TZCNT depending on the BMI instruction set >> (Haswell/Piledriver era). Furthermore there are errata on some Intel >> systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated. >> >> Which means this piece of asm suffers from all of an undefined output >> register, undefined CF behaviour, and differing ZF behaviour (I believe) >> depending on which hardware you're running on. >> >> The only thing the REP prefix is getting you is a deterministic 0 in the >> destination register, > > No, it doesn't. > > For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64 > in this case. > > It also means there's no chance of coming up with a useful alternative > for ffs() to use TZCNT when available. Right, for ffs() TZCNT isn't suitable. But for find_first_set_bit() it was, yielding a reliably out-of-range output for zero input (which BSF wouldn't guarantee). Jan
diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h index 59ae8ed150b6..5104334e4874 100644 --- a/xen/arch/arm/include/asm/bitops.h +++ b/xen/arch/arm/include/asm/bitops.h @@ -160,18 +160,6 @@ static inline int fls(unsigned int x) #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); }) #define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); }) -/** - * find_first_set_bit - find the first set bit in @word - * @word: the word to search - * - * Returns the bit-number of the first set bit (first bit being 0). - * The input must *not* be zero. - */ -static inline unsigned int find_first_set_bit(unsigned long word) -{ - return ffsl(word) - 1; -} - /** * hweightN - returns the hamming weight of a N-bit word * @x: the word to weigh diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h index ecec2a826660..989d341a44c7 100644 --- a/xen/arch/ppc/include/asm/bitops.h +++ b/xen/arch/ppc/include/asm/bitops.h @@ -206,13 +206,4 @@ static always_inline unsigned long __ffs(unsigned long word) return __builtin_ctzl(word); } -/** - * find_first_set_bit - find the first set bit in @word - * @word: the word to search - * - * Returns the bit-number of the first set bit (first bit being 0). - * The input must *not* be zero. - */ -#define find_first_set_bit(x) (ffsl(x) - 1) - #endif /* _ASM_PPC_BITOPS_H */ diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h index 99342877e32f..2835bb6814d5 100644 --- a/xen/arch/x86/include/asm/bitops.h +++ b/xen/arch/x86/include/asm/bitops.h @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max) r__; \ }) -/** - * find_first_set_bit - find the first set bit in @word - * @word: the word to search - * - * Returns the bit-number of the first set bit. The input must *not* be zero. - */ -static inline unsigned int find_first_set_bit(unsigned long word) -{ - asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) ); - return (unsigned int)word; -} - static inline unsigned int arch_ffs(unsigned int x) { int r = -1;
No more users. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien@xen.org> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> CC: Bertrand Marquis <bertrand.marquis@arm.com> CC: Michal Orzel <michal.orzel@amd.com> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com> CC: Shawn Anastasio <sanastasio@raptorengineering.com> CC: consulting@bugseng.com <consulting@bugseng.com> CC: Simone Ballarin <simone.ballarin@bugseng.com> CC: Federico Serafini <federico.serafini@bugseng.com> CC: Nicola Vetrini <nicola.vetrini@bugseng.com> --- xen/arch/arm/include/asm/bitops.h | 12 ------------ xen/arch/ppc/include/asm/bitops.h | 9 --------- xen/arch/x86/include/asm/bitops.h | 12 ------------ 3 files changed, 33 deletions(-)