diff mbox series

[7/7] xen/bitops: Delete find_first_set_bit()

Message ID 20240313172716.2325427-8-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series xen/bitops: Reduce the mess, starting with ffs() | expand

Commit Message

Andrew Cooper March 13, 2024, 5:27 p.m. UTC
No more users.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/arch/arm/include/asm/bitops.h | 12 ------------
 xen/arch/ppc/include/asm/bitops.h |  9 ---------
 xen/arch/x86/include/asm/bitops.h | 12 ------------
 3 files changed, 33 deletions(-)

Comments

Jan Beulich March 14, 2024, 3:59 p.m. UTC | #1
On 13.03.2024 18:27, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>      r__;                                                                    \
>  })
>  
> -/**
> - * find_first_set_bit - find the first set bit in @word
> - * @word: the word to search
> - * 
> - * Returns the bit-number of the first set bit. The input must *not* be zero.
> - */
> -static inline unsigned int find_first_set_bit(unsigned long word)
> -{
> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
> -    return (unsigned int)word;
> -}

And you think it's okay to no longer use TZCNT like this when available,
where the output doesn't have to have its value set up front?

Jan
Andrew Cooper March 14, 2024, 5:14 p.m. UTC | #2
On 14/03/2024 3:59 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>      r__;                                                                    \
>>  })
>>  
>> -/**
>> - * find_first_set_bit - find the first set bit in @word
>> - * @word: the word to search
>> - * 
>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>> - */
>> -static inline unsigned int find_first_set_bit(unsigned long word)
>> -{
>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>> -    return (unsigned int)word;
>> -}
> And you think it's okay to no longer use TZCNT like this when available,
> where the output doesn't have to have its value set up front?

This is a particularly evil piece of inline asm.

It is interpreted as BSF or TZCNT depending on the BMI instruction set
(Haswell/Piledriver era).  Furthermore there are errata on some Intel
systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.

Which means this piece of asm suffers from all of an undefined output
register, undefined CF behaviour, and differing ZF behaviour (I believe)
depending on which hardware you're running on.

The only thing the REP prefix is getting you is a deterministic 0 in the
destination register, on some hardware only, for code which has already
violated the input safety condition.  As a piece of defence in depth,
then perhaps it's useful.

But following up from the other thread,
https://gcc.gnu.org/pipermail/gcc/2024-March/243475.html is form where
the compiler can (and does!) simplify back to the plain BSF form when it
can prove that this is safe.


The only case where using TZCNT is helpful is when we're compiling for
x86_64-v3 and there is no need to work around BSF's undefined behaviour.

Even with x86's arch_ffs() now split nicely based on whether the
compiler knows BSF is safe or not, an alternative to swap between BSF
and TZCNT probably isn't a win; you still have to cover up 6 or 7 bytes
of the -1 setup, which you can't do with leading prefixes on the TZCNT
itself.

All CPUs with BMI can swallow the double-instruction data dependency
without breaking a sweat, at which point you're trading off (at best) a
1-cycle improvement vs the setup costs of the alternative.  If there is
any real improvement to be had here, it's marginal enough that I'm not
sure it's worth doing.

~Andrew
Andrew Cooper March 15, 2024, 1:48 p.m. UTC | #3
On 14/03/2024 5:14 pm, Andrew Cooper wrote:
> On 14/03/2024 3:59 pm, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>>      r__;                                                                    \
>>>  })
>>>  
>>> -/**
>>> - * find_first_set_bit - find the first set bit in @word
>>> - * @word: the word to search
>>> - * 
>>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>>> - */
>>> -static inline unsigned int find_first_set_bit(unsigned long word)
>>> -{
>>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>>> -    return (unsigned int)word;
>>> -}
>> And you think it's okay to no longer use TZCNT like this when available,
>> where the output doesn't have to have its value set up front?
> This is a particularly evil piece of inline asm.
>
> It is interpreted as BSF or TZCNT depending on the BMI instruction set
> (Haswell/Piledriver era).  Furthermore there are errata on some Intel
> systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.
>
> Which means this piece of asm suffers from all of an undefined output
> register, undefined CF behaviour, and differing ZF behaviour (I believe)
> depending on which hardware you're running on.
>
> The only thing the REP prefix is getting you is a deterministic 0 in the
> destination register,

No, it doesn't.

For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64
in this case.

It also means there's no chance of coming up with a useful alternative
for ffs() to use TZCNT when available.

~Andrew
Jan Beulich March 15, 2024, 2:16 p.m. UTC | #4
On 15.03.2024 14:48, Andrew Cooper wrote:
> On 14/03/2024 5:14 pm, Andrew Cooper wrote:
>> On 14/03/2024 3:59 pm, Jan Beulich wrote:
>>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/include/asm/bitops.h
>>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>>>      r__;                                                                    \
>>>>  })
>>>>  
>>>> -/**
>>>> - * find_first_set_bit - find the first set bit in @word
>>>> - * @word: the word to search
>>>> - * 
>>>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>>>> - */
>>>> -static inline unsigned int find_first_set_bit(unsigned long word)
>>>> -{
>>>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>>>> -    return (unsigned int)word;
>>>> -}
>>> And you think it's okay to no longer use TZCNT like this when available,
>>> where the output doesn't have to have its value set up front?
>> This is a particularly evil piece of inline asm.
>>
>> It is interpreted as BSF or TZCNT depending on the BMI instruction set
>> (Haswell/Piledriver era).  Furthermore there are errata on some Intel
>> systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.
>>
>> Which means this piece of asm suffers from all of an undefined output
>> register, undefined CF behaviour, and differing ZF behaviour (I believe)
>> depending on which hardware you're running on.
>>
>> The only thing the REP prefix is getting you is a deterministic 0 in the
>> destination register,
> 
> No, it doesn't.
> 
> For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64
> in this case.
> 
> It also means there's no chance of coming up with a useful alternative
> for ffs() to use TZCNT when available.

Right, for ffs() TZCNT isn't suitable. But for find_first_set_bit() it was,
yielding a reliably out-of-range output for zero input (which BSF wouldn't
guarantee).

Jan
diff mbox series

Patch

diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index 59ae8ed150b6..5104334e4874 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -160,18 +160,6 @@  static inline int fls(unsigned int x)
 #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
 #define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-        return ffsl(word) - 1;
-}
-
 /**
  * hweightN - returns the hamming weight of a N-bit word
  * @x: the word to weigh
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index ecec2a826660..989d341a44c7 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -206,13 +206,4 @@  static always_inline unsigned long __ffs(unsigned long word)
     return __builtin_ctzl(word);
 }
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-#define find_first_set_bit(x) (ffsl(x) - 1)
-
 #endif /* _ASM_PPC_BITOPS_H */
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 99342877e32f..2835bb6814d5 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -401,18 +401,6 @@  static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
     r__;                                                                    \
 })
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- * 
- * Returns the bit-number of the first set bit. The input must *not* be zero.
- */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
-    return (unsigned int)word;
-}
-
 static inline unsigned int arch_ffs(unsigned int x)
 {
     int r = -1;