diff mbox series

[6/7] xen: Swap find_first_set_bit() for ffsl() - 1

Message ID 20240313172716.2325427-7-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series xen/bitops: Reduce the mess, starting with ffs() | expand

Commit Message

Andrew Cooper March 13, 2024, 5:27 p.m. UTC
find_first_set_bit() is a Xen-ism which has undefined behaviour with a 0
input.  The latter is well defined with an input of 0, and is a found outside
of Xen too.

_init_heap_pages() is the one special case here, comparing the LSB of two
different addresses.  The -1 cancels off both sides of the expression.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/arch/x86/guest/xen/xen.c                 | 4 ++--
 xen/arch/x86/hvm/dom0_build.c                | 2 +-
 xen/arch/x86/hvm/hpet.c                      | 8 ++++----
 xen/arch/x86/include/asm/pt-contig-markers.h | 2 +-
 xen/arch/x86/mm.c                            | 2 +-
 xen/arch/x86/mm/p2m-pod.c                    | 4 ++--
 xen/common/page_alloc.c                      | 2 +-
 xen/common/softirq.c                         | 2 +-
 xen/drivers/passthrough/amd/iommu_map.c      | 2 +-
 xen/drivers/passthrough/iommu.c              | 4 ++--
 xen/drivers/passthrough/x86/iommu.c          | 4 ++--
 11 files changed, 18 insertions(+), 18 deletions(-)

Comments

Jan Beulich March 14, 2024, 2:30 p.m. UTC | #1
On 13.03.2024 18:27, Andrew Cooper wrote:
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>      if ( contig_mask )
>      {
>          /* See pt-contig-markers.h for a description of the marker scheme. */
> -        unsigned int i, shift = find_first_set_bit(contig_mask);
> +        unsigned int i, shift = ffsl(contig_mask) - 1;

The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
introduced).

But what I first of all would like to have clarification on is what your
(perhaps just abstract at this point) plans are wrt ffz() / ffzl().
Potential side-by-side uses would be odd now, and would continue to be odd
if the difference in bit labeling was retained. Since we're switching to
a consolidated set of basic helpers, such an anomaly would better not
survive imo.

Jan
Oleksii Kurochko March 14, 2024, 4:48 p.m. UTC | #2
On Thu, 2024-03-14 at 15:30 +0100, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct
> > domain_iommu *hd,
> >      if ( contig_mask )
> >      {
> >          /* See pt-contig-markers.h for a description of the marker
> > scheme. */
> > -        unsigned int i, shift = find_first_set_bit(contig_mask);
> > +        unsigned int i, shift = ffsl(contig_mask) - 1;
> 
> The need for subtracting 1 is why personally I dislike ffs() / ffsl()
> (and
> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
> introduced).
> 
> But what I first of all would like to have clarification on is what
> your
> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
> Potential side-by-side uses would be odd now, and would continue to
> be odd
> if the difference in bit labeling was retained. Since we're switching
> to
> a consolidated set of basic helpers, such an anomaly would better not
> survive imo.
Right now, ffz() is defined as __ffs(~(x)), so and it seems to me
__ffs()/ffz() exist only as a Linux compatible, so I wanted as a part
of RISC-V patch series put into xen/linux-compat.h and just include
this header where it will be necessary:

#define __ffs(x) (ffs(~(x)) - 1)
#define ffz(x) __ffs(~(x))

Why should we care about ffzl()? It is not used in Xen, is it?

~ Oleksii
Jan Beulich March 14, 2024, 4:55 p.m. UTC | #3
On 14.03.2024 17:48, Oleksii wrote:
> On Thu, 2024-03-14 at 15:30 +0100, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct
>>> domain_iommu *hd,
>>>      if ( contig_mask )
>>>      {
>>>          /* See pt-contig-markers.h for a description of the marker
>>> scheme. */
>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>>
>> The need for subtracting 1 is why personally I dislike ffs() / ffsl()
>> (and
>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>> introduced).
>>
>> But what I first of all would like to have clarification on is what
>> your
>> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
>> Potential side-by-side uses would be odd now, and would continue to
>> be odd
>> if the difference in bit labeling was retained. Since we're switching
>> to
>> a consolidated set of basic helpers, such an anomaly would better not
>> survive imo.
> Right now, ffz() is defined as __ffs(~(x)), so and it seems to me
> __ffs()/ffz() exist only as a Linux compatible, so I wanted as a part
> of RISC-V patch series put into xen/linux-compat.h and just include
> this header where it will be necessary:
> 
> #define __ffs(x) (ffs(~(x)) - 1)
> #define ffz(x) __ffs(~(x))

Well, right now ffz() is used in just a single file. One option therefore
would be to not have it available generally, and - as you say - if need
be supply it in linux-compat.h. Another option would be to have something
along its lines generally available, if we deem it useful.

> Why should we care about ffzl()? It is not used in Xen, is it?

I find it odd to have ffs() and ffsl(), but then ffz() without ffzl().
That's not my understanding of a generally useful (and largely free of
surprises) set of library routines.

Jan
Andrew Cooper March 14, 2024, 6:47 p.m. UTC | #4
On 14/03/2024 2:30 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>      if ( contig_mask )
>>      {
>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
> introduced).

It's sad that there are competing APIs with different bit-labelling, but
the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
ARM that I studied in detail).

I firmly believe that fewer APIs which are fully well defined (and can
optimise based on the compiler's idea of safety) is still better than a
maze of APIs with different behaviours.

> But what I first of all would like to have clarification on is what your
> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
> Potential side-by-side uses would be odd now, and would continue to be odd
> if the difference in bit labeling was retained. Since we're switching to
> a consolidated set of basic helpers, such an anomaly would better not
> survive imo.

I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
the existing mess so RISC-V wasn't making it yet-worse.

But yes - it warrants thinking about.


I was intending to do the fls() next then popcnt().   The latter has
quite a lot of cleanup wanting to come with it, and is more
architecturally invasive, and I know I've got a years-old outstanding
piece of work to try and do popcnt more nicely on x86.

I have wanted ffz() in the past.  I think I just went with explicit ~
because I didn't want to continue this debate at the time.

However, I (very much more) do not want a situation where ffs() and
ffz() have different bit-labellings.


There are no builtins, and having now studied the architectures we care
about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
leading/trailing zeros" instruction.

So using ffs(~val) really will get you the best code generation
available, and seeing as it halves the number of bitops to maintain, I
think this is the best tradeoff overall.

I intend to put ffz() and __ffs() into linux-compat.h and leave them
there to discourage their use generally.

~Andrew
Andrew Cooper March 14, 2024, 6:51 p.m. UTC | #5
On 14/03/2024 6:47 pm, Andrew Cooper wrote:
> On 14/03/2024 2:30 pm, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>>      if ( contig_mask )
>>>      {
>>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>> introduced).
> It's sad that there are competing APIs with different bit-labelling, but
> the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
> ARM that I studied in detail).
>
> I firmly believe that fewer APIs which are fully well defined (and can
> optimise based on the compiler's idea of safety) is still better than a
> maze of APIs with different behaviours.
>
>> But what I first of all would like to have clarification on is what your
>> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
>> Potential side-by-side uses would be odd now, and would continue to be odd
>> if the difference in bit labeling was retained. Since we're switching to
>> a consolidated set of basic helpers, such an anomaly would better not
>> survive imo.
> I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
> the existing mess so RISC-V wasn't making it yet-worse.
>
> But yes - it warrants thinking about.
>
>
> I was intending to do the fls() next then popcnt().   The latter has
> quite a lot of cleanup wanting to come with it, and is more
> architecturally invasive, and I know I've got a years-old outstanding
> piece of work to try and do popcnt more nicely on x86.
>
> I have wanted ffz() in the past.  I think I just went with explicit ~
> because I didn't want to continue this debate at the time.
>
> However, I (very much more) do not want a situation where ffs() and
> ffz() have different bit-labellings.
>
>
> There are no builtins, and having now studied the architectures we care
> about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
> leading/trailing zeros" instruction.

Hopefully obviously, I meant ones here.   My point is that the compiler
emitted code always has a NOT in it somewhere.

>
> So using ffs(~val) really will get you the best code generation
> available, and seeing as it halves the number of bitops to maintain, I
> think this is the best tradeoff overall.
>
> I intend to put ffz() and __ffs() into linux-compat.h and leave them
> there to discourage their use generally.
>
> ~Andrew
Jan Beulich March 18, 2024, 9:13 a.m. UTC | #6
On 14.03.2024 19:51, Andrew Cooper wrote:
> On 14/03/2024 6:47 pm, Andrew Cooper wrote:
>> On 14/03/2024 2:30 pm, Jan Beulich wrote:
>>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>>>      if ( contig_mask )
>>>>      {
>>>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>>> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
>>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>>> introduced).
>> It's sad that there are competing APIs with different bit-labelling, but
>> the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
>> ARM that I studied in detail).
>>
>> I firmly believe that fewer APIs which are fully well defined (and can
>> optimise based on the compiler's idea of safety) is still better than a
>> maze of APIs with different behaviours.

I agree here. The anomaly (as I would call it) with ffs(), though, is what
makes me wonder whether we might not be better off introducing ctz() and
clz() instead. Unlike ffs() their name says exactly what is meant. This is
then also a clear hint, for Arm and RISC-V at least, what underlying
instruction is used. Plus there are matching builtins (unlike for e.g.
fls()).

>>> But what I first of all would like to have clarification on is what your
>>> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
>>> Potential side-by-side uses would be odd now, and would continue to be odd
>>> if the difference in bit labeling was retained. Since we're switching to
>>> a consolidated set of basic helpers, such an anomaly would better not
>>> survive imo.
>> I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
>> the existing mess so RISC-V wasn't making it yet-worse.
>>
>> But yes - it warrants thinking about.
>>
>>
>> I was intending to do the fls() next then popcnt().   The latter has
>> quite a lot of cleanup wanting to come with it, and is more
>> architecturally invasive, and I know I've got a years-old outstanding
>> piece of work to try and do popcnt more nicely on x86.
>>
>> I have wanted ffz() in the past.  I think I just went with explicit ~
>> because I didn't want to continue this debate at the time.
>>
>> However, I (very much more) do not want a situation where ffs() and
>> ffz() have different bit-labellings.
>>
>>
>> There are no builtins, and having now studied the architectures we care
>> about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
>> leading/trailing zeros" instruction.
> 
> Hopefully obviously, I meant ones here.   My point is that the compiler
> emitted code always has a NOT in it somewhere.

Right; I was about to ask but then remembered there was another mail from
you on this thread.

>> So using ffs(~val) really will get you the best code generation
>> available, and seeing as it halves the number of bitops to maintain, I
>> think this is the best tradeoff overall.
>>
>> I intend to put ffz() and __ffs() into linux-compat.h and leave them
>> there to discourage their use generally.

I'm okay with this plan. As per above I'd prefer if ffs() moved there, too.

Jan
Andrew Cooper March 18, 2024, 12:27 p.m. UTC | #7
On 18/03/2024 9:13 am, Jan Beulich wrote:
> On 14.03.2024 19:51, Andrew Cooper wrote:
>> On 14/03/2024 6:47 pm, Andrew Cooper wrote:
>>> On 14/03/2024 2:30 pm, Jan Beulich wrote:
>>>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>>>>      if ( contig_mask )
>>>>>      {
>>>>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>>>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>>>> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
>>>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>>>> introduced).
>>> It's sad that there are competing APIs with different bit-labelling, but
>>> the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
>>> ARM that I studied in detail).
>>>
>>> I firmly believe that fewer APIs which are fully well defined (and can
>>> optimise based on the compiler's idea of safety) is still better than a
>>> maze of APIs with different behaviours.
> I agree here. The anomaly (as I would call it) with ffs(), though, is what
> makes me wonder whether we might not be better off introducing ctz() and
> clz() instead. Unlike ffs() their name says exactly what is meant. This is
> then also a clear hint, for Arm and RISC-V at least, what underlying
> instruction is used. Plus there are matching builtins (unlike for e.g.
> fls()).

I considered this, but I think it will be a bad idea.

Right now, almost all of our logic is expressed in terms of
ffs()/fls().  Rearranging this to clz/ctz is risky enough on its own,
let alone the potential for mistakes during backport.

Both ffs() and fls() are well defined for all inputs, and I've found a
way to let the optimiser deal with simplifying things when safe to do so.

Therefore, keeping ffs()/fls() is the right thing to do.  It's harder to
shoot yourself in the foot with, and optimiser can still do an good job
in the general case.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index d9768cc9527d..7484b3f73ad3 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -168,14 +168,14 @@  static void cf_check xen_evtchn_upcall(void)
 
     while ( pending )
     {
-        unsigned int l1 = find_first_set_bit(pending);
+        unsigned int l1 = ffsl(pending) - 1;
         unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
 
         __clear_bit(l1, &pending);
         evtchn &= ~XEN_shared_info->evtchn_mask[l1];
         while ( evtchn )
         {
-            unsigned int port = find_first_set_bit(evtchn);
+            unsigned int port = ffsl(evtchn) - 1;
 
             __clear_bit(port, &evtchn);
             port += l1 * BITS_PER_LONG;
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index bbae8a564522..7bc092675628 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -139,7 +139,7 @@  static int __init pvh_populate_memory_range(struct domain *d,
         order = get_order_from_pages(end - start + 1);
         order = min(order ? order - 1 : 0, max_order);
         /* The order allocated and populated must be aligned to the address. */
-        order = min(order, start ? find_first_set_bit(start) : MAX_ORDER);
+        order = min(order, start ? ffsl(start) - 1 : MAX_ORDER);
         page = alloc_domheap_pages(d, order, dom0_memflags | MEMF_no_scrub);
         if ( page == NULL )
         {
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 1db9c0b60ee0..30ec14b24110 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -336,7 +336,7 @@  static void timer_sanitize_int_route(HPETState *h, unsigned int tn)
      * enabled pick the first irq.
      */
     timer_config(h, tn) |=
-        MASK_INSR(find_first_set_bit(timer_int_route_cap(h, tn)),
+        MASK_INSR(ffsl(timer_int_route_cap(h, tn)) - 1,
                   HPET_TN_ROUTE);
 }
 
@@ -410,7 +410,7 @@  static int cf_check hpet_write(
         {
             bool active;
 
-            i = find_first_set_bit(new_val);
+            i = ffsl(new_val) - 1;
             if ( i >= HPET_TIMER_NUM )
                 break;
             __clear_bit(i, &new_val);
@@ -536,14 +536,14 @@  static int cf_check hpet_write(
     /* stop/start timers whos state was changed by this write. */
     while (stop_timers)
     {
-        i = find_first_set_bit(stop_timers);
+        i = ffsl(stop_timers) - 1;
         __clear_bit(i, &stop_timers);
         hpet_stop_timer(h, i, guest_time);
     }
 
     while (start_timers)
     {
-        i = find_first_set_bit(start_timers);
+        i = ffsl(start_timers) - 1;
         __clear_bit(i, &start_timers);
         hpet_set_timer(h, i, guest_time);
     }
diff --git a/xen/arch/x86/include/asm/pt-contig-markers.h b/xen/arch/x86/include/asm/pt-contig-markers.h
index b3c1fe803534..e8c8157d605f 100644
--- a/xen/arch/x86/include/asm/pt-contig-markers.h
+++ b/xen/arch/x86/include/asm/pt-contig-markers.h
@@ -60,7 +60,7 @@  static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx,
     /* Step 1: Reduce markers in lower numbered entries. */
     while ( i )
     {
-        b = find_first_set_bit(i);
+        b = ffsl(i) - 1;
         i &= ~(1U << b);
         if ( GET_MARKER(pt[i]) <= b )
             break;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 62f5b811bbe8..28e9a159b577 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3418,7 +3418,7 @@  static int vcpumask_to_pcpumask(
         {
             unsigned int cpu;
 
-            vcpu_id = find_first_set_bit(vmask);
+            vcpu_id = ffsl(vmask) - 1;
             vmask &= ~(1UL << vcpu_id);
             vcpu_id += vcpu_bias;
             if ( (vcpu_id >= d->max_vcpus) )
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index 65d31e552305..e0ad934d2e30 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -684,7 +684,7 @@  unsigned long
 p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn, unsigned int order)
 {
     unsigned long left = 1UL << order, ret = 0;
-    unsigned int chunk_order = find_first_set_bit(gfn_x(gfn) | left);
+    unsigned int chunk_order = ffsl(gfn_x(gfn) | left) - 1;
 
     do {
         ret += decrease_reservation(d, gfn, chunk_order);
@@ -1384,7 +1384,7 @@  guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
                                       unsigned int order)
 {
     unsigned long left = 1UL << order;
-    unsigned int chunk_order = find_first_set_bit(gfn | left);
+    unsigned int chunk_order = ffsl(gfn | left) - 1;
     int rc;
 
     if ( !paging_mode_translate(d) )
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 2ec17df9b420..812eac51ea0d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1817,7 +1817,7 @@  static void _init_heap_pages(const struct page_info *pg,
     if ( unlikely(!avail[nid]) )
     {
         bool use_tail = IS_ALIGNED(s, 1UL << MAX_ORDER) &&
-                        (find_first_set_bit(e) <= find_first_set_bit(s));
+                        (ffsl(e) <= ffsl(s));
         unsigned long n;
 
         n = init_node_heap(nid, s, nr_pages, &use_tail);
diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index 321d26902d37..bee4a82009c3 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -48,7 +48,7 @@  static void __do_softirq(unsigned long ignore_mask)
              || cpu_is_offline(cpu) )
             break;
 
-        i = find_first_set_bit(pending);
+        i = ffsl(pending) - 1;
         clear_bit(i, &softirq_pending(cpu));
         (*softirq_handlers[i])();
     }
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index e0f4fe736a8d..f1061bfc798c 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -137,7 +137,7 @@  static void set_iommu_ptes_present(unsigned long pt_mfn,
         ASSERT(!pde->u);
 
         if ( pde > table )
-            ASSERT(pde->ign0 == find_first_set_bit(pde - table));
+            ASSERT(pde->ign0 == ffsl(pde - table) - 1);
         else
             ASSERT(pde->ign0 == CONTIG_LEVEL_SHIFT);
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 996c31be1284..67dd8e5cd9e1 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -301,7 +301,7 @@  static unsigned int mapping_order(const struct domain_iommu *hd,
 {
     unsigned long res = dfn_x(dfn) | mfn_x(mfn);
     unsigned long sizes = hd->platform_ops->page_sizes;
-    unsigned int bit = find_first_set_bit(sizes), order = 0;
+    unsigned int bit = ffsl(sizes) - 1, order = 0;
 
     ASSERT(bit == PAGE_SHIFT);
 
@@ -309,7 +309,7 @@  static unsigned int mapping_order(const struct domain_iommu *hd,
     {
         unsigned long mask;
 
-        bit = find_first_set_bit(sizes);
+        bit = ffsl(sizes) - 1;
         mask = (1UL << bit) - 1;
         if ( nr <= mask || (res & mask) )
             break;
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index a3fa0aef7c37..d721ea27a033 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -641,7 +641,7 @@  struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
     if ( contig_mask )
     {
         /* See pt-contig-markers.h for a description of the marker scheme. */
-        unsigned int i, shift = find_first_set_bit(contig_mask);
+        unsigned int i, shift = ffsl(contig_mask) - 1;
 
         ASSERT((CONTIG_LEVEL_SHIFT & (contig_mask >> shift)) == CONTIG_LEVEL_SHIFT);
 
@@ -652,7 +652,7 @@  struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
 
         for ( i = 4; i < PAGE_SIZE / sizeof(*p); i += 4 )
         {
-            p[i + 0] = (find_first_set_bit(i) + 0ULL) << shift;
+            p[i + 0] = (ffsl(i) - 1ULL) << shift;
             p[i + 1] = 0;
             p[i + 2] = 1ULL << shift;
             p[i + 3] = 0;