Message ID | 20211128035704.270739-3-yury.norov@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | lib/bitmap: optimize bitmap_weight() usage | expand |
On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote: > Now as we have bitmap_weight_eq(), switch bitmap_full() and > bitmap_empty() to using it. [...] > -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) > -{ > - if (small_const_nbits(nbits)) > - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); > - > - return find_first_bit(src, nbits) == nbits; > -} [...] > +static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits) > +{ > + return bitmap_weight_eq(src, nbits, 0); > +} [..] What's the speed difference? Have you benchmarked this? Best Regards Michał Mirosław
On Sun, Nov 28, 2021 at 05:37:19AM +0100, Michał Mirosław wrote: > On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote: > > Now as we have bitmap_weight_eq(), switch bitmap_full() and > > bitmap_empty() to using it. > [...] > > -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) > > -{ > > - if (small_const_nbits(nbits)) > > - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); > > - > > - return find_first_bit(src, nbits) == nbits; > > -} > [...] > > +static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits) > > +{ > > + return bitmap_weight_eq(src, nbits, 0); > > +} > [..] > > What's the speed difference? Have you benchmarked this? bitmap_weight_eq() should be faster than find_first_bit(), but the difference is few cycles, so I didn't bother measuring it. New version looks just better.
On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote: > Now as we have bitmap_weight_eq(), switch bitmap_full() and > bitmap_empty() to using it. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > --- > include/linux/bitmap.h | 26 ++++++++++---------------- > 1 file changed, 10 insertions(+), 16 deletions(-) > > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > index 996041f771c8..2d951e4dc814 100644 > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -386,22 +386,6 @@ static inline int bitmap_subset(const unsigned long *src1, > return __bitmap_subset(src1, src2, nbits); > } > > -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) > -{ > - if (small_const_nbits(nbits)) > - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); > - > - return find_first_bit(src, nbits) == nbits; > -} Since this is supposed to be an optimization, I would go all the way and replace this with the trivial implementation instead: bool bitmap_empty(long *bits, size_t nbits) { for (; nbits >= BITS_PER_LONG; ++bits, nbits -= BITS_PER_LONG) if (*bits) return false; if (nbits && *bits & BITMAP_LAST_WORD_MASK(nbits)) return false; return true; } Best Regards Michał Mirosław
On Sun, Nov 28, 2021 at 10:10 AM Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote: > > On Sat, Nov 27, 2021 at 07:56:57PM -0800, Yury Norov wrote: > > Now as we have bitmap_weight_eq(), switch bitmap_full() and > > bitmap_empty() to using it. > > > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > > --- > > include/linux/bitmap.h | 26 ++++++++++---------------- > > 1 file changed, 10 insertions(+), 16 deletions(-) > > > > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > > index 996041f771c8..2d951e4dc814 100644 > > --- a/include/linux/bitmap.h > > +++ b/include/linux/bitmap.h > > @@ -386,22 +386,6 @@ static inline int bitmap_subset(const unsigned long *src1, > > return __bitmap_subset(src1, src2, nbits); > > } > > > > -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) > > -{ > > - if (small_const_nbits(nbits)) > > - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); > > - > > - return find_first_bit(src, nbits) == nbits; > > -} > > Since this is supposed to be an optimization, I would go all the way and > replace this with the trivial implementation instead: > > bool bitmap_empty(long *bits, size_t nbits) > { > for (; nbits >= BITS_PER_LONG; ++bits, nbits -= BITS_PER_LONG) > if (*bits) > return false; > > if (nbits && *bits & BITMAP_LAST_WORD_MASK(nbits)) > return false; > > return true; > } This is what current implementations basically do, based on find_first_bit(). I think that for long bitmaps the most time consuming operation is moving data to L1, and for short bitmaps the difference between approaches is barely measurable. But hweght_long on each iteration can't be more effective than the current version. So, I'll drop this patch for v2 and keep things unchanged.
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 996041f771c8..2d951e4dc814 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -386,22 +386,6 @@ static inline int bitmap_subset(const unsigned long *src1, return __bitmap_subset(src1, src2, nbits); } -static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) -{ - if (small_const_nbits(nbits)) - return ! (*src & BITMAP_LAST_WORD_MASK(nbits)); - - return find_first_bit(src, nbits) == nbits; -} - -static inline bool bitmap_full(const unsigned long *src, unsigned int nbits) -{ - if (small_const_nbits(nbits)) - return ! (~(*src) & BITMAP_LAST_WORD_MASK(nbits)); - - return find_first_zero_bit(src, nbits) == nbits; -} - static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits) { if (small_const_nbits(nbits)) @@ -436,6 +420,16 @@ static __always_inline bool bitmap_weight_le(const unsigned long *src, return __bitmap_weight_le(src, nbits, num); } +static __always_inline bool bitmap_empty(const unsigned long *src, unsigned int nbits) +{ + return bitmap_weight_eq(src, nbits, 0); +} + +static __always_inline bool bitmap_full(const unsigned long *src, unsigned int nbits) +{ + return bitmap_weight_eq(src, nbits, nbits); +} + static __always_inline void bitmap_set(unsigned long *map, unsigned int start, unsigned int nbits) {
Now as we have bitmap_weight_eq(), switch bitmap_full() and bitmap_empty() to using it. Signed-off-by: Yury Norov <yury.norov@gmail.com> --- include/linux/bitmap.h | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-)