mbox series

[RESEND,v2,0/6] lib/find_bit: fast path for small bitmaps

Message ID 20210130191719.7085-1-yury.norov@gmail.com (mailing list archive)
Headers show
Series lib/find_bit: fast path for small bitmaps | expand

Message

Yury Norov Jan. 30, 2021, 7:17 p.m. UTC
Bitmap operations are much simpler and faster in case of small bitmaps
which fit into a single word. In linux/bitmap.h we have a machinery that
allows compiler to replace actual function call with a few instructions
if bitmaps passed into the function are small and their size is known at
compile time.

find_*_bit() API lacks this functionality; despite users will benefit from
it a lot. One important example is cpumask subsystem when
NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.

Tools is synchronized with new implementation where needed.

v1: https://www.spinics.net/lists/kernel/msg3804727.html
v2: - employ GENMASK() for bitmaps;
    - unify find_bit inliners in;
    - address comments to v1;



Yury Norov (8):
  tools: disable -Wno-type-limits
  tools: bitmap: sync function declarations with linux kernel
  arch: rearrange headers inclusion order in asm/bitops for m68k and sh
  lib: introduce BITS_{FIRST,LAST} macro
  bitsperlong.h: introduce SMALL_CONST() macro
  lib: inline _find_next_bit() wrappers
  lib: add fast path for find_next_*_bit()
  lib: add fast path for find_first_*_bit() and find_last_bit()

 arch/m68k/include/asm/bitops.h          |   4 +-
 arch/sh/include/asm/bitops.h            |   3 +-
 include/asm-generic/bitops/find.h       | 108 +++++++++++++++++++++---
 include/asm-generic/bitops/le.h         |  38 ++++++++-
 include/asm-generic/bitsperlong.h       |   2 +
 include/linux/bitmap.h                  |  60 ++++++-------
 include/linux/bitops.h                  |  12 ---
 include/linux/bits.h                    |   6 ++
 include/linux/cpumask.h                 |   8 +-
 include/linux/netdev_features.h         |   2 +-
 include/linux/nodemask.h                |   2 +-
 lib/bitmap.c                            |  26 +++---
 lib/find_bit.c                          |  72 +++-------------
 lib/genalloc.c                          |   8 +-
 tools/include/asm-generic/bitops/find.h |  85 +++++++++++++++++--
 tools/include/asm-generic/bitsperlong.h |   2 +
 tools/include/linux/bitmap.h            |  47 ++++-------
 tools/include/linux/bits.h              |   6 ++
 tools/lib/bitmap.c                      |  10 +--
 tools/lib/find_bit.c                    |  56 +++++-------
 tools/scripts/Makefile.include          |   1 +
 tools/testing/radix-tree/bitmap.c       |   4 +-
 22 files changed, 337 insertions(+), 225 deletions(-)

Comments

Yury Norov Feb. 15, 2021, 9:30 p.m. UTC | #1
[add David Laight <David.Laight@ACULAB.COM> ]

On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> Bitmap operations are much simpler and faster in case of small bitmaps
> which fit into a single word. In linux/bitmap.h we have a machinery that
> allows compiler to replace actual function call with a few instructions
> if bitmaps passed into the function are small and their size is known at
> compile time.
> 
> find_*_bit() API lacks this functionality; despite users will benefit from
> it a lot. One important example is cpumask subsystem when
> NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
> 
> Tools is synchronized with new implementation where needed.
> 
> v1: https://www.spinics.net/lists/kernel/msg3804727.html
> v2: - employ GENMASK() for bitmaps;
>     - unify find_bit inliners in;
>     - address comments to v1;

Comments so far:
 - increased image size (patch #8) - addressed by introducing
   CONFIG_FAST_PATH;
 - split tools and kernel parts - not clear why it's better.

 Anything else?
Andy Shevchenko Feb. 16, 2021, 9:14 a.m. UTC | #2
On Mon, Feb 15, 2021 at 01:30:44PM -0800, Yury Norov wrote:
> [add David Laight <David.Laight@ACULAB.COM> ]
> 
> On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> > Bitmap operations are much simpler and faster in case of small bitmaps
> > which fit into a single word. In linux/bitmap.h we have a machinery that
> > allows compiler to replace actual function call with a few instructions
> > if bitmaps passed into the function are small and their size is known at
> > compile time.
> > 
> > find_*_bit() API lacks this functionality; despite users will benefit from
> > it a lot. One important example is cpumask subsystem when
> > NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> > a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
> > 
> > Tools is synchronized with new implementation where needed.
> > 
> > v1: https://www.spinics.net/lists/kernel/msg3804727.html
> > v2: - employ GENMASK() for bitmaps;
> >     - unify find_bit inliners in;
> >     - address comments to v1;
> 
> Comments so far:
>  - increased image size (patch #8) - addressed by introducing
>    CONFIG_FAST_PATH;

>  - split tools and kernel parts - not clear why it's better.

Because tools are user space programs and sometimes may not follow kernel
specifics, so they are different logically and changes should be separated.

>  Anything else?
Yury Norov Feb. 16, 2021, 6 p.m. UTC | #3
On Tue, Feb 16, 2021 at 11:14:23AM +0200, Andy Shevchenko wrote:
> On Mon, Feb 15, 2021 at 01:30:44PM -0800, Yury Norov wrote:
> > [add David Laight <David.Laight@ACULAB.COM> ]
> > 
> > On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> > > Bitmap operations are much simpler and faster in case of small bitmaps
> > > which fit into a single word. In linux/bitmap.h we have a machinery that
> > > allows compiler to replace actual function call with a few instructions
> > > if bitmaps passed into the function are small and their size is known at
> > > compile time.
> > > 
> > > find_*_bit() API lacks this functionality; despite users will benefit from
> > > it a lot. One important example is cpumask subsystem when
> > > NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> > > a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
> > > 
> > > Tools is synchronized with new implementation where needed.
> > > 
> > > v1: https://www.spinics.net/lists/kernel/msg3804727.html
> > > v2: - employ GENMASK() for bitmaps;
> > >     - unify find_bit inliners in;
> > >     - address comments to v1;
> > 
> > Comments so far:
> >  - increased image size (patch #8) - addressed by introducing
> >    CONFIG_FAST_PATH;
> 
> >  - split tools and kernel parts - not clear why it's better.
> 
> Because tools are user space programs and sometimes may not follow kernel
> specifics, so they are different logically and changes should be separated.

In this specific case tools follow kernel well.

Nevertheless, if you think it's a blocker for the series, I can split. What
option for tools is better for you - doubling the number of patches or 
squashing everything in a patch bomb?
Andy Shevchenko Feb. 17, 2021, 10:33 a.m. UTC | #4
On Tue, Feb 16, 2021 at 10:00:42AM -0800, Yury Norov wrote:
> On Tue, Feb 16, 2021 at 11:14:23AM +0200, Andy Shevchenko wrote:
> > On Mon, Feb 15, 2021 at 01:30:44PM -0800, Yury Norov wrote:
> > > [add David Laight <David.Laight@ACULAB.COM> ]
> > > 
> > > On Sat, Jan 30, 2021 at 11:17:11AM -0800, Yury Norov wrote:
> > > > Bitmap operations are much simpler and faster in case of small bitmaps
> > > > which fit into a single word. In linux/bitmap.h we have a machinery that
> > > > allows compiler to replace actual function call with a few instructions
> > > > if bitmaps passed into the function are small and their size is known at
> > > > compile time.
> > > > 
> > > > find_*_bit() API lacks this functionality; despite users will benefit from
> > > > it a lot. One important example is cpumask subsystem when
> > > > NR_CPUS <= BITS_PER_LONG. In the very best case, the compiler may replace
> > > > a find_*_bit() call for such a bitmap with a single ffs or ffz instruction.
> > > > 
> > > > Tools is synchronized with new implementation where needed.
> > > > 
> > > > v1: https://www.spinics.net/lists/kernel/msg3804727.html
> > > > v2: - employ GENMASK() for bitmaps;
> > > >     - unify find_bit inliners in;
> > > >     - address comments to v1;
> > > 
> > > Comments so far:
> > >  - increased image size (patch #8) - addressed by introducing
> > >    CONFIG_FAST_PATH;
> > 
> > >  - split tools and kernel parts - not clear why it's better.
> > 
> > Because tools are user space programs and sometimes may not follow kernel
> > specifics, so they are different logically and changes should be separated.
> 
> In this specific case tools follow kernel well.
> 
> Nevertheless, if you think it's a blocker for the series, I can split.

It's not a blocker from my side. But you make it harder to push like this,
because you will need a tag from tools, which in my practice is quite
hard to get -> blocker. My point is: don't make obstacles where we can avoid
them. So, if tools won't take this, it won't block us.

> What
> option for tools is better for you - doubling the number of patches or
> squashing everything in a patch bomb?

Not a tools guy, but common sense tells me that the best approach is to follow
kind of changes in the kernel (similar granularity).