diff mbox series

[08/14] lib/Kconfig: introduce FAST_PATH option

Message ID 20210218040512.709186-9-yury.norov@gmail.com (mailing list archive)
State New
Headers show
Series lib/find_bit: fast path for small bitmaps | expand

Commit Message

Yury Norov Feb. 18, 2021, 4:05 a.m. UTC
This series introduces fast paths for find_bit() routines. It is
beneficial for typical systems, but those who limited in I-cache
may be concerned about increasing the .text size of the Image.

To address this concern, one can disable FAST_PATH option in the config
and some save memory.

The effect of this option on my arm64 next-20210217 build is:

Before:
	Sections:
	Idx Name          Size      VMA               LMA               File off  Algn
	  0 .head.text    00010000  ffff800010000000  ffff800010000000  00010000  2**16
			  CONTENTS, ALLOC, LOAD, READONLY, CODE
	  1 .text         0115e3a8  ffff800010010000  ffff800010010000  00020000  2**16
			  CONTENTS, ALLOC, LOAD, READONLY, CODE
	  2 .got.plt      00000018  ffff80001116e3a8  ffff80001116e3a8  0117e3a8  2**3
			  CONTENTS, ALLOC, LOAD, DATA
	  3 .rodata       007a72ca  ffff800011170000  ffff800011170000  01180000  2**12
			  CONTENTS, ALLOC, LOAD, DATA
	  ...

After:
	Sections:
	Idx Name          Size      VMA               LMA               File off  Algn
	  0 .head.text    00010000  ffff800010000000  ffff800010000000  00010000  2**16
			  CONTENTS, ALLOC, LOAD, READONLY, CODE
	  1 .text         011623a8  ffff800010010000  ffff800010010000  00020000  2**16
			  CONTENTS, ALLOC, LOAD, READONLY, CODE
	  2 .got.plt      00000018  ffff8000111723a8  ffff8000111723a8  011823a8  2**3
			  CONTENTS, ALLOC, LOAD, DATA
	  3 .rodata       007a772a  ffff800011180000  ffff800011180000  01190000  2**12
			  CONTENTS, ALLOC, LOAD, DATA
	  ...

Notice that this is the cumulive effect on already existing fast paths
controlled by SMALL_CONST() together with ones added by this series.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/asm-generic/bitsperlong.h | 4 ++++
 lib/Kconfig                       | 7 +++++++
 2 files changed, 11 insertions(+)

Comments

Andy Shevchenko Feb. 18, 2021, 3:15 p.m. UTC | #1
On Wed, Feb 17, 2021 at 08:05:06PM -0800, Yury Norov wrote:
> This series introduces fast paths for find_bit() routines. It is
> beneficial for typical systems, but those who limited in I-cache
> may be concerned about increasing the .text size of the Image.
> 
> To address this concern, one can disable FAST_PATH option in the config
> and some save memory.
> 
> The effect of this option on my arm64 next-20210217 build is:

(Maybe bloat-o-meter will give better view on this, i.e. more human-readable)

> Before:
> 	Sections:
> 	Idx Name          Size      VMA               LMA               File off  Algn
> 	  0 .head.text    00010000  ffff800010000000  ffff800010000000  00010000  2**16
> 			  CONTENTS, ALLOC, LOAD, READONLY, CODE
> 	  1 .text         0115e3a8  ffff800010010000  ffff800010010000  00020000  2**16
> 			  CONTENTS, ALLOC, LOAD, READONLY, CODE
> 	  2 .got.plt      00000018  ffff80001116e3a8  ffff80001116e3a8  0117e3a8  2**3
> 			  CONTENTS, ALLOC, LOAD, DATA
> 	  3 .rodata       007a72ca  ffff800011170000  ffff800011170000  01180000  2**12
> 			  CONTENTS, ALLOC, LOAD, DATA
> 	  ...
> 
> After:
> 	Sections:
> 	Idx Name          Size      VMA               LMA               File off  Algn
> 	  0 .head.text    00010000  ffff800010000000  ffff800010000000  00010000  2**16
> 			  CONTENTS, ALLOC, LOAD, READONLY, CODE
> 	  1 .text         011623a8  ffff800010010000  ffff800010010000  00020000  2**16
> 			  CONTENTS, ALLOC, LOAD, READONLY, CODE
> 	  2 .got.plt      00000018  ffff8000111723a8  ffff8000111723a8  011823a8  2**3
> 			  CONTENTS, ALLOC, LOAD, DATA
> 	  3 .rodata       007a772a  ffff800011180000  ffff800011180000  01190000  2**12
> 			  CONTENTS, ALLOC, LOAD, DATA
> 	  ...
> 
> Notice that this is the cumulive effect on already existing fast paths
> controlled by SMALL_CONST() together with ones added by this series.

...

> +config FAST_PATH

I think the name is to broad for this cases, perhaps BITS_FAST_PATH? or BITMAP?

> +	bool "Enable fast path code generation"
> +	default y
> +	help
> +	  This option enables fast path optimization with the cost of increasing
> +	  the text section.
Yury Norov Feb. 18, 2021, 7:24 p.m. UTC | #2
On Thu, Feb 18, 2021 at 05:15:43PM +0200, Andy Shevchenko wrote:
> On Wed, Feb 17, 2021 at 08:05:06PM -0800, Yury Norov wrote:
> > This series introduces fast paths for find_bit() routines. It is
> > beneficial for typical systems, but those who limited in I-cache
> > may be concerned about increasing the .text size of the Image.
> > 
> > To address this concern, one can disable FAST_PATH option in the config
> > and some save memory.
> > 
> > The effect of this option on my arm64 next-20210217 build is:
> 
> (Maybe bloat-o-meter will give better view on this, i.e. more human-readable)

Never heard about this tool, thanks for the hint.

scripts/bloat-o-meter vmlinux vmlinux.new
add/remove: 16/13 grow/shrink: 111/439 up/down: 3616/-19352 (-15736)
Function                                     old     new   delta
find_next_bit.constprop                        -     220    +220
apply_wqattrs_cleanup                          -     176    +176
memcg_free_shrinker_maps                       -     172    +172
...
cpuset_hotplug_workfn                       2584    2288    -296
task_numa_fault                             3640    3320    -320
kmem_cache_free_bulk                        1684    1280    -404
Total: Before=26085140, After=26069404, chg -0.06%

The complete output is here:
https://pastebin.com/kBSdVJcK

So if I understand the output correctly, the size of .text is decreased...
Looks weird, but if it's true, we don't need the FAST_BIT config at all
because there's no tradeoff, and I should drop the patch.

Hmm...

> > +config FAST_PATH
> 
> I think the name is to broad for this cases, perhaps BITS_FAST_PATH? or BITMAP?

My logic was that since SMALL_CONST() is global, and FAST_PATH
controls the SMALL_CONST, it should also be global. I believe,
Linux should have a global switch to control the behaviour in
such cases, similarly to -Os compiler option. And I was surprized
when I found nothing like FAST_PATH in the config.

What about having FAST_PATH as a global option, and later if someone
will request for granularity, we'll introduce nested configs?

> > +	bool "Enable fast path code generation"
> > +	default y
> > +	help
> > +	  This option enables fast path optimization with the cost of increasing
> > +	  the text section.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
>
Andy Shevchenko Feb. 19, 2021, 10:52 a.m. UTC | #3
On Thu, Feb 18, 2021 at 11:24:19AM -0800, Yury Norov wrote:
> On Thu, Feb 18, 2021 at 05:15:43PM +0200, Andy Shevchenko wrote:
> > On Wed, Feb 17, 2021 at 08:05:06PM -0800, Yury Norov wrote:
> > > This series introduces fast paths for find_bit() routines. It is
> > > beneficial for typical systems, but those who limited in I-cache
> > > may be concerned about increasing the .text size of the Image.
> > > 
> > > To address this concern, one can disable FAST_PATH option in the config
> > > and some save memory.
> > > 
> > > The effect of this option on my arm64 next-20210217 build is:
> > 
> > (Maybe bloat-o-meter will give better view on this, i.e. more human-readable)
> 
> Never heard about this tool, thanks for the hint.
> 
> scripts/bloat-o-meter vmlinux vmlinux.new
> add/remove: 16/13 grow/shrink: 111/439 up/down: 3616/-19352 (-15736)
> Function                                     old     new   delta
> find_next_bit.constprop                        -     220    +220
> apply_wqattrs_cleanup                          -     176    +176
> memcg_free_shrinker_maps                       -     172    +172
> ...
> cpuset_hotplug_workfn                       2584    2288    -296
> task_numa_fault                             3640    3320    -320
> kmem_cache_free_bulk                        1684    1280    -404
> Total: Before=26085140, After=26069404, chg -0.06%
> 
> The complete output is here:
> https://pastebin.com/kBSdVJcK
> 
> So if I understand the output correctly, the size of .text is decreased...
> Looks weird, but if it's true, we don't need the FAST_BIT config at all
> because there's no tradeoff, and I should drop the patch.

I actually expected the text size decrease when it's about constants.
I remember that in PCI case we discussed with Bjorn the use of
for_each_set_bit() that brought entire function into the object file that
increased it by ~300 bytes (or so). But the code is something like

	for_each_set_bit(i, &addr, 32)

...

> > I think the name is to broad for this cases, perhaps BITS_FAST_PATH? or BITMAP?
> 
> My logic was that since SMALL_CONST() is global, and FAST_PATH
> controls the SMALL_CONST, it should also be global. I believe,
> Linux should have a global switch to control the behaviour in
> such cases, similarly to -Os compiler option. And I was surprized
> when I found nothing like FAST_PATH in the config.
> 
> What about having FAST_PATH as a global option, and later if someone
> will request for granularity, we'll introduce nested configs?

I think it is too far from now. Let's do one step at a time.
diff mbox series

Patch

diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h
index 0eeb77544f1d..209e531074c1 100644
--- a/include/asm-generic/bitsperlong.h
+++ b/include/asm-generic/bitsperlong.h
@@ -23,6 +23,10 @@ 
 #define BITS_PER_LONG_LONG 64
 #endif
 
+#ifdef CONFIG_FAST_PATH
 #define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG)
+#else
+#define SMALL_CONST(n) (0)
+#endif
 
 #endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/lib/Kconfig b/lib/Kconfig
index a38cc61256f1..7a1b9c8d2a32 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -39,6 +39,13 @@  config PACKING
 
 	  When in doubt, say N.
 
+config FAST_PATH
+	bool "Enable fast path code generation"
+	default y
+	help
+	  This option enables fast path optimization with the cost of increasing
+	  the text section.
+
 config BITREVERSE
 	tristate