Message ID | 20220617144031.2549432-1-alexandr.lobakin@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | bitops: let optimize out non-atomic bitops on compile-time constants | expand |
Hi Olek, On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin <alexandr.lobakin@intel.com> wrote: > While I was working on converting some structure fields from a fixed > type to a bitmap, I started observing code size increase not only in > places where the code works with the converted structure fields, but > also where the converted vars were on the stack. That said, the > following code: > > DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1]; > unsigned long bar = BIT(BAR_BIT); > unsigned long baz = 0; > > __set_bit(FOO_BIT, foo); > baz |= BIT(BAZ_BIT); > > BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo)); > BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT)); > BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT)); > > triggers the first assertion on x86_64, which means that the > compiler is unable to evaluate it to a compile-time initializer > when the architecture-specific bitop is used even if it's obvious. > I found that this is due to that many architecture-specific > non-atomic bitop implementations use inline asm or other hacks which > are faster or more robust when working with "real" variables (i.e. > fields from the structures etc.), but the compilers have no clue how > to optimize them out when called on compile-time constants. > > So, in order to let the compiler optimize out such cases, expand the > test_bit() and __*_bit() definitions with a compile-time condition > check, so that they will pick the generic C non-atomic bitop > implementations when all of the arguments passed are compile-time > constants, which means that the result will be a compile-time > constant as well and the compiler will produce more efficient and > simple code in 100% cases (no changes when there's at least one > non-compile-time-constant argument). > The condition itself: > > if ( > __builtin_constant_p(nr) && /* <- bit position is constant */ > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is > always either NULL or not */ > addr && /* <- bitmap addr is not NULL */ > __builtin_constant_p(*addr) /* <- compiler knows the value of > the target bitmap */ > ) > /* then pick the generic C variant > else > /* old code path, arch-specific > > I also tried __is_constexpr() as suggested by Andy, but it was > always returning 0 ('not a constant') for the 2,3 and 4th > conditions. > > The savings are architecture, compiler and compiler flags dependent, > for example, on x86_64 -O2: > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) > > and ARM64 (courtesy of Mark[0]): > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) > > And the following: > > DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { }; > __be16 flags; > > __set_bit(IP_TUNNEL_CSUM_BIT, flags); > > tun_flags = cpu_to_be16(*flags & U16_MAX); > > if (test_bit(IP_TUNNEL_VTI_BIT, flags)) > tun_flags |= VTI_ISVTI; > > BUILD_BUG_ON(!__builtin_constant_p(tun_flags)); > > doesn't blow up anymore (which is being checked now at build time), > so that we can now e.g. use fixed bitmaps in compile-time assertions > etc. > > The series has been in intel-next for a while with no reported issues. > > From v2[1]: > * collect several Reviewed-bys (Andy, Yury); > * add a comment to generic_test_bit() that it is atomic-safe and > must always stay like that (the first version of this series > errorneously tried to change this) (Andy, Marco); > * unify the way how architectures define platform-specific bitops, > both supporting instrumentation and not: now they define only > 'arch_' versions and asm-generic includes take care of the rest; > * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr()) > (Andy); > * add compile-time tests to lib/test_bitmap to make sure everything > works as expected on any setup (Yury). Thanks for the update! Still seeing add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136) on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y) with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
From: Geert Uytterhoeven <geert@linux-m68k.org> Date: Mon, 20 Jun 2022 14:07:27 +0200 > Hi Olek, Hey! > > On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin > <alexandr.lobakin@intel.com> wrote: > > While I was working on converting some structure fields from a fixed > > type to a bitmap, I started observing code size increase not only in > > places where the code works with the converted structure fields, but > > also where the converted vars were on the stack. That said, the > > following code: > > > > DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1]; > > unsigned long bar = BIT(BAR_BIT); > > unsigned long baz = 0; > > > > __set_bit(FOO_BIT, foo); > > baz |= BIT(BAZ_BIT); > > > > BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo)); > > BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT)); > > BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT)); > > > > triggers the first assertion on x86_64, which means that the > > compiler is unable to evaluate it to a compile-time initializer > > when the architecture-specific bitop is used even if it's obvious. > > I found that this is due to that many architecture-specific > > non-atomic bitop implementations use inline asm or other hacks which > > are faster or more robust when working with "real" variables (i.e. > > fields from the structures etc.), but the compilers have no clue how > > to optimize them out when called on compile-time constants. > > > > So, in order to let the compiler optimize out such cases, expand the > > test_bit() and __*_bit() definitions with a compile-time condition > > check, so that they will pick the generic C non-atomic bitop > > implementations when all of the arguments passed are compile-time > > constants, which means that the result will be a compile-time > > constant as well and the compiler will produce more efficient and > > simple code in 100% cases (no changes when there's at least one > > non-compile-time-constant argument). > > The condition itself: > > > > if ( > > __builtin_constant_p(nr) && /* <- bit position is constant */ > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is > > always either NULL or not */ > > addr && /* <- bitmap addr is not NULL */ > > __builtin_constant_p(*addr) /* <- compiler knows the value of > > the target bitmap */ > > ) > > /* then pick the generic C variant > > else > > /* old code path, arch-specific > > > > I also tried __is_constexpr() as suggested by Andy, but it was > > always returning 0 ('not a constant') for the 2,3 and 4th > > conditions. > > > > The savings are architecture, compiler and compiler flags dependent, > > for example, on x86_64 -O2: > > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) > > > > and ARM64 (courtesy of Mark[0]): > > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) > > > > And the following: > > > > DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { }; > > __be16 flags; > > > > __set_bit(IP_TUNNEL_CSUM_BIT, flags); > > > > tun_flags = cpu_to_be16(*flags & U16_MAX); > > > > if (test_bit(IP_TUNNEL_VTI_BIT, flags)) > > tun_flags |= VTI_ISVTI; > > > > BUILD_BUG_ON(!__builtin_constant_p(tun_flags)); > > > > doesn't blow up anymore (which is being checked now at build time), > > so that we can now e.g. use fixed bitmaps in compile-time assertions > > etc. > > > > The series has been in intel-next for a while with no reported issues. > > > > From v2[1]: > > * collect several Reviewed-bys (Andy, Yury); > > * add a comment to generic_test_bit() that it is atomic-safe and > > must always stay like that (the first version of this series > > errorneously tried to change this) (Andy, Marco); > > * unify the way how architectures define platform-specific bitops, > > both supporting instrumentation and not: now they define only > > 'arch_' versions and asm-generic includes take care of the rest; > > * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr()) > > (Andy); > > * add compile-time tests to lib/test_bitmap to make sure everything > > works as expected on any setup (Yury). > > Thanks for the update! > > Still seeing > add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136) Meh. What about -O2 (OPTIMIZE_FOR_PERFORMANCE)? I have a thought to make it depend on the config option above, but that would make code behave differently, so it's not safe. Are those 3 Kb critical for m68k machines? I'm asking because for some embedded systems they are :) Another thing, this could happen due to inlining rebalance. E.g. the compiler could inline or uninline some functions due to resolving bit{maps,ops} to compile-time constants. I was seeing such in the past several times. Also, IIRC you already sent some bloat-o-meter results here, and that was the case. On the other hand, if lib/test_bitmap.o builds successfully (I assume it does), the series works as expected. > > on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y) > with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04). > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds Thanks, Olek
From: kernel test robot <lkp@intel.com> Date: Sun, 19 Jun 2022 17:20:05 +0800 Also, could someone please help me with this? I don't get what went wrong with sparse, it's not even some new code, just moving old stuff. > tree: https://github.com/alobakin/linux bitops > head: 9bd39b17ce49d350eed93a031e0da6389067013e > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config) > compiler: mips64el-linux-gcc (GCC) 11.3.0 > reproduce: > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # apt-get install sparse > # sparse version: v0.6.4-30-g92122700-dirty > # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e > git remote add alobakin https://github.com/alobakin/linux > git fetch --no-tags alobakin bitops > git checkout 521611f961a7dda92eefa26e1afd3914c06af64e > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash > > If you fix the issue, kindly add following tag where applicable > Reported-by: kernel test robot <lkp@intel.com> > > > sparse warnings: (new ones prefixed by >>) > command-line: note: in included file: > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined > builtin:0:0: sparse: this was the original definition > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined > builtin:0:0: sparse: this was the original definition > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined > builtin:0:0: sparse: this was the original definition > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined > builtin:0:0: sparse: this was the original definition > block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h): > include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p' > include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old' > include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask' > include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return' > include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return' > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t > > vim +222 block/elevator.c > > 9817064b68fef7 Jens Axboe 2006-07-28 217 > 70b3ea056f3074 Jens Axboe 2016-12-07 218 void elv_rqhash_add(struct request_queue *q, struct request *rq) > 9817064b68fef7 Jens Axboe 2006-07-28 219 { > b374d18a4bfce7 Jens Axboe 2008-10-31 220 struct elevator_queue *e = q->elevator; > 9817064b68fef7 Jens Axboe 2006-07-28 221 > 9817064b68fef7 Jens Axboe 2006-07-28 @222 BUG_ON(ELV_ON_HASH(rq)); > 242d98f077ac0a Sasha Levin 2012-12-17 223 hash_add(e->hash, &rq->hash, rq_hash_key(rq)); > e806402130c9c4 Christoph Hellwig 2016-10-20 224 rq->rq_flags |= RQF_HASHED; > 9817064b68fef7 Jens Axboe 2006-07-28 225 } > bd166ef183c263 Jens Axboe 2017-01-17 226 EXPORT_SYMBOL_GPL(elv_rqhash_add); > 9817064b68fef7 Jens Axboe 2006-07-28 227 > > :::::: The code at line 222 was first introduced by commit > :::::: 9817064b68fef7e4580c6df1ea597e106b9ff88b [PATCH] elevator: move the backmerging logic into the elevator core > > :::::: TO: Jens Axboe <axboe@suse.de> > :::::: CC: Jens Axboe <axboe@nelson.home.kernel.dk> > > -- > 0-DAY CI Kernel Test Service > https://01.org/lkp Thanks, Olek
On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote: > So, in order to let the compiler optimize out such cases, expand the > test_bit() and __*_bit() definitions with a compile-time condition > check, so that they will pick the generic C non-atomic bitop > implementations when all of the arguments passed are compile-time > constants, which means that the result will be a compile-time > constant as well and the compiler will produce more efficient and > simple code in 100% cases (no changes when there's at least one > non-compile-time-constant argument). > The savings are architecture, compiler and compiler flags dependent, > for example, on x86_64 -O2: > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) > > and ARM64 (courtesy of Mark[0]): > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) Hmm... with *this version* of the series, I'm not getting results nearly as good as that when building defconfig atop v5.19-rc3: GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804) GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632) GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548) GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444) GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400) GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608) LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356) LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792) LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712) ... and that now seems to be regressing codegen with recent versions of GCC as much as it improves it LLVM. I'm not sure if we've improved some other code and removed the benefit between v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't look as compelling as it did. Overall that's mostly hidden in the Image size, due to 64K alignment and padding requirements: Toolchain Before After Difference GCC 8.5.0 36178432 36178432 0 GCC 9.3.0 36112896 36112896 0 GCC 10.3.0 36442624 36377088 -65536 GCC 11.1.0 36311552 36377088 +65536 GCC 11.3.0 36311552 36311552 0 GCC 12.1.0 36377088 36377088 0 LLVM 12.0.1 31418880 31418880 0 LLVM 13.0.1 31418880 31418880 0 LLVM 14.0.0 31218176 31218176 0 ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive change overall (due to 64KiB alignment restrictions for portions of the kernel Image). Thanks, Mark.
From: Mark Rutland <mark.rutland@arm.com> Date: Mon, 20 Jun 2022 15:19:42 +0100 > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote: > > So, in order to let the compiler optimize out such cases, expand the > > test_bit() and __*_bit() definitions with a compile-time condition > > check, so that they will pick the generic C non-atomic bitop > > implementations when all of the arguments passed are compile-time > > constants, which means that the result will be a compile-time > > constant as well and the compiler will produce more efficient and > > simple code in 100% cases (no changes when there's at least one > > non-compile-time-constant argument). > > > The savings are architecture, compiler and compiler flags dependent, > > for example, on x86_64 -O2: > > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) > > > > and ARM64 (courtesy of Mark[0]): > > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) > > Hmm... with *this version* of the series, I'm not getting results nearly as > good as that when building defconfig atop v5.19-rc3: > > GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804) > GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632) > GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548) > GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444) > GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400) > GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608) > > LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356) > LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792) > LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712) > > ... and that now seems to be regressing codegen with recent versions of GCC as > much as it improves it LLVM. > > I'm not sure if we've improved some other code and removed the benefit between > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't > look as compelling as it did. Mostly likely it's due to that in v1 I mistakingly removed `volatile` from gen[eric]_test_bit(), so there was an impact for non-constant cases as well. +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does it work? Probably the same reason as for m68k, more constant optimization -> more aggressive inlining or inlining rebalance -> larger code. OTOH I've no idea why sometimes compiler decides to uninline really tiny functions where due to this patch series some bitops have been converted to constants, like it goes on m68k. > > Overall that's mostly hidden in the Image size, due to 64K alignment and > padding requirements: > > Toolchain Before After Difference > > GCC 8.5.0 36178432 36178432 0 > GCC 9.3.0 36112896 36112896 0 > GCC 10.3.0 36442624 36377088 -65536 > GCC 11.1.0 36311552 36377088 +65536 > GCC 11.3.0 36311552 36311552 0 > GCC 12.1.0 36377088 36377088 0 > > LLVM 12.0.1 31418880 31418880 0 > LLVM 13.0.1 31418880 31418880 0 > LLVM 14.0.0 31218176 31218176 0 > > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive > change overall (due to 64KiB alignment restrictions for portions of the kernel > Image). > > Thanks, > Mark. Thanks, Olek
On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin <alexandr.lobakin@intel.com> wrote: > > From: kernel test robot <lkp@intel.com> > Date: Sun, 19 Jun 2022 17:20:05 +0800 > > Also, could someone please help me with this? I don't get what went > wrong with sparse, it's not even some new code, just moving old > stuff. > > > tree: https://github.com/alobakin/linux bitops > > head: 9bd39b17ce49d350eed93a031e0da6389067013e > > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures > > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config) > > compiler: mips64el-linux-gcc (GCC) 11.3.0 > > reproduce: > > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > > chmod +x ~/bin/make.cross > > # apt-get install sparse > > # sparse version: v0.6.4-30-g92122700-dirty > > # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e > > git remote add alobakin https://github.com/alobakin/linux > > git fetch --no-tags alobakin bitops > > git checkout 521611f961a7dda92eefa26e1afd3914c06af64e > > # save the config file > > mkdir build_dir && cp config build_dir/.config > > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash > > > > If you fix the issue, kindly add following tag where applicable > > Reported-by: kernel test robot <lkp@intel.com> > > > > > > sparse warnings: (new ones prefixed by >>) > > command-line: note: in included file: > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined > > builtin:0:0: sparse: this was the original definition > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined > > builtin:0:0: sparse: this was the original definition > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined > > builtin:0:0: sparse: this was the original definition > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined > > builtin:0:0: sparse: this was the original definition > > block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h): > > include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p' > > include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old' > > include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask' > > include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return' > > include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return' > > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t > > > > vim +222 block/elevator.c > > > > 9817064b68fef7 Jens Axboe 2006-07-28 217 > > 70b3ea056f3074 Jens Axboe 2016-12-07 218 void elv_rqhash_add(struct request_queue *q, struct request *rq) > > 9817064b68fef7 Jens Axboe 2006-07-28 219 { > > b374d18a4bfce7 Jens Axboe 2008-10-31 220 struct elevator_queue *e = q->elevator; > > 9817064b68fef7 Jens Axboe 2006-07-28 221 > > 9817064b68fef7 Jens Axboe 2006-07-28 @222 BUG_ON(ELV_ON_HASH(rq)); > > 242d98f077ac0a Sasha Levin 2012-12-17 223 hash_add(e->hash, &rq->hash, rq_hash_key(rq)); > > e806402130c9c4 Christoph Hellwig 2016-10-20 224 rq->rq_flags |= RQF_HASHED; > > 9817064b68fef7 Jens Axboe 2006-07-28 225 } > > bd166ef183c263 Jens Axboe 2017-01-17 226 EXPORT_SYMBOL_GPL(elv_rqhash_add); > > 9817064b68fef7 Jens Axboe 2006-07-28 227 It looks like a false positive for _your_ case, but if you want to fix here is the background. The sparse has an ability to control custom types that should never set bits outside of the limited range. For this the special annotation is given, i.e. __bitwise. Since the culprit type is defined that way it means the pure integer (signed or unsigned) that comes with pure definition can't be used in a safe way. To solve this each of such definitions should be converted to the very same type (req_flags_t). See serial core where some UART flags are defined in a similar way and how code copes with that.
From: Andy Shevchenko <andy.shevchenko@gmail.com> Date: Mon, 20 Jun 2022 17:18:40 +0200 > On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin > <alexandr.lobakin@intel.com> wrote: > > > > From: kernel test robot <lkp@intel.com> > > Date: Sun, 19 Jun 2022 17:20:05 +0800 > > > > Also, could someone please help me with this? I don't get what went > > wrong with sparse, it's not even some new code, just moving old > > stuff. > > > > > tree: https://github.com/alobakin/linux bitops > > > head: 9bd39b17ce49d350eed93a031e0da6389067013e > > > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures > > > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/202206191726.wq70mbMK-lkp@intel.com/config) > > > compiler: mips64el-linux-gcc (GCC) 11.3.0 > > > reproduce: > > > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > > > chmod +x ~/bin/make.cross > > > # apt-get install sparse > > > # sparse version: v0.6.4-30-g92122700-dirty > > > # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e > > > git remote add alobakin https://github.com/alobakin/linux > > > git fetch --no-tags alobakin bitops > > > git checkout 521611f961a7dda92eefa26e1afd3914c06af64e > > > # save the config file > > > mkdir build_dir && cp config build_dir/.config > > > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash > > > > > > If you fix the issue, kindly add following tag where applicable > > > Reported-by: kernel test robot <lkp@intel.com> > > > > > > > > > sparse warnings: (new ones prefixed by >>) > > > command-line: note: in included file: > > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined > > > builtin:0:0: sparse: this was the original definition > > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined > > > builtin:0:0: sparse: this was the original definition > > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined > > > builtin:0:0: sparse: this was the original definition > > > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined > > > builtin:0:0: sparse: this was the original definition > > > block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h): > > > include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p' > > > include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old' > > > include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask' > > > include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return' > > > include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return' > > > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t > > > > > > vim +222 block/elevator.c > > > > > > 9817064b68fef7 Jens Axboe 2006-07-28 217 > > > 70b3ea056f3074 Jens Axboe 2016-12-07 218 void elv_rqhash_add(struct request_queue *q, struct request *rq) > > > 9817064b68fef7 Jens Axboe 2006-07-28 219 { > > > b374d18a4bfce7 Jens Axboe 2008-10-31 220 struct elevator_queue *e = q->elevator; > > > 9817064b68fef7 Jens Axboe 2006-07-28 221 > > > 9817064b68fef7 Jens Axboe 2006-07-28 @222 BUG_ON(ELV_ON_HASH(rq)); > > > 242d98f077ac0a Sasha Levin 2012-12-17 223 hash_add(e->hash, &rq->hash, rq_hash_key(rq)); > > > e806402130c9c4 Christoph Hellwig 2016-10-20 224 rq->rq_flags |= RQF_HASHED; > > > 9817064b68fef7 Jens Axboe 2006-07-28 225 } > > > bd166ef183c263 Jens Axboe 2017-01-17 226 EXPORT_SYMBOL_GPL(elv_rqhash_add); > > > 9817064b68fef7 Jens Axboe 2006-07-28 227 > > It looks like a false positive for _your_ case, but if you want to fix > here is the background. > > The sparse has an ability to control custom types that should never > set bits outside of the limited range. For this the special annotation > is given, i.e. __bitwise. Since the culprit type is defined that way > it means the pure integer (signed or unsigned) that comes with pure > definition can't be used in a safe way. To solve this each of such > definitions should be converted to the very same type (req_flags_t). > See serial core where some UART flags are defined in a similar way and > how code copes with that. Ah, I haven't looked at the req_flags_t definition and didn't know it uses bitwise annotation. So it's a local blk layer problem (not casting from/to bitwise), not a generic header issue. Thanks for the hint! > > -- > With Best Regards, > Andy Shevchenko Thanks, Olek
On Mon, Jun 20, 2022 at 03:51:46PM +0200, Alexander Lobakin wrote: > From: kernel test robot <lkp@intel.com> > Date: Sun, 19 Jun 2022 17:20:05 +0800 > > Also, could someone please help me with this? I don't get what went > wrong with sparse, it's not even some new code, just moving old > stuff. Hi, The first sparse's warnings (sparse: preprocessor token __ATOMIC_*) are already fixed (and most probably, the bots have already taken the fix). I'm working on the second ones (sparse: unreplaced symbol). -- Luc (Sparse's maintainer)
On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote: > From: Mark Rutland <mark.rutland@arm.com> > Date: Mon, 20 Jun 2022 15:19:42 +0100 > > > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote: > > > > > The savings are architecture, compiler and compiler flags dependent, > > > for example, on x86_64 -O2: > > > > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) > > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) > > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) > > > > > > and ARM64 (courtesy of Mark[0]): > > > > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) > > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) > > > > Hmm... with *this version* of the series, I'm not getting results nearly as > > good as that when building defconfig atop v5.19-rc3: > > > > GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804) > > GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632) > > GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548) > > GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444) > > GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400) > > GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608) > > > > LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356) > > LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792) > > LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712) > > > > ... and that now seems to be regressing codegen with recent versions of GCC as > > much as it improves it LLVM. > > > > I'm not sure if we've improved some other code and removed the benefit between > > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't > > look as compelling as it did. > > Mostly likely it's due to that in v1 I mistakingly removed > `volatile` from gen[eric]_test_bit(), so there was an impact for > non-constant cases as well. > +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does > it work? I didn't have it enabled, but I tried that just nw with GCC 12.1.0 and it builds cleanly, and test_bitmap_const_eval() gets optimized away entirely. If i remove the `static` from that, GCC generates: | <test_bitmap_const_eval>: | paciasp | autiasp | ret ... which is a trivial stub. > Probably the same reason as for m68k, more constant > optimization -> more aggressive inlining or inlining rebalance -> > larger code. OTOH I've no idea why sometimes compiler decides to > uninline really tiny functions where due to this patch series some > bitops have been converted to constants, like it goes on m68k. Hmm.... it'd be interesting to take a look at a few architectures and see what the common case is. Thanks, Mark.
On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote: > From: Mark Rutland <mark.rutland@arm.com> > Date: Mon, 20 Jun 2022 15:19:42 +0100 > > > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote: > > > So, in order to let the compiler optimize out such cases, expand the > > > test_bit() and __*_bit() definitions with a compile-time condition > > > check, so that they will pick the generic C non-atomic bitop > > > implementations when all of the arguments passed are compile-time > > > constants, which means that the result will be a compile-time > > > constant as well and the compiler will produce more efficient and > > > simple code in 100% cases (no changes when there's at least one > > > non-compile-time-constant argument). > > > > > The savings are architecture, compiler and compiler flags dependent, > > > for example, on x86_64 -O2: > > > > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) > > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) > > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) > > > > > > and ARM64 (courtesy of Mark[0]): > > > > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) > > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) > > > > Hmm... with *this version* of the series, I'm not getting results nearly as > > good as that when building defconfig atop v5.19-rc3: > > > > GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804) > > GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632) > > GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548) > > GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444) > > GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400) > > GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608) > > > > LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356) > > LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792) > > LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712) > > > > ... and that now seems to be regressing codegen with recent versions of GCC as > > much as it improves it LLVM. > > > > I'm not sure if we've improved some other code and removed the benefit between > > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't > > look as compelling as it did. > > Mostly likely it's due to that in v1 I mistakingly removed > `volatile` from gen[eric]_test_bit(), so there was an impact for > non-constant cases as well. > +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does > it work? Probably the same reason as for m68k, more constant > optimization -> more aggressive inlining or inlining rebalance -> > larger code. OTOH I've no idea why sometimes compiler decides to > uninline really tiny functions where due to this patch series some > bitops have been converted to constants, like it goes on m68k. > > > > > Overall that's mostly hidden in the Image size, due to 64K alignment and > > padding requirements: > > > > Toolchain Before After Difference > > > > GCC 8.5.0 36178432 36178432 0 > > GCC 9.3.0 36112896 36112896 0 > > GCC 10.3.0 36442624 36377088 -65536 > > GCC 11.1.0 36311552 36377088 +65536 > > GCC 11.3.0 36311552 36311552 0 > > GCC 12.1.0 36377088 36377088 0 > > > > LLVM 12.0.1 31418880 31418880 0 > > LLVM 13.0.1 31418880 31418880 0 > > LLVM 14.0.0 31218176 31218176 0 > > > > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive > > change overall (due to 64KiB alignment restrictions for portions of the kernel > > Image). I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's: add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392) Which is not great in terms of layout size. But I don't think we should focus too much on those numbers. The goal of the series is not to shrink the image; the true goal is to provide more information to the compiler in a hope that it will make a better decision regarding optimizations. Looking at results provided by Mark, both GCC and LLVM have a tendency to inline and use other techniques that increase the image more aggressively in newer releases, comparing to old ones. From this perspective, unless we find some terribly wrong behavior, I'm OK with +5K for the Image, because I trust my compiler and believe it spent those 5K wisely. For the reasons said above, I think we shouldn't disable const bitops for -Os build. I think this series has total positive impact because it adds a lot of information for compiler with just a few lines of code. If no objections, I think it's good to try it in -next. Alexander, would you like me to fix gen/generic typo in comment and take it in bitmap-for-next, or you'd prefer to send v4? Thanks, Yury
From: Yury Norov <yury.norov@gmail.com> Date: Tue, 21 Jun 2022 10:39:44 -0700 > On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote: > > From: Mark Rutland <mark.rutland@arm.com> > > Date: Mon, 20 Jun 2022 15:19:42 +0100 > > > > > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote: > > > > So, in order to let the compiler optimize out such cases, expand the > > > > test_bit() and __*_bit() definitions with a compile-time condition > > > > check, so that they will pick the generic C non-atomic bitop > > > > implementations when all of the arguments passed are compile-time > > > > constants, which means that the result will be a compile-time > > > > constant as well and the compiler will produce more efficient and > > > > simple code in 100% cases (no changes when there's at least one > > > > non-compile-time-constant argument). > > > > > > > The savings are architecture, compiler and compiler flags dependent, > > > > for example, on x86_64 -O2: > > > > > > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) > > > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) > > > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) > > > > > > > > and ARM64 (courtesy of Mark[0]): > > > > > > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) > > > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) > > > > > > Hmm... with *this version* of the series, I'm not getting results nearly as > > > good as that when building defconfig atop v5.19-rc3: > > > > > > GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804) > > > GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632) > > > GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548) > > > GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444) > > > GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400) > > > GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608) > > > > > > LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356) > > > LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792) > > > LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712) > > > > > > ... and that now seems to be regressing codegen with recent versions of GCC as > > > much as it improves it LLVM. > > > > > > I'm not sure if we've improved some other code and removed the benefit between > > > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't > > > look as compelling as it did. > > > > Mostly likely it's due to that in v1 I mistakingly removed > > `volatile` from gen[eric]_test_bit(), so there was an impact for > > non-constant cases as well. > > +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does > > it work? Probably the same reason as for m68k, more constant > > optimization -> more aggressive inlining or inlining rebalance -> > > larger code. OTOH I've no idea why sometimes compiler decides to > > uninline really tiny functions where due to this patch series some > > bitops have been converted to constants, like it goes on m68k. > > > > > > > > Overall that's mostly hidden in the Image size, due to 64K alignment and > > > padding requirements: > > > > > > Toolchain Before After Difference > > > > > > GCC 8.5.0 36178432 36178432 0 > > > GCC 9.3.0 36112896 36112896 0 > > > GCC 10.3.0 36442624 36377088 -65536 > > > GCC 11.1.0 36311552 36377088 +65536 > > > GCC 11.3.0 36311552 36311552 0 > > > GCC 12.1.0 36377088 36377088 0 > > > > > > LLVM 12.0.1 31418880 31418880 0 > > > LLVM 13.0.1 31418880 31418880 0 > > > LLVM 14.0.0 31218176 31218176 0 > > > > > > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive > > > change overall (due to 64KiB alignment restrictions for portions of the kernel > > > Image). > > I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's: > add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392) > > Which is not great in terms of layout size. But I don't think we should > focus too much on those numbers. The goal of the series is not to shrink > the image; the true goal is to provide more information to the compiler > in a hope that it will make a better decision regarding optimizations. > > Looking at results provided by Mark, both GCC and LLVM have a tendency > to inline and use other techniques that increase the image more > aggressively in newer releases, comparing to old ones. From this > perspective, unless we find some terribly wrong behavior, I'm OK with > +5K for the Image, because I trust my compiler and believe it spent > those 5K wisely. > > For the reasons said above, I think we shouldn't disable const > bitops for -Os build. > > I think this series has total positive impact because it adds a lot > of information for compiler with just a few lines of code. Right, that was the primary intention. But then I got some text size decreases and thought this applies to any setup :) > > If no objections, I think it's good to try it in -next. Alexander, > would you like me to fix gen/generic typo in comment and take it in > bitmap-for-next, or you'd prefer to send v4? I'm sending v4 in a couple minutes, lkp reported that on ARC GCC never expands mem*() builtins to plain assignments, which sucks, but failed my compile-time tests, so I adjusted code a bit. Hope that change will be okay for everyone, so that you could pick it. > > Thanks, > Yury Thanks, Olek