Message ID | 20230717080739.1000460-1-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: enable dead code elimination | expand |
On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote: > Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the > user to enable dead code elimination. In order for this to work, ensure > that we keep the necessary tables by annotating them with KEEP, also it > requires further changes to linker script to KEEP some tables and wildcard > compiler generated sections into the right place. > > The following comparison is based 6.5-rc2 with defconfig, > > $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new > add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276) > Function old new delta > ... > Total: Before=17888959, After=17824683, chg -0.36% > > add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44) > Data old new delta > ... > Total: Before=4820808, After=4820764, chg -0.00% > > add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096) > RO Data old new delta > ... > Total: Before=5179123, After=5178027, chg -0.02% > > $ size vmlinux-base vmlinux > text data bss dec hex filename > 25433734 15385766 630656 41450156 2787aac vmlinux-base > 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new > > Memory available after booting, saving 704k on qemu, > base: 8084532K/8388608K > new: 8085236K/8388608K Is that a 0.009% improvement? Is it really worth the hassle? x86 doesn't select this and risc-v had to turn it off for LLD, so it feels like we're just creating a rod for our own back by selecting it. Will > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > arch/arm64/Kconfig | 1 + > arch/arm64/kernel/vmlinux.lds.S | 5 +++-- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2511b30d0f6..73bb908ec62f 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -148,6 +148,7 @@ config ARM64 > select GENERIC_VDSO_TIME_NS > select HARDIRQS_SW_RESEND > select HAS_IOPORT > + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION > select HAVE_MOVE_PMD > select HAVE_MOVE_PUD > select HAVE_PCI > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S > index 3cd7e76cc562..bb4ce6cd6896 100644 > --- a/arch/arm64/kernel/vmlinux.lds.S > +++ b/arch/arm64/kernel/vmlinux.lds.S > @@ -238,7 +238,7 @@ SECTIONS > . = ALIGN(4); > .altinstructions : { > __alt_instructions = .; > - *(.altinstructions) > + KEEP(*(.altinstructions)) > __alt_instructions_end = .; > } > > @@ -258,8 +258,9 @@ SECTIONS > INIT_CALLS > CON_INITCALL > INIT_RAM_FS > - *(.init.altinstructions .init.bss) /* from the EFI stub */ > + KEEP(*(.init.altinstructions .init.bss*)) /* from the EFI stub */ > } > + > .exit.data : { > EXIT_DATA > } > -- > 2.27.0 >
On 2023-07-17 09:07, Kefeng Wang wrote: > Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing > the > user to enable dead code elimination. In order for this to work, ensure > that we keep the necessary tables by annotating them with KEEP, also it > requires further changes to linker script to KEEP some tables and > wildcard > compiler generated sections into the right place. > > The following comparison is based 6.5-rc2 with defconfig, > > $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new > add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 > (-64276) > Function old new delta > ... > Total: Before=17888959, After=17824683, chg -0.36% > > add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44) > Data old new delta > ... > Total: Before=4820808, After=4820764, chg -0.00% > > add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096) > RO Data old new delta > ... > Total: Before=5179123, After=5178027, chg -0.02% > > $ size vmlinux-base vmlinux > text data bss dec hex filename > 25433734 15385766 630656 41450156 2787aac vmlinux-base > 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new > > Memory available after booting, saving 704k on qemu, > base: 8084532K/8388608K > new: 8085236K/8388608K > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> I took this patch for a spin in my tree, and ended up with: CC .vmlinux.export.o UPD include/generated/utsversion.h CC init/version-timestamp.o LD .tmp_vmlinux.kallsyms1 ld: init/main.o(__patchable_function_entries): error: need linked-to section for --gc-sections make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1 make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: vmlinux] Error 2 make: *** [Makefile:234: __sub-make] Error 2 so it's probably not ready for prime time. M.
On 2023/7/17 17:24, Will Deacon wrote: > On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote: >> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the >> user to enable dead code elimination. In order for this to work, ensure >> that we keep the necessary tables by annotating them with KEEP, also it >> requires further changes to linker script to KEEP some tables and wildcard >> compiler generated sections into the right place. >> >> The following comparison is based 6.5-rc2 with defconfig, >> >> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new >> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276) >> Function old new delta >> ... >> Total: Before=17888959, After=17824683, chg -0.36% >> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44) >> Data old new delta >> ... >> Total: Before=4820808, After=4820764, chg -0.00% >> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096) >> RO Data old new delta >> ... >> Total: Before=5179123, After=5178027, chg -0.02% >> >> $ size vmlinux-base vmlinux >> text data bss dec hex filename >> 25433734 15385766 630656 41450156 2787aac vmlinux-base >> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new >> >> Memory available after booting, saving 704k on qemu, >> base: 8084532K/8388608K >> new: 8085236K/8388608K > > Is that a 0.009% improvement? Is it really worth the hassle? > > x86 doesn't select this and risc-v had to turn it off for LLD, so it feels > like we're just creating a rod for our own back by selecting it. The LD_DEAD_CODE_DATA_ELIMINATION is particularly used for small configs on small systems, risc-v is aimed to resource limited board platforms, maybe x86 has no strong requirement, and we will try to use it on some embedded board, if no one try it, this feature will never become stable :) > > Will >
On 2023/7/17 17:42, Marc Zyngier wrote: > On 2023-07-17 09:07, Kefeng Wang wrote: >> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the >> user to enable dead code elimination. In order for this to work, ensure >> that we keep the necessary tables by annotating them with KEEP, also it >> requires further changes to linker script to KEEP some tables and >> wildcard >> compiler generated sections into the right place. >> >> The following comparison is based 6.5-rc2 with defconfig, >> >> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new >> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276) >> Function old new delta >> ... >> Total: Before=17888959, After=17824683, chg -0.36% >> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44) >> Data old new delta >> ... >> Total: Before=4820808, After=4820764, chg -0.00% >> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096) >> RO Data old new delta >> ... >> Total: Before=5179123, After=5178027, chg -0.02% >> >> $ size vmlinux-base vmlinux >> text data bss dec hex filename >> 25433734 15385766 630656 41450156 2787aac vmlinux-base >> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new >> >> Memory available after booting, saving 704k on qemu, >> base: 8084532K/8388608K >> new: 8085236K/8388608K >> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > > I took this patch for a spin in my tree, and ended up with: > > CC .vmlinux.export.o > UPD include/generated/utsversion.h > CC init/version-timestamp.o > LD .tmp_vmlinux.kallsyms1 > ld: init/main.o(__patchable_function_entries): error: need linked-to > section for --gc-sections > make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1 > make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: vmlinux] > Error 2 > make: *** [Makefile:234: __sub-make] Error 2 I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or allyesconfig, does it need special config or gcc version? > > so it's probably not ready for prime time. > > M.
On Mon, 17 Jul 2023 12:56:39 +0100, Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > > > > On 2023/7/17 17:42, Marc Zyngier wrote: > > On 2023-07-17 09:07, Kefeng Wang wrote: > >> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the > >> user to enable dead code elimination. In order for this to work, ensure > >> that we keep the necessary tables by annotating them with KEEP, also it > >> requires further changes to linker script to KEEP some tables and > >> wildcard > >> compiler generated sections into the right place. > >> > >> The following comparison is based 6.5-rc2 with defconfig, > >> > >> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new > >> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276) > >> Function old new delta > >> ... > >> Total: Before=17888959, After=17824683, chg -0.36% > >> > >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44) > >> Data old new delta > >> ... > >> Total: Before=4820808, After=4820764, chg -0.00% > >> > >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096) > >> RO Data old new delta > >> ... > >> Total: Before=5179123, After=5178027, chg -0.02% > >> > >> $ size vmlinux-base vmlinux > >> text data bss dec hex filename > >> 25433734 15385766 630656 41450156 2787aac vmlinux-base > >> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new > >> > >> Memory available after booting, saving 704k on qemu, > >> base: 8084532K/8388608K > >> new: 8085236K/8388608K > >> > >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > > > > I took this patch for a spin in my tree, and ended up with: > > > > CC .vmlinux.export.o > > UPD include/generated/utsversion.h > > CC init/version-timestamp.o > > LD .tmp_vmlinux.kallsyms1 > > ld: init/main.o(__patchable_function_entries): error: need linked-to > > section for --gc-sections > > make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1 > > make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: > > vmlinux] Error 2 > > make: *** [Makefile:234: __sub-make] Error 2 > > I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or > allyesconfig, does it need special config or gcc version? You tell me! gcc (Debian 10.2.1-6) 10.2.1 20210110 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. so hardly something special. This is built with the current state of my NV tree, available here[1] As for the configuration, have a look here[2]. M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.6-WIP [2] https://paste.debian.net/1286106/
On 2023/7/17 20:15, Marc Zyngier wrote: > On Mon, 17 Jul 2023 12:56:39 +0100, > Kefeng Wang <wangkefeng.wang@huawei.com> wrote: >> >> >> >> On 2023/7/17 17:42, Marc Zyngier wrote: >>> On 2023-07-17 09:07, Kefeng Wang wrote: >>>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the >>>> user to enable dead code elimination. In order for this to work, ensure >>>> that we keep the necessary tables by annotating them with KEEP, also it >>>> requires further changes to linker script to KEEP some tables and >>>> wildcard >>>> compiler generated sections into the right place. >>>> >>>> The following comparison is based 6.5-rc2 with defconfig, >>>> ... >>> >>> I took this patch for a spin in my tree, and ended up with: >>> >>> CC .vmlinux.export.o >>> UPD include/generated/utsversion.h >>> CC init/version-timestamp.o >>> LD .tmp_vmlinux.kallsyms1 >>> ld: init/main.o(__patchable_function_entries): error: need linked-to >>> section for --gc-sections >>> make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1 >>> make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: >>> vmlinux] Error 2 >>> make: *** [Makefile:234: __sub-make] Error 2 >> >> I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or >> allyesconfig, does it need special config or gcc version? > > You tell me! > > gcc (Debian 10.2.1-6) 10.2.1 20210110 > Copyright (C) 2020 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > so hardly something special. This is built with the current state of > my NV tree, available here[1] As for the configuration, have a look > here[2]. 1) With gcc 10.3.1/ld (GNU Binutils) 2.37, it could be reproduced, but there is no issue for cross-compiler gcc 9.3/ld (GNU Binutils for Ubuntu) 2.34. 2) There is same issue like commit f7584322e4fe ("riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD") said with allyesconfig on arm64, it takes too long in bfd_flavour_name() Samples: 257K of event 'cycles', Event count (approx.): 203974259359 Overhead Shared Object Symbol IPC [IPC Coverage] - 61.11% libbfd-2.34-arm64.so [.] bfd_flavour_name - - bfd_flavour_name - 6.55% libbfd-2.34-arm64.so [.] bfd_hash_traverse - - Just like you said, it is not ready for prime time, so please ignore this patch :( > > M. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.6-WIP > [2] https://paste.debian.net/1286106/ >
On 2023/7/17 17:24, Will Deacon wrote: > On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote: >> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the >> user to enable dead code elimination. In order for this to work, ensure >> that we keep the necessary tables by annotating them with KEEP, also it >> requires further changes to linker script to KEEP some tables and wildcard >> compiler generated sections into the right place. >> >> The following comparison is based 6.5-rc2 with defconfig, >> >> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new >> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276) >> Function old new delta >> ... >> Total: Before=17888959, After=17824683, chg -0.36% >> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44) >> Data old new delta >> ... >> Total: Before=4820808, After=4820764, chg -0.00% >> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096) >> RO Data old new delta >> ... >> Total: Before=5179123, After=5178027, chg -0.02% >> >> $ size vmlinux-base vmlinux >> text data bss dec hex filename >> 25433734 15385766 630656 41450156 2787aac vmlinux-base >> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new >> >> Memory available after booting, saving 704k on qemu, >> base: 8084532K/8388608K >> new: 8085236K/8388608K > > Is that a 0.009% improvement? Is it really worth the hassle? > > x86 doesn't select this and risc-v had to turn it off for LLD, so it feels > like we're just creating a rod for our own back by selecting it. I tested this patch and found that, the smaller the config file,the more significant the reduction in file size of the builds. This may be useful in scenarios such as embedded systems where size is particularly critical. Just like Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, this boots well on qemu, with defconfig, it shrinks their builds by ~1.6%, and with tinyconfig it shrinks their builds by ~18.7%. defconfig: text data bss dec hex 26839348 16695234 629456 44164038 2a1e3c6 before 26140556 16667058 628880 43436494 296c9ce after tinyconfig: text data bss dec hex 1259568 272100 104312 1635980 18f68c before 967056 258716 103824 1329596 1449bc after | tinyconfig | defconfig --------|------------------------|--------------------- No DCE | 1635980 | 44164038 DCE | 1329596 | 43436494 Shrink | 306384 (~18.7%) | 727544 (~1.6%) > > Will >
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a2511b30d0f6..73bb908ec62f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -148,6 +148,7 @@ config ARM64 select GENERIC_VDSO_TIME_NS select HARDIRQS_SW_RESEND select HAS_IOPORT + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION select HAVE_MOVE_PMD select HAVE_MOVE_PUD select HAVE_PCI diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 3cd7e76cc562..bb4ce6cd6896 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -238,7 +238,7 @@ SECTIONS . = ALIGN(4); .altinstructions : { __alt_instructions = .; - *(.altinstructions) + KEEP(*(.altinstructions)) __alt_instructions_end = .; } @@ -258,8 +258,9 @@ SECTIONS INIT_CALLS CON_INITCALL INIT_RAM_FS - *(.init.altinstructions .init.bss) /* from the EFI stub */ + KEEP(*(.init.altinstructions .init.bss*)) /* from the EFI stub */ } + .exit.data : { EXIT_DATA }
Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the user to enable dead code elimination. In order for this to work, ensure that we keep the necessary tables by annotating them with KEEP, also it requires further changes to linker script to KEEP some tables and wildcard compiler generated sections into the right place. The following comparison is based 6.5-rc2 with defconfig, $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276) Function old new delta ... Total: Before=17888959, After=17824683, chg -0.36% add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44) Data old new delta ... Total: Before=4820808, After=4820764, chg -0.00% add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096) RO Data old new delta ... Total: Before=5179123, After=5178027, chg -0.02% $ size vmlinux-base vmlinux text data bss dec hex filename 25433734 15385766 630656 41450156 2787aac vmlinux-base 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new Memory available after booting, saving 704k on qemu, base: 8084532K/8388608K new: 8085236K/8388608K Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- arch/arm64/Kconfig | 1 + arch/arm64/kernel/vmlinux.lds.S | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-)