Message ID | 20240220081527.23408-1-liuyuntao12@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [-next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION | expand |
On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote: > The current arm32 architecture does not yet support the > HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in > embedded scenarios, and enabling this feature would be beneficial for > reducing the size of the kernel image. > > In order to make this work, we keep the necessary tables by annotating > them with KEEP, also it requires further changes to linker script to KEEP > some tables and wildcard compiler generated sections into the right place. Thanks for the patch, I think this is a very useful feature and we should get this upstream, especially if we can combine it with CONFIG_LTO_CLANG (which is supported on arm64 at the moment, but not on arm). > It boots normally with defconfig, vexpress_defconfig and tinyconfig. > > The size comparison of zImage is as follows: > defconfig vexpress_defconfig tinyconfig > 5137712 5138024 424192 no dce > 5032560 4997824 298384 dce > 2.0% 2.7% 29.7% shrink > > When using smaller config file, there is a significant reduction in the > size of the zImage. > > We also tested this patch on a commercially available single-board > computer, and the comparison is as follows: > a15eb_config > 2161384 no dce > 2092240 dce > 3.2% shrink > > The zImage size has been reduced by approximately 3.2%, which is 70KB on > 2.1M. Nice description! I do suspect that there will be additional bugs that we run into with some corner cases. > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S > b/arch/arm/boot/compressed/vmlinux.lds.S > index 3fcb3e62dc56..da21244aa892 100644 > --- a/arch/arm/boot/compressed/vmlinux.lds.S > +++ b/arch/arm/boot/compressed/vmlinux.lds.S > @@ -89,7 +89,7 @@ SECTIONS > * The EFI stub always executes from RAM, and runs strictly before > the > * decompressor, so we can make an exception for its r/w data, and > keep it > */ > - *(.data.efistub .bss.efistub) > + *(.data.* .bss.*) > __pecoff_data_end = .; > > /* This doesn't seem right to me, or maybe I misunderstand what the original version does. Have you tested with both CONFIG_EFI_STUB on and off, and booting with and without UEFI? If I read this right, you would move all .data and .bss into the stub here, not just the parts we actually want? > diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S > index bd9127c4b451..de373c6c2ae8 100644 > --- a/arch/arm/kernel/vmlinux.lds.S > +++ b/arch/arm/kernel/vmlinux.lds.S > @@ -74,7 +74,7 @@ SECTIONS > . = ALIGN(4); > __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { > __start___ex_table = .; > - ARM_MMU_KEEP(*(__ex_table)) > + ARM_MMU_KEEP(KEEP(*(__ex_table))) > __stop___ex_table = .; > } > > @@ -116,7 +116,7 @@ SECTIONS > #endif > .init.pv_table : { > __pv_table_begin = .; > - *(.pv_table) > + KEEP(*(.pv_table)) > __pv_table_end = .; > } I guess this prevents discarding any function that has a reference from pv_table or ex_table, even if there are no other references, right? I don't know how to solve this other than forcing all the uaccess and virt_to_phys functions to be out of line helpers. For uaccess, there are probably very few functions that need this, so it should make little difference. You might want to try changing CONFIG_ARM_PATCH_PHYS_VIRT into a method that just always adds an offset from C code instead of the boot time patching. That way the code would be a bit less efficient but you might be able to get a larger size reduction by dropping additional unused code. Maybe test your patch both with and without ARM_PATCH_PHYS_VIRT to see what the best-case impact would be. Arnd
在 2024/2/20 16:40, Arnd Bergmann 写道: > On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote: >> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S >> b/arch/arm/boot/compressed/vmlinux.lds.S >> index 3fcb3e62dc56..da21244aa892 100644 >> --- a/arch/arm/boot/compressed/vmlinux.lds.S >> +++ b/arch/arm/boot/compressed/vmlinux.lds.S >> @@ -89,7 +89,7 @@ SECTIONS >> * The EFI stub always executes from RAM, and runs strictly before >> the >> * decompressor, so we can make an exception for its r/w data, and >> keep it >> */ >> - *(.data.efistub .bss.efistub) >> + *(.data.* .bss.*) >> __pecoff_data_end = .; >> >> /* > > This doesn't seem right to me, or maybe I misunderstand what > the original version does. Have you tested with both > CONFIG_EFI_STUB on and off, and booting with and without > UEFI? Yes, I have tested with CONFIG_EFI_STUB on and off, and booting with UEFI on a single-board computer, and it boots well. > > If I read this right, you would move all .data and .bss > into the stub here, not just the parts we actually want? In the file "drivers/firmware/efi/libstub/Makefile", it is written: --- # # ARM discards the .data section because it disallows r/w data in the # decompressor. So move our .data to .data.efistub and .bss to .bss.efistub, # which are preserved explicitly by the decompressor linker script. # STUBCOPY_FLAGS-$(CONFIG_ARM) += --rename-section .data=.data.efistub \ --rename-section .bss=.bss.efistub,load,alloc --- I think that .data.efistub represents the entire .data section, the same applies to .bss as well, so i move all .data and .bss into the stub here. > >> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S >> index bd9127c4b451..de373c6c2ae8 100644 >> --- a/arch/arm/kernel/vmlinux.lds.S >> +++ b/arch/arm/kernel/vmlinux.lds.S >> @@ -74,7 +74,7 @@ SECTIONS >> . = ALIGN(4); >> __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { >> __start___ex_table = .; >> - ARM_MMU_KEEP(*(__ex_table)) >> + ARM_MMU_KEEP(KEEP(*(__ex_table))) >> __stop___ex_table = .; >> } >> >> @@ -116,7 +116,7 @@ SECTIONS >> #endif >> .init.pv_table : { >> __pv_table_begin = .; >> - *(.pv_table) >> + KEEP(*(.pv_table)) >> __pv_table_end = .; >> } > > I guess this prevents discarding any function that has a reference > from pv_table or ex_table, even if there are no other references, > right? Indeed so, if not keep ex_table, the compilation process will result in an error: no __ex_table in file: vmlinux Failed to sort kernel tables and if not keep pv_table, It can be compiled successfully, but the QEMU boots will fail. > > I don't know how to solve this other than forcing all the > uaccess and virt_to_phys functions to be out of line > helpers. For uaccess, there are probably very few functions > that need this, so it should make little difference. > > You might want to try changing CONFIG_ARM_PATCH_PHYS_VIRT > into a method that just always adds an offset from C code > instead of the boot time patching. That way the code would > be a bit less efficient but you might be able to get > a larger size reduction by dropping additional unused code. > > Maybe test your patch both with and without > ARM_PATCH_PHYS_VIRT to see what the best-case impact would > be. > This is a very good idea, I will give it a try. > Arnd
On Tue, Feb 20, 2024, at 10:53, liuyuntao (F) wrote: > 在 2024/2/20 16:40, Arnd Bergmann 写道: >> On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote: > # > # ARM discards the .data section because it disallows r/w data in the > # decompressor. So move our .data to .data.efistub and .bss to .bss.efistub, > # which are preserved explicitly by the decompressor linker script. > # > STUBCOPY_FLAGS-$(CONFIG_ARM) += --rename-section .data=.data.efistub \ > --rename-section .bss=.bss.efistub,load,alloc > > --- > > I think that .data.efistub represents the entire .data section, the same > applies to .bss as well, > > so i move all .data and .bss into the stub here. > Ok, I see. >> >> I guess this prevents discarding any function that has a reference >> from pv_table or ex_table, even if there are no other references, >> right? > > Indeed so, if not keep ex_table, the compilation process will result in > > an error: > > no __ex_table in file: vmlinux > > Failed to sort kernel tables Sure, and without the ex_table contents, it would not be able to recover from a failed uaccess either. > and if not keep pv_table, It can be compiled successfully, but the QEMU > boots will fail. Right. The pv_table isn't technically necessary since it can be disabled. I think it was originally introduced in order to avoid performance regressions when we introduced multiplatform kernels that can run at arbitrary physical addresses rather than having it as a build-time constant. I don't know how much difference that actually makes for performance, so turning it into a normal runtime lookup may or may not be a good compromise when building with HAVE_LD_DEAD_CODE_DATA_ELIMINATION. I have given your patch some build testing with random configurations in my build setup and it seems to work fine with gcc/binutils, but unfortunately I came across a link failure using clang/lld: ld.lld: error: ./arch/arm/kernel/vmlinux.lds:35: ( expected, but got } >>> __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } .vectors.bhb.loop8 { KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); __vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); __vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); __vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = __vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); __stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors)); >>> ^ I don't immediately see what the problem is here, I hope you can address it before you send a v2. Arnd
On Wed, Feb 21, 2024, at 16:51, Arnd Bergmann wrote: > I have given your patch some build testing with random > configurations in my build setup and it seems to work > fine with gcc/binutils, but unfortunately I came across > a link failure using clang/lld: I ran into another bug now, this time with CONFIG_XIP_KERNEL=y: no __ex_table in file: vmlinux Failed to sort kernel tables make[4]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1 Essentially you have to modify arch/arm/kernel/vmlinux-xip.lds.S the same way as vmlinux.lds.S: --- a/arch/arm/kernel/vmlinux-xip.lds.S +++ b/arch/arm/kernel/vmlinux-xip.lds.S @@ -63,7 +63,7 @@ SECTIONS . = ALIGN(4); __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { __start___ex_table = .; - ARM_MMU_KEEP(*(__ex_table)) + ARM_MMU_KEEP(KEEP(*(__ex_table))) __stop___ex_table = .; } @@ -83,7 +83,7 @@ SECTIONS } .init.arch.info : { __arch_info_begin = .; - *(.arch.info.init) + KEEP(*(.arch.info.init)) __arch_info_end = .; } .init.tagtable : { The pv_table is not needed for XIP_KERNEL=y because that requires not patching the kernel. Arnd
On 2024/2/21 23:51, Arnd Bergmann wrote: > I have given your patch some build testing with random > configurations in my build setup and it seems to work > fine with gcc/binutils, but unfortunately I came across > a link failure using clang/lld: > > ld.lld: error: ./arch/arm/kernel/vmlinux.lds:35: ( expected, but got } >>>> __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } .vectors.bhb.loop8 { KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); __vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); __vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); __vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = __vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); __stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors)); >>>> ^ > > I don't immediately see what the problem is here, I hope you > can address it before you send a v2. > > Arnd I reproduced this issue with make LLVM=1 ARCH=arm -j320 > ../make.txt. Based on the position of the caret, I speculate that the issue arises from lld's inability to recognize the KEEP() command preceding the OUTPUT section in the OVERLAY command. > The full syntax of the OVERLAY command is as follows: OVERLAY [start] : [NOCROSSREFS] [AT ( ldaddr )] { secname1 { output-section-command output-section-command … } [:phdr…] [=fill] secname2 { output-section-command output-section-command … } [:phdr…] [=fill] … } [>region] [:phdr…] [=fill] [,] > I attempted to modify KEEP(*(.vectors)) to KEEP(*(.vectors))(.vectors), and received the following error message: ld.lld: error: ./arch/arm/kernel/vmlinux.lds:36: ( expected, but got } > __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors))(.vectors) } .vectors.bhb.loop8 { KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); __vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); __vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); __vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = __vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); __stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors)); > ^ The position of the caret has been moved below the right brace of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating the entire `KEEP(*(.vectors))` as a file name. This could potentially be a bug in lld. Perhaps we can temporarily enable the DCE option only when option LD_IS_LLD is disabled, like risc-v: `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
On 2024/2/22 17:52, Arnd Bergmann wrote: > On Wed, Feb 21, 2024, at 16:51, Arnd Bergmann wrote: >> I have given your patch some build testing with random >> configurations in my build setup and it seems to work >> fine with gcc/binutils, but unfortunately I came across >> a link failure using clang/lld: > > I ran into another bug now, this time with CONFIG_XIP_KERNEL=y: > > no __ex_table in file: vmlinux > Failed to sort kernel tables > make[4]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1 > > Essentially you have to modify arch/arm/kernel/vmlinux-xip.lds.S > the same way as vmlinux.lds.S: > Thanks a lot. I didn't consider this situation. I will take your advice. Thanks again. > --- a/arch/arm/kernel/vmlinux-xip.lds.S > +++ b/arch/arm/kernel/vmlinux-xip.lds.S > @@ -63,7 +63,7 @@ SECTIONS > . = ALIGN(4); > __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { > __start___ex_table = .; > - ARM_MMU_KEEP(*(__ex_table)) > + ARM_MMU_KEEP(KEEP(*(__ex_table))) > __stop___ex_table = .; > } > > @@ -83,7 +83,7 @@ SECTIONS > } > .init.arch.info : { > __arch_info_begin = .; > - *(.arch.info.init) > + KEEP(*(.arch.info.init)) > __arch_info_end = .; > } > .init.tagtable : { > > > The pv_table is not needed for XIP_KERNEL=y because that > requires not patching the kernel. > > Arnd
On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote: > > The position of the caret has been moved below the right brace > of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating > the entire `KEEP(*(.vectors))` as a file name. This could potentially be > a bug in lld. Perhaps we can temporarily > enable the DCE option only when option LD_IS_LLD is disabled, > like risc-v: > > `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`. I would really like to see this working with lld if at all possible, as it allows the combination of gc-sections with lto and CONFIG_TRIM_UNUSED_KSYMS. I experimented with lld myself now and I did get a booting kernel even without the the KEEP() on the vectors. I also see that this is the only use of OVERLAY in the kernel, so I hope that we can find a way to make it work with existing lld after all, either without the KEEP or without the OVERLAY. Did you see any problems without the KEEP() on the vectors? Arnd
On 2024/2/23 0:04, Arnd Bergmann wrote: > On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote: >> >> The position of the caret has been moved below the right brace >> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating >> the entire `KEEP(*(.vectors))` as a file name. This could potentially be >> a bug in lld. Perhaps we can temporarily >> enable the DCE option only when option LD_IS_LLD is disabled, >> like risc-v: >> >> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`. > > I would really like to see this working with lld if at all > possible, as it allows the combination of gc-sections with > lto and CONFIG_TRIM_UNUSED_KSYMS. > > I experimented with lld myself now and I did get a booting > kernel even without the the KEEP() on the vectors. I also When the kernel booted up successfully, was config DCE enabled? > see that this is the only use of OVERLAY in the kernel, so > I hope that we can find a way to make it work with existing > lld after all, either without the KEEP or without the OVERLAY. Yeah, i will try other way to make it work. > > Did you see any problems without the KEEP() on the vectors? > > Arnd Without the KEEP(),disable DCE, qemu boots well, enable DCE,qemu hangs on startup. I experimented with lld using vexpress_defconfig.
On Fri, Feb 23, 2024, at 02:39, liuyuntao (F) wrote: > On 2024/2/23 0:04, Arnd Bergmann wrote: >> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote: >>> >>> The position of the caret has been moved below the right brace >>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating >>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be >>> a bug in lld. Perhaps we can temporarily >>> enable the DCE option only when option LD_IS_LLD is disabled, >>> like risc-v: >>> >>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`. >> >> I would really like to see this working with lld if at all >> possible, as it allows the combination of gc-sections with >> lto and CONFIG_TRIM_UNUSED_KSYMS. >> >> I experimented with lld myself now and I did get a booting >> kernel even without the the KEEP() on the vectors. I also > > When the kernel booted up successfully, was config DCE enabled? My mistake. I did have DCE enabled in the kernel I built, but ended up passing an older image to it in the end, and that did not boot. Arnd
On 2024/2/23 0:04, Arnd Bergmann wrote: > On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote: >> >> The position of the caret has been moved below the right brace >> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating >> the entire `KEEP(*(.vectors))` as a file name. This could potentially be >> a bug in lld. Perhaps we can temporarily >> enable the DCE option only when option LD_IS_LLD is disabled, >> like risc-v: >> >> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`. > > I would really like to see this working with lld if at all > possible, as it allows the combination of gc-sections with > lto and CONFIG_TRIM_UNUSED_KSYMS. > > I experimented with lld myself now and I did get a booting > kernel even without the the KEEP() on the vectors. I also > see that this is the only use of OVERLAY in the kernel, so > I hope that we can find a way to make it work with existing > lld after all, either without the KEEP or without the OVERLAY. > > Did you see any problems without the KEEP() on the vectors? > > Arnd Hi, Arnd. I have added a global symbol g_keep1 in .vectors, g_keep2 in .vectors.bhb.loop8 and g_keep3 in .vectors.bhb.bpiall respectively. I also added another section to reference these three global symbols, to prevent these sections from being removed during linking with --gc-sections. It worked,but there should be a better way to achieve it. the patch: diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h index f2ff79f740ab..d60f6e83a9f7 100644 --- a/arch/arm/include/asm/vmlinux.lds.h +++ b/arch/arm/include/asm/vmlinux.lds.h @@ -125,13 +125,13 @@ __vectors_lma = .; \ OVERLAY 0xffff0000 : NOCROSSREFS AT(__vectors_lma) { \ .vectors { \ - KEEP(*(.vectors)) \ + *(.vectors) \ } \ .vectors.bhb.loop8 { \ - KEEP(*(.vectors.bhb.loop8)) \ + *(.vectors.bhb.loop8) \ } \ .vectors.bhb.bpiall { \ - KEEP(*(.vectors.bhb.bpiall)) \ + *(.vectors.bhb.bpiall) \ } \ } \ ARM_LMA(__vectors, .vectors); \ diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 6150a716828c..84536e805da0 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -1075,6 +1075,9 @@ THUMB( .reloc ., R_ARM_THM_PC12, .L__vector_swi ) W(b) vector_addrexcptn W(b) vector_irq W(b) vector_fiq + .global g_keep1 + g_keep1: + .word 0 #ifdef CONFIG_HARDEN_BRANCH_HISTORY .section .vectors.bhb.loop8, "ax", %progbits @@ -1088,6 +1091,9 @@ THUMB( .reloc ., R_ARM_THM_PC12, .L__vector_bhb_loop8_swi ) W(b) vector_addrexcptn W(b) vector_bhb_loop8_irq W(b) vector_bhb_loop8_fiq + .global g_keep2 + g_keep2: + .word 0 .section .vectors.bhb.bpiall, "ax", %progbits W(b) vector_rst @@ -1100,6 +1106,9 @@ THUMB( .reloc ., R_ARM_THM_PC12, .L__vector_bhb_bpiall_swi ) W(b) vector_addrexcptn W(b) vector_bhb_bpiall_irq W(b) vector_bhb_bpiall_fiq + .global g_keep3 + g_keep3: + .word 0 #endif .data @@ -1108,3 +1117,8 @@ THUMB( .reloc ., R_ARM_THM_PC12, .L__vector_bhb_bpiall_swi ) .globl cr_alignment cr_alignment: .space 4 + +.section .keep_vectors, "ax", %progbits + LDR r0, =g_keep1 + LDR r1, =g_keep2 + LDR r2, =g_keep3 diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 01a887c1141a..5cdfb4ba3ac4 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -62,6 +62,7 @@ SECTIONS .text : { /* Real text segment */ _stext = .; /* Text and read-only data */ ARM_TEXT + KEEP(*(.keep_vectors)) } #ifdef CONFIG_DEBUG_ALIGN_RODATA
On 2024/2/27 16:06, liuyuntao (F) wrote: > > > On 2024/2/23 0:04, Arnd Bergmann wrote: >> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote: >>> >>> The position of the caret has been moved below the right brace >>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating >>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be >>> a bug in lld. Perhaps we can temporarily >>> enable the DCE option only when option LD_IS_LLD is disabled, >>> like risc-v: >>> >>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`. >> >> I would really like to see this working with lld if at all >> possible, as it allows the combination of gc-sections with >> lto and CONFIG_TRIM_UNUSED_KSYMS. > Another way to preserve .vector sections without using KEEP annotation. It boots well. How do you feel about this approach? and, could I submit a v2 patch? 1: Define ARM_VECTORS_TEXT to explicitly preserve .vectors section. > --- a/arch/arm/include/asm/vmlinux.lds.h > +++ b/arch/arm/include/asm/vmlinux.lds.h > @@ -87,6 +87,13 @@ > *(.vfp11_veneer) \ > *(.v4_bx) > > +#define ARM_VECTORS_TEXT \ > + .vectors.text : { \ > + KEEP(*(.vectors)) \ > + KEEP(*(.vectors.bhb.loop8)) \ > + KEEP(*(.vectors.bhb.bpiall)) \ > + } > + > #define ARM_TEXT \ > IDMAP_TEXT \ > __entry_text_start = .; \ 2: Ref ARM_VECTORS_TEXT if config HAVE_LD_DEAD_CODE_DATA_ELIMINATION is enabled, and the same to arch/arm/kernel/vmlinux.lds.S > --- a/arch/arm/kernel/vmlinux-xip.lds.S > +++ b/arch/arm/kernel/vmlinux-xip.lds.S > @@ -135,6 +135,10 @@ SECTIONS > ARM_TCM > #endif > > +#ifdef HAVE_LD_DEAD_CODE_DATA_ELIMINATION > + ARM_VECTORS_TEXT > +#endif > + > /* > * End of copied data. We need a dummy section to get its LMA. > * Also located before final ALIGN() as trailing padding is > not stored
On Thu, Mar 7, 2024, at 04:09, liuyuntao (F) wrote: > On 2024/2/27 16:06, liuyuntao (F) wrote: >> >> >> On 2024/2/23 0:04, Arnd Bergmann wrote: >>> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote: >>>> >>>> The position of the caret has been moved below the right brace >>>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating >>>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be >>>> a bug in lld. Perhaps we can temporarily >>>> enable the DCE option only when option LD_IS_LLD is disabled, >>>> like risc-v: >>>> >>>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`. >>> >>> I would really like to see this working with lld if at all >>> possible, as it allows the combination of gc-sections with >>> lto and CONFIG_TRIM_UNUSED_KSYMS. >> > Another way to preserve .vector sections without using KEEP annotation. > It boots well. How do you feel about this approach? and, could I submit > a v2 patch? Yes, if that works, please submit it as v2, we'll see if anyone has concerns about the new version then. Arnd
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0af6709570d1..de78ceb821df 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -113,6 +113,7 @@ config ARM select HAVE_KERNEL_XZ select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M select HAVE_KRETPROBES if HAVE_KPROBES + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI select HAVE_OPTPROBES if !THUMB2_KERNEL diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S index 3fcb3e62dc56..da21244aa892 100644 --- a/arch/arm/boot/compressed/vmlinux.lds.S +++ b/arch/arm/boot/compressed/vmlinux.lds.S @@ -89,7 +89,7 @@ SECTIONS * The EFI stub always executes from RAM, and runs strictly before the * decompressor, so we can make an exception for its r/w data, and keep it */ - *(.data.efistub .bss.efistub) + *(.data.* .bss.*) __pecoff_data_end = .; /* @@ -125,7 +125,7 @@ SECTIONS . = BSS_START; __bss_start = .; - .bss : { *(.bss) } + .bss : { *(.bss .bss.*) } _end = .; . = ALIGN(8); /* the stack must be 64-bit aligned */ diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h index 4c8632d5c432..f2ff79f740ab 100644 --- a/arch/arm/include/asm/vmlinux.lds.h +++ b/arch/arm/include/asm/vmlinux.lds.h @@ -42,7 +42,7 @@ #define PROC_INFO \ . = ALIGN(4); \ __proc_info_begin = .; \ - *(.proc.info.init) \ + KEEP(*(.proc.info.init)) \ __proc_info_end = .; #define IDMAP_TEXT \ @@ -125,13 +125,13 @@ __vectors_lma = .; \ OVERLAY 0xffff0000 : NOCROSSREFS AT(__vectors_lma) { \ .vectors { \ - *(.vectors) \ + KEEP(*(.vectors)) \ } \ .vectors.bhb.loop8 { \ - *(.vectors.bhb.loop8) \ + KEEP(*(.vectors.bhb.loop8)) \ } \ .vectors.bhb.bpiall { \ - *(.vectors.bhb.bpiall) \ + KEEP(*(.vectors.bhb.bpiall)) \ } \ } \ ARM_LMA(__vectors, .vectors); \ diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index bd9127c4b451..de373c6c2ae8 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -74,7 +74,7 @@ SECTIONS . = ALIGN(4); __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { __start___ex_table = .; - ARM_MMU_KEEP(*(__ex_table)) + ARM_MMU_KEEP(KEEP(*(__ex_table))) __stop___ex_table = .; } @@ -99,7 +99,7 @@ SECTIONS } .init.arch.info : { __arch_info_begin = .; - *(.arch.info.init) + KEEP(*(.arch.info.init)) __arch_info_end = .; } .init.tagtable : { @@ -116,7 +116,7 @@ SECTIONS #endif .init.pv_table : { __pv_table_begin = .; - *(.pv_table) + KEEP(*(.pv_table)) __pv_table_end = .; }
The current arm32 architecture does not yet support the HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in embedded scenarios, and enabling this feature would be beneficial for reducing the size of the kernel image. In order to make this work, we keep the necessary tables by annotating them with KEEP, also it requires further changes to linker script to KEEP some tables and wildcard compiler generated sections into the right place. It boots normally with defconfig, vexpress_defconfig and tinyconfig. The size comparison of zImage is as follows: defconfig vexpress_defconfig tinyconfig 5137712 5138024 424192 no dce 5032560 4997824 298384 dce 2.0% 2.7% 29.7% shrink When using smaller config file, there is a significant reduction in the size of the zImage. We also tested this patch on a commercially available single-board computer, and the comparison is as follows: a15eb_config 2161384 no dce 2092240 dce 3.2% shrink The zImage size has been reduced by approximately 3.2%, which is 70KB on 2.1M. Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com> --- arch/arm/Kconfig | 1 + arch/arm/boot/compressed/vmlinux.lds.S | 4 ++-- arch/arm/include/asm/vmlinux.lds.h | 8 ++++---- arch/arm/kernel/vmlinux.lds.S | 6 +++--- 4 files changed, 10 insertions(+), 9 deletions(-)