diff mbox series

[-next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

Message ID 20240220081527.23408-1-liuyuntao12@huawei.com (mailing list archive)
State New, archived
Headers show
Series [-next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION | expand

Commit Message

Yuntao Liu Feb. 20, 2024, 8:15 a.m. UTC
The current arm32 architecture does not yet support the
HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
embedded scenarios, and enabling this feature would be beneficial for
reducing the size of the kernel image.

In order to make this work, we keep the necessary tables by annotating
them with KEEP, also it requires further changes to linker script to KEEP
some tables and wildcard compiler generated sections into the right place.

It boots normally with defconfig, vexpress_defconfig and tinyconfig.

The size comparison of zImage is as follows:
defconfig       vexpress_defconfig      tinyconfig
5137712         5138024                 424192          no dce
5032560         4997824                 298384          dce
2.0%            2.7%                    29.7%           shrink

When using smaller config file, there is a significant reduction in the
size of the zImage.

We also tested this patch on a commercially available single-board
computer, and the comparison is as follows:
a15eb_config
2161384         no dce
2092240         dce
3.2%            shrink

The zImage size has been reduced by approximately 3.2%, which is 70KB on
2.1M.

Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
---
 arch/arm/Kconfig                       | 1 +
 arch/arm/boot/compressed/vmlinux.lds.S | 4 ++--
 arch/arm/include/asm/vmlinux.lds.h     | 8 ++++----
 arch/arm/kernel/vmlinux.lds.S          | 6 +++---
 4 files changed, 10 insertions(+), 9 deletions(-)

Comments

Arnd Bergmann Feb. 20, 2024, 8:40 a.m. UTC | #1
On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote:
> The current arm32 architecture does not yet support the
> HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
> embedded scenarios, and enabling this feature would be beneficial for
> reducing the size of the kernel image.
>
> In order to make this work, we keep the necessary tables by annotating
> them with KEEP, also it requires further changes to linker script to KEEP
> some tables and wildcard compiler generated sections into the right place.

Thanks for the patch, I think this is a very useful feature
and we should get this upstream, especially if we can combine
it with CONFIG_LTO_CLANG (which is supported on arm64 at the
moment, but not on arm).

> It boots normally with defconfig, vexpress_defconfig and tinyconfig.
>
> The size comparison of zImage is as follows:
> defconfig       vexpress_defconfig      tinyconfig
> 5137712         5138024                 424192          no dce
> 5032560         4997824                 298384          dce
> 2.0%            2.7%                    29.7%           shrink
>
> When using smaller config file, there is a significant reduction in the
> size of the zImage.
>
> We also tested this patch on a commercially available single-board
> computer, and the comparison is as follows:
> a15eb_config
> 2161384         no dce
> 2092240         dce
> 3.2%            shrink
>
> The zImage size has been reduced by approximately 3.2%, which is 70KB on
> 2.1M.

Nice description! I do suspect that there will be additional
bugs that we run into with some corner cases.

> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S 
> b/arch/arm/boot/compressed/vmlinux.lds.S
> index 3fcb3e62dc56..da21244aa892 100644
> --- a/arch/arm/boot/compressed/vmlinux.lds.S
> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> @@ -89,7 +89,7 @@ SECTIONS
>       * The EFI stub always executes from RAM, and runs strictly before 
> the
>       * decompressor, so we can make an exception for its r/w data, and 
> keep it
>       */
> -    *(.data.efistub .bss.efistub)
> +    *(.data.* .bss.*)
>      __pecoff_data_end = .;
> 
>      /*

This doesn't seem right to me, or maybe I misunderstand what
the original version does. Have you tested with both
CONFIG_EFI_STUB on and off, and booting with and without
UEFI?

If I read this right, you would move all .data and .bss
into the stub here, not just the parts we actually want?

> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index bd9127c4b451..de373c6c2ae8 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -74,7 +74,7 @@ SECTIONS
>  	. = ALIGN(4);
>  	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
>  		__start___ex_table = .;
> -		ARM_MMU_KEEP(*(__ex_table))
> +		ARM_MMU_KEEP(KEEP(*(__ex_table)))
>  		__stop___ex_table = .;
>  	}
> 
> @@ -116,7 +116,7 @@ SECTIONS
>  #endif
>  	.init.pv_table : {
>  		__pv_table_begin = .;
> -		*(.pv_table)
> +		KEEP(*(.pv_table))
>  		__pv_table_end = .;
>  	}

I guess this prevents discarding any function that has a reference
from pv_table or ex_table, even if there are no other references,
right?

I don't know how to solve this other than forcing all the
uaccess and virt_to_phys functions to be out of line
helpers. For uaccess, there are probably very few functions
that need this, so it should make little difference.

You might want to try changing CONFIG_ARM_PATCH_PHYS_VIRT
into a method that just always adds an offset from C code
instead of the boot time patching. That way the code would
be a bit less efficient but you might be able to get
a larger size reduction by dropping additional unused code.

Maybe test your patch both with and without
ARM_PATCH_PHYS_VIRT to see what the best-case impact would
be.

      Arnd
Yuntao Liu Feb. 20, 2024, 9:53 a.m. UTC | #2
在 2024/2/20 16:40, Arnd Bergmann 写道:
> On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote:
>> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S
>> b/arch/arm/boot/compressed/vmlinux.lds.S
>> index 3fcb3e62dc56..da21244aa892 100644
>> --- a/arch/arm/boot/compressed/vmlinux.lds.S
>> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
>> @@ -89,7 +89,7 @@ SECTIONS
>>        * The EFI stub always executes from RAM, and runs strictly before
>> the
>>        * decompressor, so we can make an exception for its r/w data, and
>> keep it
>>        */
>> -    *(.data.efistub .bss.efistub)
>> +    *(.data.* .bss.*)
>>       __pecoff_data_end = .;
>>
>>       /*
> 
> This doesn't seem right to me, or maybe I misunderstand what
> the original version does. Have you tested with both
> CONFIG_EFI_STUB on and off, and booting with and without
> UEFI?

Yes, I have tested with CONFIG_EFI_STUB on and off, and booting with 
UEFI on a single-board computer, and it boots well.


> 
> If I read this right, you would move all .data and .bss
> into the stub here, not just the parts we actually want?

In the file "drivers/firmware/efi/libstub/Makefile", it is written:

---

#
# ARM discards the .data section because it disallows r/w data in the
# decompressor. So move our .data to .data.efistub and .bss to .bss.efistub,
# which are preserved explicitly by the decompressor linker script.
#
STUBCOPY_FLAGS-$(CONFIG_ARM)	+= --rename-section .data=.data.efistub	\
				   --rename-section .bss=.bss.efistub,load,alloc

---

I think that .data.efistub represents the entire .data section, the same 
applies to .bss as well,

so i move all .data and .bss into the stub here.


> 
>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>> index bd9127c4b451..de373c6c2ae8 100644
>> --- a/arch/arm/kernel/vmlinux.lds.S
>> +++ b/arch/arm/kernel/vmlinux.lds.S
>> @@ -74,7 +74,7 @@ SECTIONS
>>   	. = ALIGN(4);
>>   	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
>>   		__start___ex_table = .;
>> -		ARM_MMU_KEEP(*(__ex_table))
>> +		ARM_MMU_KEEP(KEEP(*(__ex_table)))
>>   		__stop___ex_table = .;
>>   	}
>>
>> @@ -116,7 +116,7 @@ SECTIONS
>>   #endif
>>   	.init.pv_table : {
>>   		__pv_table_begin = .;
>> -		*(.pv_table)
>> +		KEEP(*(.pv_table))
>>   		__pv_table_end = .;
>>   	}
> 
> I guess this prevents discarding any function that has a reference
> from pv_table or ex_table, even if there are no other references,
> right?

Indeed so, if not keep ex_table,  the compilation process will result in

an error:

     no __ex_table in file: vmlinux

     Failed to sort kernel tables

and if not keep pv_table, It can be compiled successfully, but the QEMU 
boots will fail.

> 
> I don't know how to solve this other than forcing all the
> uaccess and virt_to_phys functions to be out of line
> helpers. For uaccess, there are probably very few functions
> that need this, so it should make little difference.
> 
> You might want to try changing CONFIG_ARM_PATCH_PHYS_VIRT
> into a method that just always adds an offset from C code
> instead of the boot time patching. That way the code would
> be a bit less efficient but you might be able to get
> a larger size reduction by dropping additional unused code.
> 
> Maybe test your patch both with and without
> ARM_PATCH_PHYS_VIRT to see what the best-case impact would
> be.
> 

This is a very good idea, I will give it a try.

>        Arnd
Arnd Bergmann Feb. 21, 2024, 3:51 p.m. UTC | #3
On Tue, Feb 20, 2024, at 10:53, liuyuntao (F) wrote:
> 在 2024/2/20 16:40, Arnd Bergmann 写道:
>> On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote:
> #
> # ARM discards the .data section because it disallows r/w data in the
> # decompressor. So move our .data to .data.efistub and .bss to .bss.efistub,
> # which are preserved explicitly by the decompressor linker script.
> #
> STUBCOPY_FLAGS-$(CONFIG_ARM)	+= --rename-section .data=.data.efistub	\
> 				   --rename-section .bss=.bss.efistub,load,alloc
>
> ---
>
> I think that .data.efistub represents the entire .data section, the same 
> applies to .bss as well,
>
> so i move all .data and .bss into the stub here.
>

Ok, I see. 

>> 
>> I guess this prevents discarding any function that has a reference
>> from pv_table or ex_table, even if there are no other references,
>> right?
>
> Indeed so, if not keep ex_table,  the compilation process will result in
>
> an error:
>
>      no __ex_table in file: vmlinux
>
>      Failed to sort kernel tables

Sure, and without the ex_table contents, it would not be able
to recover from a failed uaccess either.

> and if not keep pv_table, It can be compiled successfully, but the QEMU 
> boots will fail.

Right. The pv_table isn't technically necessary since it can
be disabled.  I think it was originally introduced in order
to avoid performance regressions when we introduced multiplatform
kernels that can run at arbitrary physical addresses rather than
having it as a build-time constant.

I don't know how much difference that actually makes for performance,
so turning it into a normal runtime lookup may or may not be
a good compromise when building with HAVE_LD_DEAD_CODE_DATA_ELIMINATION.

I have given your patch some build testing with random
configurations in my build setup and it seems to work
fine with gcc/binutils, but unfortunately I came across
a link failure using clang/lld:

ld.lld: error: ./arch/arm/kernel/vmlinux.lds:35: ( expected, but got }
>>>  __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } .vectors.bhb.loop8 { KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); __vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); __vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); __vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = __vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); __stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors));
>>>                                                                                           ^

I don't immediately see what the problem is here, I hope you
can address it before you send a v2.

     Arnd
Arnd Bergmann Feb. 22, 2024, 9:52 a.m. UTC | #4
On Wed, Feb 21, 2024, at 16:51, Arnd Bergmann wrote:
> I have given your patch some build testing with random
> configurations in my build setup and it seems to work
> fine with gcc/binutils, but unfortunately I came across
> a link failure using clang/lld:

I ran into another bug now, this time with CONFIG_XIP_KERNEL=y:

no __ex_table in file: vmlinux
Failed to sort kernel tables
make[4]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1

Essentially you have to modify arch/arm/kernel/vmlinux-xip.lds.S
the same way as vmlinux.lds.S:

--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -63,7 +63,7 @@ SECTIONS
        . = ALIGN(4);
        __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
                __start___ex_table = .;
-               ARM_MMU_KEEP(*(__ex_table))
+               ARM_MMU_KEEP(KEEP(*(__ex_table)))
                __stop___ex_table = .;
        }
 
@@ -83,7 +83,7 @@ SECTIONS
        }
        .init.arch.info : {
                __arch_info_begin = .;
-               *(.arch.info.init)
+               KEEP(*(.arch.info.init))
                __arch_info_end = .;
        }
        .init.tagtable : {


The pv_table is not needed for XIP_KERNEL=y because that
requires not patching the kernel.

    Arnd
Yuntao Liu Feb. 22, 2024, 11:24 a.m. UTC | #5
On 2024/2/21 23:51, Arnd Bergmann wrote:
> I have given your patch some build testing with random
> configurations in my build setup and it seems to work
> fine with gcc/binutils, but unfortunately I came across
> a link failure using clang/lld:
> 
> ld.lld: error: ./arch/arm/kernel/vmlinux.lds:35: ( expected, but got }
>>>>   __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } .vectors.bhb.loop8 { KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); __vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); __vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); __vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = __vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); __stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors));
>>>>                                                                                            ^
> 
> I don't immediately see what the problem is here, I hope you
> can address it before you send a v2.
> 
>       Arnd

I reproduced this issue with make LLVM=1 ARCH=arm -j320 > ../make.txt.
Based on the position of the caret, I speculate that the issue arises from
lld's inability to recognize the KEEP() command preceding the OUTPUT
section in the OVERLAY command.
 >
The full syntax of the OVERLAY command is as follows: OVERLAY [start] : 
[NOCROSSREFS] [AT ( ldaddr )] { secname1 { output-section-command 
output-section-command … } [:phdr…] [=fill] secname2 { 
output-section-command output-section-command … } [:phdr…] [=fill] … } 
[>region] [:phdr…] [=fill] [,]
 >
I attempted to modify KEEP(*(.vectors)) to KEEP(*(.vectors))(.vectors),
and received the following error message:
ld.lld: error: ./arch/arm/kernel/vmlinux.lds:36: ( expected, but got } > 
__vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { 
KEEP(*(.vectors))(.vectors) } .vectors.bhb.loop8 { 
KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { 
KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); 
__vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); 
__vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); 
__vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + 
SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = 
LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = 
LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = 
__vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) + 
SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 
0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); 
__stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + 
SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - 
ADDR(.vectors)); > ^
The position of the caret has been moved below the right brace
of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
the entire `KEEP(*(.vectors))` as a file name. This could potentially be 
a bug in lld. Perhaps we can temporarily
enable the DCE option only when option LD_IS_LLD is disabled,
like risc-v:
`select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
Yuntao Liu Feb. 22, 2024, 11:32 a.m. UTC | #6
On 2024/2/22 17:52, Arnd Bergmann wrote:
> On Wed, Feb 21, 2024, at 16:51, Arnd Bergmann wrote:
>> I have given your patch some build testing with random
>> configurations in my build setup and it seems to work
>> fine with gcc/binutils, but unfortunately I came across
>> a link failure using clang/lld:
> 
> I ran into another bug now, this time with CONFIG_XIP_KERNEL=y:
> 
> no __ex_table in file: vmlinux
> Failed to sort kernel tables
> make[4]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1
> 
> Essentially you have to modify arch/arm/kernel/vmlinux-xip.lds.S
> the same way as vmlinux.lds.S:
>
Thanks a lot. I didn't consider this situation. I will take your advice. 
Thanks again.

> --- a/arch/arm/kernel/vmlinux-xip.lds.S
> +++ b/arch/arm/kernel/vmlinux-xip.lds.S
> @@ -63,7 +63,7 @@ SECTIONS
>          . = ALIGN(4);
>          __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
>                  __start___ex_table = .;
> -               ARM_MMU_KEEP(*(__ex_table))
> +               ARM_MMU_KEEP(KEEP(*(__ex_table)))
>                  __stop___ex_table = .;
>          }
>   
> @@ -83,7 +83,7 @@ SECTIONS
>          }
>          .init.arch.info : {
>                  __arch_info_begin = .;
> -               *(.arch.info.init)
> +               KEEP(*(.arch.info.init))
>                  __arch_info_end = .;
>          }
>          .init.tagtable : {
> 
> 
> The pv_table is not needed for XIP_KERNEL=y because that
> requires not patching the kernel.
> 
>      Arnd
Arnd Bergmann Feb. 22, 2024, 4:04 p.m. UTC | #7
On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>
> The position of the caret has been moved below the right brace
> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
> the entire `KEEP(*(.vectors))` as a file name. This could potentially be 
> a bug in lld. Perhaps we can temporarily
> enable the DCE option only when option LD_IS_LLD is disabled,
> like risc-v:
>
> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.

I would really like to see this working with lld if at all
possible, as it allows the combination of gc-sections with
lto and CONFIG_TRIM_UNUSED_KSYMS.

I experimented with lld myself now and I did get a booting
kernel even without the the KEEP() on the vectors. I also
see that this is the only use of OVERLAY in the kernel, so
I hope that we can find a way to make it work with existing
lld after all, either without the KEEP or without the OVERLAY.

Did you see any problems without the KEEP() on the vectors?

     Arnd
Yuntao Liu Feb. 23, 2024, 1:39 a.m. UTC | #8
On 2024/2/23 0:04, Arnd Bergmann wrote:
> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>
>> The position of the caret has been moved below the right brace
>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>> a bug in lld. Perhaps we can temporarily
>> enable the DCE option only when option LD_IS_LLD is disabled,
>> like risc-v:
>>
>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
> 
> I would really like to see this working with lld if at all
> possible, as it allows the combination of gc-sections with
> lto and CONFIG_TRIM_UNUSED_KSYMS.
> 
> I experimented with lld myself now and I did get a booting
> kernel even without the the KEEP() on the vectors. I also

When the kernel booted up successfully, was config DCE enabled?

> see that this is the only use of OVERLAY in the kernel, so
> I hope that we can find a way to make it work with existing
> lld after all, either without the KEEP or without the OVERLAY.

Yeah, i will try other way to make it work.

> 
> Did you see any problems without the KEEP() on the vectors?
> 
>       Arnd

Without the KEEP(),disable DCE, qemu boots well,
enable DCE,qemu hangs on startup.
I experimented with lld using vexpress_defconfig.
Arnd Bergmann Feb. 23, 2024, 6:32 a.m. UTC | #9
On Fri, Feb 23, 2024, at 02:39, liuyuntao (F) wrote:
> On 2024/2/23 0:04, Arnd Bergmann wrote:
>> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>>
>>> The position of the caret has been moved below the right brace
>>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>>> a bug in lld. Perhaps we can temporarily
>>> enable the DCE option only when option LD_IS_LLD is disabled,
>>> like risc-v:
>>>
>>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>> 
>> I would really like to see this working with lld if at all
>> possible, as it allows the combination of gc-sections with
>> lto and CONFIG_TRIM_UNUSED_KSYMS.
>> 
>> I experimented with lld myself now and I did get a booting
>> kernel even without the the KEEP() on the vectors. I also
>
> When the kernel booted up successfully, was config DCE enabled?

My mistake. I did have DCE enabled in the kernel I built,
but ended up passing an older image to it in the end,
and that did not boot.

     Arnd
Yuntao Liu Feb. 27, 2024, 8:06 a.m. UTC | #10
On 2024/2/23 0:04, Arnd Bergmann wrote:
> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>
>> The position of the caret has been moved below the right brace
>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>> a bug in lld. Perhaps we can temporarily
>> enable the DCE option only when option LD_IS_LLD is disabled,
>> like risc-v:
>>
>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
> 
> I would really like to see this working with lld if at all
> possible, as it allows the combination of gc-sections with
> lto and CONFIG_TRIM_UNUSED_KSYMS.
> 
> I experimented with lld myself now and I did get a booting
> kernel even without the the KEEP() on the vectors. I also
> see that this is the only use of OVERLAY in the kernel, so
> I hope that we can find a way to make it work with existing
> lld after all, either without the KEEP or without the OVERLAY.
> 
> Did you see any problems without the KEEP() on the vectors?
> 
>       Arnd

Hi, Arnd. I have added a global symbol g_keep1 in .vectors, g_keep2 in 
.vectors.bhb.loop8 and g_keep3 in .vectors.bhb.bpiall respectively. I 
also added another section to reference these three global symbols, to 
prevent these sections from being removed during linking with --gc-sections.

It worked,but there should be a better way to achieve it. the patch:

diff --git a/arch/arm/include/asm/vmlinux.lds.h 
b/arch/arm/include/asm/vmlinux.lds.h
index f2ff79f740ab..d60f6e83a9f7 100644
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -125,13 +125,13 @@
         __vectors_lma = .;                                              \
         OVERLAY 0xffff0000 : NOCROSSREFS AT(__vectors_lma) {            \
                 .vectors {                                              \
-                       KEEP(*(.vectors))                               \
+                       *(.vectors)                                     \
                 }                                                       \
                 .vectors.bhb.loop8 {                                    \
-                       KEEP(*(.vectors.bhb.loop8))                     \
+                       *(.vectors.bhb.loop8)                           \
                 }                                                       \
                 .vectors.bhb.bpiall {                                   \
-                       KEEP(*(.vectors.bhb.bpiall))                    \
+                       *(.vectors.bhb.bpiall)                          \
                 }                                                       \
         }                                                               \
         ARM_LMA(__vectors, .vectors);                                   \
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 6150a716828c..84536e805da0 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -1075,6 +1075,9 @@ THUMB(    .reloc  ., R_ARM_THM_PC12, 
.L__vector_swi               )
         W(b)    vector_addrexcptn
         W(b)    vector_irq
         W(b)    vector_fiq
+       .global g_keep1
+       g_keep1:
+               .word 0

  #ifdef CONFIG_HARDEN_BRANCH_HISTORY
         .section .vectors.bhb.loop8, "ax", %progbits
@@ -1088,6 +1091,9 @@ THUMB(    .reloc  ., R_ARM_THM_PC12, 
.L__vector_bhb_loop8_swi     )
         W(b)    vector_addrexcptn
         W(b)    vector_bhb_loop8_irq
         W(b)    vector_bhb_loop8_fiq
+       .global g_keep2
+       g_keep2:
+               .word 0

         .section .vectors.bhb.bpiall, "ax", %progbits
         W(b)    vector_rst
@@ -1100,6 +1106,9 @@ THUMB(    .reloc  ., R_ARM_THM_PC12, 
.L__vector_bhb_bpiall_swi    )
         W(b)    vector_addrexcptn
         W(b)    vector_bhb_bpiall_irq
         W(b)    vector_bhb_bpiall_fiq
+       .global g_keep3
+       g_keep3:
+               .word 0
  #endif

         .data
@@ -1108,3 +1117,8 @@ THUMB(    .reloc  ., R_ARM_THM_PC12, 
.L__vector_bhb_bpiall_swi    )
         .globl  cr_alignment
  cr_alignment:
         .space  4
+
+.section .keep_vectors, "ax", %progbits
+       LDR r0, =g_keep1
+       LDR r1, =g_keep2
+       LDR r2, =g_keep3
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 01a887c1141a..5cdfb4ba3ac4 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -62,6 +62,7 @@ SECTIONS
         .text : {                       /* Real text segment            */
                 _stext = .;             /* Text and read-only data      */
                 ARM_TEXT
+               KEEP(*(.keep_vectors))
         }

  #ifdef CONFIG_DEBUG_ALIGN_RODATA
Yuntao Liu March 7, 2024, 3:09 a.m. UTC | #11
On 2024/2/27 16:06, liuyuntao (F) wrote:
> 
> 
> On 2024/2/23 0:04, Arnd Bergmann wrote:
>> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>>
>>> The position of the caret has been moved below the right brace
>>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>>> a bug in lld. Perhaps we can temporarily
>>> enable the DCE option only when option LD_IS_LLD is disabled,
>>> like risc-v:
>>>
>>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>>
>> I would really like to see this working with lld if at all
>> possible, as it allows the combination of gc-sections with
>> lto and CONFIG_TRIM_UNUSED_KSYMS.
> 
Another way to preserve .vector sections without using KEEP annotation.
It boots well. How do you feel about this approach? and, could I submit 
a v2 patch?

1: Define ARM_VECTORS_TEXT to explicitly preserve .vectors section.

 > --- a/arch/arm/include/asm/vmlinux.lds.h
 > +++ b/arch/arm/include/asm/vmlinux.lds.h
 > @@ -87,6 +87,13 @@
 >                 *(.vfp11_veneer)                                \
 >                 *(.v4_bx)
 >
 > +#define ARM_VECTORS_TEXT                                       \
 > +       .vectors.text : {                                       \
 > +               KEEP(*(.vectors))                               \
 > +               KEEP(*(.vectors.bhb.loop8))                     \
 > +               KEEP(*(.vectors.bhb.bpiall))                    \
 > +       }
 > +
 >  #define ARM_TEXT                                               \
 >                 IDMAP_TEXT                                      \
 >                 __entry_text_start = .;                         \

2: Ref ARM_VECTORS_TEXT if config HAVE_LD_DEAD_CODE_DATA_ELIMINATION is 
enabled, and the same to arch/arm/kernel/vmlinux.lds.S

 > --- a/arch/arm/kernel/vmlinux-xip.lds.S 

 > +++ b/arch/arm/kernel/vmlinux-xip.lds.S 

 > @@ -135,6 +135,10 @@ SECTIONS 

 >         ARM_TCM 

 >  #endif 

 > 

 > +#ifdef HAVE_LD_DEAD_CODE_DATA_ELIMINATION 

 > +       ARM_VECTORS_TEXT 

 > +#endif 

 > + 

 >         /* 

 >          * End of copied data. We need a dummy section to get its 
LMA.
 >          * Also located before final ALIGN() as trailing padding is
 > not stored
Arnd Bergmann March 7, 2024, 7:29 a.m. UTC | #12
On Thu, Mar 7, 2024, at 04:09, liuyuntao (F) wrote:
> On 2024/2/27 16:06, liuyuntao (F) wrote:
>> 
>> 
>> On 2024/2/23 0:04, Arnd Bergmann wrote:
>>> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>>>
>>>> The position of the caret has been moved below the right brace
>>>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>>>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>>>> a bug in lld. Perhaps we can temporarily
>>>> enable the DCE option only when option LD_IS_LLD is disabled,
>>>> like risc-v:
>>>>
>>>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>>>
>>> I would really like to see this working with lld if at all
>>> possible, as it allows the combination of gc-sections with
>>> lto and CONFIG_TRIM_UNUSED_KSYMS.
>> 
> Another way to preserve .vector sections without using KEEP annotation.
> It boots well. How do you feel about this approach? and, could I submit 
> a v2 patch?

Yes, if that works, please submit it as v2, we'll see if anyone
has concerns about the new version then.

     Arnd
diff mbox series

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0af6709570d1..de78ceb821df 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -113,6 +113,7 @@  config ARM
 	select HAVE_KERNEL_XZ
 	select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M
 	select HAVE_KRETPROBES if HAVE_KPROBES
+	select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
 	select HAVE_MOD_ARCH_SPECIFIC
 	select HAVE_NMI
 	select HAVE_OPTPROBES if !THUMB2_KERNEL
diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
index 3fcb3e62dc56..da21244aa892 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.S
+++ b/arch/arm/boot/compressed/vmlinux.lds.S
@@ -89,7 +89,7 @@  SECTIONS
      * The EFI stub always executes from RAM, and runs strictly before the
      * decompressor, so we can make an exception for its r/w data, and keep it
      */
-    *(.data.efistub .bss.efistub)
+    *(.data.* .bss.*)
     __pecoff_data_end = .;
 
     /*
@@ -125,7 +125,7 @@  SECTIONS
 
   . = BSS_START;
   __bss_start = .;
-  .bss			: { *(.bss) }
+  .bss			: { *(.bss .bss.*) }
   _end = .;
 
   . = ALIGN(8);		/* the stack must be 64-bit aligned */
diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h
index 4c8632d5c432..f2ff79f740ab 100644
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -42,7 +42,7 @@ 
 #define PROC_INFO							\
 		. = ALIGN(4);						\
 		__proc_info_begin = .;					\
-		*(.proc.info.init)					\
+		KEEP(*(.proc.info.init))				\
 		__proc_info_end = .;
 
 #define IDMAP_TEXT							\
@@ -125,13 +125,13 @@ 
 	__vectors_lma = .;						\
 	OVERLAY 0xffff0000 : NOCROSSREFS AT(__vectors_lma) {		\
 		.vectors {						\
-			*(.vectors)					\
+			KEEP(*(.vectors))				\
 		}							\
 		.vectors.bhb.loop8 {					\
-			*(.vectors.bhb.loop8)				\
+			KEEP(*(.vectors.bhb.loop8))			\
 		}							\
 		.vectors.bhb.bpiall {					\
-			*(.vectors.bhb.bpiall)				\
+			KEEP(*(.vectors.bhb.bpiall))			\
 		}							\
 	}								\
 	ARM_LMA(__vectors, .vectors);					\
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index bd9127c4b451..de373c6c2ae8 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -74,7 +74,7 @@  SECTIONS
 	. = ALIGN(4);
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
 		__start___ex_table = .;
-		ARM_MMU_KEEP(*(__ex_table))
+		ARM_MMU_KEEP(KEEP(*(__ex_table)))
 		__stop___ex_table = .;
 	}
 
@@ -99,7 +99,7 @@  SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
@@ -116,7 +116,7 @@  SECTIONS
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}