diff mbox series

[PATCH-next,v4] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

Message ID 20240316023932.700685-1-liuyuntao12@huawei.com (mailing list archive)
State New, archived
Headers show
Series [PATCH-next,v4] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION | expand

Commit Message

liuyuntao (F) March 16, 2024, 2:39 a.m. UTC
The current arm32 architecture does not yet support the
HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
embedded scenarios, and enabling this feature would be beneficial for
reducing the size of the kernel image.

In order to make this work, we keep the necessary tables by annotating
them with KEEP, also it requires further changes to linker script to KEEP
some tables and wildcard compiler generated sections into the right place.
When using ld.lld for linking, KEEP is not recognized within the OVERLAY
command, and Ard proposed a concise method to solve this problem.

It boots normally with defconfig, vexpress_defconfig and tinyconfig.

The size comparison of zImage is as follows:
defconfig       vexpress_defconfig      tinyconfig
5137712         5138024                 424192          no dce
5032560         4997824                 298384          dce
2.0%            2.7%                    29.7%           shrink

When using smaller config file, there is a significant reduction in the
size of the zImage.

We also tested this patch on a commercially available single-board
computer, and the comparison is as follows:
a15eb_config
2161384         no dce
2092240         dce
3.2%            shrink

The zImage size has been reduced by approximately 3.2%, which is 70KB on
2.1M.

Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
---
v4:
   - remove -fdata-sections flag from KBUILD_CFLAGS_KERNEL in drivers/firmware/efi/libstub/Makefile

v3:
   - A better way to KEEP .vectors section for ld.lld linking.
   - https://lore.kernel.org/all/20240315063154.696633-1-liuyuntao12@huawei.com/

v2:
   - Support config XIP_KERNEL.
   - Support LLVM compilation.
   - https://lore.kernel.org/all/20240307151231.654025-1-liuyuntao12@huawei.com/

v1: https://lore.kernel.org/all/20240220081527.23408-1-liuyuntao12@huawei.com/
---
 arch/arm/Kconfig                       | 1 +
 arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
 arch/arm/include/asm/vmlinux.lds.h     | 2 +-
 arch/arm/kernel/entry-armv.S           | 3 +++
 arch/arm/kernel/vmlinux-xip.lds.S      | 4 ++--
 arch/arm/kernel/vmlinux.lds.S          | 6 +++---
 drivers/firmware/efi/libstub/Makefile  | 4 ++++
 7 files changed, 15 insertions(+), 7 deletions(-)

Comments

Ard Biesheuvel March 19, 2024, 5:12 p.m. UTC | #1
On Sat, 16 Mar 2024 at 03:44, Yuntao Liu <liuyuntao12@huawei.com> wrote:
>
> The current arm32 architecture does not yet support the
> HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
> embedded scenarios, and enabling this feature would be beneficial for
> reducing the size of the kernel image.
>
> In order to make this work, we keep the necessary tables by annotating
> them with KEEP, also it requires further changes to linker script to KEEP
> some tables and wildcard compiler generated sections into the right place.
> When using ld.lld for linking, KEEP is not recognized within the OVERLAY
> command, and Ard proposed a concise method to solve this problem.
>
> It boots normally with defconfig, vexpress_defconfig and tinyconfig.
>
> The size comparison of zImage is as follows:
> defconfig       vexpress_defconfig      tinyconfig
> 5137712         5138024                 424192          no dce
> 5032560         4997824                 298384          dce
> 2.0%            2.7%                    29.7%           shrink
>
> When using smaller config file, there is a significant reduction in the
> size of the zImage.
>
> We also tested this patch on a commercially available single-board
> computer, and the comparison is as follows:
> a15eb_config
> 2161384         no dce
> 2092240         dce
> 3.2%            shrink
>
> The zImage size has been reduced by approximately 3.2%, which is 70KB on
> 2.1M.
>
> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
> Tested-by: Arnd Bergmann <arnd@arndb.de>
> Reviewed-by: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
liuyuntao (F) June 3, 2024, 12:55 p.m. UTC | #2
Gentle ping

On 2024/3/20 1:12, Ard Biesheuvel wrote:
> On Sat, 16 Mar 2024 at 03:44, Yuntao Liu <liuyuntao12@huawei.com> wrote:
>>
>> The current arm32 architecture does not yet support the
>> HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
>> embedded scenarios, and enabling this feature would be beneficial for
>> reducing the size of the kernel image.
>>
>> In order to make this work, we keep the necessary tables by annotating
>> them with KEEP, also it requires further changes to linker script to KEEP
>> some tables and wildcard compiler generated sections into the right place.
>> When using ld.lld for linking, KEEP is not recognized within the OVERLAY
>> command, and Ard proposed a concise method to solve this problem.
>>
>> It boots normally with defconfig, vexpress_defconfig and tinyconfig.
>>
>> The size comparison of zImage is as follows:
>> defconfig       vexpress_defconfig      tinyconfig
>> 5137712         5138024                 424192          no dce
>> 5032560         4997824                 298384          dce
>> 2.0%            2.7%                    29.7%           shrink
>>
>> When using smaller config file, there is a significant reduction in the
>> size of the zImage.
>>
>> We also tested this patch on a commercially available single-board
>> computer, and the comparison is as follows:
>> a15eb_config
>> 2161384         no dce
>> 2092240         dce
>> 3.2%            shrink
>>
>> The zImage size has been reduced by approximately 3.2%, which is 70KB on
>> 2.1M.
>>
>> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
>> Tested-by: Arnd Bergmann <arnd@arndb.de>
>> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
> 
> Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Linus Walleij June 3, 2024, 1:47 p.m. UTC | #3
Hi liuyuntao,

On Mon, Jun 3, 2024 at 2:55 PM liuyuntao (F) <liuyuntao12@huawei.com> wrote:

> Gentle ping

FWIW:
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

I see you put versions into Russell's patch tracker, but please mark
these as "superseded" (you can do this in the web UI) and put in this
version of the patch based on v6.10-rc1 and tested. Basing on the
-rc1 is usually best for development work.

Yours,
Linus Walleij
liuyuntao (F) June 3, 2024, 2:15 p.m. UTC | #4
OK.
Thanks~

On 2024/6/3 21:47, Linus Walleij wrote:
> Hi liuyuntao,
> 
> On Mon, Jun 3, 2024 at 2:55 PM liuyuntao (F) <liuyuntao12@huawei.com> wrote:
> 
>> Gentle ping
> 
> FWIW:
> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
> 
> I see you put versions into Russell's patch tracker, but please mark
> these as "superseded" (you can do this in the web UI) and put in this
> version of the patch based on v6.10-rc1 and tested. Basing on the
> -rc1 is usually best for development work.
> 
> Yours,
> Linus Walleij
Geert Uytterhoeven June 11, 2024, 12:45 p.m. UTC | #5
Hi Yuntao,

On Sat, Mar 16, 2024 at 3:44 AM Yuntao Liu <liuyuntao12@huawei.com> wrote:
> The current arm32 architecture does not yet support the
> HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
> embedded scenarios, and enabling this feature would be beneficial for
> reducing the size of the kernel image.
>
> In order to make this work, we keep the necessary tables by annotating
> them with KEEP, also it requires further changes to linker script to KEEP
> some tables and wildcard compiler generated sections into the right place.
> When using ld.lld for linking, KEEP is not recognized within the OVERLAY
> command, and Ard proposed a concise method to solve this problem.
>
> It boots normally with defconfig, vexpress_defconfig and tinyconfig.
>
> The size comparison of zImage is as follows:
> defconfig       vexpress_defconfig      tinyconfig
> 5137712         5138024                 424192          no dce
> 5032560         4997824                 298384          dce
> 2.0%            2.7%                    29.7%           shrink
>
> When using smaller config file, there is a significant reduction in the
> size of the zImage.
>
> We also tested this patch on a commercially available single-board
> computer, and the comparison is as follows:
> a15eb_config
> 2161384         no dce
> 2092240         dce
> 3.2%            shrink
>
> The zImage size has been reduced by approximately 3.2%, which is 70KB on
> 2.1M.
>
> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
> Tested-by: Arnd Bergmann <arnd@arndb.de>
> Reviewed-by: Arnd Bergmann <arnd@arndb.de>

Thanks for your patch, which is now commit ed0f941022515ff4 ("ARM:
9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION") in
arm/for-next (next-20240611).

I gave this a try on my custom configs for RSK+RZA1 (RZ/A1H)
and RZA2MEVB (RZ/A2M).  According to bloat-o-meter, enabling
HAVE_LD_DEAD_CODE_DATA_ELIMINATION reduced kernel size by almost
500 KiB (-8.3%).  The figures reported in "Memory: ... available"
were even more impressive: 1032 KiB more free memory than before.

As these boards have only 32 resp. 64 MiB of RAM, and some products
even use RZ/A1H with just the 10 MiB of on-chip SRAM, this is a good
improvement to have!
Thanks!

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert
liuyuntao (F) June 12, 2024, 1:54 a.m. UTC | #6
That's great, thanks for your testing.

On 2024/6/11 20:45, Geert Uytterhoeven wrote:
> Hi Yuntao,
> 
> On Sat, Mar 16, 2024 at 3:44 AM Yuntao Liu <liuyuntao12@huawei.com> wrote:
>> The current arm32 architecture does not yet support the
>> HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
>> embedded scenarios, and enabling this feature would be beneficial for
>> reducing the size of the kernel image.
>>
>> In order to make this work, we keep the necessary tables by annotating
>> them with KEEP, also it requires further changes to linker script to KEEP
>> some tables and wildcard compiler generated sections into the right place.
>> When using ld.lld for linking, KEEP is not recognized within the OVERLAY
>> command, and Ard proposed a concise method to solve this problem.
>>
>> It boots normally with defconfig, vexpress_defconfig and tinyconfig.
>>
>> The size comparison of zImage is as follows:
>> defconfig       vexpress_defconfig      tinyconfig
>> 5137712         5138024                 424192          no dce
>> 5032560         4997824                 298384          dce
>> 2.0%            2.7%                    29.7%           shrink
>>
>> When using smaller config file, there is a significant reduction in the
>> size of the zImage.
>>
>> We also tested this patch on a commercially available single-board
>> computer, and the comparison is as follows:
>> a15eb_config
>> 2161384         no dce
>> 2092240         dce
>> 3.2%            shrink
>>
>> The zImage size has been reduced by approximately 3.2%, which is 70KB on
>> 2.1M.
>>
>> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
>> Tested-by: Arnd Bergmann <arnd@arndb.de>
>> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
> 
> Thanks for your patch, which is now commit ed0f941022515ff4 ("ARM:
> 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION") in
> arm/for-next (next-20240611).
> 
> I gave this a try on my custom configs for RSK+RZA1 (RZ/A1H)
> and RZA2MEVB (RZ/A2M).  According to bloat-o-meter, enabling
> HAVE_LD_DEAD_CODE_DATA_ELIMINATION reduced kernel size by almost
> 500 KiB (-8.3%).  The figures reported in "Memory: ... available"
> were even more impressive: 1032 KiB more free memory than before.
> 
> As these boards have only 32 resp. 64 MiB of RAM, and some products
> even use RZ/A1H with just the 10 MiB of on-chip SRAM, this is a good
> improvement to have!
> Thanks!
> 
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> 
> Gr{oetje,eeting}s,
> 
>                          Geert
>
diff mbox series

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b14aed3a17ab..45f25f6e7a62 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -114,6 +114,7 @@  config ARM
 	select HAVE_KERNEL_XZ
 	select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M
 	select HAVE_KRETPROBES if HAVE_KPROBES
+	select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
 	select HAVE_MOD_ARCH_SPECIFIC
 	select HAVE_NMI
 	select HAVE_OPTPROBES if !THUMB2_KERNEL
diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
index 3fcb3e62dc56..d411abd4310e 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.S
+++ b/arch/arm/boot/compressed/vmlinux.lds.S
@@ -125,7 +125,7 @@  SECTIONS
 
   . = BSS_START;
   __bss_start = .;
-  .bss			: { *(.bss) }
+  .bss			: { *(.bss .bss.*) }
   _end = .;
 
   . = ALIGN(8);		/* the stack must be 64-bit aligned */
diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h
index 4c8632d5c432..d60f6e83a9f7 100644
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -42,7 +42,7 @@ 
 #define PROC_INFO							\
 		. = ALIGN(4);						\
 		__proc_info_begin = .;					\
-		*(.proc.info.init)					\
+		KEEP(*(.proc.info.init))				\
 		__proc_info_end = .;
 
 #define IDMAP_TEXT							\
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 6150a716828c..f01d23a220e6 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -1065,6 +1065,7 @@  vector_addrexcptn:
 	.globl	vector_fiq
 
 	.section .vectors, "ax", %progbits
+	.reloc  .text, R_ARM_NONE, .
 	W(b)	vector_rst
 	W(b)	vector_und
 ARM(	.reloc	., R_ARM_LDR_PC_G0, .L__vector_swi		)
@@ -1078,6 +1079,7 @@  THUMB(	.reloc	., R_ARM_THM_PC12, .L__vector_swi		)
 
 #ifdef CONFIG_HARDEN_BRANCH_HISTORY
 	.section .vectors.bhb.loop8, "ax", %progbits
+	.reloc  .text, R_ARM_NONE, .
 	W(b)	vector_rst
 	W(b)	vector_bhb_loop8_und
 ARM(	.reloc	., R_ARM_LDR_PC_G0, .L__vector_bhb_loop8_swi	)
@@ -1090,6 +1092,7 @@  THUMB(	.reloc	., R_ARM_THM_PC12, .L__vector_bhb_loop8_swi	)
 	W(b)	vector_bhb_loop8_fiq
 
 	.section .vectors.bhb.bpiall, "ax", %progbits
+	.reloc  .text, R_ARM_NONE, .
 	W(b)	vector_rst
 	W(b)	vector_bhb_bpiall_und
 ARM(	.reloc	., R_ARM_LDR_PC_G0, .L__vector_bhb_bpiall_swi	)
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
index c16d196b5aad..5eddb75a7174 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -63,7 +63,7 @@  SECTIONS
 	. = ALIGN(4);
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
 		__start___ex_table = .;
-		ARM_MMU_KEEP(*(__ex_table))
+		ARM_MMU_KEEP(KEEP(*(__ex_table)))
 		__stop___ex_table = .;
 	}
 
@@ -83,7 +83,7 @@  SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index bd9127c4b451..de373c6c2ae8 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -74,7 +74,7 @@  SECTIONS
 	. = ALIGN(4);
 	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
 		__start___ex_table = .;
-		ARM_MMU_KEEP(*(__ex_table))
+		ARM_MMU_KEEP(KEEP(*(__ex_table)))
 		__stop___ex_table = .;
 	}
 
@@ -99,7 +99,7 @@  SECTIONS
 	}
 	.init.arch.info : {
 		__arch_info_begin = .;
-		*(.arch.info.init)
+		KEEP(*(.arch.info.init))
 		__arch_info_end = .;
 	}
 	.init.tagtable : {
@@ -116,7 +116,7 @@  SECTIONS
 #endif
 	.init.pv_table : {
 		__pv_table_begin = .;
-		*(.pv_table)
+		KEEP(*(.pv_table))
 		__pv_table_end = .;
 	}
 
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 31eb1e287ce1..619f1f9d9ba9 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -56,6 +56,10 @@  KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_CFI), $(KBUILD_CFLAGS))
 # disable LTO
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
 
+# The .data section would be renamed to .data.efistub, therefore, remove
+# `-fdata-sections` flag from KBUILD_CFLAGS_KERNEL
+KBUILD_CFLAGS_KERNEL := $(filter-out -fdata-sections, $(KBUILD_CFLAGS_KERNEL))
+
 GCOV_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n