diff mbox series

[RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

Message ID 20210225112122.2198845-1-arnd@kernel.org (mailing list archive)
State New, archived
Headers show
Series [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION | expand

Commit Message

Arnd Bergmann Feb. 25, 2021, 11:20 a.m. UTC
From: Arnd Bergmann <arnd@arndb.de>

When looking at kernel size optimizations, I found that arm64
does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
which enables the --gc-sections flag to the linker.

I see that for a defconfig build with llvm, there are some
notable improvements from enabling this, in particular when
combined with the recently added CONFIG_LTO_CLANG_THIN
and CONFIG_TRIM_UNUSED_KSYMS:

   text    data     bss     dec     hex filename
16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux

I needed a small change to the linker script to get clean randconfig
builds, but I have not done any meaningful boot testing on it to
see if it works. If there are no regressions, I wonder whether this
should be autmatically done for LTO builds, given that it improves
both kernel size and compile speed.

Link: https://lore.kernel.org/lkml/CAK8P3a05VZ9hSKRzVTxTn+1nf9E+gqebJWTj6N23nfm+ELHt9A@mail.gmail.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm64/Kconfig              | 1 +
 arch/arm64/kernel/vmlinux.lds.S | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

Kees Cook Feb. 25, 2021, 8:16 p.m. UTC | #1
On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> When looking at kernel size optimizations, I found that arm64
> does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
> which enables the --gc-sections flag to the linker.
> 
> I see that for a defconfig build with llvm, there are some
> notable improvements from enabling this, in particular when
> combined with the recently added CONFIG_LTO_CLANG_THIN
> and CONFIG_TRIM_UNUSED_KSYMS:
> 
>    text    data     bss     dec     hex filename
> 16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
> 16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
> 16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
> 16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
> 17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
> 16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
> 16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
> 15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux
> 
> I needed a small change to the linker script to get clean randconfig
> builds, but I have not done any meaningful boot testing on it to
> see if it works. If there are no regressions, I wonder whether this
> should be autmatically done for LTO builds, given that it improves
> both kernel size and compile speed.
> 
> Link: https://lore.kernel.org/lkml/CAK8P3a05VZ9hSKRzVTxTn+1nf9E+gqebJWTj6N23nfm+ELHt9A@mail.gmail.com/
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Kees Cook <keescook@chromium.org>
Sedat Dilek Feb. 26, 2021, 12:36 a.m. UTC | #2
On Thu, Feb 25, 2021 at 12:21 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> When looking at kernel size optimizations, I found that arm64
> does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
> which enables the --gc-sections flag to the linker.
>
> I see that for a defconfig build with llvm, there are some
> notable improvements from enabling this, in particular when
> combined with the recently added CONFIG_LTO_CLANG_THIN
> and CONFIG_TRIM_UNUSED_KSYMS:
>
>    text    data     bss     dec     hex filename
> 16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
> 16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
> 16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
> 16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
> 17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
> 16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
> 16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
> 15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux
>

Thanks for the numbers.
Does CONFIG_TRIM_UNUSED_KSYMS=y have an impact to the build-time (and
disc-usage - negative way means longer/bigger)?
Do you have any build-time for the above numbers?

BTW, is CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y setable for x86 (64bit)?
( Did not look or check for it. )

- Sedat -

> I needed a small change to the linker script to get clean randconfig
> builds, but I have not done any meaningful boot testing on it to
> see if it works. If there are no regressions, I wonder whether this
> should be autmatically done for LTO builds, given that it improves
> both kernel size and compile speed.
>
> Link: https://lore.kernel.org/lkml/CAK8P3a05VZ9hSKRzVTxTn+1nf9E+gqebJWTj6N23nfm+ELHt9A@mail.gmail.com/
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm64/Kconfig              | 1 +
>  arch/arm64/kernel/vmlinux.lds.S | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index b94a678afce4..75e13cc52928 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2,6 +2,7 @@
>  config ARM64
>         def_bool y
>         select ACPI_CCA_REQUIRED if ACPI
> +       select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
>         select ACPI_GENERIC_GSI if ACPI
>         select ACPI_GTDT if ACPI
>         select ACPI_IORT if ACPI
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index bad2b9eaab22..926cdb597a45 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -217,7 +217,7 @@ SECTIONS
>                 INIT_CALLS
>                 CON_INITCALL
>                 INIT_RAM_FS
> -               *(.init.altinstructions .init.bss .init.bss.*)  /* from the EFI stub */
> +               *(.init.altinstructions .init.data.* .init.bss .init.bss.*)     /* from the EFI stub */
>         }
>         .exit.data : {
>                 EXIT_DATA
> --
> 2.29.2
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210225112122.2198845-1-arnd%40kernel.org.
Arnd Bergmann Feb. 26, 2021, 8:14 a.m. UTC | #3
On Fri, Feb 26, 2021 at 1:36 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Thu, Feb 25, 2021 at 12:21 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > When looking at kernel size optimizations, I found that arm64
> > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
> > which enables the --gc-sections flag to the linker.
> >
> > I see that for a defconfig build with llvm, there are some
> > notable improvements from enabling this, in particular when
> > combined with the recently added CONFIG_LTO_CLANG_THIN
> > and CONFIG_TRIM_UNUSED_KSYMS:
> >
> >    text    data     bss     dec     hex filename
> > 16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
> > 16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
> > 16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
> > 16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
> > 17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
> > 16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
> > 16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
> > 15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux
> >
>
> Thanks for the numbers.
> Does CONFIG_TRIM_UNUSED_KSYMS=y have an impact to the build-time (and
> disc-usage - negative way means longer/bigger)?
> Do you have any build-time for the above numbers?

They are in the mailing list archive I linked to:

==== defconfig ====
     332.001786355 seconds time elapsed
    8599.464163000 seconds user
     676.919635000 seconds sys
==== trim_defconfig ====
     448.378576012 seconds time elapsed
   10735.489271000 seconds user
     964.006504000 seconds sys
==== gc_defconfig ====
     324.347492236 seconds time elapsed
    8465.785800000 seconds user
     614.899797000 seconds sys
==== gc+trim_defconfig ====
     429.188875620 seconds time elapsed
   10203.759658000 seconds user
     871.307973000 seconds sys
==== thinlto_defconfig ====
     389.793540200 seconds time elapsed
    9491.665320000 seconds user
     664.858109000 seconds sys
==== thinlto+trim_defconfig ====
     580.431820561 seconds time elapsed
   11429.515538000 seconds user
    1056.985745000 seconds sys
==== gc+thinlto_defconfig ====
     389.484364525 seconds time elapsed
    9473.831980000 seconds user
     675.057675000 seconds sys
==== gc+thinlto+trim_defconfig ====
     580.824912807 seconds time elapsed
   11433.650337000 seconds user
    1049.845569000 seconds sys

So HAVE_LD_DEAD_CODE_DATA_ELIMINATION is a small improvement
on build time (since it can spend less time linking), while
CONFIG_TRIM_UNUSED_KSYMS slows it down quite a bit. Combining
CONFIG_TRIM_UNUSED_KSYMS with CONFIG_THINLTO is really
slow because here most of the time is spent in the final link (especially
when you have many CPU cores to do the earlier bits quickly), but then
it does the link twice.

> BTW, is CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y setable for x86 (64bit)?
> ( Did not look or check for it. )

No, in mainline, HAVE_LD_DEAD_CODE_DATA_ELIMINATION is currently
only selected on MIPS and PowerPC. I only sent experimental patches to
enable it on arm64 and m68k, but have not tried booting them. If you
select the symbol on x86, you should see similar results.

       Arnd
Sedat Dilek Feb. 26, 2021, 9:05 a.m. UTC | #4
On Fri, Feb 26, 2021 at 9:14 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Fri, Feb 26, 2021 at 1:36 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Thu, Feb 25, 2021 at 12:21 PM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > From: Arnd Bergmann <arnd@arndb.de>
> > >
> > > When looking at kernel size optimizations, I found that arm64
> > > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
> > > which enables the --gc-sections flag to the linker.
> > >
> > > I see that for a defconfig build with llvm, there are some
> > > notable improvements from enabling this, in particular when
> > > combined with the recently added CONFIG_LTO_CLANG_THIN
> > > and CONFIG_TRIM_UNUSED_KSYMS:
> > >
> > >    text    data     bss     dec     hex filename
> > > 16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
> > > 16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
> > > 16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
> > > 16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
> > > 17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
> > > 16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
> > > 16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
> > > 15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux
> > >
> >
> > Thanks for the numbers.
> > Does CONFIG_TRIM_UNUSED_KSYMS=y have an impact to the build-time (and
> > disc-usage - negative way means longer/bigger)?
> > Do you have any build-time for the above numbers?
>
> They are in the mailing list archive I linked to:
>
> ==== defconfig ====
>      332.001786355 seconds time elapsed
>     8599.464163000 seconds user
>      676.919635000 seconds sys
> ==== trim_defconfig ====
>      448.378576012 seconds time elapsed
>    10735.489271000 seconds user
>      964.006504000 seconds sys
> ==== gc_defconfig ====
>      324.347492236 seconds time elapsed
>     8465.785800000 seconds user
>      614.899797000 seconds sys
> ==== gc+trim_defconfig ====
>      429.188875620 seconds time elapsed
>    10203.759658000 seconds user
>      871.307973000 seconds sys
> ==== thinlto_defconfig ====
>      389.793540200 seconds time elapsed
>     9491.665320000 seconds user
>      664.858109000 seconds sys
> ==== thinlto+trim_defconfig ====
>      580.431820561 seconds time elapsed
>    11429.515538000 seconds user
>     1056.985745000 seconds sys
> ==== gc+thinlto_defconfig ====
>      389.484364525 seconds time elapsed
>     9473.831980000 seconds user
>      675.057675000 seconds sys
> ==== gc+thinlto+trim_defconfig ====
>      580.824912807 seconds time elapsed
>    11433.650337000 seconds user
>     1049.845569000 seconds sys
>

Thanks for the numbers Arnd.

> So HAVE_LD_DEAD_CODE_DATA_ELIMINATION is a small improvement
> on build time (since it can spend less time linking), while
> CONFIG_TRIM_UNUSED_KSYMS slows it down quite a bit. Combining
> CONFIG_TRIM_UNUSED_KSYMS with CONFIG_THINLTO is really
> slow because here most of the time is spent in the final link (especially
> when you have many CPU cores to do the earlier bits quickly), but then
> it does the link twice.
>

My first pre-v5.12-rc1 kernel-build was with Clang-ThinLTO enabled.
But with the next ones I jumped to Sami's Clang-CFI.

> > BTW, is CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y setable for x86 (64bit)?
> > ( Did not look or check for it. )
>
> No, in mainline, HAVE_LD_DEAD_CODE_DATA_ELIMINATION is currently
> only selected on MIPS and PowerPC. I only sent experimental patches to
> enable it on arm64 and m68k, but have not tried booting them. If you
> select the symbol on x86, you should see similar results.
>

OK, i see:

$ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/mips/
arch/mips/Kconfig:      select HAVE_LD_DEAD_CODE_DATA_ELIMINATION

$ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/powerpc/
arch/powerpc/Kconfig:   select HAVE_LD_DEAD_CODE_DATA_ELIMINATION

So, I need to add this to arch/x86/Kconfig.

You happen to know if changes to arch/x86/kernel/vmlinux.lds.S
(sections) are needed?

Last question:
The last days I see a lot of fixes touching inlining with LLVM/Clang v13-git.
What git tag are you using?
What are your experiences?
Pending patches (kernel-side)?

I use:
$ /opt/llvm-toolchain/bin/clang --version
dileks clang version 13.0.0 (https://github.com/llvm/llvm-project.git
c465429f286f50e52a8d2b3b39f38344f3381cce)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm-toolchain/bin

My LLVM toolchain is ThinLTO+PGO optimized for Linux-kernel builds.

- Sedat -
Arnd Bergmann Feb. 26, 2021, 9:51 a.m. UTC | #5
On Fri, Feb 26, 2021 at 10:05 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> On Fri, Feb 26, 2021 at 9:14 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> > > BTW, is CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y setable for x86 (64bit)?
> > > ( Did not look or check for it. )
> >
> > No, in mainline, HAVE_LD_DEAD_CODE_DATA_ELIMINATION is currently
> > only selected on MIPS and PowerPC. I only sent experimental patches to
> > enable it on arm64 and m68k, but have not tried booting them. If you
> > select the symbol on x86, you should see similar results.
> >
>
> OK, i see:
>
> $ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/mips/
> arch/mips/Kconfig:      select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
>
> $ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/powerpc/
> arch/powerpc/Kconfig:   select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
>
> So, I need to add this to arch/x86/Kconfig.
>
> You happen to know if changes to arch/x86/kernel/vmlinux.lds.S
> (sections) are needed?

No idea. I'm still debugging a possible regression on arm64, but both
issues I found for arm64 are specific to that architecture and won't
happen on x86. It's likely that something else breaks though.

> Last question:
> The last days I see a lot of fixes touching inlining with LLVM/Clang v13-git.
> What git tag are you using?
> What are your experiences?
> Pending patches (kernel-side)?
>
> I use:
> $ /opt/llvm-toolchain/bin/clang --version
> dileks clang version 13.0.0 (https://github.com/llvm/llvm-project.git
> c465429f286f50e52a8d2b3b39f38344f3381cce)

This is what I have on the build box:
Ubuntu clang version
13.0.0-++20210223104451+ebca13c66504-1~exp1~20210223095200.234

        Arnd
Sedat Dilek Feb. 26, 2021, 10:02 a.m. UTC | #6
On Fri, Feb 26, 2021 at 10:51 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Fri, Feb 26, 2021 at 10:05 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > On Fri, Feb 26, 2021 at 9:14 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > > > BTW, is CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y setable for x86 (64bit)?
> > > > ( Did not look or check for it. )
> > >
> > > No, in mainline, HAVE_LD_DEAD_CODE_DATA_ELIMINATION is currently
> > > only selected on MIPS and PowerPC. I only sent experimental patches to
> > > enable it on arm64 and m68k, but have not tried booting them. If you
> > > select the symbol on x86, you should see similar results.
> > >
> >
> > OK, i see:
> >
> > $ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/mips/
> > arch/mips/Kconfig:      select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> >
> > $ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/powerpc/
> > arch/powerpc/Kconfig:   select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> >
> > So, I need to add this to arch/x86/Kconfig.
> >
> > You happen to know if changes to arch/x86/kernel/vmlinux.lds.S
> > (sections) are needed?
>
> No idea. I'm still debugging a possible regression on arm64, but both
> issues I found for arm64 are specific to that architecture and won't
> happen on x86. It's likely that something else breaks though.
>

My first experiences with gc were with the freetz project (mips
embedded router) and don't ask me what GCC version.

I will try with gc + trim + cfi later after my current build has finished.

> > Last question:
> > The last days I see a lot of fixes touching inlining with LLVM/Clang v13-git.
> > What git tag are you using?
> > What are your experiences?
> > Pending patches (kernel-side)?
> >
> > I use:
> > $ /opt/llvm-toolchain/bin/clang --version
> > dileks clang version 13.0.0 (https://github.com/llvm/llvm-project.git
> > c465429f286f50e52a8d2b3b39f38344f3381cce)
>
> This is what I have on the build box:
> Ubuntu clang version
> 13.0.0-++20210223104451+ebca13c66504-1~exp1~20210223095200.234
>

Distro-clang takes much longer here.
Selfmade stage1-only LLVM toolchain compiles here 10% faster.
cfi takes approx. 20% longer.
With trim + gc I suppose it will take much longer.

Let me test.
Will report later.

- Sedat -
Fangrui Song Feb. 26, 2021, 9:13 p.m. UTC | #7
On 2021-02-25, Arnd Bergmann wrote:
>From: Arnd Bergmann <arnd@arndb.de>
>
>When looking at kernel size optimizations, I found that arm64
>does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
>which enables the --gc-sections flag to the linker.
>
>I see that for a defconfig build with llvm, there are some
>notable improvements from enabling this, in particular when
>combined with the recently added CONFIG_LTO_CLANG_THIN
>and CONFIG_TRIM_UNUSED_KSYMS:
>
>   text    data     bss     dec     hex filename
>16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
>16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
>16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
>16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
>17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
>16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
>16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
>15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux
>
>I needed a small change to the linker script to get clean randconfig
>builds, but I have not done any meaningful boot testing on it to
>see if it works. If there are no regressions, I wonder whether this
>should be autmatically done for LTO builds, given that it improves
>both kernel size and compile speed.
>
>Link: https://lore.kernel.org/lkml/CAK8P3a05VZ9hSKRzVTxTn+1nf9E+gqebJWTj6N23nfm+ELHt9A@mail.gmail.com/
>Signed-off-by: Arnd Bergmann <arnd@arndb.de>

For folks who are interested in --gc-sections on metadata sections,
I want to bring you awareness of the implication of __start_/__stop_ symbols and C identifier name sections.
You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
(Its linked blog article has some examples.)

In the kernel linker scripts, most C identifier name sections begin with double-underscore __.
Some are surrounded by `KEEP(...)`, some are not.

* A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
* Without `KEEP`, __start_/__stop_ references from a live input section
   can unnecessarily retain all the associated C identifier name input
   sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.

As an example, a __start___jump_table reference from a live section
causes all `__jump_table` input section to be retained, even if you
change `KEEP(__jump_table)` to `(__jump_table)`.
(If you change the symbol name from `__start_${section}` to something
else (e.g. `__start${section}`), the rule will not apply.)


There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
ld --gc-sections.
Arnd Bergmann Feb. 27, 2021, 9:49 a.m. UTC | #8
On Fri, Feb 26, 2021 at 10:13 PM 'Fangrui Song' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> For folks who are interested in --gc-sections on metadata sections,
> I want to bring you awareness of the implication of __start_/__stop_ symbols and C identifier name sections.
> You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
> (Its linked blog article has some examples.)
>
> In the kernel linker scripts, most C identifier name sections begin with double-underscore __.
> Some are surrounded by `KEEP(...)`, some are not.
>
> * A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
> * Without `KEEP`, __start_/__stop_ references from a live input section
>    can unnecessarily retain all the associated C identifier name input
>    sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.
>
> As an example, a __start___jump_table reference from a live section
> causes all `__jump_table` input section to be retained, even if you
> change `KEEP(__jump_table)` to `(__jump_table)`.
> (If you change the symbol name from `__start_${section}` to something
> else (e.g. `__start${section}`), the rule will not apply.)

I suspect the __start_* symbols are cargo-culted by many developers
copying stuff around between kernel linker scripts, that's certainly how I
approach making changes to it normally without a deeper understanding
of how the linker actually works or what the different bits of syntax mean
there.

I see the original vmlinux.lds linker script showed up in linux-2.1.23, and
it contained

+  . = ALIGN(16);               /* Exception table */
+  __start___ex_table = .;
+  __ex_table : { *(__ex_table) }
+  __stop___ex_table = .;
+
+  __start___ksymtab = .;       /* Kernel symbol table */
+  __ksymtab : { *(__ksymtab) }
+  __stop___ksymtab = .;

originally for arch/sparc, and shortly afterwards for i386. The magic
__ex_table section was first used in linux-2.1.7 without a linker
script. It's probably a good idea to try cleaning these up by using
non-magic start/stop symbols for all sections, and relying on KEEP()
instead where needed.

> There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
> ld --gc-sections.

I see a lot of these were added by Nick Piggin (added to Cc) in this commit:

commit 266ff2a8f51f02b429a987d87634697eb0d01d6a
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Wed May 9 22:59:58 2018 +1000

    kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION

    KEEP more tables, and add the function/data section wildcard to more
    section selections.

    This is a little ad-hoc at the moment, but kernel code should be moved
    to consistently use .text..x (note: double dots) for explicit sections
    and all references to it in the linker script can be made with
    TEXT_MAIN, and similarly for other sections.

    For now, let's see if major architectures move to enabling this option
    then we can do some refactoring passes. Otherwise if it remains unused
    or superseded by LTO, this may not be required.

    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

which apparently was intentionally cautious.

Unlike what Nick expected in his submission, I now think the annotations
will be needed for LTO just like they are for --gc-sections.

      Arnd
Nicholas Piggin March 1, 2021, 1:11 a.m. UTC | #9
Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> On Fri, Feb 26, 2021 at 10:13 PM 'Fangrui Song' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
>>
>> For folks who are interested in --gc-sections on metadata sections,
>> I want to bring you awareness of the implication of __start_/__stop_ symbols and C identifier name sections.
>> You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
>> (Its linked blog article has some examples.)
>>
>> In the kernel linker scripts, most C identifier name sections begin with double-underscore __.
>> Some are surrounded by `KEEP(...)`, some are not.
>>
>> * A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
>> * Without `KEEP`, __start_/__stop_ references from a live input section
>>    can unnecessarily retain all the associated C identifier name input
>>    sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.
>>
>> As an example, a __start___jump_table reference from a live section
>> causes all `__jump_table` input section to be retained, even if you
>> change `KEEP(__jump_table)` to `(__jump_table)`.
>> (If you change the symbol name from `__start_${section}` to something
>> else (e.g. `__start${section}`), the rule will not apply.)
> 
> I suspect the __start_* symbols are cargo-culted by many developers
> copying stuff around between kernel linker scripts, that's certainly how I
> approach making changes to it normally without a deeper understanding
> of how the linker actually works or what the different bits of syntax mean
> there.
> 
> I see the original vmlinux.lds linker script showed up in linux-2.1.23, and
> it contained
> 
> +  . = ALIGN(16);               /* Exception table */
> +  __start___ex_table = .;
> +  __ex_table : { *(__ex_table) }
> +  __stop___ex_table = .;
> +
> +  __start___ksymtab = .;       /* Kernel symbol table */
> +  __ksymtab : { *(__ksymtab) }
> +  __stop___ksymtab = .;
> 
> originally for arch/sparc, and shortly afterwards for i386. The magic
> __ex_table section was first used in linux-2.1.7 without a linker
> script. It's probably a good idea to try cleaning these up by using
> non-magic start/stop symbols for all sections, and relying on KEEP()
> instead where needed.
> 
>> There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
>> ld --gc-sections.
> 
> I see a lot of these were added by Nick Piggin (added to Cc) in this commit:
> 
> commit 266ff2a8f51f02b429a987d87634697eb0d01d6a
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Wed May 9 22:59:58 2018 +1000
> 
>     kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION
> 
>     KEEP more tables, and add the function/data section wildcard to more
>     section selections.
> 
>     This is a little ad-hoc at the moment, but kernel code should be moved
>     to consistently use .text..x (note: double dots) for explicit sections
>     and all references to it in the linker script can be made with
>     TEXT_MAIN, and similarly for other sections.
> 
>     For now, let's see if major architectures move to enabling this option
>     then we can do some refactoring passes. Otherwise if it remains unused
>     or superseded by LTO, this may not be required.
> 
>     Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>     Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> 
> which apparently was intentionally cautious.
> 
> Unlike what Nick expected in his submission, I now think the annotations
> will be needed for LTO just like they are for --gc-sections.

Yeah I wasn't sure exactly what LTO looks like or how it would work.
I thought perhaps LTO might be able to find dead code with circular / 
back references, we could put references from the code back to these 
tables or something so they would be kept without KEEP. I don't know, I 
was handwaving!

I managed to get powerpc (and IIRC x86?) working with gc sections with
those KEEP annotations, but effectiveness of course is far worse than 
what Nicolas was able to achieve with all his techniques and tricks.

But yes unless there is some other mechanism to handle these tables, 
then KEEP probably has to stay. I suggest this wants a very explicit and 
systematic way to handle it (maybe with some toolchain support) rather 
than trying to just remove things case by case and see what breaks.

I don't know if Nicolas is still been working on his shrinking patches
recenty but he probably knows more than anyone about this stuff.

Thanks,
Nick
Masahiro Yamada March 10, 2021, 8:49 p.m. UTC | #10
On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> > On Fri, Feb 26, 2021 at 10:13 PM 'Fangrui Song' via Clang Built Linux
> > <clang-built-linux@googlegroups.com> wrote:
> >>
> >> For folks who are interested in --gc-sections on metadata sections,
> >> I want to bring you awareness of the implication of __start_/__stop_ symbols and C identifier name sections.
> >> You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
> >> (Its linked blog article has some examples.)
> >>
> >> In the kernel linker scripts, most C identifier name sections begin with double-underscore __.
> >> Some are surrounded by `KEEP(...)`, some are not.
> >>
> >> * A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
> >> * Without `KEEP`, __start_/__stop_ references from a live input section
> >>    can unnecessarily retain all the associated C identifier name input
> >>    sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.
> >>
> >> As an example, a __start___jump_table reference from a live section
> >> causes all `__jump_table` input section to be retained, even if you
> >> change `KEEP(__jump_table)` to `(__jump_table)`.
> >> (If you change the symbol name from `__start_${section}` to something
> >> else (e.g. `__start${section}`), the rule will not apply.)
> >
> > I suspect the __start_* symbols are cargo-culted by many developers
> > copying stuff around between kernel linker scripts, that's certainly how I
> > approach making changes to it normally without a deeper understanding
> > of how the linker actually works or what the different bits of syntax mean
> > there.
> >
> > I see the original vmlinux.lds linker script showed up in linux-2.1.23, and
> > it contained
> >
> > +  . = ALIGN(16);               /* Exception table */
> > +  __start___ex_table = .;
> > +  __ex_table : { *(__ex_table) }
> > +  __stop___ex_table = .;
> > +
> > +  __start___ksymtab = .;       /* Kernel symbol table */
> > +  __ksymtab : { *(__ksymtab) }
> > +  __stop___ksymtab = .;
> >
> > originally for arch/sparc, and shortly afterwards for i386. The magic
> > __ex_table section was first used in linux-2.1.7 without a linker
> > script. It's probably a good idea to try cleaning these up by using
> > non-magic start/stop symbols for all sections, and relying on KEEP()
> > instead where needed.
> >
> >> There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
> >> ld --gc-sections.
> >
> > I see a lot of these were added by Nick Piggin (added to Cc) in this commit:
> >
> > commit 266ff2a8f51f02b429a987d87634697eb0d01d6a
> > Author: Nicholas Piggin <npiggin@gmail.com>
> > Date:   Wed May 9 22:59:58 2018 +1000
> >
> >     kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION
> >
> >     KEEP more tables, and add the function/data section wildcard to more
> >     section selections.
> >
> >     This is a little ad-hoc at the moment, but kernel code should be moved
> >     to consistently use .text..x (note: double dots) for explicit sections
> >     and all references to it in the linker script can be made with
> >     TEXT_MAIN, and similarly for other sections.
> >
> >     For now, let's see if major architectures move to enabling this option
> >     then we can do some refactoring passes. Otherwise if it remains unused
> >     or superseded by LTO, this may not be required.
> >
> >     Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >     Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> >
> > which apparently was intentionally cautious.
> >
> > Unlike what Nick expected in his submission, I now think the annotations
> > will be needed for LTO just like they are for --gc-sections.
>
> Yeah I wasn't sure exactly what LTO looks like or how it would work.
> I thought perhaps LTO might be able to find dead code with circular /
> back references, we could put references from the code back to these
> tables or something so they would be kept without KEEP. I don't know, I
> was handwaving!
>
> I managed to get powerpc (and IIRC x86?) working with gc sections with
> those KEEP annotations, but effectiveness of course is far worse than
> what Nicolas was able to achieve with all his techniques and tricks.
>
> But yes unless there is some other mechanism to handle these tables,
> then KEEP probably has to stay. I suggest this wants a very explicit and
> systematic way to handle it (maybe with some toolchain support) rather
> than trying to just remove things case by case and see what breaks.
>
> I don't know if Nicolas is still been working on his shrinking patches
> recenty but he probably knows more than anyone about this stuff.
>
> Thanks,
> Nick
>


I tested LD_DEAD_CODE_DATA_ELIMINATION for the latest kernel.

I added an unused function, this_func_is_unused(),
then built the ppc kernel with LD_DEAD_CODE_DATA_ELIMINATION.

It remained in vmlinux.


masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
>>  kernel/cpu.c
masahiro@oscar:~/ref/linux$ export
CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
masahiro@oscar:~/ref/linux$
~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
this_func
c000000000170560 T .this_func_is_unused
c000000001d8d560 D this_func_is_unused
masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y


If I remember correctly,
LD_DEAD_CODE_DATA_ELIMINATION dropped unused functions
when I tried it last time.


I also tried arm64 with a HAVE_LD_DEAD_CODE_DATA_ELIMINATION hack.
The result was the same.



Am I missing something?
Arnd Bergmann March 10, 2021, 9:08 p.m. UTC | #11
On Wed, Mar 10, 2021 at 9:50 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
> On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:

>
> masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
> >>  kernel/cpu.c
> masahiro@oscar:~/ref/linux$ export
> CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
> masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
> masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
> masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
> masahiro@oscar:~/ref/linux$
> ~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
> this_func
> c000000000170560 T .this_func_is_unused
> c000000001d8d560 D this_func_is_unused
> masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
> CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
>
>
> If I remember correctly,
> LD_DEAD_CODE_DATA_ELIMINATION dropped unused functions
> when I tried it last time.
>
>
> I also tried arm64 with a HAVE_LD_DEAD_CODE_DATA_ELIMINATION hack.
> The result was the same.
>
>
>
> Am I missing something?

It's possible that it only works in combination with CLANG_LTO now
because something broke. I definitely saw a reduction in kernel
size when both options are enabled, but did not try a simple test
case like you did.

Maybe some other reference gets created that prevents the function
from being garbage-collected unless that other option is removed
as well?

         Arnd
Nicolas Pitre March 10, 2021, 9:19 p.m. UTC | #12
On Mon, 1 Mar 2021, Nicholas Piggin wrote:

> Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> > Unlike what Nick expected in his submission, I now think the annotations
> > will be needed for LTO just like they are for --gc-sections.
> 
> Yeah I wasn't sure exactly what LTO looks like or how it would work.
> I thought perhaps LTO might be able to find dead code with circular / 
> back references, we could put references from the code back to these 
> tables or something so they would be kept without KEEP. I don't know, I 
> was handwaving!
> 
> I managed to get powerpc (and IIRC x86?) working with gc sections with
> those KEEP annotations, but effectiveness of course is far worse than 
> what Nicolas was able to achieve with all his techniques and tricks.
> 
> But yes unless there is some other mechanism to handle these tables, 
> then KEEP probably has to stay. I suggest this wants a very explicit and 
> systematic way to handle it (maybe with some toolchain support) rather 
> than trying to just remove things case by case and see what breaks.
> 
> I don't know if Nicolas is still been working on his shrinking patches
> recenty but he probably knows more than anyone about this stuff.

Looks like not much has changed since last time I played with this stuff.

There is a way to omit the KEEP() on tables, but something must create a 
dependency from the code being pointed to by those tables to the table 
entries themselves. I did write my findings in the following article 
(just skip over the introductory blurb): 

https://lwn.net/Articles/741494/

Once that dependency is there, then the KEEP() may go and 
garbage-collecting a function will also garbage-collect the table entry 
automatically.

OTOH this trickery is not needed with LTO as garbage collection happens 
at the source code optimization level. The KEEP() may remain in that 
case as unneeded table entries will simply not be created in the first 
place.


Nicolas
Sedat Dilek March 10, 2021, 9:24 p.m. UTC | #13
On Wed, Mar 10, 2021 at 10:08 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Wed, Mar 10, 2021 at 9:50 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
> > On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
>
> >
> > masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
> > >>  kernel/cpu.c
> > masahiro@oscar:~/ref/linux$ export
> > CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
> > masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
> > masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
> > masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
> > masahiro@oscar:~/ref/linux$
> > ~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
> > this_func
> > c000000000170560 T .this_func_is_unused
> > c000000001d8d560 D this_func_is_unused
> > masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
> > CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
> >
> >
> > If I remember correctly,
> > LD_DEAD_CODE_DATA_ELIMINATION dropped unused functions
> > when I tried it last time.
> >
> >
> > I also tried arm64 with a HAVE_LD_DEAD_CODE_DATA_ELIMINATION hack.
> > The result was the same.
> >
> >
> >
> > Am I missing something?
>
> It's possible that it only works in combination with CLANG_LTO now
> because something broke. I definitely saw a reduction in kernel
> size when both options are enabled, but did not try a simple test
> case like you did.
>
> Maybe some other reference gets created that prevents the function
> from being garbage-collected unless that other option is removed
> as well?
>

The best results on size-reduction of vmlinux I got with Clang-CFI on x86-64.

Clang-LTO and Clang-CFI:
I was able to build with CONFIG_TRIM_UNUSED_KSYMS=y which needs to add
a whitelist file or add a whitelist to scripts/gen_autoksyms.sh.
And boot on bare metal.
Furthermore, I was able to compile
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y with and without
CONFIG_TRIM_UNUSED_KSYMS=y.
Every kernel I had CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y does not boot.
Yes, there is a size reduction with both enabled but not that good as
with Clang-CFI.
All testings with several iterations of LLVM/Clang v13-git.
With CONFIG_TRIM_UNUSED_KSYMS=y I see a 3x-loops of building .version
and folowing steps - got no answer if this is intended.
Means longer build-time.
I did not follow this anymore as both Kconfigs with Clang-LTO consume
more build-time and the resulting vmlinux is some MiB bigger than with
Clang-CFI.

If someone is interested in x86-64 I can provide the whitelist files
and or (alternatively) changes to scripts/gen_autoksyms.sh.
AFAICS I had open a thread for this - damn digital dementia.

- Sedat -
Rasmus Villemoes March 10, 2021, 9:45 p.m. UTC | #14
On 10/03/2021 21.49, Masahiro Yamada wrote:
> On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin <npiggin@gmail.com> wrote:

> 
> I tested LD_DEAD_CODE_DATA_ELIMINATION for the latest kernel.
> 
> I added an unused function, this_func_is_unused(),
> then built the ppc kernel with LD_DEAD_CODE_DATA_ELIMINATION.
> 
> It remained in vmlinux.
> 
> 
> masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
>>>  kernel/cpu.c
> masahiro@oscar:~/ref/linux$ export
> CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
> masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
> masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
> masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
> masahiro@oscar:~/ref/linux$
> ~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
> this_func
> c000000000170560 T .this_func_is_unused
> c000000001d8d560 D this_func_is_unused

Dunno, works just fine for my ppc32 target in v4.19 (i.e., the function
gets eliminated when enabling LD_DEAD_CODE_DATA_ELIMINATION).

But yes, I can reproduce for master ppc64 defconfig. kernel/.cpu.o.cmd
says that it wasn't even compiled with -ffunction-sections, nor does
.vmlinux.cmd mention --gc-sections.

> masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
> CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y

Ah, but scripts/config just blindly adds that config option - I don't
think ppc64 actually supports this, and
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y automagically vanishes from
.config when building.

Rasmus
Nicolas Pitre March 10, 2021, 9:47 p.m. UTC | #15
On Wed, 10 Mar 2021, Sedat Dilek wrote:

> The best results on size-reduction of vmlinux I got with Clang-CFI on x86-64.
> 
> Clang-LTO and Clang-CFI:
> I was able to build with CONFIG_TRIM_UNUSED_KSYMS=y which needs to add
> a whitelist file or add a whitelist to scripts/gen_autoksyms.sh.
> And boot on bare metal.
> Furthermore, I was able to compile
> CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y with and without
> CONFIG_TRIM_UNUSED_KSYMS=y.
> Every kernel I had CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y does not boot.
> Yes, there is a size reduction with both enabled but not that good as
> with Clang-CFI.
> All testings with several iterations of LLVM/Clang v13-git.
> With CONFIG_TRIM_UNUSED_KSYMS=y I see a 3x-loops of building .version
> and folowing steps - got no answer if this is intended.

Yes it is intended. I explained it here:

https://lkml.org/lkml/2021/3/9/1099

With CONFIG_TRIM_UNUSED_KSYMS some EXPORT_SYMBOL() are removed, which 
allows for optimizing away the corresponding code, which in turn opens 
the possibility for more EXPORT_SYMBOL() to be removed, etc. The process 
eventually converge to a stable build. Normally only 2 passes are needed 
to converge, but LTO opens the possibilities for extra passes.

> Means longer build-time.

Oh, absolutely.  LTO (at least when I played with it) is slow. Add the 
multi-pass from CONFIG_TRIM_UNUSED_KSYMS on top of that and your kernel 
build becomes agonizingly slow. This is not something you want when 
doing kernel development.

> I did not follow this anymore as both Kconfigs with Clang-LTO consume
> more build-time and the resulting vmlinux is some MiB bigger than with
> Clang-CFI.

That's rather strange. At least with gcc LTO I always obtained smaller 
kernels.


Nicolas
Sedat Dilek March 10, 2021, 9:57 p.m. UTC | #16
On Wed, Mar 10, 2021 at 10:47 PM Nicolas Pitre <nico@fluxnic.net> wrote:
...
> > With CONFIG_TRIM_UNUSED_KSYMS=y I see a 3x-loops of building .version
> > and folowing steps - got no answer if this is intended.
>
> Yes it is intended. I explained it here:
>
> https://lkml.org/lkml/2021/3/9/1099
>

Ah, cool.
Thanks for that link.

> With CONFIG_TRIM_UNUSED_KSYMS some EXPORT_SYMBOL() are removed, which
> allows for optimizing away the corresponding code, which in turn opens
> the possibility for more EXPORT_SYMBOL() to be removed, etc. The process
> eventually converge to a stable build. Normally only 2 passes are needed
> to converge, but LTO opens the possibilities for extra passes.
>
> > Means longer build-time.
>
> Oh, absolutely.  LTO (at least when I played with it) is slow. Add the
> multi-pass from CONFIG_TRIM_UNUSED_KSYMS on top of that and your kernel
> build becomes agonizingly slow. This is not something you want when
> doing kernel development.
>

Thanks for the feedback.

> > I did not follow this anymore as both Kconfigs with Clang-LTO consume
> > more build-time and the resulting vmlinux is some MiB bigger than with
> > Clang-CFI.
>
> That's rather strange. At least with gcc LTO I always obtained smaller
> kernels.
>

I cannot say much to GCC-LTO - I never used it.

If you are interested in Clang-CFI (see [1]) - which requires
Clang-LTO enabled and LLVM/Clang >= 12.
Some hours ago version 12.0.0-rc3 was released, see [2].

- Sedat -

[1] https://github.com/samitolvanen/linux/commits/clang-cfi
[2] https://github.com/ClangBuiltLinux/linux/issues/1259
Nick Desaulniers March 10, 2021, 10:02 p.m. UTC | #17
On Wed, Mar 10, 2021 at 1:08 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Wed, Mar 10, 2021 at 9:50 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
> > On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
>
> >
> > masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
> > >>  kernel/cpu.c
> > masahiro@oscar:~/ref/linux$ export
> > CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
> > masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
> > masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
> > masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
> > masahiro@oscar:~/ref/linux$
> > ~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
> > this_func
> > c000000000170560 T .this_func_is_unused
> > c000000001d8d560 D this_func_is_unused
> > masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
> > CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
> >
> >
> > If I remember correctly,
> > LD_DEAD_CODE_DATA_ELIMINATION dropped unused functions
> > when I tried it last time.
> >
> >
> > I also tried arm64 with a HAVE_LD_DEAD_CODE_DATA_ELIMINATION hack.
> > The result was the same.
> >
> >
> >
> > Am I missing something?
>
> It's possible that it only works in combination with CLANG_LTO now
> because something broke. I definitely saw a reduction in kernel
> size when both options are enabled, but did not try a simple test
> case like you did.
>
> Maybe some other reference gets created that prevents the function
> from being garbage-collected unless that other option is removed
> as well?

I wish the linker had a debug flag that could let developers discover
the decisions it made during --gc-sections as to why certain symbols
were retained/kept or not.
Nicolas Pitre March 10, 2021, 10:08 p.m. UTC | #18
On Wed, 10 Mar 2021, Nick Desaulniers wrote:

> On Wed, Mar 10, 2021 at 1:08 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Wed, Mar 10, 2021 at 9:50 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
> > > On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> >
> > >
> > > masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
> > > >>  kernel/cpu.c
> > > masahiro@oscar:~/ref/linux$ export
> > > CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
> > > masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
> > > masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
> > > masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
> > > masahiro@oscar:~/ref/linux$
> > > ~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
> > > this_func
> > > c000000000170560 T .this_func_is_unused
> > > c000000001d8d560 D this_func_is_unused
> > > masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
> > > CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
> > > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
> > >
> > >
> > > If I remember correctly,
> > > LD_DEAD_CODE_DATA_ELIMINATION dropped unused functions
> > > when I tried it last time.
> > >
> > >
> > > I also tried arm64 with a HAVE_LD_DEAD_CODE_DATA_ELIMINATION hack.
> > > The result was the same.
> > >
> > >
> > >
> > > Am I missing something?
> >
> > It's possible that it only works in combination with CLANG_LTO now
> > because something broke. I definitely saw a reduction in kernel
> > size when both options are enabled, but did not try a simple test
> > case like you did.
> >
> > Maybe some other reference gets created that prevents the function
> > from being garbage-collected unless that other option is removed
> > as well?
> 
> I wish the linker had a debug flag that could let developers discover
> the decisions it made during --gc-sections as to why certain symbols
> were retained/kept or not.

The GNU LD has --print-gc-sections to list those sections that were 
dropped. And normally you should be able to find why a section wasn't 
dropped by looking for dependencies in the linker map file.


Nicolas
Fangrui Song March 10, 2021, 10:29 p.m. UTC | #19
On 2021-03-10, Arnd Bergmann wrote:
>On Wed, Mar 10, 2021 at 9:50 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
>> On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>> > Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
>
>>
>> masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
>> >>  kernel/cpu.c
>> masahiro@oscar:~/ref/linux$ export
>> CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
>> masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
>> masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
>> masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
>> masahiro@oscar:~/ref/linux$
>> ~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
>> this_func
>> c000000000170560 T .this_func_is_unused
>> c000000001d8d560 D this_func_is_unused
>> masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
>> CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
>> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
>>
>>
>> If I remember correctly,
>> LD_DEAD_CODE_DATA_ELIMINATION dropped unused functions
>> when I tried it last time.

--gc-sections drops unused sections.
If the unused function is part of a larger section which is retained due to other symbols (-fno-function-sections),
the unused section will be retained as well.

>>
>>
>> I also tried arm64 with a HAVE_LD_DEAD_CODE_DATA_ELIMINATION hack.
>> The result was the same.
>>
>>
>>
>> Am I missing something?
>
>It's possible that it only works in combination with CLANG_LTO now
>because something broke. I definitely saw a reduction in kernel
>size when both options are enabled, but did not try a simple test
>case like you did.
>
>Maybe some other reference gets created that prevents the function
>from being garbage-collected unless that other option is removed
>as well?
>
>         Arnd

I believe with LLVM regular LTO, --gc-sections has very little benefit
on compiler generated sections. It is still useful for assembly generated sections
(but most such sections are probably needed):

* Target specific optimizations can drop references on constants (e.g. `memcpy(..., &constant, sizeof(constant));`)
* Due to phase ordering issues some definitions are not discarded by the optimizer.

For ThinLTO there are more compiler generated sections discarded by `--gc-sections`:

* ThinLTO can cause a definition to be imported to other modules. The original definition may be unneeded after imports.
* The definition may survive after intra-module optimization. After imports, a round of (inter-module) IR optimizations after `computeDeadSymbolsWithConstProp` may make the definition unneeded.
* Symbol resolution is conservative.

Regarding symbol resolution, symbol resolution happens before LTO and LTO happens before --gc-sections. The symbol resolution process may be conservative: it may communicate to LTO that some symbols are referenced by regular object files while in the GC stage the references turn out to not exist because of discarded sections with more precise GC roots.

(I've added the above points to my https://maskray.me/blog/2021-02-28-linker-garbage-collection#link-time-optimization )
Fangrui Song March 10, 2021, 10:42 p.m. UTC | #20
On 2021-03-10, Nicolas Pitre wrote:
>On Mon, 1 Mar 2021, Nicholas Piggin wrote:
>
>> Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
>> > Unlike what Nick expected in his submission, I now think the annotations
>> > will be needed for LTO just like they are for --gc-sections.
>>
>> Yeah I wasn't sure exactly what LTO looks like or how it would work.
>> I thought perhaps LTO might be able to find dead code with circular /
>> back references, we could put references from the code back to these
>> tables or something so they would be kept without KEEP. I don't know, I
>> was handwaving!
>>
>> I managed to get powerpc (and IIRC x86?) working with gc sections with
>> those KEEP annotations, but effectiveness of course is far worse than
>> what Nicolas was able to achieve with all his techniques and tricks.
>>
>> But yes unless there is some other mechanism to handle these tables,
>> then KEEP probably has to stay. I suggest this wants a very explicit and
>> systematic way to handle it (maybe with some toolchain support) rather
>> than trying to just remove things case by case and see what breaks.
>>
>> I don't know if Nicolas is still been working on his shrinking patches
>> recenty but he probably knows more than anyone about this stuff.
>
>Looks like not much has changed since last time I played with this stuff.
>
>There is a way to omit the KEEP() on tables, but something must create a
>dependency from the code being pointed to by those tables to the table
>entries themselves. I did write my findings in the following article
>(just skip over the introductory blurb):
>
>https://lwn.net/Articles/741494/

Hey, this article taught me R_*_NONE which motivated me to add various R_*_NONE
support to LLVM 9!

In the weekend I noticed that with binutils>=2.26, one can use
.reloc ., BFD_RELOC_NONE, target
(https://sourceware.org/bugzilla/show_bug.cgi?id=27530 ).
I implemented it for many targets in LLVM, but that will require 13.0.0.

>Once that dependency is there, then the KEEP() may go and
>garbage-collecting a function will also garbage-collect the table entry
>automatically.
>
>OTOH this trickery is not needed with LTO as garbage collection happens
>at the source code optimization level. The KEEP() may remain in that
>case as unneeded table entries will simply not be created in the first
>place.

For Thin LTO, --gc-sections is still very useful.
I have more notes in https://maskray.me/blog/2021-02-28-linker-garbage-collection#link-time-optimization .
Catalin Marinas March 17, 2021, 2:37 p.m. UTC | #21
On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index bad2b9eaab22..926cdb597a45 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -217,7 +217,7 @@ SECTIONS
>  		INIT_CALLS
>  		CON_INITCALL
>  		INIT_RAM_FS
> -		*(.init.altinstructions .init.bss .init.bss.*)	/* from the EFI stub */
> +		*(.init.altinstructions .init.data.* .init.bss .init.bss.*)	/* from the EFI stub */

INIT_DATA already covers .init.data and .init.data.*, so I don't think
we need this change.

Also, mainline doesn't have .init.bss.*, do you know where this came
from? I can't find it in -next either.
Catalin Marinas March 17, 2021, 4:18 p.m. UTC | #22
On Wed, Mar 17, 2021 at 02:37:57PM +0000, Catalin Marinas wrote:
> On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > index bad2b9eaab22..926cdb597a45 100644
> > --- a/arch/arm64/kernel/vmlinux.lds.S
> > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > @@ -217,7 +217,7 @@ SECTIONS
> >  		INIT_CALLS
> >  		CON_INITCALL
> >  		INIT_RAM_FS
> > -		*(.init.altinstructions .init.bss .init.bss.*)	/* from the EFI stub */
> > +		*(.init.altinstructions .init.data.* .init.bss .init.bss.*)	/* from the EFI stub */
> 
> INIT_DATA already covers .init.data and .init.data.*, so I don't think
> we need this change.

Ah, INIT_DATA only covers init.data.* (so no dot in front). The above
is needed for the EFI stub.

However, I gave this a quick try and under Qemu with -cpu max and -smp 2
(or more) it fails as below. I haven't debugged but the lr points to
just after the switch_to() call. Maybe some section got discarded and we
patched in the wrong instructions. It is fine with -cpu host or -smp 1.

-------------------8<------------------------
smp: Bringing up secondary CPUs ...
Detected PIPT I-cache on CPU1
CPU1: Booted secondary processor 0x0000000001 [0x000f0510]
Unable to handle kernel paging request at virtual address eb91d81ad2971160
Mem abort info:
  ESR = 0x86000004
  EC = 0x21: IABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
[eb91d81ad2971160] address between user and kernel address ranges
Internal error: Oops: 86000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 16 Comm: migration/1 Not tainted 5.12.0-rc3-00002-g128e977c1322 #1
Stopper: 0x0 <- 0x0
pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
pc : 0xeb91d81ad2971160
lr : __schedule+0x230/0x6b8
sp : ffff80001009bd60
x29: ffff80001009bd60 x28: 0000000000000000 
x27: ffff0000000a6760 x26: ffff0000000b7540 
x25: 0080000000000000 x24: ffffd81ad3969000 
x23: ffff0000000a6200 x22: 6ee0d81ad2971658 
x21: ffff0000000a6200 x20: ffff000000080000 
x19: ffff00007fbc6bc0 x18: 0000000000000030 
x17: 0000000000000000 x16: 0000000000000000 
x15: 00008952b30a9a9e x14: 0000000000000366 
x13: 0000000000000192 x12: 0000000000000000 
x11: 0000000000000003 x10: 00000000000009b0 
x9 : ffff80001009bd30 x8 : ffff0000000a6c10 
x7 : ffff00007fbc6cc0 x6 : 00000000fffedb30 
x5 : 00000000ffffffff x4 : 0000000000000000 
x3 : 0000000000000008 x2 : 0000000000000000 
x1 : ffff0000000a6200 x0 : ffff0000000a3800 
Call trace:
 0xeb91d81ad2971160
 schedule+0x70/0x108
 schedule_preempt_disabled+0x24/0x40
 __kthread_parkme+0x68/0xd0
 kthread+0x138/0x170
 ret_from_fork+0x10/0x30
Code: bad PC value
---[ end trace af3481062ecef3e7 ]---
Arnd Bergmann March 18, 2021, 8:41 a.m. UTC | #23
On Wed, Mar 17, 2021 at 5:18 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Wed, Mar 17, 2021 at 02:37:57PM +0000, Catalin Marinas wrote:
> > On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > > index bad2b9eaab22..926cdb597a45 100644
> > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > > @@ -217,7 +217,7 @@ SECTIONS
> > >             INIT_CALLS
> > >             CON_INITCALL
> > >             INIT_RAM_FS
> > > -           *(.init.altinstructions .init.bss .init.bss.*)  /* from the EFI stub */
> > > +           *(.init.altinstructions .init.data.* .init.bss .init.bss.*)     /* from the EFI stub */
> >
> > INIT_DATA already covers .init.data and .init.data.*, so I don't think
> > we need this change.
>
> Ah, INIT_DATA only covers init.data.* (so no dot in front). The above
> is needed for the EFI stub.

I wonder if that is just a typo in INIT_DATA. Nico introduced it as part of
266ff2a8f51f ("kbuild: Fix asm-generic/vmlinux.lds.h for
LD_DEAD_CODE_DATA_ELIMINATION"), so perhaps that should have
been .init.data.* instead.

> However, I gave this a quick try and under Qemu with -cpu max and -smp 2
> (or more) it fails as below. I haven't debugged but the lr points to
> just after the switch_to() call. Maybe some section got discarded and we
> patched in the wrong instructions. It is fine with -cpu host or -smp 1.

Ah, interesting.

> -------------------8<------------------------
> smp: Bringing up secondary CPUs ...
> Detected PIPT I-cache on CPU1
> CPU1: Booted secondary processor 0x0000000001 [0x000f0510]
> Unable to handle kernel paging request at virtual address eb91d81ad2971160
> Mem abort info:
>   ESR = 0x86000004
>   EC = 0x21: IABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> [eb91d81ad2971160] address between user and kernel address ranges
> Internal error: Oops: 86000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 1 PID: 16 Comm: migration/1 Not tainted 5.12.0-rc3-00002-g128e977c1322 #1
> Stopper: 0x0 <- 0x0
> pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> pc : 0xeb91d81ad2971160
> lr : __schedule+0x230/0x6b8
> sp : ffff80001009bd60
> x29: ffff80001009bd60 x28: 0000000000000000
> x27: ffff0000000a6760 x26: ffff0000000b7540
> x25: 0080000000000000 x24: ffffd81ad3969000
> x23: ffff0000000a6200 x22: 6ee0d81ad2971658
> x21: ffff0000000a6200 x20: ffff000000080000
> x19: ffff00007fbc6bc0 x18: 0000000000000030
> x17: 0000000000000000 x16: 0000000000000000
> x15: 00008952b30a9a9e x14: 0000000000000366
> x13: 0000000000000192 x12: 0000000000000000
> x11: 0000000000000003 x10: 00000000000009b0
> x9 : ffff80001009bd30 x8 : ffff0000000a6c10
> x7 : ffff00007fbc6cc0 x6 : 00000000fffedb30
> x5 : 00000000ffffffff x4 : 0000000000000000
> x3 : 0000000000000008 x2 : 0000000000000000
> x1 : ffff0000000a6200 x0 : ffff0000000a3800
> Call trace:
>  0xeb91d81ad2971160
>  schedule+0x70/0x108
>  schedule_preempt_disabled+0x24/0x40
>  __kthread_parkme+0x68/0xd0
>  kthread+0x138/0x170
>  ret_from_fork+0x10/0x30
> Code: bad PC value
> ---[ end trace af3481062ecef3e7 ]---

This looks like it has just returned from __schedule() to schedule()
and is trying to return from that as well, through code like this:

.L562:
// /git/arm-soc/kernel/sched/core.c:5159: }
        ldp     x19, x20, [sp, 16]      //,,
        ldp     x29, x30, [sp], 32      //,,,
        hint    29 // autiasp
        ret

It looks like pointer authentication gone wrong, which ended up
with dereferencing the broken pointer in x22, and it explains why
it only happens with -cpu max. Presumably this also only happens
on secondary CPUs, so maybe the bit that initializes PAC on
secondary CPUs got discarded?

        Arnd
Catalin Marinas March 19, 2021, 12:25 p.m. UTC | #24
On Thu, Mar 18, 2021 at 09:41:54AM +0100, Arnd Bergmann wrote:
> On Wed, Mar 17, 2021 at 5:18 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> >
> > On Wed, Mar 17, 2021 at 02:37:57PM +0000, Catalin Marinas wrote:
> > > On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> > > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > > > index bad2b9eaab22..926cdb597a45 100644
> > > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > > > @@ -217,7 +217,7 @@ SECTIONS
> > > >             INIT_CALLS
> > > >             CON_INITCALL
> > > >             INIT_RAM_FS
> > > > -           *(.init.altinstructions .init.bss .init.bss.*)  /* from the EFI stub */
> > > > +           *(.init.altinstructions .init.data.* .init.bss .init.bss.*)     /* from the EFI stub */
> > >
> > > INIT_DATA already covers .init.data and .init.data.*, so I don't think
> > > we need this change.
> >
> > Ah, INIT_DATA only covers init.data.* (so no dot in front). The above
> > is needed for the EFI stub.
> 
> I wonder if that is just a typo in INIT_DATA. Nico introduced it as part of
> 266ff2a8f51f ("kbuild: Fix asm-generic/vmlinux.lds.h for
> LD_DEAD_CODE_DATA_ELIMINATION"), so perhaps that should have
> been .init.data.* instead.

I think it was the other Nicholas ;) (with an 'h'). The vmlinux.lds.h
change indeed looks like a typo (it's been around since 4.18).

> > However, I gave this a quick try and under Qemu with -cpu max and -smp 2
> > (or more) it fails as below. I haven't debugged but the lr points to
> > just after the switch_to() call. Maybe some section got discarded and we
> > patched in the wrong instructions. It is fine with -cpu host or -smp 1.
> 
> Ah, interesting.
> 
> > -------------------8<------------------------
> > smp: Bringing up secondary CPUs ...
> > Detected PIPT I-cache on CPU1
> > CPU1: Booted secondary processor 0x0000000001 [0x000f0510]
> > Unable to handle kernel paging request at virtual address eb91d81ad2971160
> > Mem abort info:
> >   ESR = 0x86000004
> >   EC = 0x21: IABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> > [eb91d81ad2971160] address between user and kernel address ranges
> > Internal error: Oops: 86000004 [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 1 PID: 16 Comm: migration/1 Not tainted 5.12.0-rc3-00002-g128e977c1322 #1
> > Stopper: 0x0 <- 0x0
> > pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> > pc : 0xeb91d81ad2971160
> > lr : __schedule+0x230/0x6b8
> > sp : ffff80001009bd60
> > x29: ffff80001009bd60 x28: 0000000000000000
> > x27: ffff0000000a6760 x26: ffff0000000b7540
> > x25: 0080000000000000 x24: ffffd81ad3969000
> > x23: ffff0000000a6200 x22: 6ee0d81ad2971658
> > x21: ffff0000000a6200 x20: ffff000000080000
> > x19: ffff00007fbc6bc0 x18: 0000000000000030
> > x17: 0000000000000000 x16: 0000000000000000
> > x15: 00008952b30a9a9e x14: 0000000000000366
> > x13: 0000000000000192 x12: 0000000000000000
> > x11: 0000000000000003 x10: 00000000000009b0
> > x9 : ffff80001009bd30 x8 : ffff0000000a6c10
> > x7 : ffff00007fbc6cc0 x6 : 00000000fffedb30
> > x5 : 00000000ffffffff x4 : 0000000000000000
> > x3 : 0000000000000008 x2 : 0000000000000000
> > x1 : ffff0000000a6200 x0 : ffff0000000a3800
> > Call trace:
> >  0xeb91d81ad2971160
> >  schedule+0x70/0x108
> >  schedule_preempt_disabled+0x24/0x40
> >  __kthread_parkme+0x68/0xd0
> >  kthread+0x138/0x170
> >  ret_from_fork+0x10/0x30
> > Code: bad PC value
> > ---[ end trace af3481062ecef3e7 ]---
> 
> This looks like it has just returned from __schedule() to schedule()
> and is trying to return from that as well, through code like this:
> 
> .L562:
> // /git/arm-soc/kernel/sched/core.c:5159: }
>         ldp     x19, x20, [sp, 16]      //,,
>         ldp     x29, x30, [sp], 32      //,,,
>         hint    29 // autiasp
>         ret
> 
> It looks like pointer authentication gone wrong, which ended up
> with dereferencing the broken pointer in x22, and it explains why
> it only happens with -cpu max. Presumably this also only happens
> on secondary CPUs, so maybe the bit that initializes PAC on
> secondary CPUs got discarded?

I seems that the whole alternative instructions section is gone, so any
run-time code patching that the kernel does won't work. The kernel boots
with the diff below but I'm not convinced we don't miss anything else.
In some cases you get a linker warning about gc sections but not in this
case. Maybe we need some more asserts to ensure that certain sections
are not empty.

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 11909782ee3e..036cc59033d3 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -203,7 +203,7 @@ SECTIONS
 	. = ALIGN(4);
 	.altinstructions : {
 		__alt_instructions = .;
-		*(.altinstructions)
+		KEEP(*(.altinstructions))
 		__alt_instructions_end = .;
 	}

Do we need a KEEP(.init.altinstructions) as well? 

BTW, the build fails with CONFIG_FUNCTION_TRACER enabled:

aarch64-linux-gnu-ld: init/main.o(__patchable_function_entries): error: need linked-to section for --gc-sections
Arnd Bergmann March 19, 2021, 2:01 p.m. UTC | #25
On Fri, Mar 19, 2021 at 1:25 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Mar 18, 2021 at 09:41:54AM +0100, Arnd Bergmann wrote:
> > On Wed, Mar 17, 2021 at 5:18 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Wed, Mar 17, 2021 at 02:37:57PM +0000, Catalin Marinas wrote:
> > > > On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> > > > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > > > > index bad2b9eaab22..926cdb597a45 100644
> > > > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > > > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > > > > @@ -217,7 +217,7 @@ SECTIONS
> > > > >             INIT_CALLS
> > > > >             CON_INITCALL
> > > > >             INIT_RAM_FS
> > > > > -           *(.init.altinstructions .init.bss .init.bss.*)  /* from the EFI stub */
> > > > > +           *(.init.altinstructions .init.data.* .init.bss .init.bss.*)     /* from the EFI stub */
> > > >
> > > > INIT_DATA already covers .init.data and .init.data.*, so I don't think
> > > > we need this change.
> > >
> > > Ah, INIT_DATA only covers init.data.* (so no dot in front). The above
> > > is needed for the EFI stub.
> >
> > I wonder if that is just a typo in INIT_DATA. Nico introduced it as part of
> > 266ff2a8f51f ("kbuild: Fix asm-generic/vmlinux.lds.h for
> > LD_DEAD_CODE_DATA_ELIMINATION"), so perhaps that should have
> > been .init.data.* instead.
>
> I think it was the other Nicholas ;) (with an 'h'). The vmlinux.lds.h
> change indeed looks like a typo (it's been around since 4.18).

Right, my mistake.

> > It looks like pointer authentication gone wrong, which ended up
> > with dereferencing the broken pointer in x22, and it explains why
> > it only happens with -cpu max. Presumably this also only happens
> > on secondary CPUs, so maybe the bit that initializes PAC on
> > secondary CPUs got discarded?
>
> I seems that the whole alternative instructions section is gone, so any
> run-time code patching that the kernel does won't work. The kernel boots
> with the diff below but I'm not convinced we don't miss anything else.
> In some cases you get a linker warning about gc sections but not in this
> case. Maybe we need some more asserts to ensure that certain sections
> are not empty.
>
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 11909782ee3e..036cc59033d3 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -203,7 +203,7 @@ SECTIONS
>         . = ALIGN(4);
>         .altinstructions : {
>                 __alt_instructions = .;
> -               *(.altinstructions)
> +               KEEP(*(.altinstructions))
>                 __alt_instructions_end = .;
>         }
>
> Do we need a KEEP(.init.altinstructions) as well?

I would guess so. Whatever causes the .altinstructions to get dropped
presumably also leads to the same happening to .init.altinstructions.

Ideally each use of altinstructions would cause a reference to a
particular symbol so that one gets kept, while any .altinstructions
for unused functions get discarded.

        Arnd
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b94a678afce4..75e13cc52928 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2,6 +2,7 @@ 
 config ARM64
 	def_bool y
 	select ACPI_CCA_REQUIRED if ACPI
+	select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
 	select ACPI_GENERIC_GSI if ACPI
 	select ACPI_GTDT if ACPI
 	select ACPI_IORT if ACPI
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index bad2b9eaab22..926cdb597a45 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -217,7 +217,7 @@  SECTIONS
 		INIT_CALLS
 		CON_INITCALL
 		INIT_RAM_FS
-		*(.init.altinstructions .init.bss .init.bss.*)	/* from the EFI stub */
+		*(.init.altinstructions .init.data.* .init.bss .init.bss.*)	/* from the EFI stub */
 	}
 	.exit.data : {
 		EXIT_DATA