kbuild: Allow Clang global merging if !MODULES
diff mbox series

Message ID 20200702233929.181409-1-danny@kdrag0n.dev
State New
Headers show
Series
  • kbuild: Allow Clang global merging if !MODULES
Related show

Commit Message

Danny Lin July 2, 2020, 11:39 p.m. UTC
The old reasoning for disabling Clang's global merging optimization is
that it breaks modpost by coalescing many symbols into _MergedGlobals.
However, modpost is only used in builds with dynamic modules;
vmlinux.symvers is still created during standalone builds, but it's
effectively just an empty dummy file.

Enabling the optimization whenever possible allows us to reap the
benefits of reduced register pressure when many global variables are
used in the same function.

An x86 defconfig kernel built with this optimization boots fine in qemu,
and a downstream 4.14 kernel has been used on arm64 for nearly a year
without any issues caused by this optimization.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
---
 Makefile | 3 +++
 1 file changed, 3 insertions(+)

Comments

Nathan Chancellor July 5, 2020, 2:43 a.m. UTC | #1
Hi Danny,

On Thu, Jul 02, 2020 at 04:39:29PM -0700, Danny Lin wrote:
> The old reasoning for disabling Clang's global merging optimization is
> that it breaks modpost by coalescing many symbols into _MergedGlobals.
> However, modpost is only used in builds with dynamic modules;
> vmlinux.symvers is still created during standalone builds, but it's
> effectively just an empty dummy file.
> 
> Enabling the optimization whenever possible allows us to reap the
> benefits of reduced register pressure when many global variables are
> used in the same function.

Have you run into any place within the kernel that this is the case or
this more of a "could help if this ever happens" type of deal?

> An x86 defconfig kernel built with this optimization boots fine in qemu,
> and a downstream 4.14 kernel has been used on arm64 for nearly a year
> without any issues caused by this optimization.

If I am reading LLVM's source correctly, this option only seems relevant
for ARM and AArch64?

$ rg --no-heading createGlobalMergePass
llvm/lib/CodeGen/GlobalMerge.cpp:679:Pass *llvm::createGlobalMergePass(const TargetMachine *TM, unsigned Offset,
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp:524:    addPass(createGlobalMergePass(TM, 4095, OnlyOptimizeForSize,
llvm/lib/Target/ARM/ARMTargetMachine.cpp:456:    addPass(createGlobalMergePass(TM, 127, OnlyOptimizeForSize,
llvm/include/llvm/CodeGen/Passes.h:419:  Pass *createGlobalMergePass(const TargetMachine *TM, unsigned MaximalOffset,

Otherwise, I think this is probably okay. According to [1], when the
optimization level is less than -O3, we get a less aggressive version of
this optimization level, which is good for code size:

https://github.com/llvm/llvm-project/commit/8207641251706ea808df6d2a1ea8f87b8ee04c6d

However, we do potentially get merging of extern globals if we do not
specify -mglobal-merge (if I am reading the source correctly), which
this commit claims might hurt performance? Not sure if there is any way
to test or verify that?

https://github.com/llvm/llvm-project/commit/de73404b8c4332190750537eb93ce0d5b6451300

> Signed-off-by: Danny Lin <danny@kdrag0n.dev>
> ---
>  Makefile | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index a60c98519c37..f04c3639cf61 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -772,10 +772,13 @@ ifdef CONFIG_CC_IS_CLANG
>  KBUILD_CPPFLAGS += -Qunused-arguments
>  KBUILD_CFLAGS += -Wno-format-invalid-specifier
>  KBUILD_CFLAGS += -Wno-gnu
> +
> +ifdef CONFIG_MODULES
>  # CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the
>  # source of a reference will be _MergedGlobals and not on of the whitelisted names.
>  # See modpost pattern 2
>  KBUILD_CFLAGS += -mno-global-merge
> +endif
>  else
>  
>  # These warnings generated too much noise in a regular build.
> -- 
> 2.27.0
> 

Cheers,
Nathan
Masahiro Yamada Aug. 6, 2020, 9:59 p.m. UTC | #2
On Sun, Jul 5, 2020 at 11:43 AM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> Hi Danny,
>
> On Thu, Jul 02, 2020 at 04:39:29PM -0700, Danny Lin wrote:
> > The old reasoning for disabling Clang's global merging optimization is
> > that it breaks modpost by coalescing many symbols into _MergedGlobals.
> > However, modpost is only used in builds with dynamic modules;
> > vmlinux.symvers is still created during standalone builds, but it's
> > effectively just an empty dummy file.

I am not convinced with this statement.

modpost is used even if CONFIG_MODULES=n

You can see the following log for allnoconfig.

  MODPOST vmlinux.symvers


Yes, vmlinux.symver is empty.
This is because EXPORT_SYMBOL is no-op when CONFIG_MODULES=n.



> >
> > Enabling the optimization whenever possible allows us to reap the
> > benefits of reduced register pressure when many global variables are
> > used in the same function.
>
> Have you run into any place within the kernel that this is the case or
> this more of a "could help if this ever happens" type of deal?
>
> > An x86 defconfig kernel built with this optimization boots fine in qemu,
> > and a downstream 4.14 kernel has been used on arm64 for nearly a year
> > without any issues caused by this optimization.
>
> If I am reading LLVM's source correctly, this option only seems relevant
> for ARM and AArch64?
>
> $ rg --no-heading createGlobalMergePass
> llvm/lib/CodeGen/GlobalMerge.cpp:679:Pass *llvm::createGlobalMergePass(const TargetMachine *TM, unsigned Offset,
> llvm/lib/Target/AArch64/AArch64TargetMachine.cpp:524:    addPass(createGlobalMergePass(TM, 4095, OnlyOptimizeForSize,
> llvm/lib/Target/ARM/ARMTargetMachine.cpp:456:    addPass(createGlobalMergePass(TM, 127, OnlyOptimizeForSize,
> llvm/include/llvm/CodeGen/Passes.h:419:  Pass *createGlobalMergePass(const TargetMachine *TM, unsigned MaximalOffset,
>
> Otherwise, I think this is probably okay. According to [1], when the
> optimization level is less than -O3, we get a less aggressive version of
> this optimization level, which is good for code size:
>
> https://github.com/llvm/llvm-project/commit/8207641251706ea808df6d2a1ea8f87b8ee04c6d


I do not understand how -mglobal-merge works.
Do you have test code to generate _MergedGlobals ?


Thanks.





> However, we do potentially get merging of extern globals if we do not
> specify -mglobal-merge (if I am reading the source correctly), which
> this commit claims might hurt performance? Not sure if there is any way
> to test or verify that?
>
> https://github.com/llvm/llvm-project/commit/de73404b8c4332190750537eb93ce0d5b6451300
>
> > Signed-off-by: Danny Lin <danny@kdrag0n.dev>
> > ---
> >  Makefile | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/Makefile b/Makefile
> > index a60c98519c37..f04c3639cf61 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -772,10 +772,13 @@ ifdef CONFIG_CC_IS_CLANG
> >  KBUILD_CPPFLAGS += -Qunused-arguments
> >  KBUILD_CFLAGS += -Wno-format-invalid-specifier
> >  KBUILD_CFLAGS += -Wno-gnu
> > +
> > +ifdef CONFIG_MODULES
> >  # CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the
> >  # source of a reference will be _MergedGlobals and not on of the whitelisted names.
> >  # See modpost pattern 2
> >  KBUILD_CFLAGS += -mno-global-merge
> > +endif
> >  else
> >
> >  # These warnings generated too much noise in a regular build.
> > --
> > 2.27.0
> >
>
> Cheers,
> Nathan



--
Best Regards
Masahiro Yamada

Patch
diff mbox series

diff --git a/Makefile b/Makefile
index a60c98519c37..f04c3639cf61 100644
--- a/Makefile
+++ b/Makefile
@@ -772,10 +772,13 @@  ifdef CONFIG_CC_IS_CLANG
 KBUILD_CPPFLAGS += -Qunused-arguments
 KBUILD_CFLAGS += -Wno-format-invalid-specifier
 KBUILD_CFLAGS += -Wno-gnu
+
+ifdef CONFIG_MODULES
 # CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the
 # source of a reference will be _MergedGlobals and not on of the whitelisted names.
 # See modpost pattern 2
 KBUILD_CFLAGS += -mno-global-merge
+endif
 else
 
 # These warnings generated too much noise in a regular build.