mbox series

[v7,00/17] Add support for Clang LTO

Message ID 20201118220731.925424-1-samitolvanen@google.com (mailing list archive)
Headers show
Series Add support for Clang LTO | expand

Message

Sami Tolvanen Nov. 18, 2020, 10:07 p.m. UTC
This patch series adds support for building the kernel with Clang's
Link Time Optimization (LTO). In addition to performance, the primary
motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
be used in the kernel. Google has shipped millions of Pixel devices
running three major kernel versions with LTO+CFI since 2018.

Most of the patches are build system changes for handling LLVM bitcode,
which Clang produces with LTO instead of ELF object files, postponing
ELF processing until a later stage, and ensuring initcall ordering.

Note that v7 brings back arm64 support as Will has now staged the
prerequisite memory ordering patches [1], and drops x86_64 while we work
on fixing the remaining objtool warnings [2].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
[2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/

You can also pull this series from

  https://github.com/samitolvanen/linux.git lto-v7

---
Changes in v7:

  - Rebased to master again.

  - Added back arm64 patches as the prerequisites are now staged,
    and dropped x86_64 support until the remaining objtool issues
    are resolved.

  - Dropped ifdefs from module.lds.S.

Changes in v6:

  - Added the missing --mcount flag to patch 5.

  - Dropped the arm64 patches from this series and will repost them
    later.

Changes in v5:

  - Rebased on top of tip/master.

  - Changed the command line for objtool to use --vmlinux --duplicate
    to disable warnings about retpoline thunks and to fix .orc_unwind
    generation for vmlinux.o.

  - Added --noinstr flag to objtool, so we can use --vmlinux without
    also enabling noinstr validation.

  - Disabled objtool's unreachable instruction warnings with LTO to
    disable false positives for the int3 padding in vmlinux.o.

  - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps
    in x86 assembly code to fix objtool warnings with retpoline.

  - Fixed modpost warnings about missing version information with
    CONFIG_MODVERSIONS.

  - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks
    to Sedat for pointing this out.

  - Updated the help text for ThinLTO to better explain the trade-offs.

  - Updated commit messages with better explanations.

Changes in v4:

  - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool.

  - Moved ftrace configs related to generating __mcount_loc to Kconfig,
    so they are available also in Makefile.modfinal.

  - Dropped two prerequisite patches that were merged to Linus' tree.

Changes in v3:

  - Added a separate patch to remove the unused DISABLE_LTO treewide,
    as filtering out CC_FLAGS_LTO instead is preferred.

  - Updated the Kconfig help to explain why LTO is behind a choice
    and disabled by default.

  - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now
    appended directly to CC_FLAGS_LTO.

  - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier.

  - Fixed ThinLTO cache handling for external module builds.

  - Rebased on top of Masahiro's patch for preprocessing modules.lds,
    and moved the contents of module-lto.lds to modules.lds.S.

  - Moved objtool_args to Makefile.lib to avoid duplication of the
    command line parameters in Makefile.modfinal.

  - Clarified in the commit message for the initcall ordering patch
    that the initcall order remains the same as without LTO.

  - Changed link-vmlinux.sh to use jobserver-exec to control the
    number of jobs started by generate_initcall_ordering.pl.

  - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's
    no longer needed with ToT kernel.

  - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug
    with stack protector attributes.

Changes in v2:

  - Fixed -Wmissing-prototypes warnings with W=1.

  - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache
    scrubbing to make distclean.

  - Added a comment about Clang >=11 being required.

  - Added a patch to disable LTO for the arm64 KVM nVHE code.

  - Disabled objtool's noinstr validation with LTO unless enabled.

  - Included Peter's proposed objtool mcount patch in the series
    and replaced recordmcount with the objtool pass to avoid
    whitelisting relocations that are not calls.

  - Updated several commit messages with better explanations.


Sami Tolvanen (17):
  tracing: move function tracer options to Kconfig
  kbuild: add support for Clang LTO
  kbuild: lto: fix module versioning
  kbuild: lto: limit inlining
  kbuild: lto: merge module sections
  kbuild: lto: remove duplicate dependencies from .mod files
  init: lto: ensure initcall ordering
  init: lto: fix PREL32 relocations
  PCI: Fix PREL32 relocations for LTO
  modpost: lto: strip .lto from module names
  scripts/mod: disable LTO for empty.c
  efi/libstub: disable LTO
  drivers/misc/lkdtm: disable LTO for rodata.o
  arm64: vdso: disable LTO
  KVM: arm64: disable LTO for the nVHE directory
  arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
  arm64: allow LTO_CLANG and THINLTO to be selected

 .gitignore                            |   1 +
 Makefile                              |  45 +++--
 arch/Kconfig                          |  74 +++++++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/kernel/vdso/Makefile       |   3 +-
 arch/arm64/kvm/hyp/nvhe/Makefile      |   4 +-
 drivers/firmware/efi/libstub/Makefile |   2 +
 drivers/misc/lkdtm/Makefile           |   1 +
 include/asm-generic/vmlinux.lds.h     |  11 +-
 include/linux/init.h                  |  79 +++++++-
 include/linux/pci.h                   |  19 +-
 kernel/trace/Kconfig                  |  16 ++
 scripts/Makefile.build                |  50 ++++-
 scripts/Makefile.lib                  |   6 +-
 scripts/Makefile.modfinal             |   9 +-
 scripts/Makefile.modpost              |  25 ++-
 scripts/generate_initcall_order.pl    | 270 ++++++++++++++++++++++++++
 scripts/link-vmlinux.sh               |  70 ++++++-
 scripts/mod/Makefile                  |   1 +
 scripts/mod/modpost.c                 |  16 +-
 scripts/mod/modpost.h                 |   9 +
 scripts/mod/sumversion.c              |   6 +-
 scripts/module.lds.S                  |  24 +++
 23 files changed, 677 insertions(+), 68 deletions(-)
 create mode 100755 scripts/generate_initcall_order.pl


base-commit: 0fa8ee0d9ab95c9350b8b84574824d9a384a9f7d

Comments

Nick Desaulniers Nov. 18, 2020, 11:42 p.m. UTC | #1
On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> be used in the kernel. Google has shipped millions of Pixel devices
> running three major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM bitcode,
> which Clang produces with LTO instead of ELF object files, postponing
> ELF processing until a later stage, and ensuring initcall ordering.
>
> Note that v7 brings back arm64 support as Will has now staged the
> prerequisite memory ordering patches [1], and drops x86_64 while we work
> on fixing the remaining objtool warnings [2].
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> [2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/
>
> You can also pull this series from
>
>   https://github.com/samitolvanen/linux.git lto-v7

Thanks for continuing to drive this series Sami.  For the series,

Tested-by: Nick Desaulniers <ndesaulniers@google.com>

I did virtualized boot tests with the series applied to aarch64
defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
with CONFIG_THINLTO.  If you make changes to the series in follow ups,
please drop my tested by tag from the modified patches and I'll help
re-test.  Some minor feedback on the Kconfig change, but I'll post it
off of that patch.

>
> ---
> Changes in v7:
>
>   - Rebased to master again.
>
>   - Added back arm64 patches as the prerequisites are now staged,
>     and dropped x86_64 support until the remaining objtool issues
>     are resolved.
>
>   - Dropped ifdefs from module.lds.S.
>
> Changes in v6:
>
>   - Added the missing --mcount flag to patch 5.
>
>   - Dropped the arm64 patches from this series and will repost them
>     later.
>
> Changes in v5:
>
>   - Rebased on top of tip/master.
>
>   - Changed the command line for objtool to use --vmlinux --duplicate
>     to disable warnings about retpoline thunks and to fix .orc_unwind
>     generation for vmlinux.o.
>
>   - Added --noinstr flag to objtool, so we can use --vmlinux without
>     also enabling noinstr validation.
>
>   - Disabled objtool's unreachable instruction warnings with LTO to
>     disable false positives for the int3 padding in vmlinux.o.
>
>   - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps
>     in x86 assembly code to fix objtool warnings with retpoline.
>
>   - Fixed modpost warnings about missing version information with
>     CONFIG_MODVERSIONS.
>
>   - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks
>     to Sedat for pointing this out.
>
>   - Updated the help text for ThinLTO to better explain the trade-offs.
>
>   - Updated commit messages with better explanations.
>
> Changes in v4:
>
>   - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool.
>
>   - Moved ftrace configs related to generating __mcount_loc to Kconfig,
>     so they are available also in Makefile.modfinal.
>
>   - Dropped two prerequisite patches that were merged to Linus' tree.
>
> Changes in v3:
>
>   - Added a separate patch to remove the unused DISABLE_LTO treewide,
>     as filtering out CC_FLAGS_LTO instead is preferred.
>
>   - Updated the Kconfig help to explain why LTO is behind a choice
>     and disabled by default.
>
>   - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now
>     appended directly to CC_FLAGS_LTO.
>
>   - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier.
>
>   - Fixed ThinLTO cache handling for external module builds.
>
>   - Rebased on top of Masahiro's patch for preprocessing modules.lds,
>     and moved the contents of module-lto.lds to modules.lds.S.
>
>   - Moved objtool_args to Makefile.lib to avoid duplication of the
>     command line parameters in Makefile.modfinal.
>
>   - Clarified in the commit message for the initcall ordering patch
>     that the initcall order remains the same as without LTO.
>
>   - Changed link-vmlinux.sh to use jobserver-exec to control the
>     number of jobs started by generate_initcall_ordering.pl.
>
>   - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's
>     no longer needed with ToT kernel.
>
>   - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug
>     with stack protector attributes.
>
> Changes in v2:
>
>   - Fixed -Wmissing-prototypes warnings with W=1.
>
>   - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache
>     scrubbing to make distclean.
>
>   - Added a comment about Clang >=11 being required.
>
>   - Added a patch to disable LTO for the arm64 KVM nVHE code.
>
>   - Disabled objtool's noinstr validation with LTO unless enabled.
>
>   - Included Peter's proposed objtool mcount patch in the series
>     and replaced recordmcount with the objtool pass to avoid
>     whitelisting relocations that are not calls.
>
>   - Updated several commit messages with better explanations.
>
>
> Sami Tolvanen (17):
>   tracing: move function tracer options to Kconfig
>   kbuild: add support for Clang LTO
>   kbuild: lto: fix module versioning
>   kbuild: lto: limit inlining
>   kbuild: lto: merge module sections
>   kbuild: lto: remove duplicate dependencies from .mod files
>   init: lto: ensure initcall ordering
>   init: lto: fix PREL32 relocations
>   PCI: Fix PREL32 relocations for LTO
>   modpost: lto: strip .lto from module names
>   scripts/mod: disable LTO for empty.c
>   efi/libstub: disable LTO
>   drivers/misc/lkdtm: disable LTO for rodata.o
>   arm64: vdso: disable LTO
>   KVM: arm64: disable LTO for the nVHE directory
>   arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
>   arm64: allow LTO_CLANG and THINLTO to be selected
>
>  .gitignore                            |   1 +
>  Makefile                              |  45 +++--
>  arch/Kconfig                          |  74 +++++++
>  arch/arm64/Kconfig                    |   4 +
>  arch/arm64/kernel/vdso/Makefile       |   3 +-
>  arch/arm64/kvm/hyp/nvhe/Makefile      |   4 +-
>  drivers/firmware/efi/libstub/Makefile |   2 +
>  drivers/misc/lkdtm/Makefile           |   1 +
>  include/asm-generic/vmlinux.lds.h     |  11 +-
>  include/linux/init.h                  |  79 +++++++-
>  include/linux/pci.h                   |  19 +-
>  kernel/trace/Kconfig                  |  16 ++
>  scripts/Makefile.build                |  50 ++++-
>  scripts/Makefile.lib                  |   6 +-
>  scripts/Makefile.modfinal             |   9 +-
>  scripts/Makefile.modpost              |  25 ++-
>  scripts/generate_initcall_order.pl    | 270 ++++++++++++++++++++++++++
>  scripts/link-vmlinux.sh               |  70 ++++++-
>  scripts/mod/Makefile                  |   1 +
>  scripts/mod/modpost.c                 |  16 +-
>  scripts/mod/modpost.h                 |   9 +
>  scripts/mod/sumversion.c              |   6 +-
>  scripts/module.lds.S                  |  24 +++
>  23 files changed, 677 insertions(+), 68 deletions(-)
>  create mode 100755 scripts/generate_initcall_order.pl
>
>
> base-commit: 0fa8ee0d9ab95c9350b8b84574824d9a384a9f7d
> --
> 2.29.2.299.gdc1121823c-goog
>
Josh Poimboeuf Nov. 20, 2020, 4:04 a.m. UTC | #2
On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> be used in the kernel. Google has shipped millions of Pixel devices
> running three major kernel versions with LTO+CFI since 2018.
> 
> Most of the patches are build system changes for handling LLVM bitcode,
> which Clang produces with LTO instead of ELF object files, postponing
> ELF processing until a later stage, and ensuring initcall ordering.
> 
> Note that v7 brings back arm64 support as Will has now staged the
> prerequisite memory ordering patches [1], and drops x86_64 while we work
> on fixing the remaining objtool warnings [2].

Sami,

Here are some patches to fix the objtool issues (other than crypto which
I'll work on next).

  https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git objtool-vmlinux
Ard Biesheuvel Nov. 20, 2020, 10:29 a.m. UTC | #3
On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <samitolvanen@google.com> wrote:
> >
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > be used in the kernel. Google has shipped millions of Pixel devices
> > running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM bitcode,
> > which Clang produces with LTO instead of ELF object files, postponing
> > ELF processing until a later stage, and ensuring initcall ordering.
> >
> > Note that v7 brings back arm64 support as Will has now staged the
> > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > on fixing the remaining objtool warnings [2].
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> > [2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/
> >
> > You can also pull this series from
> >
> >   https://github.com/samitolvanen/linux.git lto-v7
>
> Thanks for continuing to drive this series Sami.  For the series,
>
> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
>
> I did virtualized boot tests with the series applied to aarch64
> defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> with CONFIG_THINLTO.  If you make changes to the series in follow ups,
> please drop my tested by tag from the modified patches and I'll help
> re-test.  Some minor feedback on the Kconfig change, but I'll post it
> off of that patch.
>

When you say 'virtualized" do you mean QEMU on x86? Or actual
virtualization on an AArch64 KVM host?

The distinction is important here, given the potential impact of LTO
on things that QEMU simply does not model when it runs in TCG mode on
a foreign host architecture.

> >
> > ---
> > Changes in v7:
> >
> >   - Rebased to master again.
> >
> >   - Added back arm64 patches as the prerequisites are now staged,
> >     and dropped x86_64 support until the remaining objtool issues
> >     are resolved.
> >
> >   - Dropped ifdefs from module.lds.S.
> >
> > Changes in v6:
> >
> >   - Added the missing --mcount flag to patch 5.
> >
> >   - Dropped the arm64 patches from this series and will repost them
> >     later.
> >
> > Changes in v5:
> >
> >   - Rebased on top of tip/master.
> >
> >   - Changed the command line for objtool to use --vmlinux --duplicate
> >     to disable warnings about retpoline thunks and to fix .orc_unwind
> >     generation for vmlinux.o.
> >
> >   - Added --noinstr flag to objtool, so we can use --vmlinux without
> >     also enabling noinstr validation.
> >
> >   - Disabled objtool's unreachable instruction warnings with LTO to
> >     disable false positives for the int3 padding in vmlinux.o.
> >
> >   - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps
> >     in x86 assembly code to fix objtool warnings with retpoline.
> >
> >   - Fixed modpost warnings about missing version information with
> >     CONFIG_MODVERSIONS.
> >
> >   - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks
> >     to Sedat for pointing this out.
> >
> >   - Updated the help text for ThinLTO to better explain the trade-offs.
> >
> >   - Updated commit messages with better explanations.
> >
> > Changes in v4:
> >
> >   - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool.
> >
> >   - Moved ftrace configs related to generating __mcount_loc to Kconfig,
> >     so they are available also in Makefile.modfinal.
> >
> >   - Dropped two prerequisite patches that were merged to Linus' tree.
> >
> > Changes in v3:
> >
> >   - Added a separate patch to remove the unused DISABLE_LTO treewide,
> >     as filtering out CC_FLAGS_LTO instead is preferred.
> >
> >   - Updated the Kconfig help to explain why LTO is behind a choice
> >     and disabled by default.
> >
> >   - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now
> >     appended directly to CC_FLAGS_LTO.
> >
> >   - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier.
> >
> >   - Fixed ThinLTO cache handling for external module builds.
> >
> >   - Rebased on top of Masahiro's patch for preprocessing modules.lds,
> >     and moved the contents of module-lto.lds to modules.lds.S.
> >
> >   - Moved objtool_args to Makefile.lib to avoid duplication of the
> >     command line parameters in Makefile.modfinal.
> >
> >   - Clarified in the commit message for the initcall ordering patch
> >     that the initcall order remains the same as without LTO.
> >
> >   - Changed link-vmlinux.sh to use jobserver-exec to control the
> >     number of jobs started by generate_initcall_ordering.pl.
> >
> >   - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's
> >     no longer needed with ToT kernel.
> >
> >   - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug
> >     with stack protector attributes.
> >
> > Changes in v2:
> >
> >   - Fixed -Wmissing-prototypes warnings with W=1.
> >
> >   - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache
> >     scrubbing to make distclean.
> >
> >   - Added a comment about Clang >=11 being required.
> >
> >   - Added a patch to disable LTO for the arm64 KVM nVHE code.
> >
> >   - Disabled objtool's noinstr validation with LTO unless enabled.
> >
> >   - Included Peter's proposed objtool mcount patch in the series
> >     and replaced recordmcount with the objtool pass to avoid
> >     whitelisting relocations that are not calls.
> >
> >   - Updated several commit messages with better explanations.
> >
> >
> > Sami Tolvanen (17):
> >   tracing: move function tracer options to Kconfig
> >   kbuild: add support for Clang LTO
> >   kbuild: lto: fix module versioning
> >   kbuild: lto: limit inlining
> >   kbuild: lto: merge module sections
> >   kbuild: lto: remove duplicate dependencies from .mod files
> >   init: lto: ensure initcall ordering
> >   init: lto: fix PREL32 relocations
> >   PCI: Fix PREL32 relocations for LTO
> >   modpost: lto: strip .lto from module names
> >   scripts/mod: disable LTO for empty.c
> >   efi/libstub: disable LTO
> >   drivers/misc/lkdtm: disable LTO for rodata.o
> >   arm64: vdso: disable LTO
> >   KVM: arm64: disable LTO for the nVHE directory
> >   arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
> >   arm64: allow LTO_CLANG and THINLTO to be selected
> >
> >  .gitignore                            |   1 +
> >  Makefile                              |  45 +++--
> >  arch/Kconfig                          |  74 +++++++
> >  arch/arm64/Kconfig                    |   4 +
> >  arch/arm64/kernel/vdso/Makefile       |   3 +-
> >  arch/arm64/kvm/hyp/nvhe/Makefile      |   4 +-
> >  drivers/firmware/efi/libstub/Makefile |   2 +
> >  drivers/misc/lkdtm/Makefile           |   1 +
> >  include/asm-generic/vmlinux.lds.h     |  11 +-
> >  include/linux/init.h                  |  79 +++++++-
> >  include/linux/pci.h                   |  19 +-
> >  kernel/trace/Kconfig                  |  16 ++
> >  scripts/Makefile.build                |  50 ++++-
> >  scripts/Makefile.lib                  |   6 +-
> >  scripts/Makefile.modfinal             |   9 +-
> >  scripts/Makefile.modpost              |  25 ++-
> >  scripts/generate_initcall_order.pl    | 270 ++++++++++++++++++++++++++
> >  scripts/link-vmlinux.sh               |  70 ++++++-
> >  scripts/mod/Makefile                  |   1 +
> >  scripts/mod/modpost.c                 |  16 +-
> >  scripts/mod/modpost.h                 |   9 +
> >  scripts/mod/sumversion.c              |   6 +-
> >  scripts/module.lds.S                  |  24 +++
> >  23 files changed, 677 insertions(+), 68 deletions(-)
> >  create mode 100755 scripts/generate_initcall_order.pl
> >
> >
> > base-commit: 0fa8ee0d9ab95c9350b8b84574824d9a384a9f7d
> > --
> > 2.29.2.299.gdc1121823c-goog
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
Nick Desaulniers Nov. 20, 2020, 8:19 p.m. UTC | #4
On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > Thanks for continuing to drive this series Sami.  For the series,
> >
> > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
> >
> > I did virtualized boot tests with the series applied to aarch64
> > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > with CONFIG_THINLTO.  If you make changes to the series in follow ups,
> > please drop my tested by tag from the modified patches and I'll help
> > re-test.  Some minor feedback on the Kconfig change, but I'll post it
> > off of that patch.
> >
>
> When you say 'virtualized" do you mean QEMU on x86? Or actual
> virtualization on an AArch64 KVM host?

aarch64 guest on x86_64 host.  If you have additional configurations
that are important to you, additional testing help would be
appreciated.

>
> The distinction is important here, given the potential impact of LTO
> on things that QEMU simply does not model when it runs in TCG mode on
> a foreign host architecture.
Sami Tolvanen Nov. 20, 2020, 8:25 p.m. UTC | #5
On Thu, Nov 19, 2020 at 8:04 PM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > be used in the kernel. Google has shipped millions of Pixel devices
> > running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM bitcode,
> > which Clang produces with LTO instead of ELF object files, postponing
> > ELF processing until a later stage, and ensuring initcall ordering.
> >
> > Note that v7 brings back arm64 support as Will has now staged the
> > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > on fixing the remaining objtool warnings [2].
>
> Sami,
>
> Here are some patches to fix the objtool issues (other than crypto which
> I'll work on next).
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git objtool-vmlinux

Thanks, Josh! I can confirm that these fix all the non-crypto objtool
warnings with LTO as well.

Sami
Ard Biesheuvel Nov. 20, 2020, 11:30 p.m. UTC | #6
On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > >
> > > Thanks for continuing to drive this series Sami.  For the series,
> > >
> > > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
> > >
> > > I did virtualized boot tests with the series applied to aarch64
> > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > > with CONFIG_THINLTO.  If you make changes to the series in follow ups,
> > > please drop my tested by tag from the modified patches and I'll help
> > > re-test.  Some minor feedback on the Kconfig change, but I'll post it
> > > off of that patch.
> > >
> >
> > When you say 'virtualized" do you mean QEMU on x86? Or actual
> > virtualization on an AArch64 KVM host?
>
> aarch64 guest on x86_64 host.  If you have additional configurations
> that are important to you, additional testing help would be
> appreciated.
>

Could you run this on an actual phone? Or does Android already ship
with this stuff?


> >
> > The distinction is important here, given the potential impact of LTO
> > on things that QEMU simply does not model when it runs in TCG mode on
> > a foreign host architecture.
>
> --
> Thanks,
> ~Nick Desaulniers
Nick Desaulniers Nov. 20, 2020, 11:53 p.m. UTC | #7
On Fri, Nov 20, 2020 at 3:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > >
> > > > Thanks for continuing to drive this series Sami.  For the series,
> > > >
> > > > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
> > > >
> > > > I did virtualized boot tests with the series applied to aarch64
> > > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > > > with CONFIG_THINLTO.  If you make changes to the series in follow ups,
> > > > please drop my tested by tag from the modified patches and I'll help
> > > > re-test.  Some minor feedback on the Kconfig change, but I'll post it
> > > > off of that patch.
> > > >
> > >
> > > When you say 'virtualized" do you mean QEMU on x86? Or actual
> > > virtualization on an AArch64 KVM host?
> >
> > aarch64 guest on x86_64 host.  If you have additional configurations
> > that are important to you, additional testing help would be
> > appreciated.
> >
>
> Could you run this on an actual phone? Or does Android already ship
> with this stuff?

By `this`, if you mean "the LTO series", it has been shipping on
Android phones for years now, I think it's even required in the latest
release.

If you mean "the LTO series + mainline" on a phone, well there's the
android-mainline of https://android.googlesource.com/kernel/common/,
in which this series was recently removed in order to facilitate
rebasing Android's patches on ToT-mainline until getting the series
landed upstream.  Bit of a chicken and the egg problem there.

If you mean "the LTO series + mainline + KVM" on a phone; I don't know
the precise state of aarch64 KVM and Android (Will or Marc would
know).  We did experiment recently with RockPI's for aach64 KVM, IIRC;
I think Android is tricky as it still requires A64+A32/T32 chipsets,
Alistair would know more.  Might be interesting to boot a virtualized
(or paravirtualized?) guest built with LTO in a host built with LTO
for sure, but I don't know if we have tried that yet (I think we did
try LTO guests of android kernels, but I think they were on the stock
RockPI host BSP image IIRC).

> > > The distinction is important here, given the potential impact of LTO
> > > on things that QEMU simply does not model when it runs in TCG mode on
> > > a foreign host architecture.
Nathan Chancellor Nov. 21, 2020, 3:14 a.m. UTC | #8
On Fri, Nov 20, 2020 at 11:29:51AM +0100, Ard Biesheuvel wrote:
> On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <samitolvanen@google.com> wrote:
> > >
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > be used in the kernel. Google has shipped millions of Pixel devices
> > > running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM bitcode,
> > > which Clang produces with LTO instead of ELF object files, postponing
> > > ELF processing until a later stage, and ensuring initcall ordering.
> > >
> > > Note that v7 brings back arm64 support as Will has now staged the
> > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > on fixing the remaining objtool warnings [2].
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> > > [2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/
> > >
> > > You can also pull this series from
> > >
> > >   https://github.com/samitolvanen/linux.git lto-v7
> >
> > Thanks for continuing to drive this series Sami.  For the series,
> >
> > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
> >
> > I did virtualized boot tests with the series applied to aarch64
> > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > with CONFIG_THINLTO.  If you make changes to the series in follow ups,
> > please drop my tested by tag from the modified patches and I'll help
> > re-test.  Some minor feedback on the Kconfig change, but I'll post it
> > off of that patch.
> >
> 
> When you say 'virtualized" do you mean QEMU on x86? Or actual
> virtualization on an AArch64 KVM host?
> 
> The distinction is important here, given the potential impact of LTO
> on things that QEMU simply does not model when it runs in TCG mode on
> a foreign host architecture.

I have booted this series on my Raspberry Pi 4 (ARCH=arm64 defconfig).

$ uname -r
5.10.0-rc4-00108-g830200082c74

$ zgrep LTO /proc/config.gz
CONFIG_LTO=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_THINLTO=y
CONFIG_THINLTO=y
# CONFIG_LTO_NONE is not set
CONFIG_LTO_CLANG=y
# CONFIG_HID_WALTOP is not set

and I have taken that same kernel and booted it under QEMU with
'-enable-kvm' without any visible issues.

I have tested four combinations:

clang 12 @ f9f0a4046e11c2b4c130640f343e3b2b5db08c1:
    * CONFIG_THINLTO=y
    * CONFIG_THINLTO=n

clang 11.0.0
    * CONFIG_THINLTO=y
    * CONFIG_THINLTO=n

Tested-by: Nathan Chancellor <natechancellor@gmail.com>

Cheers,
Nathan
Ard Biesheuvel Nov. 21, 2020, 7:35 a.m. UTC | #9
On Sat, 21 Nov 2020 at 00:53, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Fri, Nov 20, 2020 at 3:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > > >
> > > > > Thanks for continuing to drive this series Sami.  For the series,
> > > > >
> > > > > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
> > > > >
> > > > > I did virtualized boot tests with the series applied to aarch64
> > > > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > > > > with CONFIG_THINLTO.  If you make changes to the series in follow ups,
> > > > > please drop my tested by tag from the modified patches and I'll help
> > > > > re-test.  Some minor feedback on the Kconfig change, but I'll post it
> > > > > off of that patch.
> > > > >
> > > >
> > > > When you say 'virtualized" do you mean QEMU on x86? Or actual
> > > > virtualization on an AArch64 KVM host?
> > >
> > > aarch64 guest on x86_64 host.  If you have additional configurations
> > > that are important to you, additional testing help would be
> > > appreciated.
> > >
> >
> > Could you run this on an actual phone? Or does Android already ship
> > with this stuff?
>
> By `this`, if you mean "the LTO series", it has been shipping on
> Android phones for years now, I think it's even required in the latest
> release.
>
> If you mean "the LTO series + mainline" on a phone, well there's the
> android-mainline of https://android.googlesource.com/kernel/common/,
> in which this series was recently removed in order to facilitate
> rebasing Android's patches on ToT-mainline until getting the series
> landed upstream.  Bit of a chicken and the egg problem there.
>
> If you mean "the LTO series + mainline + KVM" on a phone; I don't know
> the precise state of aarch64 KVM and Android (Will or Marc would
> know).  We did experiment recently with RockPI's for aach64 KVM, IIRC;
> I think Android is tricky as it still requires A64+A32/T32 chipsets,
> Alistair would know more.  Might be interesting to boot a virtualized
> (or paravirtualized?) guest built with LTO in a host built with LTO
> for sure, but I don't know if we have tried that yet (I think we did
> try LTO guests of android kernels, but I think they were on the stock
> RockPI host BSP image IIRC).
>

I don't think testing under KVM gives us more confidence or coverage
than testing on bare metal. I was just pointing out that 'virtualized'
is misleading, and if you test things under QEMU/x86 + TCG, it is
better to be clear about this, and refer to it as 'under emulation'.
Marc Zyngier Nov. 21, 2020, 11:40 a.m. UTC | #10
On 2020-11-20 23:53, Nick Desaulniers wrote:
> On Fri, Nov 20, 2020 at 3:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>> 
>> On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers 
>> <ndesaulniers@google.com> wrote:
>> >
>> > On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>> > >
>> > > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <ndesaulniers@google.com> wrote:
>> > > >
>> > > > Thanks for continuing to drive this series Sami.  For the series,
>> > > >
>> > > > Tested-by: Nick Desaulniers <ndesaulniers@google.com>
>> > > >
>> > > > I did virtualized boot tests with the series applied to aarch64
>> > > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
>> > > > with CONFIG_THINLTO.  If you make changes to the series in follow ups,
>> > > > please drop my tested by tag from the modified patches and I'll help
>> > > > re-test.  Some minor feedback on the Kconfig change, but I'll post it
>> > > > off of that patch.
>> > > >
>> > >
>> > > When you say 'virtualized" do you mean QEMU on x86? Or actual
>> > > virtualization on an AArch64 KVM host?
>> >
>> > aarch64 guest on x86_64 host.  If you have additional configurations
>> > that are important to you, additional testing help would be
>> > appreciated.
>> >
>> 
>> Could you run this on an actual phone? Or does Android already ship
>> with this stuff?
> 
> By `this`, if you mean "the LTO series", it has been shipping on
> Android phones for years now, I think it's even required in the latest
> release.
> 
> If you mean "the LTO series + mainline" on a phone, well there's the
> android-mainline of https://android.googlesource.com/kernel/common/,
> in which this series was recently removed in order to facilitate
> rebasing Android's patches on ToT-mainline until getting the series
> landed upstream.  Bit of a chicken and the egg problem there.
> 
> If you mean "the LTO series + mainline + KVM" on a phone; I don't know
> the precise state of aarch64 KVM and Android (Will or Marc would
> know).

If you are lucky enough to have an Android system booting at EL2,
KVM should just works [1], though I haven't tried with this series.

> We did experiment recently with RockPI's for aach64 KVM, IIRC;
> I think Android is tricky as it still requires A64+A32/T32 chipsets,

Which is about 100% of the Android systems at the moment (I don't think
any of the asymmetric SoCs are in the wild yet). It doesn't really 
affect
KVM anyway.

          M.

[1] with the broken firmware gotchas that I believed to be erradicated
8 years ago, but are still prevalent in the Android world: laughable
PSCI implementation, invalid CNTFRQ_EL0...
Will Deacon Nov. 30, 2020, 12:01 p.m. UTC | #11
Hi Sami,

On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> be used in the kernel. Google has shipped millions of Pixel devices
> running three major kernel versions with LTO+CFI since 2018.
> 
> Most of the patches are build system changes for handling LLVM bitcode,
> which Clang produces with LTO instead of ELF object files, postponing
> ELF processing until a later stage, and ensuring initcall ordering.
> 
> Note that v7 brings back arm64 support as Will has now staged the
> prerequisite memory ordering patches [1], and drops x86_64 while we work
> on fixing the remaining objtool warnings [2].

Sounds like you're going to post a v8, but that's the plan for merging
that? The arm64 parts look pretty good to me now.

Will
Kees Cook Dec. 1, 2020, 5:31 p.m. UTC | #12
On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> Hi Sami,
> 
> On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > be used in the kernel. Google has shipped millions of Pixel devices
> > running three major kernel versions with LTO+CFI since 2018.
> > 
> > Most of the patches are build system changes for handling LLVM bitcode,
> > which Clang produces with LTO instead of ELF object files, postponing
> > ELF processing until a later stage, and ensuring initcall ordering.
> > 
> > Note that v7 brings back arm64 support as Will has now staged the
> > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > on fixing the remaining objtool warnings [2].
> 
> Sounds like you're going to post a v8, but that's the plan for merging
> that? The arm64 parts look pretty good to me now.

I haven't seen Masahiro comment on this in a while, so given the review
history and its use (for years now) in Android, I will carry v8 (assuming
all is fine with it) it in -next unless there are objections.
Nick Desaulniers Dec. 1, 2020, 7:51 p.m. UTC | #13
On Tue, Dec 1, 2020 at 9:31 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > Hi Sami,
> >
> > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > be used in the kernel. Google has shipped millions of Pixel devices
> > > running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM bitcode,
> > > which Clang produces with LTO instead of ELF object files, postponing
> > > ELF processing until a later stage, and ensuring initcall ordering.
> > >
> > > Note that v7 brings back arm64 support as Will has now staged the
> > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > on fixing the remaining objtool warnings [2].
> >
> > Sounds like you're going to post a v8, but that's the plan for merging
> > that? The arm64 parts look pretty good to me now.
>
> I haven't seen Masahiro comment on this in a while, so given the review
> history and its use (for years now) in Android, I will carry v8 (assuming
> all is fine with it) it in -next unless there are objections.

I had some minor stylistic feedback on the Kconfig changes; I'm happy
for you to land the bulk of the changes and then I follow up with
patches to the Kconfig after.
Sami Tolvanen Dec. 1, 2020, 9:38 p.m. UTC | #14
On Tue, Dec 1, 2020 at 11:51 AM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> On Tue, Dec 1, 2020 at 9:31 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > > Hi Sami,
> > >
> > > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > > This patch series adds support for building the kernel with Clang's
> > > > Link Time Optimization (LTO). In addition to performance, the primary
> > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > > be used in the kernel. Google has shipped millions of Pixel devices
> > > > running three major kernel versions with LTO+CFI since 2018.
> > > >
> > > > Most of the patches are build system changes for handling LLVM bitcode,
> > > > which Clang produces with LTO instead of ELF object files, postponing
> > > > ELF processing until a later stage, and ensuring initcall ordering.
> > > >
> > > > Note that v7 brings back arm64 support as Will has now staged the
> > > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > > on fixing the remaining objtool warnings [2].
> > >
> > > Sounds like you're going to post a v8, but that's the plan for merging
> > > that? The arm64 parts look pretty good to me now.
> >
> > I haven't seen Masahiro comment on this in a while, so given the review
> > history and its use (for years now) in Android, I will carry v8 (assuming
> > all is fine with it) it in -next unless there are objections.
>
> I had some minor stylistic feedback on the Kconfig changes; I'm happy
> for you to land the bulk of the changes and then I follow up with
> patches to the Kconfig after.

These are included in v8, which I just sent out.

Sami
Masahiro Yamada Dec. 2, 2020, 2:42 a.m. UTC | #15
On Wed, Dec 2, 2020 at 2:31 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > Hi Sami,
> >
> > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > be used in the kernel. Google has shipped millions of Pixel devices
> > > running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM bitcode,
> > > which Clang produces with LTO instead of ELF object files, postponing
> > > ELF processing until a later stage, and ensuring initcall ordering.
> > >
> > > Note that v7 brings back arm64 support as Will has now staged the
> > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > on fixing the remaining objtool warnings [2].
> >
> > Sounds like you're going to post a v8, but that's the plan for merging
> > that? The arm64 parts look pretty good to me now.
>
> I haven't seen Masahiro comment on this in a while, so given the review
> history and its use (for years now) in Android, I will carry v8 (assuming
> all is fine with it) it in -next unless there are objections.


What I dislike about this implementation is
it cannot drop any unreachable function/data.
(and it is completely different from GCC LTO)

This is not real LTO.




> --
> Kees Cook
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/202012010929.3788AF5%40keescook.
Sami Tolvanen Dec. 2, 2020, 5:46 a.m. UTC | #16
On Tue, Dec 1, 2020 at 6:43 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Wed, Dec 2, 2020 at 2:31 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > > Hi Sami,
> > >
> > > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > > This patch series adds support for building the kernel with Clang's
> > > > Link Time Optimization (LTO). In addition to performance, the primary
> > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > > be used in the kernel. Google has shipped millions of Pixel devices
> > > > running three major kernel versions with LTO+CFI since 2018.
> > > >
> > > > Most of the patches are build system changes for handling LLVM bitcode,
> > > > which Clang produces with LTO instead of ELF object files, postponing
> > > > ELF processing until a later stage, and ensuring initcall ordering.
> > > >
> > > > Note that v7 brings back arm64 support as Will has now staged the
> > > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > > on fixing the remaining objtool warnings [2].
> > >
> > > Sounds like you're going to post a v8, but that's the plan for merging
> > > that? The arm64 parts look pretty good to me now.
> >
> > I haven't seen Masahiro comment on this in a while, so given the review
> > history and its use (for years now) in Android, I will carry v8 (assuming
> > all is fine with it) it in -next unless there are objections.
>
>
> What I dislike about this implementation is
> it cannot drop any unreachable function/data.
> (and it is completely different from GCC LTO)
>
> This is not real LTO.

I'm not sure I understand your concern. LTO cannot drop functions or
data from vmlinux.o that may be referenced externally. However, with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, the linker certainly can drop
unused functions and data when linking vmlinux, and there's no reason
this option can't be used together with LTO. In fact, Pixel 3 does
enable this option, but in our experience, there isn't much unused
code or data to remove, so later devices no longer use it.

There's technically no reason why we couldn't postpone LTO until we
link vmlinux instead, and thus allow the linker to possibly remove
more unused code without the help of --gc-sections, but at least with
the current build process, that would involve performing the slow LTO
link step multiple times, which isn't worth it when we can get the
performance benefits (and CFI) already when linking vmlinux.o with
LTO.

Sami
Kees Cook Dec. 2, 2020, 6:54 p.m. UTC | #17
On Wed, Dec 02, 2020 at 11:42:21AM +0900, Masahiro Yamada wrote:
> On Wed, Dec 2, 2020 at 2:31 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > > Hi Sami,
> > >
> > > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > > This patch series adds support for building the kernel with Clang's
> > > > Link Time Optimization (LTO). In addition to performance, the primary
> > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > > be used in the kernel. Google has shipped millions of Pixel devices
> > > > running three major kernel versions with LTO+CFI since 2018.
> > > >
> > > > Most of the patches are build system changes for handling LLVM bitcode,
> > > > which Clang produces with LTO instead of ELF object files, postponing
> > > > ELF processing until a later stage, and ensuring initcall ordering.
> > > >
> > > > Note that v7 brings back arm64 support as Will has now staged the
> > > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > > on fixing the remaining objtool warnings [2].
> > >
> > > Sounds like you're going to post a v8, but that's the plan for merging
> > > that? The arm64 parts look pretty good to me now.
> >
> > I haven't seen Masahiro comment on this in a while, so given the review
> > history and its use (for years now) in Android, I will carry v8 (assuming
> > all is fine with it) it in -next unless there are objections.
> 
> 
> What I dislike about this implementation is
> it cannot drop any unreachable function/data.
> (and it is completely different from GCC LTO)

This seems to be an orthogonal concern: the kernel doesn't have GCC LTO
support either (though much of Sami's work is required for GCC LTO too).

> This is not real LTO.

I don't know what you're defining as "real LTO", but this is, very much,
Link Time Optimization: the compiler has access to the entire code at
once, and it is therefore in a position to perform many manipulations to
the code. As Sami mentioned, perhaps you're thinking specifically of
dead code elimination? That's a specific optimization.

> [thread[1] merging]
> This help document is misleading.
> People who read the document would misunderstand how great this feature would.
> 
> This should be added in the commit log and Kconfig help:
> 
>            In contrast to the example in the documentation, Clang LTO
>            for the kernel cannot remove any unreachable function or data.
>            In fact, this results in even bigger vmlinux and modules.

Which LTO passes are happening, how optimization are being performed,
etc, are endlessly tunable, but we can't work on that tuning without
the infrastructure to perform an LTO build in the first place. We need
to land the support, and go from there. As written, it works very well
for arm64 (which is what v8 targets specifically) and the results have
been running on millions of Android phones for years now. If further
tuning needs to happen for other architectures, config combinations, etc,
those can and will be developed. (For example, x86 is around the corner,
once some false positive warnings from objtool get hammered out, etc.)

I still want this in -next so we can build on it and improve it -- it
has been stuck in limbo for too long.

-Kees

[1] https://lore.kernel.org/kernel-hardening/CAK7LNASMh1KysAB4+gU7_iuTW+5GT2_yMDevwpLwx0iqjxwmWw@mail.gmail.com/