Message ID | 20200929214631.3516445-1-samitolvanen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Add support for Clang LTO | expand |
On Tue, Sep 29, 2020 at 02:46:02PM -0700, Sami Tolvanen wrote: > Furthermore, patches 4-8 include Peter's patch for generating > __mcount_loc with objtool, and build system changes to enable it on > x86. With these patches, we no longer need to annotate functions > that have non-call references to __fentry__ with LTO, which makes > supporting dynamic ftrace much simpler. Peter, can you take patches 4-8 into -tip? I think it'd make sense to keep them together. Steven, it sounds like you're okay with the changes (i.e. Sami showed the one concern you had was already handled)? Getting these into v5.10 would be really really nice. I'd really like to get the tree-spanning prerequisites nailed down for this series. It's been difficult to coordinate given the multiple maintainers. :) Specifically these patches: Peter Zijlstra (1): objtool: Add a pass for generating __mcount_loc Sami Tolvanen (4): objtool: Don't autodetect vmlinux.o tracing: move function tracer options to Kconfig tracing: add support for objtool mcount x86, build: use objtool mcount https://lore.kernel.org/lkml/20200929214631.3516445-5-samitolvanen@google.com/ https://lore.kernel.org/lkml/20200929214631.3516445-6-samitolvanen@google.com/ https://lore.kernel.org/lkml/20200929214631.3516445-7-samitolvanen@google.com/ https://lore.kernel.org/lkml/20200929214631.3516445-8-samitolvanen@google.com/ https://lore.kernel.org/lkml/20200929214631.3516445-9-samitolvanen@google.com/ Thanks!
On Tue, Sep 29, 2020 at 2:46 PM Sami Tolvanen <samitolvanen@google.com> wrote: > > This patch series adds support for building x86_64 and arm64 kernels > with Clang's Link Time Optimization (LTO). > > In addition to performance, the primary motivation for LTO is > to allow Clang's Control-Flow Integrity (CFI) to be used in the > kernel. Google has shipped millions of Pixel devices running three > major kernel versions with LTO+CFI since 2018. > > Most of the patches are build system changes for handling LLVM > bitcode, which Clang produces with LTO instead of ELF object files, > postponing ELF processing until a later stage, and ensuring initcall > ordering. Sami, thanks for continuing to drive the series. I encourage you to keep resending with fixes accumulated or dropped on a weekly cadence. The series worked well for me on arm64, but for x86_64 on mainline I saw a stream of new objtool warnings: testing your LTO series; x86_64 defconfig + CONFIG_THINLTO: ``` LTO vmlinux.o OBJTOOL vmlinux.o vmlinux.o: warning: objtool: wakeup_long64()+0x61: indirect jump found in RETPOLINE build vmlinux.o: warning: objtool: .text+0x308a: indirect jump found in RETPOLINE build vmlinux.o: warning: objtool: .text+0x30c5: indirect jump found in RETPOLINE build vmlinux.o: warning: objtool: copy_user_enhanced_fast_string() falls through to next function copy_user_generic_unrolled() vmlinux.o: warning: objtool: __memcpy_mcsafe() falls through to next function mcsafe_handle_tail() vmlinux.o: warning: objtool: memset() falls through to next function memset_erms() vmlinux.o: warning: objtool: __memcpy() falls through to next function memcpy_erms() vmlinux.o: warning: objtool: __x86_indirect_thunk_rax() falls through to next function __x86_retpoline_rax() vmlinux.o: warning: objtool: __x86_indirect_thunk_rbx() falls through to next function __x86_retpoline_rbx() vmlinux.o: warning: objtool: __x86_indirect_thunk_rcx() falls through to next function __x86_retpoline_rcx() vmlinux.o: warning: objtool: __x86_indirect_thunk_rdx() falls through to next function __x86_retpoline_rdx() vmlinux.o: warning: objtool: __x86_indirect_thunk_rsi() falls through to next function __x86_retpoline_rsi() vmlinux.o: warning: objtool: __x86_indirect_thunk_rdi() falls through to next function __x86_retpoline_rdi() vmlinux.o: warning: objtool: __x86_indirect_thunk_rbp() falls through to next function __x86_retpoline_rbp() vmlinux.o: warning: objtool: __x86_indirect_thunk_r8() falls through to next function __x86_retpoline_r8() vmlinux.o: warning: objtool: __x86_indirect_thunk_r9() falls through to next function __x86_retpoline_r9() vmlinux.o: warning: objtool: __x86_indirect_thunk_r10() falls through to next function __x86_retpoline_r10() vmlinux.o: warning: objtool: __x86_indirect_thunk_r11() falls through to next function __x86_retpoline_r11() vmlinux.o: warning: objtool: __x86_indirect_thunk_r12() falls through to next function __x86_retpoline_r12() vmlinux.o: warning: objtool: __x86_indirect_thunk_r13() falls through to next function __x86_retpoline_r13() vmlinux.o: warning: objtool: __x86_indirect_thunk_r14() falls through to next function __x86_retpoline_r14() vmlinux.o: warning: objtool: __x86_indirect_thunk_r15() falls through to next function __x86_retpoline_r15() ``` I think those should be resolved before I provide any kind of tested by tag. My other piece of feedback was that I like the default ThinLTO, but I think the help text in the Kconfig which is visible during menuconfig could be improved by informing the user the tradeoffs. For example, if CONFIG_THINLTO is disabled, it should be noted that full LTO will be used instead. Also, that full LTO may produce slightly better optimized binaries than ThinLTO, at the cost of not utilizing multiple cores when linking and thus significantly slower to link. Maybe explaining that setting it to "n" implies a full LTO build, which will be much slower to link but possibly slightly faster would be good? It's not visible unless LTO_CLANG and ARCH_SUPPORTS_THINLTO is enabled, so I don't think you need to explain that THINLTO without those is *not* full LTO. I'll leave the precise wording to you. WDYT? Also, when I look at your treewide DISABLE_LTO patch, I think "does that need to be a part of this series, or is it a cleanup that can stand on its own?" I think it may be the latter? Maybe it would help shed one more patch than to have to carry it to just send it? Or did I miss something as to why it should remain a part of this series?
On Wed, Sep 30, 2020 at 2:58 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Tue, Sep 29, 2020 at 2:46 PM Sami Tolvanen <samitolvanen@google.com> wrote: > > > > This patch series adds support for building x86_64 and arm64 kernels > > with Clang's Link Time Optimization (LTO). > > > > In addition to performance, the primary motivation for LTO is > > to allow Clang's Control-Flow Integrity (CFI) to be used in the > > kernel. Google has shipped millions of Pixel devices running three > > major kernel versions with LTO+CFI since 2018. > > > > Most of the patches are build system changes for handling LLVM > > bitcode, which Clang produces with LTO instead of ELF object files, > > postponing ELF processing until a later stage, and ensuring initcall > > ordering. > > Sami, thanks for continuing to drive the series. I encourage you to > keep resending with fixes accumulated or dropped on a weekly cadence. > > The series worked well for me on arm64, but for x86_64 on mainline I > saw a stream of new objtool warnings: [...] Objtool normally won't print out these warnings when run on vmlinux.o, but we can't pass --vmlinux to objtool as that also implies noinstr validation right now. I think we'd have to split that from --vmlinux to avoid these. I can include a patch to add a --noinstr flag in v5. Peter, any thoughts about this? > I think those should be resolved before I provide any kind of tested > by tag. My other piece of feedback was that I like the default > ThinLTO, but I think the help text in the Kconfig which is visible > during menuconfig could be improved by informing the user the > tradeoffs. For example, if CONFIG_THINLTO is disabled, it should be > noted that full LTO will be used instead. Also, that full LTO may > produce slightly better optimized binaries than ThinLTO, at the cost > of not utilizing multiple cores when linking and thus significantly > slower to link. > > Maybe explaining that setting it to "n" implies a full LTO build, > which will be much slower to link but possibly slightly faster would > be good? It's not visible unless LTO_CLANG and ARCH_SUPPORTS_THINLTO > is enabled, so I don't think you need to explain that THINLTO without > those is *not* full LTO. I'll leave the precise wording to you. WDYT? Sure, sounds good. I'll update the help text in the next version. > Also, when I look at your treewide DISABLE_LTO patch, I think "does > that need to be a part of this series, or is it a cleanup that can > stand on its own?" I think it may be the latter? Maybe it would help > shed one more patch than to have to carry it to just send it? Or did > I miss something as to why it should remain a part of this series? I suppose it could be stand-alone, but as these patches are also disabling LTO by filtering out flags in some of the same files, removing the unused DISABLE_LTO flags first would reduce confusion. But I'm fine with sending it separately too if that's preferred. Sami