Message ID | 20201201213707.541432-1-samitolvanen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Add support for Clang LTO | expand |
On Tue, Dec 1, 2020 at 1:37 PM Sami Tolvanen <samitolvanen@google.com> wrote: > > This patch series adds support for building the kernel with Clang's > Link Time Optimization (LTO). In addition to performance, the primary > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > to be used in the kernel. Google has shipped millions of Pixel > devices running three major kernel versions with LTO+CFI since 2018. > > Most of the patches are build system changes for handling LLVM > bitcode, which Clang produces with LTO instead of ELF object files, > postponing ELF processing until a later stage, and ensuring initcall > ordering. > > Note that arm64 support depends on Will's memory ordering patches > [1]. I will post x86_64 patches separately after we have fixed the > remaining objtool warnings [2][3]. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/ > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux > > You can also pull this series from > > https://github.com/samitolvanen/linux.git lto-v8 > > --- > Changes in v8: > > - Cleaned up the LTO Kconfig options based on suggestions from > Nick and Kees. Thanks Sami, for the series: Tested-by: Nick Desaulniers <ndesaulniers@google.com> (build and boot tested under emulation with https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto additionally rebased on top). As with v7, if the series changes drastically for v9, please consider dropping my tested by tag for the individual patches that change and I will help re-test them.
Hi Sami, On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote: > This patch series adds support for building the kernel with Clang's > Link Time Optimization (LTO). In addition to performance, the primary > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > to be used in the kernel. Google has shipped millions of Pixel > devices running three major kernel versions with LTO+CFI since 2018. > > Most of the patches are build system changes for handling LLVM > bitcode, which Clang produces with LTO instead of ELF object files, > postponing ELF processing until a later stage, and ensuring initcall > ordering. > > Note that arm64 support depends on Will's memory ordering patches > [1]. I will post x86_64 patches separately after we have fixed the > remaining objtool warnings [2][3]. I took this series for a spin, with my for-next/lto branch merged in but I see a failure during the LTO stage with clang 11.0.5 because it doesn't understand the '.arch_extension rcpc' directive we throw out in READ_ONCE(). We actually check that this extension is available before using it in the arm64 Kconfig: config AS_HAS_LDAPR def_bool $(as-instr,.arch_extension rcpc) so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1 on my Make command line; with that, then the detection works correctly and the LTO step succeeds. Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it would be _much_ better if this was implicit (or if LTO depended on it). Cheers, Will
On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote: > > Hi Sami, > > On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote: > > This patch series adds support for building the kernel with Clang's > > Link Time Optimization (LTO). In addition to performance, the primary > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > > to be used in the kernel. Google has shipped millions of Pixel > > devices running three major kernel versions with LTO+CFI since 2018. > > > > Most of the patches are build system changes for handling LLVM > > bitcode, which Clang produces with LTO instead of ELF object files, > > postponing ELF processing until a later stage, and ensuring initcall > > ordering. > > > > Note that arm64 support depends on Will's memory ordering patches > > [1]. I will post x86_64 patches separately after we have fixed the > > remaining objtool warnings [2][3]. > > I took this series for a spin, with my for-next/lto branch merged in but > I see a failure during the LTO stage with clang 11.0.5 because it doesn't > understand the '.arch_extension rcpc' directive we throw out in READ_ONCE(). I just tested this with Clang 11.0.0, which I believe is the latest 11.x version, and the current Clang 12 development branch, and both work for me. Godbolt confirms that '.arch_extension rcpc' is supported by the integrated assembler starting with Clang 11 (the example fails with 10.0.1): https://godbolt.org/z/1csGcT What does running clang --version and ld.lld --version tell you? > We actually check that this extension is available before using it in > the arm64 Kconfig: > > config AS_HAS_LDAPR > def_bool $(as-instr,.arch_extension rcpc) > > so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1 > on my Make command line; with that, then the detection works correctly > and the LTO step succeeds. > > Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it > would be _much_ better if this was implicit (or if LTO depended on it). Without LLVM_IAS=1, Clang uses two different assemblers when LTO is enabled: the external GNU assembler for stand-alone assembly, and LLVM's integrated assembler for inline assembly. as-instr tests the external assembler and makes an admittedly reasonable assumption that the test is also valid for inline assembly. I agree that it would reduce confusion in future if we just always enabled IAS with LTO. Nick, Nathan, any thoughts about this? Sami
On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote: > On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote: > > > > Hi Sami, > > > > On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote: > > > This patch series adds support for building the kernel with Clang's > > > Link Time Optimization (LTO). In addition to performance, the primary > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > > > to be used in the kernel. Google has shipped millions of Pixel > > > devices running three major kernel versions with LTO+CFI since 2018. > > > > > > Most of the patches are build system changes for handling LLVM > > > bitcode, which Clang produces with LTO instead of ELF object files, > > > postponing ELF processing until a later stage, and ensuring initcall > > > ordering. > > > > > > Note that arm64 support depends on Will's memory ordering patches > > > [1]. I will post x86_64 patches separately after we have fixed the > > > remaining objtool warnings [2][3]. > > > > I took this series for a spin, with my for-next/lto branch merged in but > > I see a failure during the LTO stage with clang 11.0.5 because it doesn't > > understand the '.arch_extension rcpc' directive we throw out in READ_ONCE(). > > I just tested this with Clang 11.0.0, which I believe is the latest > 11.x version, and the current Clang 12 development branch, and both > work for me. Godbolt confirms that '.arch_extension rcpc' is supported > by the integrated assembler starting with Clang 11 (the example fails > with 10.0.1): > > https://godbolt.org/z/1csGcT > > What does running clang --version and ld.lld --version tell you? 11.0.5 is AOSP's clang, which is behind the upstream 11.0.0 release so it is most likely the case that it is missing the patch that added rcpc. I think that a version based on the development branch (12.0.0) is in the works but I am not sure. > > We actually check that this extension is available before using it in > > the arm64 Kconfig: > > > > config AS_HAS_LDAPR > > def_bool $(as-instr,.arch_extension rcpc) > > > > so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1 > > on my Make command line; with that, then the detection works correctly > > and the LTO step succeeds. > > > > Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it > > would be _much_ better if this was implicit (or if LTO depended on it). > > Without LLVM_IAS=1, Clang uses two different assemblers when LTO is > enabled: the external GNU assembler for stand-alone assembly, and > LLVM's integrated assembler for inline assembly. as-instr tests the > external assembler and makes an admittedly reasonable assumption that > the test is also valid for inline assembly. > > I agree that it would reduce confusion in future if we just always > enabled IAS with LTO. Nick, Nathan, any thoughts about this? I am personally fine with that. As far as I am aware, we are in a fairly good spot on arm64 and x86_64 when it comes to the integrated assembler. Should we make it so that the user has to pass LLVM_IAS=1 explicitly or we just make adding the no integrated assembler flag to CLANG_FLAGS depend on not LTO (although that will require extra handling because Kconfig is not available at that stage I think)? Cheers, Nathan
On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote: > On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote: > > On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote: > > > This patch series adds support for building the kernel with Clang's > > > Link Time Optimization (LTO). In addition to performance, the primary > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > > > to be used in the kernel. Google has shipped millions of Pixel > > > devices running three major kernel versions with LTO+CFI since 2018. > > > > > > Most of the patches are build system changes for handling LLVM > > > bitcode, which Clang produces with LTO instead of ELF object files, > > > postponing ELF processing until a later stage, and ensuring initcall > > > ordering. > > > > > > Note that arm64 support depends on Will's memory ordering patches > > > [1]. I will post x86_64 patches separately after we have fixed the > > > remaining objtool warnings [2][3]. > > > > I took this series for a spin, with my for-next/lto branch merged in but > > I see a failure during the LTO stage with clang 11.0.5 because it doesn't > > understand the '.arch_extension rcpc' directive we throw out in READ_ONCE(). > > I just tested this with Clang 11.0.0, which I believe is the latest > 11.x version, and the current Clang 12 development branch, and both > work for me. Godbolt confirms that '.arch_extension rcpc' is supported > by the integrated assembler starting with Clang 11 (the example fails > with 10.0.1): > > https://godbolt.org/z/1csGcT > > What does running clang --version and ld.lld --version tell you? I'm using some Android prebuilts I had kicking around: Android (6875598, based on r399163b) clang version 11.0.5 (https://android.googlesource.com/toolchain/llvm-project 87f1315dfbea7c137aa2e6d362dbb457e388158d) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/local/google/home/willdeacon/work/android/repo/android-kernel/prebuilts-master/clang/host/linux-x86/clang-r399163b/bin and: LLD 11.0.5 (/buildbot/tmp/tmpx1DlI_ 87f1315dfbea7c137aa2e6d362dbb457e388158d) (compatible with GNU linkers) > > We actually check that this extension is available before using it in > > the arm64 Kconfig: > > > > config AS_HAS_LDAPR > > def_bool $(as-instr,.arch_extension rcpc) > > > > so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1 > > on my Make command line; with that, then the detection works correctly > > and the LTO step succeeds. > > > > Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it > > would be _much_ better if this was implicit (or if LTO depended on it). > > Without LLVM_IAS=1, Clang uses two different assemblers when LTO is > enabled: the external GNU assembler for stand-alone assembly, and > LLVM's integrated assembler for inline assembly. as-instr tests the > external assembler and makes an admittedly reasonable assumption that > the test is also valid for inline assembly. > > I agree that it would reduce confusion in future if we just always > enabled IAS with LTO. Nick, Nathan, any thoughts about this? That works for me, although I'm happy with anything which means that the assembler checks via as-instr apply to the assembler which will ultimately be used. Will
On Thu, Dec 3, 2020 at 10:23 AM Will Deacon <will@kernel.org> wrote: > > On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote: > > On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote: > > > I took this series for a spin, with my for-next/lto branch merged in but > > > I see a failure during the LTO stage with clang 11.0.5 because it doesn't > > > understand the '.arch_extension rcpc' directive we throw out in READ_ONCE(). > > > > I just tested this with Clang 11.0.0, which I believe is the latest > > 11.x version, and the current Clang 12 development branch, and both > > work for me. Godbolt confirms that '.arch_extension rcpc' is supported > > by the integrated assembler starting with Clang 11 (the example fails > > with 10.0.1): > > > > https://godbolt.org/z/1csGcT > > > > What does running clang --version and ld.lld --version tell you? > > I'm using some Android prebuilts I had kicking around: > > Android (6875598, based on r399163b) clang version 11.0.5 (https://android.googlesource.com/toolchain/llvm-project 87f1315dfbea7c137aa2e6d362dbb457e388158d) > Target: x86_64-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/local/google/home/willdeacon/work/android/repo/android-kernel/prebuilts-master/clang/host/linux-x86/clang-r399163b/bin > > and: > > LLD 11.0.5 (/buildbot/tmp/tmpx1DlI_ 87f1315dfbea7c137aa2e6d362dbb457e388158d) (compatible with GNU linkers) On Thu, Dec 3, 2020 at 10:22 AM Nathan Chancellor <natechancellor@gmail.com> wrote: > > 11.0.5 is AOSP's clang, which is behind the upstream 11.0.0 release so > it is most likely the case that it is missing the patch that added rcpc. > I think that a version based on the development branch (12.0.0) is in > the works but I am not sure. Yep, I have a lot of thoughts on the AOSP LLVM versioning scheme, but they're not for LKML. That's yet another reason to prefer feature detection as opposed to brittle version checks. Of course, as Will points out, if your feature detection is broken, that helps no one...more thoughts below. > > > We actually check that this extension is available before using it in > > > the arm64 Kconfig: > > > > > > config AS_HAS_LDAPR > > > def_bool $(as-instr,.arch_extension rcpc) > > > > > > so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1 > > > on my Make command line; with that, then the detection works correctly > > > and the LTO step succeeds. > > > > > > Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it > > > would be _much_ better if this was implicit (or if LTO depended on it). > > > > Without LLVM_IAS=1, Clang uses two different assemblers when LTO is > > enabled: the external GNU assembler for stand-alone assembly, and > > LLVM's integrated assembler for inline assembly. as-instr tests the > > external assembler and makes an admittedly reasonable assumption that > > the test is also valid for inline assembly. > > > > I agree that it would reduce confusion in future if we just always > > enabled IAS with LTO. Nick, Nathan, any thoughts about this? > > That works for me, although I'm happy with anything which means that the > assembler checks via as-instr apply to the assembler which will ultimately > be used. I agree with Will. I think interoperability of tools is important. We should be able to mix tools from GNU and LLVM and produce working images. Specifically, combinations like gcc+llvm-nm+as+llvm-objcopy, or clang+nm+as+objcopy as two examples. There's a combinatorial explosion of combinations to test/validate, which we're not doing today, but if for some reason someone wants to use some varied combination and it doesn't work, it's worthwhile to understand the differences and issues and try to fix them. That is a win for optionality and loose coupling. That's not what's going on here though. While I think it's ok to select a compiler and assembler and linker etc from ecosystem or another, I think trying to support a build that mixes or uses different assemblers (or linkers, compilers, etc) from both for the same build is something we should draw a line in the sand and explicitly not support (except for the compat vdso's*...). ie. if I say `make LD=ld.bfd` and ld.lld gets invoked somehow (or vice versa); I consider that a bug in KBUILD. That is what's happening here, it's why as-instr feature detection is broken; because two different assemblers were used in the same build. One for inline asm, a different one for out of line asm. At the very least, it violates the Principle of Least Surprise (or is it the Law of Equivalent Exchange, I forget). In fact, lots of the work we've been doing to enable LLVM tools to build the kernel have been identifying places throughout KBUILD where tools were hardcoded rather than using what make was told to use, and we've been making progress fixing those. The ultimate test of Linux kernel build hermiticity IMO is that I should be able to build a kernel in an environment that only has one version of either GCC/binutils or LLVM, and the kernel should build without failure. That's not the case today for all arch's; cross compiling compat vdsos again are a major pain point*, but we're making progress. In that sense, the mixing of an individual GNU and LLVM utility is what I would consider a bug in KBUILD. I want to emphasize that's distinct from mixing and matching tools when invoking make, which I consider OK, if under-tested. Ok (mixes GNU and LLVM tools; gcc is the only compiler invoked, ld.lld is the only linker invoked): $ make CC=gcc LD=ld.lld Not ok (if ld.bfd or both are invoked) $ make LD=ld.lld Not ok (if ld.lld or both are invoked) $ make LD=ld.bfd Not ok (if clang's integrated assembler and GAS are invoked) $ ./scripts/config -e LTO_CLANG $ make LLVM=1 LLVM_IAS=1 The mixing of GAS and clang's integrated assembler for kernel LTO builds is a relic of a time when this series was first written when Clang's integrated assembler was in no form ready to assemble the entire Linux kernel, but could handle the inline asm for aarch64. Fortunately, ARM's LLVM team has done great work to ensure the latest extensions like RCpc are supported and compatible, and Jian has done the hard work ironing out the last mile issues in clang's assembler to get the ball in the end zone. Removing mixing GAS and clang's IA here ups the ante and removes a fallback/pressure relief valve, but I'm fine with that. Requiring clang's integrated assembler here aligns incentives to keep this working and to continue investing here. Just because it's possible to mix the use of clang's integrated assembler with GNU assembler for LTO (for some combination of versions of these tools) doesn't mean we should support it, or encourage it, for all of the reasons above. We should make this config depend on clang's integrated assembler, and not support the mixing of assemblers in one build. Thou shalt not support invoking of different tools than what's specified*. Do not pass go, do not collect $200. Full stop. * The compat vdso's are again a special case; when cross compiling using GNU tools, a separate binary with a different target triple prefix will typically get invoked than what's used to build the rest of the kernel image. This still doesn't cross the GNU/LLVM boundary though, and most importantly doesn't involve linking together object files that were built with distinct assemblers (for example). So I'd recommend to Sami to simply make the Kconfig also depend on clang's integrated assembler (not just llvm-nm and llvm-ar). If someone cares about LTO with Clang as the compiler but GAS as the assembler, then we can revisit supporting that combination (and the changes to KCONFIG), but it shouldn't be something we consider Tier 1 supported or a combination that need be supported in a minimum viable product. And at that point we should make it avoid clang's integrated assembler entirely (I suspect LTO won't work at all in that case, so maybe even considering it is a waste of time). One question I have to Will; if for aarch64 LTO will depend on RCpc, but RCpc is an ARMv8.3 extension, what are the implications for LTO on pre-ARMv8.3 aarch64 processors?
On Thu, Dec 03, 2020 at 02:32:13PM -0800, Nick Desaulniers wrote: > On Thu, Dec 3, 2020 at 10:23 AM Will Deacon <will@kernel.org> wrote: > > On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote: > > > Without LLVM_IAS=1, Clang uses two different assemblers when LTO is > > > enabled: the external GNU assembler for stand-alone assembly, and > > > LLVM's integrated assembler for inline assembly. as-instr tests the > > > external assembler and makes an admittedly reasonable assumption that > > > the test is also valid for inline assembly. > > > > > > I agree that it would reduce confusion in future if we just always > > > enabled IAS with LTO. Nick, Nathan, any thoughts about this? > > > > That works for me, although I'm happy with anything which means that the > > assembler checks via as-instr apply to the assembler which will ultimately > > be used. > > I agree with Will. [...] > So I'd recommend to Sami to simply make the Kconfig also depend on > clang's integrated assembler (not just llvm-nm and llvm-ar). If > someone cares about LTO with Clang as the compiler but GAS as the > assembler, then we can revisit supporting that combination (and the > changes to KCONFIG), but it shouldn't be something we consider Tier 1 > supported or a combination that need be supported in a minimum viable > product. And at that point we should make it avoid clang's integrated > assembler entirely (I suspect LTO won't work at all in that case, so > maybe even considering it is a waste of time). > > One question I have to Will; if for aarch64 LTO will depend on RCpc, > but RCpc is an ARMv8.3 extension, what are the implications for LTO on > pre-ARMv8.3 aarch64 processors? It doesn't depend on RCpc -- we just emit a more expensive instruction (an RCsc acquire) if the RCpc one is not supported by both the toolchain and the CPU. So the implication for those processors is that READ_ONCE() may be more expensive. Will
On Thu, Dec 3, 2020 at 2:32 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > So I'd recommend to Sami to simply make the Kconfig also depend on > clang's integrated assembler (not just llvm-nm and llvm-ar). Sure, sounds good to me. What's the preferred way to test for this in Kconfig? It looks like actually trying to test if we have an LLVM assembler (e.g. using $(as-instr,.section ".linker-options","e",@llvm_linker_options)) doesn't work as Kconfig doesn't pass -no-integrated-as to clang here. I could do something simple like $(success,echo $(LLVM) $(LLVM_IAS) | grep -q "1 1"). Thoughts? Sami
On Fri, Dec 04, 2020 at 02:52:41PM -0800, Sami Tolvanen wrote: > On Thu, Dec 3, 2020 at 2:32 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > So I'd recommend to Sami to simply make the Kconfig also depend on > > clang's integrated assembler (not just llvm-nm and llvm-ar). > > Sure, sounds good to me. What's the preferred way to test for this in Kconfig? > > It looks like actually trying to test if we have an LLVM assembler > (e.g. using $(as-instr,.section > ".linker-options","e",@llvm_linker_options)) doesn't work as Kconfig > doesn't pass -no-integrated-as to clang here. I could do something > simple like $(success,echo $(LLVM) $(LLVM_IAS) | grep -q "1 1"). > > Thoughts? > > Sami I think depends on $(success,test $(LLVM_IAS) -eq 1) should work, at least according to my brief test. Cheers, Nathan
On Sat, Dec 5, 2020 at 10:50 PM Nathan Chancellor <natechancellor@gmail.com> wrote: > > On Fri, Dec 04, 2020 at 02:52:41PM -0800, Sami Tolvanen wrote: > > On Thu, Dec 3, 2020 at 2:32 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > > > So I'd recommend to Sami to simply make the Kconfig also depend on > > > clang's integrated assembler (not just llvm-nm and llvm-ar). > > > > Sure, sounds good to me. What's the preferred way to test for this in Kconfig? > > > > It looks like actually trying to test if we have an LLVM assembler > > (e.g. using $(as-instr,.section > > ".linker-options","e",@llvm_linker_options)) doesn't work as Kconfig > > doesn't pass -no-integrated-as to clang here. After a closer look, that's actually not correct, this seems to work with Clang+LLD no matter which assembler is used. I suppose we could test for .gasversion. to detect GNU as, but that's hardly ideal. > >I could do something > > simple like $(success,echo $(LLVM) $(LLVM_IAS) | grep -q "1 1"). > > > > Thoughts? > > > > Sami > > I think > > depends on $(success,test $(LLVM_IAS) -eq 1) > > should work, at least according to my brief test. Sure, looks good to me. However, I think we should also test for LLVM=1 to avoid possible further issues with mismatched toolchains instead of only checking for llvm-nm and llvm-ar. Sami
On Sun, Dec 06, 2020 at 12:09:31PM -0800, Sami Tolvanen wrote: > Sure, looks good to me. However, I think we should also test for > LLVM=1 to avoid possible further issues with mismatched toolchains > instead of only checking for llvm-nm and llvm-ar. It might still be worth testing for $(AR) and $(NM) because in theory, a user could say 'make AR=ar LLVM=1'. Highly unlikely I suppose but worth considering. Cheers, Nathan
On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > This patch series adds support for building the kernel with Clang's > Link Time Optimization (LTO). In addition to performance, the primary > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > to be used in the kernel. Google has shipped millions of Pixel > devices running three major kernel versions with LTO+CFI since 2018. > > Most of the patches are build system changes for handling LLVM > bitcode, which Clang produces with LTO instead of ELF object files, > postponing ELF processing until a later stage, and ensuring initcall > ordering. > > Note that arm64 support depends on Will's memory ordering patches > [1]. I will post x86_64 patches separately after we have fixed the > remaining objtool warnings [2][3]. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/ > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux > > You can also pull this series from > > https://github.com/samitolvanen/linux.git lto-v8 I've tried pull this into my randconfig test tree to give it a spin. So far I have not managed to get a working build out of it, the main problem so far being that it is really slow to build because the link stage only uses one CPU. These are the other issues I've seen so far: - one build seems to take even longer to link. It's currently at 35GB RAM usage and 40 minutes into the final link, but I'm worried it might not complete before it runs out of memory. I only have 128GB installed, and google-chrome uses another 30GB of that, and I'm also doing some other builds in parallel. Is there a minimum recommended amount of memory for doing LTO builds? - One build failed with ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o -T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a security/built-in.a crypto/built-in.a block/built-in.a arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive --start-group arch/arm64/lib/lib.a lib/lib.a ./drivers/firmware/efi/libstub/lib.a --end-group "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index" after about 30 minutes - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO doesn't work with ld.bfd. I've added a CPU_LITTLE_ENDIAN dependency to ARCH_SUPPORTS_LTO_CLANG{,THIN} - one build failed with "ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" Not sure how to debug this - one build seems to have dropped all symbols the string operations from vmlinux, so while the link goes through, modules cannot be loaded: ERROR: modpost: "memmove" [drivers/media/rc/rc-core.ko] undefined! ERROR: modpost: "memcpy" [net/wireless/cfg80211.ko] undefined! ERROR: modpost: "memcpy" [net/8021q/8021q.ko] undefined! ERROR: modpost: "memset" [net/8021q/8021q.ko] undefined! ERROR: modpost: "memcpy" [net/unix/unix.ko] undefined! ERROR: modpost: "memset" [net/sched/cls_u32.ko] undefined! ERROR: modpost: "memcpy" [net/sched/cls_u32.ko] undefined! ERROR: modpost: "memset" [net/sched/sch_skbprio.ko] undefined! ERROR: modpost: "memcpy" [net/802/garp.ko] undefined! I first thought this was related to a clang-12 bug I saw the other day, but this also happens with clang-11 - many builds complain about thousands of duplicate symbols in the kernel, e.g. ld.lld: error: duplicate symbol: qrtr_endpoint_post >>> defined in net/qrtr/qrtr.lto.o >>> defined in net/qrtr/qrtr.o ld.lld: error: duplicate symbol: init_module >>> defined in crypto/842.lto.o >>> defined in crypto/842.o ld.lld: error: duplicate symbol: init_module >>> defined in net/netfilter/nfnetlink_log.lto.o >>> defined in net/netfilter/nfnetlink_log.o ld.lld: error: duplicate symbol: vli_from_be64 >>> defined in crypto/ecc.lto.o >>> defined in crypto/ecc.o ld.lld: error: duplicate symbol: __mod_of__plldig_clk_id_device_table >>> defined in drivers/clk/clk-plldig.lto.o >>> defined in drivers/clk/clk-plldig.o Not sure if these are all known issues. If there is one you'd like me try take a closer look at for finding which config options break it, I can try Arnd
On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > - many builds complain about thousands of duplicate symbols in the kernel, e.g. > ld.lld: error: duplicate symbol: qrtr_endpoint_post > >>> defined in net/qrtr/qrtr.lto.o > >>> defined in net/qrtr/qrtr.o > ld.lld: error: duplicate symbol: init_module > >>> defined in crypto/842.lto.o > >>> defined in crypto/842.o > ld.lld: error: duplicate symbol: init_module > >>> defined in net/netfilter/nfnetlink_log.lto.o > >>> defined in net/netfilter/nfnetlink_log.o > ld.lld: error: duplicate symbol: vli_from_be64 > >>> defined in crypto/ecc.lto.o > >>> defined in crypto/ecc.o > ld.lld: error: duplicate symbol: __mod_of__plldig_clk_id_device_table > >>> defined in drivers/clk/clk-plldig.lto.o > >>> defined in drivers/clk/clk-plldig.o A small update here: I see this behavior with every single module build, including 'tinyconfig' with one module enabled, and 'defconfig'. I tuned the randconfig setting using KCONFIG_PROBABILITY=2:2:1 now, which only enables a few symbols. With this I see faster build times (obvioulsy), aroudn 30 seconds per kernel, and all small builds with CONFIG_MODULES disabled so far succeed. It appears that the problems I saw originally only happen for larger configurations, or possibly a combination of Kconfig options that don't happen that often on randconfig builds with low KCONFIG_PROBABILITY. Arnd
On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux > <clang-built-linux@googlegroups.com> wrote: > > > > This patch series adds support for building the kernel with Clang's > > Link Time Optimization (LTO). In addition to performance, the primary > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > > to be used in the kernel. Google has shipped millions of Pixel > > devices running three major kernel versions with LTO+CFI since 2018. > > > > Most of the patches are build system changes for handling LLVM > > bitcode, which Clang produces with LTO instead of ELF object files, > > postponing ELF processing until a later stage, and ensuring initcall > > ordering. > > > > Note that arm64 support depends on Will's memory ordering patches > > [1]. I will post x86_64 patches separately after we have fixed the > > remaining objtool warnings [2][3]. > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto > > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/ > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux > > > > You can also pull this series from > > > > https://github.com/samitolvanen/linux.git lto-v8 > > I've tried pull this into my randconfig test tree to give it a spin. Great, thank you for testing this! > So far I have > not managed to get a working build out of it, the main problem so far being > that it is really slow to build because the link stage only uses one CPU. > These are the other issues I've seen so far: You may want to limit your testing only to ThinLTO at first, because full LTO is going to be extremely slow with larger configs, especially when building arm64 kernels. > - one build seems to take even longer to link. It's currently at 35GB RAM > usage and 40 minutes into the final link, but I'm worried it might > not complete > before it runs out of memory. I only have 128GB installed, and google-chrome > uses another 30GB of that, and I'm also doing some other builds in parallel. > Is there a minimum recommended amount of memory for doing LTO builds? When building arm64 defconfig, the maximum memory usage I measured with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured larger configurations, but I believe LLD can easily consume 3-4x that much with full LTO allyesconfig. > - One build failed with > ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o > -T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o > init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a > certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a > security/built-in.a crypto/built-in.a block/built-in.a > arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a > sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive > --start-group arch/arm64/lib/lib.a lib/lib.a > ./drivers/firmware/efi/libstub/lib.a --end-group > "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index" > after about 30 minutes That's interesting. Did you use LLVM_IAS=1? > - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO > doesn't work with ld.bfd. > I've added a CPU_LITTLE_ENDIAN dependency to > ARCH_SUPPORTS_LTO_CLANG{,THIN} Ah, good point. I'll fix this in v9. [...] > Not sure if these are all known issues. If there is one you'd like me try > take a closer look at for finding which config options break it, I can try No, none of these are known issues. I would be happy to take a closer look if you can share configs that reproduce these. Sami
On Tue, Dec 8, 2020 at 5:55 AM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > > > - many builds complain about thousands of duplicate symbols in the kernel, e.g. > > ld.lld: error: duplicate symbol: qrtr_endpoint_post > > >>> defined in net/qrtr/qrtr.lto.o > > >>> defined in net/qrtr/qrtr.o > > ld.lld: error: duplicate symbol: init_module > > >>> defined in crypto/842.lto.o > > >>> defined in crypto/842.o > > ld.lld: error: duplicate symbol: init_module > > >>> defined in net/netfilter/nfnetlink_log.lto.o > > >>> defined in net/netfilter/nfnetlink_log.o > > ld.lld: error: duplicate symbol: vli_from_be64 > > >>> defined in crypto/ecc.lto.o > > >>> defined in crypto/ecc.o > > ld.lld: error: duplicate symbol: __mod_of__plldig_clk_id_device_table > > >>> defined in drivers/clk/clk-plldig.lto.o > > >>> defined in drivers/clk/clk-plldig.o > > A small update here: I see this behavior with every single module > build, including 'tinyconfig' with one module enabled, and 'defconfig'. The .o file here is a thin archive of the bitcode files for the module. We compile .lto.o from that before modpost, because we need an ELF binary to process, and then reuse the .lto.o file when linking the final module. At no point should we link the .o file again, especially not with .lto.o, because that would clearly cause every symbol to be duplicated, so I'm not sure what goes wrong here. Here's the relevant part of scripts/Makefile.modfinal: ifdef CONFIG_LTO_CLANG # With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to # avoid a second slow LTO link prelink-ext := .lto ... $(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE +$(call if_changed,ld_ko_o) Sami
On Tue, Dec 8, 2020 at 5:53 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > > A small update here: I see this behavior with every single module > > build, including 'tinyconfig' with one module enabled, and 'defconfig'. > > The .o file here is a thin archive of the bitcode files for the > module. We compile .lto.o from that before modpost, because we need an > ELF binary to process, and then reuse the .lto.o file when linking the > final module. > > At no point should we link the .o file again, especially not with > .lto.o, because that would clearly cause every symbol to be > duplicated, so I'm not sure what goes wrong here. Here's the relevant > part of scripts/Makefile.modfinal: > > ifdef CONFIG_LTO_CLANG > # With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to > # avoid a second slow LTO link > prelink-ext := .lto > ... > $(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE > +$(call if_changed,ld_ko_o) Ah, it's probably a local problem now, as I had a merge conflict against linux-next in this Makefile and I must have resolved the conflict incorrectly. Arnd
On Tue, Dec 8, 2020 at 5:43 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote: > > > > On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux > > <clang-built-linux@googlegroups.com> wrote: > > > > > > This patch series adds support for building the kernel with Clang's > > > Link Time Optimization (LTO). In addition to performance, the primary > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) > > > to be used in the kernel. Google has shipped millions of Pixel > > > devices running three major kernel versions with LTO+CFI since 2018. > > > > > > Most of the patches are build system changes for handling LLVM > > > bitcode, which Clang produces with LTO instead of ELF object files, > > > postponing ELF processing until a later stage, and ensuring initcall > > > ordering. > > > > > > Note that arm64 support depends on Will's memory ordering patches > > > [1]. I will post x86_64 patches separately after we have fixed the > > > remaining objtool warnings [2][3]. > > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto > > > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/ > > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux > > > > > > You can also pull this series from > > > > > > https://github.com/samitolvanen/linux.git lto-v8 > > > > I've tried pull this into my randconfig test tree to give it a spin. > > Great, thank you for testing this! > > > So far I have > > not managed to get a working build out of it, the main problem so far being > > that it is really slow to build because the link stage only uses one CPU. > > These are the other issues I've seen so far: > > You may want to limit your testing only to ThinLTO at first, because > full LTO is going to be extremely slow with larger configs, especially > when building arm64 kernels. Ok, that seems to solve most of the remaining problems after I fixed the module linking bug I introduced. > > - one build seems to take even longer to link. It's currently at 35GB RAM > > usage and 40 minutes into the final link, but I'm worried it might > > not complete > > before it runs out of memory. I only have 128GB installed, and google-chrome > > uses another 30GB of that, and I'm also doing some other builds in parallel. > > Is there a minimum recommended amount of memory for doing LTO builds? > > When building arm64 defconfig, the maximum memory usage I measured > with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured > larger configurations, but I believe LLD can easily consume 3-4x that > much with full LTO allyesconfig. Ok, that's not too bad then. Is there actually a reason to still support full-lto in your series? As I understand it, full LTO was the initial approach and used to work better, but thin LTO is actually what we want to use in the long run. Perhaps dropping the full LTO option from your series now that thin LTO works well enough and uses less resources would help avoid some of the problems. > > - One build failed with > > ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o > > -T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o > > init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a > > certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a > > security/built-in.a crypto/built-in.a block/built-in.a > > arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a > > sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive > > --start-group arch/arm64/lib/lib.a lib/lib.a > > ./drivers/firmware/efi/libstub/lib.a --end-group > > "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index" > > after about 30 minutes > > That's interesting. Did you use LLVM_IAS=1? I think I did, but it's possible that one of my build scripts didn't pass that along correctly. This one seems to be gone with thin LTO. > [...] > > Not sure if these are all known issues. If there is one you'd like me try > > take a closer look at for finding which config options break it, I can try > > No, none of these are known issues. I would be happy to take a closer > look if you can share configs that reproduce these. Attaching the config for "ld.lld: error: Never resolved function from blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" Arnd
On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote: > > Attaching the config for "ld.lld: error: Never resolved function from > blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" And here is a new one: "ld.lld: error: assignment to symbol init_pg_end does not converge" Arnd
On Tue, Dec 8, 2020 at 1:00 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 5:43 PM 'Sami Tolvanen' via Clang Built Linux > <clang-built-linux@googlegroups.com> wrote: > > > > On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > - one build seems to take even longer to link. It's currently at 35GB RAM > > > usage and 40 minutes into the final link, but I'm worried it might > > > not complete > > > before it runs out of memory. I only have 128GB installed, and google-chrome > > > uses another 30GB of that, and I'm also doing some other builds in parallel. > > > Is there a minimum recommended amount of memory for doing LTO builds? > > > > When building arm64 defconfig, the maximum memory usage I measured > > with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured > > larger configurations, but I believe LLD can easily consume 3-4x that > > much with full LTO allyesconfig. > > Ok, that's not too bad then. Is there actually a reason to still > support full-lto > in your series? As I understand it, full LTO was the initial approach and > used to work better, but thin LTO is actually what we want to use in the > long run. Perhaps dropping the full LTO option from your series now > that thin LTO works well enough and uses less resources would help > avoid some of the problems. While all developers agree that ThinLTO is a much more palatable experience than full LTO; our product teams prefer the excessive build time and memory high water mark (at build time) costs in exchange for slightly better performance than ThinLTO in <benchmarks that I've been told are important>. Keeping support for full LTO in tree would help our product teams reduce the amount of out of tree code they have. As long as <benchmarks that I've been told are important> help sell/differentiate phones, I suspect our product teams will continue to ship full LTO in production.
On Tue, Dec 8, 2020 at 10:10 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Tue, Dec 8, 2020 at 1:00 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > On Tue, Dec 8, 2020 at 5:43 PM 'Sami Tolvanen' via Clang Built Linux > > <clang-built-linux@googlegroups.com> wrote: > > > > > > On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > > > - one build seems to take even longer to link. It's currently at 35GB RAM > > > > usage and 40 minutes into the final link, but I'm worried it might > > > > not complete > > > > before it runs out of memory. I only have 128GB installed, and google-chrome > > > > uses another 30GB of that, and I'm also doing some other builds in parallel. > > > > Is there a minimum recommended amount of memory for doing LTO builds? > > > > > > When building arm64 defconfig, the maximum memory usage I measured > > > with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured > > > larger configurations, but I believe LLD can easily consume 3-4x that > > > much with full LTO allyesconfig. > > > > Ok, that's not too bad then. Is there actually a reason to still > > support full-lto > > in your series? As I understand it, full LTO was the initial approach and > > used to work better, but thin LTO is actually what we want to use in the > > long run. Perhaps dropping the full LTO option from your series now > > that thin LTO works well enough and uses less resources would help > > avoid some of the problems. > > While all developers agree that ThinLTO is a much more palatable > experience than full LTO; our product teams prefer the excessive build > time and memory high water mark (at build time) costs in exchange for > slightly better performance than ThinLTO in <benchmarks that I've been > told are important>. Keeping support for full LTO in tree would help > our product teams reduce the amount of out of tree code they have. As > long as <benchmarks that I've been told are important> help > sell/differentiate phones, I suspect our product teams will continue > to ship full LTO in production. Ok, fair enough. How about marking FULL_LTO as 'depends on !COMPILE_TEST' then? I'll do that locally for my randconfig tests, but it would help the other build bots that also force-enable COMPILE_TEST. Arnd
On 2020-12-08, 'Sami Tolvanen' via Clang Built Linux wrote: >On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote: >> >> On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux >> <clang-built-linux@googlegroups.com> wrote: >> > >> > This patch series adds support for building the kernel with Clang's >> > Link Time Optimization (LTO). In addition to performance, the primary >> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) >> > to be used in the kernel. Google has shipped millions of Pixel >> > devices running three major kernel versions with LTO+CFI since 2018. >> > >> > Most of the patches are build system changes for handling LLVM >> > bitcode, which Clang produces with LTO instead of ELF object files, >> > postponing ELF processing until a later stage, and ensuring initcall >> > ordering. >> > >> > Note that arm64 support depends on Will's memory ordering patches >> > [1]. I will post x86_64 patches separately after we have fixed the >> > remaining objtool warnings [2][3]. >> > >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto >> > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/ >> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux >> > >> > You can also pull this series from >> > >> > https://github.com/samitolvanen/linux.git lto-v8 >> >> I've tried pull this into my randconfig test tree to give it a spin. > >Great, thank you for testing this! > >> So far I have >> not managed to get a working build out of it, the main problem so far being >> that it is really slow to build because the link stage only uses one CPU. >> These are the other issues I've seen so far: ld.lld ThinLTO uses the number of (physical cores enabled by affinity) by default. >You may want to limit your testing only to ThinLTO at first, because >full LTO is going to be extremely slow with larger configs, especially >when building arm64 kernels. > >> - one build seems to take even longer to link. It's currently at 35GB RAM >> usage and 40 minutes into the final link, but I'm worried it might >> not complete >> before it runs out of memory. I only have 128GB installed, and google-chrome >> uses another 30GB of that, and I'm also doing some other builds in parallel. >> Is there a minimum recommended amount of memory for doing LTO builds? > >When building arm64 defconfig, the maximum memory usage I measured >with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured >larger configurations, but I believe LLD can easily consume 3-4x that >much with full LTO allyesconfig. > >> - One build failed with >> ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o >> -T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o >> init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a >> certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a >> security/built-in.a crypto/built-in.a block/built-in.a >> arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a >> sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive >> --start-group arch/arm64/lib/lib.a lib/lib.a >> ./drivers/firmware/efi/libstub/lib.a --end-group >> "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index" >> after about 30 minutes > >That's interesting. Did you use LLVM_IAS=1? May be worth checking which relocation or (SHT_GROUP section's sh_info) in arch/arm64/kernel/head.o is incorrect. >> - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO >> doesn't work with ld.bfd. >> I've added a CPU_LITTLE_ENDIAN dependency to >> ARCH_SUPPORTS_LTO_CLANG{,THIN} > >Ah, good point. I'll fix this in v9. Full/Thin LTO should work with GNU ld and gold with LLVMgold.so built from llvm-project (https://llvm.org/docs/GoldPlugin.html ). You'll need to make sure that LLVMgold.so is newer than clang. (Newer clang may introduce bitcode attributes which are unrecognizable by older LLVMgold.so/ld.lld) >[...] >> Not sure if these are all known issues. If there is one you'd like me try >> take a closer look at for finding which config options break it, I can try > >No, none of these are known issues. I would be happy to take a closer >look if you can share configs that reproduce these. > >Sami > >-- >You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. >To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com. >To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CABCJKueCHo2RYfx_A21m%2B%3Dd1gQLR9QsOOxCsHFeicCqyHkb-Kg%40mail.gmail.com.
On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > Attaching the config for "ld.lld: error: Never resolved function from > > blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" > > And here is a new one: "ld.lld: error: assignment to symbol > init_pg_end does not converge" > > Arnd > This is interesting. I changed the symbol assignment to a separate loop in https://reviews.llvm.org/D66279 Does raising the limit help? Sometimes the kernel linker script can be rewritten to be more friendly to the linker...
On Wed, Dec 9, 2020 at 6:23 AM 'Fāng-ruì Sòng' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > Attaching the config for "ld.lld: error: Never resolved function from > > > blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" > > > > And here is a new one: "ld.lld: error: assignment to symbol > > init_pg_end does not converge" > > This is interesting. I changed the symbol assignment to a separate > loop in https://reviews.llvm.org/D66279 > Does raising the limit help? Sometimes the kernel linker script can be > rewritten to be more friendly to the linker... If that requires rebuilding lld, testing it is beyond what I can help with right now. Hopefully someone can reproduce it with my .config. Arnd
On Wed, Dec 9, 2020 at 5:56 AM 'Fangrui Song' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > On 2020-12-08, 'Sami Tolvanen' via Clang Built Linux wrote: > >On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote: > >> So far I have > >> not managed to get a working build out of it, the main problem so far being > >> that it is really slow to build because the link stage only uses one CPU. > >> These are the other issues I've seen so far: > > ld.lld ThinLTO uses the number of (physical cores enabled by affinity) by default. Ah, I see. Do you know if it's also possible to do something like -flto=jobserver to integrate better with the kernel build system? I tend to run multiple builds under a top-level makefile with 'make -j30' in order to use 30 of the 32 threads and leave the scheduling to jobserver instead of the kernel. If the linker itself is multithreaded but the jobserver thinks it is a single thread, could end up with 30 concurrent linkers each trying to use 16 cores. > >> - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO > >> doesn't work with ld.bfd. > >> I've added a CPU_LITTLE_ENDIAN dependency to > >> ARCH_SUPPORTS_LTO_CLANG{,THIN} > > > >Ah, good point. I'll fix this in v9. > > Full/Thin LTO should work with GNU ld and gold with LLVMgold.so built from > llvm-project (https://llvm.org/docs/GoldPlugin.html ). You'll need to make sure > that LLVMgold.so is newer than clang. (Newer clang may introduce bitcode > attributes which are unrecognizable by older LLVMgold.so/ld.lld) The current patch series requires LLD: config HAS_LTO_CLANG def_bool y depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD Is this something we should change then, or try to keep it simple with the current approach, leaving LTO disabled for big-endian builds and hosts without a working lld? Arnd
On Tue, Dec 8, 2020 at 10:02 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > Attaching the config for "ld.lld: error: Never resolved function from > > blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" After rerunning this one with thinlto, it disappeared. > And here is a new one: "ld.lld: error: assignment to symbol > init_pg_end does not converge" this one is still there. and I reproduced another one with thinlto now: mm/highmem.o: no symbols lib/nmi_backtrace.o: no symbols lib/bitrev.o: no symbols mm/highmem.o: no symbols lib/nmi_backtrace.o: no symbols lib/bitrev.o: no symbols mm/highmem.o: no symbols lib/nmi_backtrace.o: no symbols lib/bitrev.o: no symbols ERROR: modpost: "memset" [drivers/most/most_cdev.ko] undefined! ERROR: modpost: "__stack_chk_guard" [drivers/most/most_cdev.ko] undefined! ERROR: modpost: "__stack_chk_fail" [drivers/most/most_cdev.ko] undefined! ERROR: modpost: "memset" [drivers/most/most_usb.ko] undefined! ERROR: modpost: "memmove" [drivers/most/most_usb.ko] undefined! ERROR: modpost: "__stack_chk_guard" [drivers/most/most_usb.ko] undefined! ERROR: modpost: "__stack_chk_fail" [drivers/most/most_usb.ko] undefined! ERROR: modpost: "__stack_chk_guard" [drivers/most/most_core.ko] undefined! ERROR: modpost: "__stack_chk_fail" [drivers/most/most_core.ko] undefined! ERROR: modpost: "memset" [drivers/ntb/ntb_transport.ko] undefined! ERROR: modpost: "memcpy" [drivers/ntb/ntb_transport.ko] undefined! ERROR: modpost: "__stack_chk_guard" [drivers/ntb/ntb_transport.ko] undefined! ERROR: modpost: "__stack_chk_fail" [drivers/ntb/ntb_transport.ko] undefined! ERROR: modpost: "__stack_chk_guard" [drivers/ntb/test/ntb_perf.ko] undefined! ERROR: modpost: "__stack_chk_fail" [drivers/ntb/test/ntb_perf.ko] undefined! ... Arnd
On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > - one build seems to have dropped all symbols the string operations > from vmlinux, > so while the link goes through, modules cannot be loaded: > ERROR: modpost: "memmove" [drivers/media/rc/rc-core.ko] undefined! > ERROR: modpost: "memcpy" [net/wireless/cfg80211.ko] undefined! > ERROR: modpost: "memcpy" [net/8021q/8021q.ko] undefined! > ERROR: modpost: "memset" [net/8021q/8021q.ko] undefined! > ERROR: modpost: "memcpy" [net/unix/unix.ko] undefined! > ERROR: modpost: "memset" [net/sched/cls_u32.ko] undefined! > ERROR: modpost: "memcpy" [net/sched/cls_u32.ko] undefined! > ERROR: modpost: "memset" [net/sched/sch_skbprio.ko] undefined! > ERROR: modpost: "memcpy" [net/802/garp.ko] undefined! > I first thought this was related to a clang-12 bug I saw the other day, but > this also happens with clang-11 It seems to happen because of CONFIG_TRIM_UNUSED_KSYMS, which is a shame, since I think that is an option we'd always want to have enabled with LTO, to allow more dead code to be eliminated. Arnd
On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > Attaching the config for "ld.lld: error: Never resolved function from > > blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" > > And here is a new one: "ld.lld: error: assignment to symbol > init_pg_end does not converge" Thanks for these. I can reproduce the "Never resolved function from blockaddress" issue with full LTO, but I couldn't reproduce this one with ToT Clang, and the config doesn't have LTO enabled: $ grep LTO 0x2824F594_defconfig CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y Is this the correct config file? Sami
On Tue, Dec 8, 2020 at 2:20 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 10:10 PM 'Nick Desaulniers' via Clang Built > Linux <clang-built-linux@googlegroups.com> wrote: > > > > On Tue, Dec 8, 2020 at 1:00 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > On Tue, Dec 8, 2020 at 5:43 PM 'Sami Tolvanen' via Clang Built Linux > > > <clang-built-linux@googlegroups.com> wrote: > > > > > > > > On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > > > > > - one build seems to take even longer to link. It's currently at 35GB RAM > > > > > usage and 40 minutes into the final link, but I'm worried it might > > > > > not complete > > > > > before it runs out of memory. I only have 128GB installed, and google-chrome > > > > > uses another 30GB of that, and I'm also doing some other builds in parallel. > > > > > Is there a minimum recommended amount of memory for doing LTO builds? > > > > > > > > When building arm64 defconfig, the maximum memory usage I measured > > > > with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured > > > > larger configurations, but I believe LLD can easily consume 3-4x that > > > > much with full LTO allyesconfig. > > > > > > Ok, that's not too bad then. Is there actually a reason to still > > > support full-lto > > > in your series? As I understand it, full LTO was the initial approach and > > > used to work better, but thin LTO is actually what we want to use in the > > > long run. Perhaps dropping the full LTO option from your series now > > > that thin LTO works well enough and uses less resources would help > > > avoid some of the problems. > > > > While all developers agree that ThinLTO is a much more palatable > > experience than full LTO; our product teams prefer the excessive build > > time and memory high water mark (at build time) costs in exchange for > > slightly better performance than ThinLTO in <benchmarks that I've been > > told are important>. Keeping support for full LTO in tree would help > > our product teams reduce the amount of out of tree code they have. As > > long as <benchmarks that I've been told are important> help > > sell/differentiate phones, I suspect our product teams will continue > > to ship full LTO in production. > > Ok, fair enough. How about marking FULL_LTO as 'depends on > !COMPILE_TEST' then? I'll do that locally for my randconfig tests, > but it would help the other build bots that also force-enable > COMPILE_TEST. Sure, that sounds reasonable to me. I'll add it in v9. Sami
On Wed, Dec 9, 2020 at 4:36 AM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > - one build seems to have dropped all symbols the string operations > > from vmlinux, > > so while the link goes through, modules cannot be loaded: > > ERROR: modpost: "memmove" [drivers/media/rc/rc-core.ko] undefined! > > ERROR: modpost: "memcpy" [net/wireless/cfg80211.ko] undefined! > > ERROR: modpost: "memcpy" [net/8021q/8021q.ko] undefined! > > ERROR: modpost: "memset" [net/8021q/8021q.ko] undefined! > > ERROR: modpost: "memcpy" [net/unix/unix.ko] undefined! > > ERROR: modpost: "memset" [net/sched/cls_u32.ko] undefined! > > ERROR: modpost: "memcpy" [net/sched/cls_u32.ko] undefined! > > ERROR: modpost: "memset" [net/sched/sch_skbprio.ko] undefined! > > ERROR: modpost: "memcpy" [net/802/garp.ko] undefined! > > I first thought this was related to a clang-12 bug I saw the other day, but > > this also happens with clang-11 > > It seems to happen because of CONFIG_TRIM_UNUSED_KSYMS, > which is a shame, since I think that is an option we'd always want to > have enabled with LTO, to allow more dead code to be eliminated. Ah yes, this is a known issue. We use TRIM_UNUSED_KSYMS with LTO in Android's Generic Kernel Image and the problem is that bitcode doesn't yet contain calls to these functions, so autoksyms won't see them. The solution is to use a symbol whitelist with LTO to prevent these from being trimmed. I suspect we would need a default whitelist for LTO builds. Sami
On Wed, Dec 9, 2020 at 5:25 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Wed, Dec 9, 2020 at 4:36 AM Arnd Bergmann <arnd@kernel.org> wrote: > > > > On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > It seems to happen because of CONFIG_TRIM_UNUSED_KSYMS, > > which is a shame, since I think that is an option we'd always want to > > have enabled with LTO, to allow more dead code to be eliminated. > > Ah yes, this is a known issue. We use TRIM_UNUSED_KSYMS with LTO in > Android's Generic Kernel Image and the problem is that bitcode doesn't > yet contain calls to these functions, so autoksyms won't see them. The > solution is to use a symbol whitelist with LTO to prevent these from > being trimmed. I suspect we would need a default whitelist for LTO > builds. A built-in allowlist sounds good to me. FWIW, in the randconfigs so far, I only saw five symbols that would need to be on it: memcpy(), memmove(), memset(), __stack_chk_fail() and __stack_chk_guard Arnd
On Wed, Dec 9, 2020 at 5:09 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > Attaching the config for "ld.lld: error: Never resolved function from > > > blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')" > > > > And here is a new one: "ld.lld: error: assignment to symbol > > init_pg_end does not converge" > > Thanks for these. I can reproduce the "Never resolved function from > blockaddress" issue with full LTO, but I couldn't reproduce this one > with ToT Clang, and the config doesn't have LTO enabled: > > $ grep LTO 0x2824F594_defconfig > CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y > > Is this the correct config file? It is the right file, and so far this is the only defconfig on which I see the "does not converge" error, so I don't have any other one. I suspect this might be an issue in the version of lld that I have here and unrelated to LTO, and I can confirm that I see the error with LTO still disabled. It seems to be completely random. I do see the bug on next-20201203 but not on a later one. I also tried bisecting through linux-next and arrived at "lib: stackdepot: add support to configure STACK_HASH_SIZE", which is almost certainly not related, other than just changing a few symbols around. Arnd