Message ID | 20230523165502.2592-1-jszhang@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION | expand |
On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > When trying to run linux with various opensource riscv core on > resource limited FPGA platforms, for example, those FPGAs with less > than 16MB SDRAM, I want to save mem as much as possible. One of the > major technologies is kernel size optimizations, I found that riscv > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which > passes -fdata-sections, -ffunction-sections to CFLAGS and passes the > --gc-sections flag to the linker. > > This not only benefits my case on FPGA but also benefits defconfigs. > Here are some notable improvements from enabling this with defconfigs: > > nommu_k210_defconfig: > text data bss dec hex > 1112009 410288 59837 1582134 182436 before > 962838 376656 51285 1390779 1538bb after > > rv32_defconfig: > text data bss dec hex > 8804455 2816544 290577 11911576 b5c198 before > 8692295 2779872 288977 11761144 b375f8 after > > defconfig: > text data bss dec hex > 9438267 3391332 485333 13314932 cb2b74 before > 9285914 3350052 483349 13119315 c82f53 after > > patch1 and patch2 are clean ups. > patch3 fixes a typo. > patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. > > NOTE: Zhangjin Wu firstly sent out a patch to enable dead code > elimination for riscv several months ago, I didn't notice it until > yesterday. Although it missed some preparations and some sections's > keeping, he is the first person to enable this feature for riscv. To > ease merging, this series take his patch into my entire series and > makes patch4 authored by him after getting his ack to reflect > the above fact. > > Since v1: > - collect Reviewed-by, Tested-by tag > - Make patch4 authored by Zhangjin Wu, add my co-developed-by tag > > Jisheng Zhang (3): > riscv: move options to keep entries sorted > riscv: vmlinux-xip.lds.S: remove .alternative section > vmlinux.lds.h: use correct .init.data.* section name > > Zhangjin Wu (1): > riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION > > arch/riscv/Kconfig | 13 +- > arch/riscv/kernel/vmlinux-xip.lds.S | 6 - > arch/riscv/kernel/vmlinux.lds.S | 6 +- > include/asm-generic/vmlinux.lds.h | 2 +- > 4 files changed, 11 insertions(+), 16 deletions(-) Do you have a base commit for this? It's not applying to 6.4-rc1 and the patchwork bot couldn't find one either.
On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: > On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > > When trying to run linux with various opensource riscv core on > > resource limited FPGA platforms, for example, those FPGAs with less > > than 16MB SDRAM, I want to save mem as much as possible. One of the > > major technologies is kernel size optimizations, I found that riscv > > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which > > passes -fdata-sections, -ffunction-sections to CFLAGS and passes the > > --gc-sections flag to the linker. > > > > This not only benefits my case on FPGA but also benefits defconfigs. > > Here are some notable improvements from enabling this with defconfigs: > > > > nommu_k210_defconfig: > > text data bss dec hex > > 1112009 410288 59837 1582134 182436 before > > 962838 376656 51285 1390779 1538bb after > > > > rv32_defconfig: > > text data bss dec hex > > 8804455 2816544 290577 11911576 b5c198 before > > 8692295 2779872 288977 11761144 b375f8 after > > > > defconfig: > > text data bss dec hex > > 9438267 3391332 485333 13314932 cb2b74 before > > 9285914 3350052 483349 13119315 c82f53 after > > > > patch1 and patch2 are clean ups. > > patch3 fixes a typo. > > patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. > > > > NOTE: Zhangjin Wu firstly sent out a patch to enable dead code > > elimination for riscv several months ago, I didn't notice it until > > yesterday. Although it missed some preparations and some sections's > > keeping, he is the first person to enable this feature for riscv. To > > ease merging, this series take his patch into my entire series and > > makes patch4 authored by him after getting his ack to reflect > > the above fact. > > > > Since v1: > > - collect Reviewed-by, Tested-by tag > > - Make patch4 authored by Zhangjin Wu, add my co-developed-by tag > > > > Jisheng Zhang (3): > > riscv: move options to keep entries sorted > > riscv: vmlinux-xip.lds.S: remove .alternative section > > vmlinux.lds.h: use correct .init.data.* section name > > > > Zhangjin Wu (1): > > riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION > > > > arch/riscv/Kconfig | 13 +- > > arch/riscv/kernel/vmlinux-xip.lds.S | 6 - > > arch/riscv/kernel/vmlinux.lds.S | 6 +- > > include/asm-generic/vmlinux.lds.h | 2 +- > > 4 files changed, 11 insertions(+), 16 deletions(-) > > Do you have a base commit for this? It's not applying to 6.4-rc1 and the > patchwork bot couldn't find one either. Hi Palmer, Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this series is based on 6.4-rc2. Thanks
On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: > > On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: >> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: >> > When trying to run linux with various opensource riscv core on >> > resource limited FPGA platforms, for example, those FPGAs with less >> > than 16MB SDRAM, I want to save mem as much as possible. One of the >> > major technologies is kernel size optimizations, I found that riscv >> > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which >> > passes -fdata-sections, -ffunction-sections to CFLAGS and passes the >> > --gc-sections flag to the linker. >> > >> > This not only benefits my case on FPGA but also benefits defconfigs. >> > Here are some notable improvements from enabling this with defconfigs: >> > >> > nommu_k210_defconfig: >> > text data bss dec hex >> > 1112009 410288 59837 1582134 182436 before >> > 962838 376656 51285 1390779 1538bb after >> > >> > rv32_defconfig: >> > text data bss dec hex >> > 8804455 2816544 290577 11911576 b5c198 before >> > 8692295 2779872 288977 11761144 b375f8 after >> > >> > defconfig: >> > text data bss dec hex >> > 9438267 3391332 485333 13314932 cb2b74 before >> > 9285914 3350052 483349 13119315 c82f53 after >> > >> > patch1 and patch2 are clean ups. >> > patch3 fixes a typo. >> > patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. >> > >> > NOTE: Zhangjin Wu firstly sent out a patch to enable dead code >> > elimination for riscv several months ago, I didn't notice it until >> > yesterday. Although it missed some preparations and some sections's >> > keeping, he is the first person to enable this feature for riscv. To >> > ease merging, this series take his patch into my entire series and >> > makes patch4 authored by him after getting his ack to reflect >> > the above fact. >> > >> > Since v1: >> > - collect Reviewed-by, Tested-by tag >> > - Make patch4 authored by Zhangjin Wu, add my co-developed-by tag >> > >> > Jisheng Zhang (3): >> > riscv: move options to keep entries sorted >> > riscv: vmlinux-xip.lds.S: remove .alternative section >> > vmlinux.lds.h: use correct .init.data.* section name >> > >> > Zhangjin Wu (1): >> > riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION >> > >> > arch/riscv/Kconfig | 13 +- >> > arch/riscv/kernel/vmlinux-xip.lds.S | 6 - >> > arch/riscv/kernel/vmlinux.lds.S | 6 +- >> > include/asm-generic/vmlinux.lds.h | 2 +- >> > 4 files changed, 11 insertions(+), 16 deletions(-) >> >> Do you have a base commit for this? It's not applying to 6.4-rc1 and the >> patchwork bot couldn't find one either. > > Hi Palmer, > > Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by > kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this > series is based on 6.4-rc2. Thanks. > > Thanks
On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: >> >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: >>> > When trying to run linux with various opensource riscv core on >>> > resource limited FPGA platforms, for example, those FPGAs with less >>> > than 16MB SDRAM, I want to save mem as much as possible. One of the >>> > major technologies is kernel size optimizations, I found that riscv >>> > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which >>> > passes -fdata-sections, -ffunction-sections to CFLAGS and passes the >>> > --gc-sections flag to the linker. >>> > >>> > This not only benefits my case on FPGA but also benefits defconfigs. >>> > Here are some notable improvements from enabling this with defconfigs: >>> > >>> > nommu_k210_defconfig: >>> > text data bss dec hex >>> > 1112009 410288 59837 1582134 182436 before >>> > 962838 376656 51285 1390779 1538bb after >>> > >>> > rv32_defconfig: >>> > text data bss dec hex >>> > 8804455 2816544 290577 11911576 b5c198 before >>> > 8692295 2779872 288977 11761144 b375f8 after >>> > >>> > defconfig: >>> > text data bss dec hex >>> > 9438267 3391332 485333 13314932 cb2b74 before >>> > 9285914 3350052 483349 13119315 c82f53 after >>> > >>> > patch1 and patch2 are clean ups. >>> > patch3 fixes a typo. >>> > patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. >>> > >>> > NOTE: Zhangjin Wu firstly sent out a patch to enable dead code >>> > elimination for riscv several months ago, I didn't notice it until >>> > yesterday. Although it missed some preparations and some sections's >>> > keeping, he is the first person to enable this feature for riscv. To >>> > ease merging, this series take his patch into my entire series and >>> > makes patch4 authored by him after getting his ack to reflect >>> > the above fact. >>> > >>> > Since v1: >>> > - collect Reviewed-by, Tested-by tag >>> > - Make patch4 authored by Zhangjin Wu, add my co-developed-by tag >>> > >>> > Jisheng Zhang (3): >>> > riscv: move options to keep entries sorted >>> > riscv: vmlinux-xip.lds.S: remove .alternative section >>> > vmlinux.lds.h: use correct .init.data.* section name >>> > >>> > Zhangjin Wu (1): >>> > riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION >>> > >>> > arch/riscv/Kconfig | 13 +- >>> > arch/riscv/kernel/vmlinux-xip.lds.S | 6 - >>> > arch/riscv/kernel/vmlinux.lds.S | 6 +- >>> > include/asm-generic/vmlinux.lds.h | 2 +- >>> > 4 files changed, 11 insertions(+), 16 deletions(-) >>> >>> Do you have a base commit for this? It's not applying to 6.4-rc1 and the >>> patchwork bot couldn't find one either. >> >> Hi Palmer, >> >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this >> series is based on 6.4-rc2. > > Thanks. Sorry to be so slow here, but I think this is causing LLD to hang on allmodconfig. I'm still getting to the bottom of it, there's a few other things I have in flight still. > >> >> Thanks
On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: > >> > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > >>> > When trying to run linux with various opensource riscv core on > >>> > resource limited FPGA platforms, for example, those FPGAs with less > >>> > than 16MB SDRAM, I want to save mem as much as possible. One of the > >>> > major technologies is kernel size optimizations, I found that riscv > >>> > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which > >>> > passes -fdata-sections, -ffunction-sections to CFLAGS and passes the > >>> > --gc-sections flag to the linker. > >>> > > >>> > This not only benefits my case on FPGA but also benefits defconfigs. > >>> > Here are some notable improvements from enabling this with defconfigs: > >>> > > >>> > nommu_k210_defconfig: > >>> > text data bss dec hex > >>> > 1112009 410288 59837 1582134 182436 before > >>> > 962838 376656 51285 1390779 1538bb after > >>> > > >>> > rv32_defconfig: > >>> > text data bss dec hex > >>> > 8804455 2816544 290577 11911576 b5c198 before > >>> > 8692295 2779872 288977 11761144 b375f8 after > >>> > > >>> > defconfig: > >>> > text data bss dec hex > >>> > 9438267 3391332 485333 13314932 cb2b74 before > >>> > 9285914 3350052 483349 13119315 c82f53 after > >>> > > >>> > patch1 and patch2 are clean ups. > >>> > patch3 fixes a typo. > >>> > patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. > >>> > > >>> > NOTE: Zhangjin Wu firstly sent out a patch to enable dead code > >>> > elimination for riscv several months ago, I didn't notice it until > >>> > yesterday. Although it missed some preparations and some sections's > >>> > keeping, he is the first person to enable this feature for riscv. To > >>> > ease merging, this series take his patch into my entire series and > >>> > makes patch4 authored by him after getting his ack to reflect > >>> > the above fact. > >>> > > >>> > Since v1: > >>> > - collect Reviewed-by, Tested-by tag > >>> > - Make patch4 authored by Zhangjin Wu, add my co-developed-by tag > >>> > > >>> > Jisheng Zhang (3): > >>> > riscv: move options to keep entries sorted > >>> > riscv: vmlinux-xip.lds.S: remove .alternative section > >>> > vmlinux.lds.h: use correct .init.data.* section name > >>> > > >>> > Zhangjin Wu (1): > >>> > riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION > >>> > > >>> > arch/riscv/Kconfig | 13 +- > >>> > arch/riscv/kernel/vmlinux-xip.lds.S | 6 - > >>> > arch/riscv/kernel/vmlinux.lds.S | 6 +- > >>> > include/asm-generic/vmlinux.lds.h | 2 +- > >>> > 4 files changed, 11 insertions(+), 16 deletions(-) > >>> > >>> Do you have a base commit for this? It's not applying to 6.4-rc1 and the > >>> patchwork bot couldn't find one either. > >> > >> Hi Palmer, > >> > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this > >> series is based on 6.4-rc2. > > > > Thanks. > > Sorry to be so slow here, but I think this is causing LLD to hang on > allmodconfig. I'm still getting to the bottom of it, there's a few > other things I have in flight still. Confirmed with v3 on mainline (linux-next is pretty red at the moment). https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ I was able to dump a backtrace of all of LLD's threads and all threads seemed parked in a futex wait except for one thread with a more interesting trace. 0x0000555557ea01ce in lld::elf::LinkerScript::addOrphanSections()::$_0::operator()(lld::elf::InputSectionBase*) const () (gdb) bt #0 0x0000555557ea01ce in lld::elf::LinkerScript::addOrphanSections()::$_0::operator()(lld::elf::InputSectionBase*) const () #1 0x0000555557e9fc3f in lld::elf::LinkerScript::addOrphanSections() () #2 0x0000555557dd0ca1 in lld::elf::LinkerDriver::link(llvm::opt::InputArgList&) () #3 0x0000555557dc19a8 in lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) () #4 0x0000555557dbfff9 in lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) () #5 0x0000555557c3ffcf in lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) () #6 0x0000555557c3f7aa in lld_main(int, char**, llvm::ToolContext const&) () #7 0x0000555557c41ee1 in main () Makes me wonder if there's some kind of loop adding orphan sections that aren't referenced, so they're cleaned up. Though I don't think it's a hang; IIRC dead code elimination adds a measurable amount of time to the build. As code is unreferenced and removed, I think the linker is reshuffling layout and thus recomputing relocations. Though triple checking mainline without this patch vs mainline with this patch, twice now I just got an error from LLD (in 2 minutes on my system): ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(efi-stub-entry.stub.o):(.init.bss.screen_info_offset) is being placed in '.init.bss.screen_info_offset' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.data.efi_nokaslr) is being placed in '.init.data.efi_nokaslr' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss.efi_noinitrd) is being placed in '.init.bss.efi_noinitrd' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss.efi_nochunk) is being placed in '.init.bss.efi_nochunk' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss.efi_novamap) is being placed in '.init.bss.efi_novamap' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss.efi_disable_pci_dma) is being placed in '.init.bss.efi_disable_pci_dma' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(file.stub.o):(.init.bss.efi_open_device_path.text_to_dp) is being placed in '.init.bss.efi_open_device_path.text_to_dp' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(gop.stub.o):(.init.bss.cmdline.0) is being placed in '.init.bss.cmdline.0' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(gop.stub.o):(.init.bss.cmdline.1) is being placed in '.init.bss.cmdline.1' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(gop.stub.o):(.init.bss.cmdline.2) is being placed in '.init.bss.cmdline.2' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(gop.stub.o):(.init.bss.cmdline.3) is being placed in '.init.bss.cmdline.3' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(gop.stub.o):(.init.bss.cmdline.4) is being placed in '.init.bss.cmdline.4' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(printk.stub.o):(.init.data.efi_loglevel) is being placed in '.init.data.efi_loglevel' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(riscv.stub.o):(.init.bss.hartid) is being placed in '.init.bss.hartid' ld.lld: error: ./drivers/firmware/efi/libstub/lib.a(systable.stub.o):(.init.bss.efi_system_table) is being placed in '.init.bss.efi_system_table' is it perhaps that these sections need placement in the linker script? This is from the orphan section warn linker command line flag. Does the EFI stub have one linker script, or one per arch? (Or am I mistaken and the EFI stub is part of vmlinux)? > > > > >> > >> Thanks >
On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this > > >> series is based on 6.4-rc2. > > > > > > Thanks. > > > > Sorry to be so slow here, but I think this is causing LLD to hang on > > allmodconfig. I'm still getting to the bottom of it, there's a few > > other things I have in flight still. > > Confirmed with v3 on mainline (linux-next is pretty red at the moment). > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ Just FYI Nick, there's been some concurrent work here from different people working on the same thing & the v3 you linked (from Zhangjin) was superseded by this v2 (from Jisheng). Cheers, Conor.
On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: > > On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: > > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: > > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: > > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: > > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > > > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by > > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this > > > >> series is based on 6.4-rc2. > > > > > > > > Thanks. > > > > > > Sorry to be so slow here, but I think this is causing LLD to hang on > > > allmodconfig. I'm still getting to the bottom of it, there's a few > > > other things I have in flight still. > > > > Confirmed with v3 on mainline (linux-next is pretty red at the moment). > > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ > > Just FYI Nick, there's been some concurrent work here from different > people working on the same thing & the v3 you linked (from Zhangjin) was > superseded by this v2 (from Jisheng). Ah! I've been testing the deprecated patch set, sorry I just looked on lore for "dead code" on riscv-linux and grabbed the first thread, without noticing the difference in authors or new version numbers for distinct series. ok, nevermind my noise. I'll follow up with the correct patch set, sorry! > > Cheers, > Conor.
On Tue, 20 Jun 2023 13:32:32 PDT (-0700), ndesaulniers@google.com wrote: > On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: >> >> On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: >> > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: >> > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: >> > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: >> > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: >> >> > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by >> > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this >> > > >> series is based on 6.4-rc2. >> > > > >> > > > Thanks. >> > > >> > > Sorry to be so slow here, but I think this is causing LLD to hang on >> > > allmodconfig. I'm still getting to the bottom of it, there's a few >> > > other things I have in flight still. >> > >> > Confirmed with v3 on mainline (linux-next is pretty red at the moment). >> > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ >> >> Just FYI Nick, there's been some concurrent work here from different >> people working on the same thing & the v3 you linked (from Zhangjin) was >> superseded by this v2 (from Jisheng). > > Ah! I've been testing the deprecated patch set, sorry I just looked on > lore for "dead code" on riscv-linux and grabbed the first thread, > without noticing the difference in authors or new version numbers for > distinct series. ok, nevermind my noise. I'll follow up with the > correct patch set, sorry! Ya, I hadn't even noticed the v3 because I pretty much only look at patchwork these days. Like we talked about in IRC, I'm going to go test the merge of this one and see what's up -- I've got it staged at <https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=1bd2963b21758a773206a1cb67c93e7a8ae8a195>, though that won't be a stable hash if it's actually broken... > >> >> Cheers, >> Conor. > > > > -- > Thanks, > ~Nick Desaulniers
On Tue, Jun 20, 2023 at 4:41 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Tue, 20 Jun 2023 13:32:32 PDT (-0700), ndesaulniers@google.com wrote: > > On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: > >> > >> On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: > >> > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > >> > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: > >> > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: > >> > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: > >> > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > >> > >> > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by > >> > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this > >> > > >> series is based on 6.4-rc2. > >> > > > > >> > > > Thanks. > >> > > > >> > > Sorry to be so slow here, but I think this is causing LLD to hang on > >> > > allmodconfig. I'm still getting to the bottom of it, there's a few > >> > > other things I have in flight still. > >> > > >> > Confirmed with v3 on mainline (linux-next is pretty red at the moment). > >> > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ > >> > >> Just FYI Nick, there's been some concurrent work here from different > >> people working on the same thing & the v3 you linked (from Zhangjin) was > >> superseded by this v2 (from Jisheng). > > > > Ah! I've been testing the deprecated patch set, sorry I just looked on > > lore for "dead code" on riscv-linux and grabbed the first thread, > > without noticing the difference in authors or new version numbers for > > distinct series. ok, nevermind my noise. I'll follow up with the > > correct patch set, sorry! > > Ya, I hadn't even noticed the v3 because I pretty much only look at > patchwork these days. Like we talked about in IRC, I'm going to go test > the merge of this one and see what's up -- I've got it staged at > <https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=1bd2963b21758a773206a1cb67c93e7a8ae8a195>, > though that won't be a stable hash if it's actually broken... Ok, https://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/ built for me. If you're seeing a hang, please let me know what version of LLD you're using and I'll build that tag from source to see if I can reproduce, then bisect if so. $ ARCH=riscv LLVM=1 /usr/bin/time -v make -j128 allmodconfig vmlinux ... Elapsed (wall clock) time (h:mm:ss or m:ss): 2:35.68 ... Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build > > > > >> > >> Cheers, > >> Conor. > > > > > > > > -- > > Thanks, > > ~Nick Desaulniers
On Tue, 20 Jun 2023 13:47:07 PDT (-0700), ndesaulniers@google.com wrote: > On Tue, Jun 20, 2023 at 4:41 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> >> On Tue, 20 Jun 2023 13:32:32 PDT (-0700), ndesaulniers@google.com wrote: >> > On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: >> >> >> >> On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: >> >> > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> >> > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: >> >> > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: >> >> > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: >> >> > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: >> >> >> >> > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by >> >> > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this >> >> > > >> series is based on 6.4-rc2. >> >> > > > >> >> > > > Thanks. >> >> > > >> >> > > Sorry to be so slow here, but I think this is causing LLD to hang on >> >> > > allmodconfig. I'm still getting to the bottom of it, there's a few >> >> > > other things I have in flight still. >> >> > >> >> > Confirmed with v3 on mainline (linux-next is pretty red at the moment). >> >> > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ >> >> >> >> Just FYI Nick, there's been some concurrent work here from different >> >> people working on the same thing & the v3 you linked (from Zhangjin) was >> >> superseded by this v2 (from Jisheng). >> > >> > Ah! I've been testing the deprecated patch set, sorry I just looked on >> > lore for "dead code" on riscv-linux and grabbed the first thread, >> > without noticing the difference in authors or new version numbers for >> > distinct series. ok, nevermind my noise. I'll follow up with the >> > correct patch set, sorry! >> >> Ya, I hadn't even noticed the v3 because I pretty much only look at >> patchwork these days. Like we talked about in IRC, I'm going to go test >> the merge of this one and see what's up -- I've got it staged at >> <https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=1bd2963b21758a773206a1cb67c93e7a8ae8a195>, >> though that won't be a stable hash if it's actually broken... > > Ok, https://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/ > built for me. If you're seeing a hang, please let me know what > version of LLD you're using and I'll build that tag from source to see > if I can reproduce, then bisect if so. > > $ ARCH=riscv LLVM=1 /usr/bin/time -v make -j128 allmodconfig vmlinux > ... > Elapsed (wall clock) time (h:mm:ss or m:ss): 2:35.68 > ... > > Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build OK, it triggered enough of a rebuild that it might take a bit for anything to filter out. Thanks! > >> >> > >> >> >> >> Cheers, >> >> Conor. >> > >> > >> > >> > -- >> > Thanks, >> > ~Nick Desaulniers > > > > -- > Thanks, > ~Nick Desaulniers
On Tue, 20 Jun 2023 14:08:33 PDT (-0700), Palmer Dabbelt wrote: > On Tue, 20 Jun 2023 13:47:07 PDT (-0700), ndesaulniers@google.com wrote: >> On Tue, Jun 20, 2023 at 4:41 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >>> >>> On Tue, 20 Jun 2023 13:32:32 PDT (-0700), ndesaulniers@google.com wrote: >>> > On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: >>> >> >>> >> On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: >>> >> > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >>> >> > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: >>> >> > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: >>> >> > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: >>> >> > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: >>> >> >>> >> > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by >>> >> > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this >>> >> > > >> series is based on 6.4-rc2. >>> >> > > > >>> >> > > > Thanks. >>> >> > > >>> >> > > Sorry to be so slow here, but I think this is causing LLD to hang on >>> >> > > allmodconfig. I'm still getting to the bottom of it, there's a few >>> >> > > other things I have in flight still. >>> >> > >>> >> > Confirmed with v3 on mainline (linux-next is pretty red at the moment). >>> >> > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ >>> >> >>> >> Just FYI Nick, there's been some concurrent work here from different >>> >> people working on the same thing & the v3 you linked (from Zhangjin) was >>> >> superseded by this v2 (from Jisheng). >>> > >>> > Ah! I've been testing the deprecated patch set, sorry I just looked on >>> > lore for "dead code" on riscv-linux and grabbed the first thread, >>> > without noticing the difference in authors or new version numbers for >>> > distinct series. ok, nevermind my noise. I'll follow up with the >>> > correct patch set, sorry! >>> >>> Ya, I hadn't even noticed the v3 because I pretty much only look at >>> patchwork these days. Like we talked about in IRC, I'm going to go test >>> the merge of this one and see what's up -- I've got it staged at >>> <https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=1bd2963b21758a773206a1cb67c93e7a8ae8a195>, >>> though that won't be a stable hash if it's actually broken... >> >> Ok, https://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/ >> built for me. If you're seeing a hang, please let me know what >> version of LLD you're using and I'll build that tag from source to see >> if I can reproduce, then bisect if so. >> >> $ ARCH=riscv LLVM=1 /usr/bin/time -v make -j128 allmodconfig vmlinux >> ... >> Elapsed (wall clock) time (h:mm:ss or m:ss): 2:35.68 >> ... >> >> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build > > OK, it triggered enough of a rebuild that it might take a bit for > anything to filter out. I'm on LLVM 16.0.2 $ git describe llvmorg-16.0.2 $ git log | head -n1 commit 18ddebe1a1a9bde349441631365f0472e9693520 that seems to hang for me -- or at least run for an hour without completing, so I assume it's hung. I'm not wed to 16.0.2, it just happens to be the last time I bumped the toolchain. I'm moving to 16.0.5 to see if that changes anything. > > Thanks! > >> >>> >>> > >>> >> >>> >> Cheers, >>> >> Conor. >>> > >>> > >>> > >>> > -- >>> > Thanks, >>> > ~Nick Desaulniers >> >> >> >> -- >> Thanks, >> ~Nick Desaulniers
On Tue, 20 Jun 2023 17:13:17 PDT (-0700), Palmer Dabbelt wrote: > On Tue, 20 Jun 2023 14:08:33 PDT (-0700), Palmer Dabbelt wrote: >> On Tue, 20 Jun 2023 13:47:07 PDT (-0700), ndesaulniers@google.com wrote: >>> On Tue, Jun 20, 2023 at 4:41 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >>>> >>>> On Tue, 20 Jun 2023 13:32:32 PDT (-0700), ndesaulniers@google.com wrote: >>>> > On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: >>>> >> >>>> >> On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: >>>> >> > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >>>> >> > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: >>>> >> > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: >>>> >> > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: >>>> >> > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: >>>> >> >>>> >> > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by >>>> >> > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this >>>> >> > > >> series is based on 6.4-rc2. >>>> >> > > > >>>> >> > > > Thanks. >>>> >> > > >>>> >> > > Sorry to be so slow here, but I think this is causing LLD to hang on >>>> >> > > allmodconfig. I'm still getting to the bottom of it, there's a few >>>> >> > > other things I have in flight still. >>>> >> > >>>> >> > Confirmed with v3 on mainline (linux-next is pretty red at the moment). >>>> >> > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ >>>> >> >>>> >> Just FYI Nick, there's been some concurrent work here from different >>>> >> people working on the same thing & the v3 you linked (from Zhangjin) was >>>> >> superseded by this v2 (from Jisheng). >>>> > >>>> > Ah! I've been testing the deprecated patch set, sorry I just looked on >>>> > lore for "dead code" on riscv-linux and grabbed the first thread, >>>> > without noticing the difference in authors or new version numbers for >>>> > distinct series. ok, nevermind my noise. I'll follow up with the >>>> > correct patch set, sorry! >>>> >>>> Ya, I hadn't even noticed the v3 because I pretty much only look at >>>> patchwork these days. Like we talked about in IRC, I'm going to go test >>>> the merge of this one and see what's up -- I've got it staged at >>>> <https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=1bd2963b21758a773206a1cb67c93e7a8ae8a195>, >>>> though that won't be a stable hash if it's actually broken... >>> >>> Ok, https://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/ >>> built for me. If you're seeing a hang, please let me know what >>> version of LLD you're using and I'll build that tag from source to see >>> if I can reproduce, then bisect if so. >>> >>> $ ARCH=riscv LLVM=1 /usr/bin/time -v make -j128 allmodconfig vmlinux >>> ... >>> Elapsed (wall clock) time (h:mm:ss or m:ss): 2:35.68 >>> ... >>> >>> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build >> >> OK, it triggered enough of a rebuild that it might take a bit for >> anything to filter out. > > I'm on LLVM 16.0.2 > > $ git describe > llvmorg-16.0.2 > $ git log | head -n1 > commit 18ddebe1a1a9bde349441631365f0472e9693520 > > that seems to hang for me -- or at least run for an hour without > completing, so I assume it's hung. I'm not wed to 16.0.2, it just > happens to be the last time I bumped the toolchain. I'm moving to > 16.0.5 to see if that changes anything. That also takes at least an hour to link. I tried running on LLVM trunk from last night $ git log | head -n1 commit 5e9173c43a9b97c8614e36d6f754317f731e71e9 and that completed. Just as a curiosity I tried to re-spin it to see how long it takes, and it's been running for 23 minutes so far. So I'm no longer actually sure there's a hang, just something slow. That's even more of a grey area, but I think it's sane to call a 1-hour link time a regression -- unless it's expected that this is just very slow to link? > >> >> Thanks! >> >>> >>>> >>>> > >>>> >> >>>> >> Cheers, >>>> >> Conor. >>>> > >>>> > >>>> > >>>> > -- >>>> > Thanks, >>>> > ~Nick Desaulniers >>> >>> >>> >>> -- >>> Thanks, >>> ~Nick Desaulniers
On Wed, Jun 21, 2023 at 07:53:59AM -0700, Palmer Dabbelt wrote: > On Tue, 20 Jun 2023 17:13:17 PDT (-0700), Palmer Dabbelt wrote: > > On Tue, 20 Jun 2023 14:08:33 PDT (-0700), Palmer Dabbelt wrote: > >> On Tue, 20 Jun 2023 13:47:07 PDT (-0700), ndesaulniers@google.com wrote: > >>> On Tue, Jun 20, 2023 at 4:41 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > >>>> > >>>> On Tue, 20 Jun 2023 13:32:32 PDT (-0700), ndesaulniers@google.com wrote: > >>>> > On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: > >>>> >> > >>>> >> On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: > >>>> >> > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > >>>> >> > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: > >>>> >> > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: > >>>> >> > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: > >>>> >> > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > >>>> >> > >>>> >> > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by > >>>> >> > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this > >>>> >> > > >> series is based on 6.4-rc2. > >>>> >> > > > > >>>> >> > > > Thanks. > >>>> >> > > > >>>> >> > > Sorry to be so slow here, but I think this is causing LLD to hang on > >>>> >> > > allmodconfig. I'm still getting to the bottom of it, there's a few > >>>> >> > > other things I have in flight still. > >>>> >> > > >>>> >> > Confirmed with v3 on mainline (linux-next is pretty red at the moment). > >>>> >> > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ > >>>> >> > >>>> >> Just FYI Nick, there's been some concurrent work here from different > >>>> >> people working on the same thing & the v3 you linked (from Zhangjin) was > >>>> >> superseded by this v2 (from Jisheng). > >>>> > > >>>> > Ah! I've been testing the deprecated patch set, sorry I just looked on > >>>> > lore for "dead code" on riscv-linux and grabbed the first thread, > >>>> > without noticing the difference in authors or new version numbers for > >>>> > distinct series. ok, nevermind my noise. I'll follow up with the > >>>> > correct patch set, sorry! > >>>> > >>>> Ya, I hadn't even noticed the v3 because I pretty much only look at > >>>> patchwork these days. Like we talked about in IRC, I'm going to go test > >>>> the merge of this one and see what's up -- I've got it staged at > >>>> <https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=1bd2963b21758a773206a1cb67c93e7a8ae8a195>, > >>>> though that won't be a stable hash if it's actually broken... > >>> > >>> Ok, https://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/ > >>> built for me. If you're seeing a hang, please let me know what > >>> version of LLD you're using and I'll build that tag from source to see > >>> if I can reproduce, then bisect if so. > >>> > >>> $ ARCH=riscv LLVM=1 /usr/bin/time -v make -j128 allmodconfig vmlinux > >>> ... > >>> Elapsed (wall clock) time (h:mm:ss or m:ss): 2:35.68 > >>> ... > >>> > >>> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build > >> > >> OK, it triggered enough of a rebuild that it might take a bit for > >> anything to filter out. > > > > I'm on LLVM 16.0.2 > > > > $ git describe > > llvmorg-16.0.2 > > $ git log | head -n1 > > commit 18ddebe1a1a9bde349441631365f0472e9693520 > > > > that seems to hang for me -- or at least run for an hour without > > completing, so I assume it's hung. I'm not wed to 16.0.2, it just > > happens to be the last time I bumped the toolchain. I'm moving to > > 16.0.5 to see if that changes anything. > > That also takes at least an hour to link. I tried running on LLVM trunk > from last night > > $ git log | head -n1 > commit 5e9173c43a9b97c8614e36d6f754317f731e71e9 > > and that completed. Just as a curiosity I tried to re-spin it to see > how long it takes, and it's been running for 23 minutes so far. After some misdirection through stupid user error, I have also reproduced this for an LLVM=1 build w/ llvmorg-16.0.0 > So I'm no longer actually sure there's a hang, just something slow. > That's even more of a grey area, but I think it's sane to call a 1-hour > link time a regression -- unless it's expected that this is just very > slow to link? I dunno, if it was only a thing for allyesconfig, then whatever - but it's gonna significantly increase build times for any large kernels if LLD is this much slower than LD. Regression in my book. I'm gonna go and experiment with mixed toolchain builds, I'll report back.. Cheers, Conor.
On Wed, Jun 21, 2023 at 05:42:08PM +0100, Conor Dooley wrote: > On Wed, Jun 21, 2023 at 07:53:59AM -0700, Palmer Dabbelt wrote: > > On Tue, 20 Jun 2023 17:13:17 PDT (-0700), Palmer Dabbelt wrote: > > > On Tue, 20 Jun 2023 14:08:33 PDT (-0700), Palmer Dabbelt wrote: > > >> On Tue, 20 Jun 2023 13:47:07 PDT (-0700), ndesaulniers@google.com wrote: > > >>> On Tue, Jun 20, 2023 at 4:41 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > >>>> > > >>>> On Tue, 20 Jun 2023 13:32:32 PDT (-0700), ndesaulniers@google.com wrote: > > >>>> > On Tue, Jun 20, 2023 at 4:13 PM Conor Dooley <conor@kernel.org> wrote: > > >>>> >> > > >>>> >> On Tue, Jun 20, 2023 at 04:05:55PM -0400, Nick Desaulniers wrote: > > >>>> >> > On Mon, Jun 19, 2023 at 6:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > >>>> >> > > On Thu, 15 Jun 2023 06:54:33 PDT (-0700), Palmer Dabbelt wrote: > > >>>> >> > > > On Wed, 14 Jun 2023 09:25:49 PDT (-0700), jszhang@kernel.org wrote: > > >>>> >> > > >> On Wed, Jun 14, 2023 at 07:49:17AM -0700, Palmer Dabbelt wrote: > > >>>> >> > > >>> On Tue, 23 May 2023 09:54:58 PDT (-0700), jszhang@kernel.org wrote: > > >>>> >> > > >>>> >> > > >> Commit 3b90b09af5be ("riscv: Fix orphan section warnings caused by > > >>>> >> > > >> kernel/pi") touches vmlinux.lds.S, so to make the merge easy, this > > >>>> >> > > >> series is based on 6.4-rc2. > > >>>> >> > > > > > >>>> >> > > > Thanks. > > >>>> >> > > > > >>>> >> > > Sorry to be so slow here, but I think this is causing LLD to hang on > > >>>> >> > > allmodconfig. I'm still getting to the bottom of it, there's a few > > >>>> >> > > other things I have in flight still. > > >>>> >> > > > >>>> >> > Confirmed with v3 on mainline (linux-next is pretty red at the moment). > > >>>> >> > https://lore.kernel.org/linux-riscv/20230517082936.37563-1-falcon@tinylab.org/ > > >>>> >> > > >>>> >> Just FYI Nick, there's been some concurrent work here from different > > >>>> >> people working on the same thing & the v3 you linked (from Zhangjin) was > > >>>> >> superseded by this v2 (from Jisheng). > > >>>> > > > >>>> > Ah! I've been testing the deprecated patch set, sorry I just looked on > > >>>> > lore for "dead code" on riscv-linux and grabbed the first thread, > > >>>> > without noticing the difference in authors or new version numbers for > > >>>> > distinct series. ok, nevermind my noise. I'll follow up with the > > >>>> > correct patch set, sorry! > > >>>> > > >>>> Ya, I hadn't even noticed the v3 because I pretty much only look at > > >>>> patchwork these days. Like we talked about in IRC, I'm going to go test > > >>>> the merge of this one and see what's up -- I've got it staged at > > >>>> <https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=for-next&id=1bd2963b21758a773206a1cb67c93e7a8ae8a195>, > > >>>> though that won't be a stable hash if it's actually broken... > > >>> > > >>> Ok, https://lore.kernel.org/linux-riscv/20230523165502.2592-1-jszhang@kernel.org/ > > >>> built for me. If you're seeing a hang, please let me know what > > >>> version of LLD you're using and I'll build that tag from source to see > > >>> if I can reproduce, then bisect if so. > > >>> > > >>> $ ARCH=riscv LLVM=1 /usr/bin/time -v make -j128 allmodconfig vmlinux > > >>> ... > > >>> Elapsed (wall clock) time (h:mm:ss or m:ss): 2:35.68 > > >>> ... > > >>> > > >>> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build > > >> > > >> OK, it triggered enough of a rebuild that it might take a bit for > > >> anything to filter out. > > > > > > I'm on LLVM 16.0.2 > > > > > > $ git describe > > > llvmorg-16.0.2 > > > $ git log | head -n1 > > > commit 18ddebe1a1a9bde349441631365f0472e9693520 > > > > > > that seems to hang for me -- or at least run for an hour without > > > completing, so I assume it's hung. I'm not wed to 16.0.2, it just > > > happens to be the last time I bumped the toolchain. I'm moving to > > > 16.0.5 to see if that changes anything. > > > > That also takes at least an hour to link. I tried running on LLVM trunk > > from last night > > > > $ git log | head -n1 > > commit 5e9173c43a9b97c8614e36d6f754317f731e71e9 > > > > and that completed. Just as a curiosity I tried to re-spin it to see > > how long it takes, and it's been running for 23 minutes so far. > > After some misdirection through stupid user error, I have also > reproduced this for an LLVM=1 build w/ llvmorg-16.0.0 > > > So I'm no longer actually sure there's a hang, just something slow. > > That's even more of a grey area, but I think it's sane to call a 1-hour > > link time a regression -- unless it's expected that this is just very > > slow to link? > > I dunno, if it was only a thing for allyesconfig, then whatever - but > it's gonna significantly increase build times for any large kernels if LLD > is this much slower than LD. Regression in my book. > > I'm gonna go and experiment with mixed toolchain builds, I'll report > back.. Probably as expected, swapping out LLD for LD linked normally & using gcc-13.1 + LLD hit the same problems with linking. Cheers, Conor.
Conor Dooley <conor@kernel.org> writes: [...] >> So I'm no longer actually sure there's a hang, just something slow. >> That's even more of a grey area, but I think it's sane to call a 1-hour >> link time a regression -- unless it's expected that this is just very >> slow to link? > > I dunno, if it was only a thing for allyesconfig, then whatever - but > it's gonna significantly increase build times for any large kernels if LLD > is this much slower than LD. Regression in my book. > > I'm gonna go and experiment with mixed toolchain builds, I'll report > back.. I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ | --toolchain=llvm-16 --runtime docker --directory . -k \ | allyesconfig Took forever, but passed after 2.5h. CONFIG_CC_VERSION_TEXT="Debian clang version 16.0.6 (++20230610113307+7cbf1a259152-1~exp1~20230610233402.106)" Björn
On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@kernel.org wrote: > Conor Dooley <conor@kernel.org> writes: > > [...] > >>> So I'm no longer actually sure there's a hang, just something slow. >>> That's even more of a grey area, but I think it's sane to call a 1-hour >>> link time a regression -- unless it's expected that this is just very >>> slow to link? >> >> I dunno, if it was only a thing for allyesconfig, then whatever - but >> it's gonna significantly increase build times for any large kernels if LLD >> is this much slower than LD. Regression in my book. >> >> I'm gonna go and experiment with mixed toolchain builds, I'll report >> back.. > > I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable > HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: > > | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ > | --toolchain=llvm-16 --runtime docker --directory . -k \ > | allyesconfig > > Took forever, but passed after 2.5h. Thanks. I just re-ran mine 17/trunk LLD under time (rather that just checking top sometimes), it's at 1.5h but even that seems quite long. I guess this is sort of up to the LLVM folks: if it's expected that DCE takes a very long time to link then I'm not opposed to allowing it, but if this is probably a bug in LLD then it seems best to turn it off until we sort things out over there. I think maybe Nick or Nathan is the best bet to know? > CONFIG_CC_VERSION_TEXT="Debian clang version 16.0.6 (++20230610113307+7cbf1a259152-1~exp1~20230610233402.106)" > > > Björn
On Wed, 21 Jun 2023 11:19:31 PDT (-0700), Palmer Dabbelt wrote: > On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@kernel.org wrote: >> Conor Dooley <conor@kernel.org> writes: >> >> [...] >> >>>> So I'm no longer actually sure there's a hang, just something slow. >>>> That's even more of a grey area, but I think it's sane to call a 1-hour >>>> link time a regression -- unless it's expected that this is just very >>>> slow to link? >>> >>> I dunno, if it was only a thing for allyesconfig, then whatever - but >>> it's gonna significantly increase build times for any large kernels if LLD >>> is this much slower than LD. Regression in my book. >>> >>> I'm gonna go and experiment with mixed toolchain builds, I'll report >>> back.. >> >> I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable >> HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: >> >> | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ >> | --toolchain=llvm-16 --runtime docker --directory . -k \ >> | allyesconfig >> >> Took forever, but passed after 2.5h. > > Thanks. I just re-ran mine 17/trunk LLD under time (rather that just > checking top sometimes), it's at 1.5h but even that seems quite long. > > I guess this is sort of up to the LLVM folks: if it's expected that DCE > takes a very long time to link then I'm not opposed to allowing it, but > if this is probably a bug in LLD then it seems best to turn it off until > we sort things out over there. > > I think maybe Nick or Nathan is the best bet to know? Looks like it's about 2h for me. I'm going to drop these from my staging tree in the interest of making progress on other stuff, but if this is just expected behavior them I'm OK taking them (though that's too much compute for me to test regularly): $ time ../../../../llvm/install/bin/ld.lld -melf64lriscv -z noexecstack -r -o vmlinux.o --whole-archive vmlinux.a --no-whole-archive --start-group ./drivers/firmware/efi/libstub/lib.a --end-group real 111m50.678s user 111m18.739s sys 1m13.147s >> CONFIG_CC_VERSION_TEXT="Debian clang version 16.0.6 (++20230610113307+7cbf1a259152-1~exp1~20230610233402.106)" >> >> >> Björn
On Wed, Jun 21, 2023 at 12:46 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Wed, 21 Jun 2023 11:19:31 PDT (-0700), Palmer Dabbelt wrote: > > On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@kernel.org wrote: > >> Conor Dooley <conor@kernel.org> writes: > >> > >> [...] > >> > >>>> So I'm no longer actually sure there's a hang, just something slow. > >>>> That's even more of a grey area, but I think it's sane to call a 1-hour > >>>> link time a regression -- unless it's expected that this is just very > >>>> slow to link? > >>> > >>> I dunno, if it was only a thing for allyesconfig, then whatever - but > >>> it's gonna significantly increase build times for any large kernels if LLD > >>> is this much slower than LD. Regression in my book. > >>> > >>> I'm gonna go and experiment with mixed toolchain builds, I'll report > >>> back.. > >> > >> I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable > >> HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: > >> > >> | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ > >> | --toolchain=llvm-16 --runtime docker --directory . -k \ > >> | allyesconfig > >> > >> Took forever, but passed after 2.5h. > > > > Thanks. I just re-ran mine 17/trunk LLD under time (rather that just > > checking top sometimes), it's at 1.5h but even that seems quite long. > > > > I guess this is sort of up to the LLVM folks: if it's expected that DCE > > takes a very long time to link then I'm not opposed to allowing it, but > > if this is probably a bug in LLD then it seems best to turn it off until > > we sort things out over there. > > > > I think maybe Nick or Nathan is the best bet to know? > > Looks like it's about 2h for me. I'm going to drop these from my > staging tree in the interest of making progress on other stuff, but if > this is just expected behavior them I'm OK taking them (though that's > too much compute for me to test regularly): > > $ time ../../../../llvm/install/bin/ld.lld -melf64lriscv -z noexecstack -r -o vmlinux.o --whole-archive vmlinux.a --no-whole-archive --start-group ./drivers/firmware/efi/libstub/lib.a --end-group > > real 111m50.678s > user 111m18.739s > sys 1m13.147s Ah, I think you meant s/allmodconfig/allyesconfig/ in your initial report. That makes more sense, and I can reproduce. Let me work on a report. > > >> CONFIG_CC_VERSION_TEXT="Debian clang version 16.0.6 (++20230610113307+7cbf1a259152-1~exp1~20230610233402.106)" > >> > >> > >> Björn -- Thanks, ~Nick Desaulniers
On Thu, 22 Jun 2023 14:40:59 PDT (-0700), ndesaulniers@google.com wrote: > On Wed, Jun 21, 2023 at 12:46 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> >> On Wed, 21 Jun 2023 11:19:31 PDT (-0700), Palmer Dabbelt wrote: >> > On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@kernel.org wrote: >> >> Conor Dooley <conor@kernel.org> writes: >> >> >> >> [...] >> >> >> >>>> So I'm no longer actually sure there's a hang, just something slow. >> >>>> That's even more of a grey area, but I think it's sane to call a 1-hour >> >>>> link time a regression -- unless it's expected that this is just very >> >>>> slow to link? >> >>> >> >>> I dunno, if it was only a thing for allyesconfig, then whatever - but >> >>> it's gonna significantly increase build times for any large kernels if LLD >> >>> is this much slower than LD. Regression in my book. >> >>> >> >>> I'm gonna go and experiment with mixed toolchain builds, I'll report >> >>> back.. >> >> >> >> I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable >> >> HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: >> >> >> >> | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ >> >> | --toolchain=llvm-16 --runtime docker --directory . -k \ >> >> | allyesconfig >> >> >> >> Took forever, but passed after 2.5h. >> > >> > Thanks. I just re-ran mine 17/trunk LLD under time (rather that just >> > checking top sometimes), it's at 1.5h but even that seems quite long. >> > >> > I guess this is sort of up to the LLVM folks: if it's expected that DCE >> > takes a very long time to link then I'm not opposed to allowing it, but >> > if this is probably a bug in LLD then it seems best to turn it off until >> > we sort things out over there. >> > >> > I think maybe Nick or Nathan is the best bet to know? >> >> Looks like it's about 2h for me. I'm going to drop these from my >> staging tree in the interest of making progress on other stuff, but if >> this is just expected behavior them I'm OK taking them (though that's >> too much compute for me to test regularly): >> >> $ time ../../../../llvm/install/bin/ld.lld -melf64lriscv -z noexecstack -r -o vmlinux.o --whole-archive vmlinux.a --no-whole-archive --start-group ./drivers/firmware/efi/libstub/lib.a --end-group >> >> real 111m50.678s >> user 111m18.739s >> sys 1m13.147s > > Ah, I think you meant s/allmodconfig/allyesconfig/ in your initial > report. That makes more sense, and I can reproduce. Let me work on a > report. Awesome, thanks! > >> >> >> CONFIG_CC_VERSION_TEXT="Debian clang version 16.0.6 (++20230610113307+7cbf1a259152-1~exp1~20230610233402.106)" >> >> >> >> >> >> Björn
On Wed, Jun 21, 2023 at 11:19:31AM -0700, Palmer Dabbelt wrote: > On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@kernel.org wrote: > > Conor Dooley <conor@kernel.org> writes: > > > > [...] > > > > > > So I'm no longer actually sure there's a hang, just something > > > > slow. That's even more of a grey area, but I think it's sane to > > > > call a 1-hour link time a regression -- unless it's expected > > > > that this is just very slow to link? > > > > > > I dunno, if it was only a thing for allyesconfig, then whatever - but > > > it's gonna significantly increase build times for any large kernels if LLD > > > is this much slower than LD. Regression in my book. > > > > > > I'm gonna go and experiment with mixed toolchain builds, I'll report > > > back.. > > > > I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable > > HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: > > > > | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ > > | --toolchain=llvm-16 --runtime docker --directory . -k \ > > | allyesconfig > > > > Took forever, but passed after 2.5h. > > Thanks. I just re-ran mine 17/trunk LLD under time (rather that just > checking top sometimes), it's at 1.5h but even that seems quite long. > > I guess this is sort of up to the LLVM folks: if it's expected that DCE > takes a very long time to link then I'm not opposed to allowing it, but if > this is probably a bug in LLD then it seems best to turn it off until we > sort things out over there. > > I think maybe Nick or Nathan is the best bet to know? I can confirm a regression with allyesconfig but not allmodconfig using LLVM 16.0.6 on my 80-core Ampere Altra system. allmodconfig: 8m 4s allmodconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 7m 4s allyesconfig: 1h 58m 30s allyesconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 12m 41s I am sure there is something that ld.lld can do better, given GNU ld does not have any problems as earlier established, so that should definitely be explored further. I see Nick already had a response about writing up a report (I wrote most of this before that email so I am still sending this one). However, allyesconfig is pretty special and not really indicative of a "real world" kernel build in my opinion (which will either be a fully modular kernel to allow use on a wide range of hardware or a monolithic kernel with just the drivers needed for a specific platform, which will be much smaller than allyesconfig); it has given us problems with large kernels before on other architectures. CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is already marked with 'depends on EXPERT' and its help text mentions its perils, so it does not seem unreasonable to me to add an additional dependency on !COMPILE_TEST so that allmodconfig and allyesconfig cannot flip this on, something like the following perhaps? diff --git a/init/Kconfig b/init/Kconfig index 32c24950c4ce..25434cbd2a6e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1388,7 +1388,7 @@ config HAVE_LD_DEAD_CODE_DATA_ELIMINATION config LD_DEAD_CODE_DATA_ELIMINATION bool "Dead code and data elimination (EXPERIMENTAL)" depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION - depends on EXPERT + depends on EXPERT && !COMPILE_TEST depends on $(cc-option,-ffunction-sections -fdata-sections) depends on $(ld-option,--gc-sections) help If applying that dependency to all architectures is too much, the selection in arch/riscv/Kconfig could be gated on the same condition. Cheers, Nathan
On Thu, 22 Jun 2023 14:53:27 PDT (-0700), nathan@kernel.org wrote: > On Wed, Jun 21, 2023 at 11:19:31AM -0700, Palmer Dabbelt wrote: >> On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@kernel.org wrote: >> > Conor Dooley <conor@kernel.org> writes: >> > >> > [...] >> > >> > > > So I'm no longer actually sure there's a hang, just something >> > > > slow. That's even more of a grey area, but I think it's sane to >> > > > call a 1-hour link time a regression -- unless it's expected >> > > > that this is just very slow to link? >> > > >> > > I dunno, if it was only a thing for allyesconfig, then whatever - but >> > > it's gonna significantly increase build times for any large kernels if LLD >> > > is this much slower than LD. Regression in my book. >> > > >> > > I'm gonna go and experiment with mixed toolchain builds, I'll report >> > > back.. >> > >> > I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable >> > HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: >> > >> > | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ >> > | --toolchain=llvm-16 --runtime docker --directory . -k \ >> > | allyesconfig >> > >> > Took forever, but passed after 2.5h. >> >> Thanks. I just re-ran mine 17/trunk LLD under time (rather that just >> checking top sometimes), it's at 1.5h but even that seems quite long. >> >> I guess this is sort of up to the LLVM folks: if it's expected that DCE >> takes a very long time to link then I'm not opposed to allowing it, but if >> this is probably a bug in LLD then it seems best to turn it off until we >> sort things out over there. >> >> I think maybe Nick or Nathan is the best bet to know? > > I can confirm a regression with allyesconfig but not allmodconfig using > LLVM 16.0.6 on my 80-core Ampere Altra system. > > allmodconfig: 8m 4s > allmodconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 7m 4s > allyesconfig: 1h 58m 30s > allyesconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 12m 41s Are those backwards? I'm getting super slow builds after merging the patch set, not before -- though apologize in advance if I'm reading it wrong, I'm well on my way to falling asleep already ;) > I am sure there is something that ld.lld can do better, given GNU ld > does not have any problems as earlier established, so that should > definitely be explored further. I see Nick already had a response about > writing up a report (I wrote most of this before that email so I am > still sending this one). > > However, allyesconfig is pretty special and not really indicative of a > "real world" kernel build in my opinion (which will either be a fully > modular kernel to allow use on a wide range of hardware or a monolithic > kernel with just the drivers needed for a specific platform, which will > be much smaller than allyesconfig); it has given us problems with large > kernels before on other architectures. I totally agree that allyesconfig is an oddity, but it's something that does get regularly build tested so a big build time hit there is going to cause trouble -- maybe not for users, but it'll be a problem for maintainers and that's way more likely to get me yelled at ;) > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is already marked with 'depends on > EXPERT' and its help text mentions its perils, so it does not seem > unreasonable to me to add an additional dependency on !COMPILE_TEST so > that allmodconfig and allyesconfig cannot flip this on, something like > the following perhaps? > > diff --git a/init/Kconfig b/init/Kconfig > index 32c24950c4ce..25434cbd2a6e 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1388,7 +1388,7 @@ config HAVE_LD_DEAD_CODE_DATA_ELIMINATION > config LD_DEAD_CODE_DATA_ELIMINATION > bool "Dead code and data elimination (EXPERIMENTAL)" > depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION > - depends on EXPERT > + depends on EXPERT && !COMPILE_TEST > depends on $(cc-option,-ffunction-sections -fdata-sections) > depends on $(ld-option,--gc-sections) > help > > If applying that dependency to all architectures is too much, the > selection in arch/riscv/Kconfig could be gated on the same condition. Is the regression for all ports, or just RISC-V? I'm fine gating this with some sort of Kconfig flag, if it's just impacting RISC-V then it seems sane to keep it over here. > Cheers, > Nathan
On Thu, Jun 22, 2023 at 03:16:51PM -0700, Palmer Dabbelt wrote: > On Thu, 22 Jun 2023 14:53:27 PDT (-0700), nathan@kernel.org wrote: > > On Wed, Jun 21, 2023 at 11:19:31AM -0700, Palmer Dabbelt wrote: > > > On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@kernel.org wrote: > > > > Conor Dooley <conor@kernel.org> writes: > > > > > > > > [...] > > > > > > > > > > So I'm no longer actually sure there's a hang, just something > > > > > > slow. That's even more of a grey area, but I think it's sane to > > > > > > call a 1-hour link time a regression -- unless it's expected > > > > > > that this is just very slow to link? > > > > > > > > > > I dunno, if it was only a thing for allyesconfig, then whatever - but > > > > > it's gonna significantly increase build times for any large kernels if LLD > > > > > is this much slower than LD. Regression in my book. > > > > > > > > > > I'm gonna go and experiment with mixed toolchain builds, I'll report > > > > > back.. > > > > > > > > I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable > > > > HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16: > > > > > > > > | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \ > > > > | --toolchain=llvm-16 --runtime docker --directory . -k \ > > > > | allyesconfig > > > > > > > > Took forever, but passed after 2.5h. > > > > > > Thanks. I just re-ran mine 17/trunk LLD under time (rather that just > > > checking top sometimes), it's at 1.5h but even that seems quite long. > > > > > > I guess this is sort of up to the LLVM folks: if it's expected that DCE > > > takes a very long time to link then I'm not opposed to allowing it, but if > > > this is probably a bug in LLD then it seems best to turn it off until we > > > sort things out over there. > > > > > > I think maybe Nick or Nathan is the best bet to know? > > > > I can confirm a regression with allyesconfig but not allmodconfig using > > LLVM 16.0.6 on my 80-core Ampere Altra system. > > > > allmodconfig: 8m 4s > > allmodconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 7m 4s > > allyesconfig: 1h 58m 30s > > allyesconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 12m 41s > > Are those backwards? I'm getting super slow builds after merging the patch > set, not before -- though apologize in advance if I'm reading it wrong, I'm > well on my way to falling asleep already ;) I know I already responded to you around this on IRC but I will do it here too for the benefit of others following this thread. These numbers are from the patchset applied on top of dad9774deaf1 ("Merge tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"); in other words, allmodconfig and allyesconfig have CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y so turning it off is basically like building allmodconfig and allyesconfig before the patchset was applied. > > I am sure there is something that ld.lld can do better, given GNU ld > > does not have any problems as earlier established, so that should > > definitely be explored further. I see Nick already had a response about > > writing up a report (I wrote most of this before that email so I am > > still sending this one). > > > > However, allyesconfig is pretty special and not really indicative of a > > "real world" kernel build in my opinion (which will either be a fully > > modular kernel to allow use on a wide range of hardware or a monolithic > > kernel with just the drivers needed for a specific platform, which will > > be much smaller than allyesconfig); it has given us problems with large > > kernels before on other architectures. > > I totally agree that allyesconfig is an oddity, but it's something that does > get regularly build tested so a big build time hit there is going to cause > trouble -- maybe not for users, but it'll be a problem for maintainers and > that's way more likely to get me yelled at ;) Agreed. That comment was more around justification for opting out of CONFIG_LD_DEAD_CODE_DATA_ELIMINATION with these configurations, since CONFIG_COMPILE_TEST has effective become "am I allmodconfig or allyesconfig?" nowadays. > > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is already marked with 'depends on > > EXPERT' and its help text mentions its perils, so it does not seem > > unreasonable to me to add an additional dependency on !COMPILE_TEST so > > that allmodconfig and allyesconfig cannot flip this on, something like > > the following perhaps? > > > > diff --git a/init/Kconfig b/init/Kconfig > > index 32c24950c4ce..25434cbd2a6e 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -1388,7 +1388,7 @@ config HAVE_LD_DEAD_CODE_DATA_ELIMINATION > > config LD_DEAD_CODE_DATA_ELIMINATION > > bool "Dead code and data elimination (EXPERIMENTAL)" > > depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION > > - depends on EXPERT > > + depends on EXPERT && !COMPILE_TEST > > depends on $(cc-option,-ffunction-sections -fdata-sections) > > depends on $(ld-option,--gc-sections) > > help > > > > If applying that dependency to all architectures is too much, the > > selection in arch/riscv/Kconfig could be gated on the same condition. > > Is the regression for all ports, or just RISC-V? I'm fine gating this with > some sort of Kconfig flag, if it's just impacting RISC-V then it seems sane > to keep it over here. I am not sure. Only mips selects HAVE_LD_DEAD_CODE_DATA_ELIMINATION unconditionally and we don't test ARCH=mips all{mod,yes}config (not sure why off the top of my head). powerpc selects it when using objtool for mcount generation, which only happens for ppc32 (which we don't test heavily or with large kernels) or using '-mprofile-kernel', which clang does not support. If you wanted to restrict it to just LD_IS_BFD in arch/riscv/Kconfig, that would be fine with me too. select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if LD_IS_BFD Nick said he would work on a report for the LLVM side, so as long as this issue is handled in some way to avoid regressing LLD builds until it is resolved, I don't think there is anything else for the kernel to do. We like to have breadcrumbs via issue links, not sure if the report will be internal to Google or on LLVM's issue tracker though; regardless, we will have to touch this block to add a version check later, at which point we can add a link to the fix in LLD. Cheers, Nathan
On Thu, Jun 22, 2023 at 11:18:03PM +0000, Nathan Chancellor wrote: > If you wanted to restrict it to just LD_IS_BFD in arch/riscv/Kconfig, > that would be fine with me too. > > select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if LD_IS_BFD Hi Jisheng, would you mind sending a v3 with the attached patch applied on top / at the end of your series? > > Nick said he would work on a report for the LLVM side, so as long as > this issue is handled in some way to avoid regressing LLD builds until > it is resolved, I don't think there is anything else for the kernel to > do. We like to have breadcrumbs via issue links, not sure if the report > will be internal to Google or on LLVM's issue tracker though; > regardless, we will have to touch this block to add a version check > later, at which point we can add a link to the fix in LLD. https://github.com/ClangBuiltLinux/linux/issues/1881
On Fri, Jun 23, 2023 at 10:17:54AM -0700, Nick Desaulniers wrote: > On Thu, Jun 22, 2023 at 11:18:03PM +0000, Nathan Chancellor wrote: > > If you wanted to restrict it to just LD_IS_BFD in arch/riscv/Kconfig, > > that would be fine with me too. > > > > select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if LD_IS_BFD > > Hi Jisheng, would you mind sending a v3 with the attached patch applied > on top / at the end of your series? Hi Nick, Nathan, Palmer, I saw the series has been applied to riscv-next, so I'm not sure which solution would it be, Palmer to apply Nick's patch to riscv-next or I to send out v3, any suggestion is appreciated. Thanks > > > > > Nick said he would work on a report for the LLVM side, so as long as > > this issue is handled in some way to avoid regressing LLD builds until > > it is resolved, I don't think there is anything else for the kernel to > > do. We like to have breadcrumbs via issue links, not sure if the report > > will be internal to Google or on LLVM's issue tracker though; > > regardless, we will have to touch this block to add a version check > > later, at which point we can add a link to the fix in LLD. > > https://github.com/ClangBuiltLinux/linux/issues/1881 > From 3e5e010958ee41b9fb408cfade8fb017c2fe7169 Mon Sep 17 00:00:00 2001 > From: Nick Desaulniers <ndesaulniers@google.com> > Date: Fri, 23 Jun 2023 10:06:17 -0700 > Subject: [PATCH] riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD > > Linking allyesconfig with ld.lld-17 with CONFIG_DEAD_CODE_ELIMINATION=y > takes hours. Assuming this is a performance regression that can be > fixed, tentatively disable this for now so that allyesconfig builds > don't start timing out. If and when there's a fix to ld.lld, this can > be converted to a version check instead so that users of older but still > supported versions of ld.lld don't hurt themselves by enabling > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. > > Link: https://github.com/ClangBuiltLinux/linux/issues/1881 > Reported-by: Palmer Dabbelt <palmer@dabbelt.com> > Suggested-by: Nathan Chancellor <nathan@kernel.org> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > Hi Jisheng, would you mind sending a v3 with this patch on top/at the > end of your patch series? > > arch/riscv/Kconfig | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 8effe5bb7788..0573991e9b78 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -116,7 +116,8 @@ config RISCV > select HAVE_KPROBES if !XIP_KERNEL > select HAVE_KPROBES_ON_FTRACE if !XIP_KERNEL > select HAVE_KRETPROBES if !XIP_KERNEL > - select HAVE_LD_DEAD_CODE_DATA_ELIMINATION > + # https://github.com/ClangBuiltLinux/linux/issues/1881 > + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD > select HAVE_MOVE_PMD > select HAVE_MOVE_PUD > select HAVE_PCI > -- > 2.41.0.162.gfafddb0af9-goog >
On Sun, Jun 25, 2023 at 08:24:56PM +0800, Jisheng Zhang wrote: > On Fri, Jun 23, 2023 at 10:17:54AM -0700, Nick Desaulniers wrote: > > On Thu, Jun 22, 2023 at 11:18:03PM +0000, Nathan Chancellor wrote: > > > If you wanted to restrict it to just LD_IS_BFD in arch/riscv/Kconfig, > > > that would be fine with me too. > > > > > > select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if LD_IS_BFD > > > > Hi Jisheng, would you mind sending a v3 with the attached patch applied > > on top / at the end of your series? > > Hi Nick, Nathan, Palmer, > > I saw the series has been applied to riscv-next, so I'm not sure which > solution would it be, Palmer to apply Nick's patch to riscv-next or > I to send out v3, any suggestion is appreciated. I don't see what you are seeing w/ riscv/for-next. HEAD is currently at 4681dacadeef ("riscv: replace deprecated scall with ecall") and there are no patches from your series in the branch: https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=for-next Cheers, Conor. > > > Nick said he would work on a report for the LLVM side, so as long as > > > this issue is handled in some way to avoid regressing LLD builds until > > > it is resolved, I don't think there is anything else for the kernel to > > > do. We like to have breadcrumbs via issue links, not sure if the report > > > will be internal to Google or on LLVM's issue tracker though; > > > regardless, we will have to touch this block to add a version check > > > later, at which point we can add a link to the fix in LLD. > > > > https://github.com/ClangBuiltLinux/linux/issues/1881 > > > From 3e5e010958ee41b9fb408cfade8fb017c2fe7169 Mon Sep 17 00:00:00 2001 > > From: Nick Desaulniers <ndesaulniers@google.com> > > Date: Fri, 23 Jun 2023 10:06:17 -0700 > > Subject: [PATCH] riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD > > > > Linking allyesconfig with ld.lld-17 with CONFIG_DEAD_CODE_ELIMINATION=y > > takes hours. Assuming this is a performance regression that can be > > fixed, tentatively disable this for now so that allyesconfig builds > > don't start timing out. If and when there's a fix to ld.lld, this can > > be converted to a version check instead so that users of older but still > > supported versions of ld.lld don't hurt themselves by enabling > > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1881 > > Reported-by: Palmer Dabbelt <palmer@dabbelt.com> > > Suggested-by: Nathan Chancellor <nathan@kernel.org> > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > > --- > > Hi Jisheng, would you mind sending a v3 with this patch on top/at the > > end of your patch series? > > > > arch/riscv/Kconfig | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 8effe5bb7788..0573991e9b78 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -116,7 +116,8 @@ config RISCV > > select HAVE_KPROBES if !XIP_KERNEL > > select HAVE_KPROBES_ON_FTRACE if !XIP_KERNEL > > select HAVE_KRETPROBES if !XIP_KERNEL > > - select HAVE_LD_DEAD_CODE_DATA_ELIMINATION > > + # https://github.com/ClangBuiltLinux/linux/issues/1881 > > + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD > > select HAVE_MOVE_PMD > > select HAVE_MOVE_PUD > > select HAVE_PCI > > -- > > 2.41.0.162.gfafddb0af9-goog > > >
On Sun, 25 Jun 2023 05:43:13 PDT (-0700), Conor Dooley wrote: > On Sun, Jun 25, 2023 at 08:24:56PM +0800, Jisheng Zhang wrote: >> On Fri, Jun 23, 2023 at 10:17:54AM -0700, Nick Desaulniers wrote: >> > On Thu, Jun 22, 2023 at 11:18:03PM +0000, Nathan Chancellor wrote: >> > > If you wanted to restrict it to just LD_IS_BFD in arch/riscv/Kconfig, >> > > that would be fine with me too. >> > > >> > > select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if LD_IS_BFD >> > >> > Hi Jisheng, would you mind sending a v3 with the attached patch applied >> > on top / at the end of your series? >> >> Hi Nick, Nathan, Palmer, >> >> I saw the series has been applied to riscv-next, so I'm not sure which >> solution would it be, Palmer to apply Nick's patch to riscv-next or >> I to send out v3, any suggestion is appreciated. > > I don't see what you are seeing w/ riscv/for-next. HEAD is currently at > 4681dacadeef ("riscv: replace deprecated scall with ecall") and there > are no patches from your series in the branch: > https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=for-next It's been in and out of staging a few times as we tracked down the performance regression, but it shouldn't have ever made it to linux-next for real. I'm fine just picking up the patch to disable DCE, I've got a few other (hopefully small) things to work through first though. > Cheers, > Conor. > >> > > Nick said he would work on a report for the LLVM side, so as long as >> > > this issue is handled in some way to avoid regressing LLD builds until >> > > it is resolved, I don't think there is anything else for the kernel to >> > > do. We like to have breadcrumbs via issue links, not sure if the report >> > > will be internal to Google or on LLVM's issue tracker though; >> > > regardless, we will have to touch this block to add a version check >> > > later, at which point we can add a link to the fix in LLD. >> > >> > https://github.com/ClangBuiltLinux/linux/issues/1881 >> >> > From 3e5e010958ee41b9fb408cfade8fb017c2fe7169 Mon Sep 17 00:00:00 2001 >> > From: Nick Desaulniers <ndesaulniers@google.com> >> > Date: Fri, 23 Jun 2023 10:06:17 -0700 >> > Subject: [PATCH] riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD >> > >> > Linking allyesconfig with ld.lld-17 with CONFIG_DEAD_CODE_ELIMINATION=y >> > takes hours. Assuming this is a performance regression that can be >> > fixed, tentatively disable this for now so that allyesconfig builds >> > don't start timing out. If and when there's a fix to ld.lld, this can >> > be converted to a version check instead so that users of older but still >> > supported versions of ld.lld don't hurt themselves by enabling >> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. >> > >> > Link: https://github.com/ClangBuiltLinux/linux/issues/1881 >> > Reported-by: Palmer Dabbelt <palmer@dabbelt.com> >> > Suggested-by: Nathan Chancellor <nathan@kernel.org> >> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> >> > --- >> > Hi Jisheng, would you mind sending a v3 with this patch on top/at the >> > end of your patch series? >> > >> > arch/riscv/Kconfig | 3 ++- >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> > >> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> > index 8effe5bb7788..0573991e9b78 100644 >> > --- a/arch/riscv/Kconfig >> > +++ b/arch/riscv/Kconfig >> > @@ -116,7 +116,8 @@ config RISCV >> > select HAVE_KPROBES if !XIP_KERNEL >> > select HAVE_KPROBES_ON_FTRACE if !XIP_KERNEL >> > select HAVE_KRETPROBES if !XIP_KERNEL >> > - select HAVE_LD_DEAD_CODE_DATA_ELIMINATION >> > + # https://github.com/ClangBuiltLinux/linux/issues/1881 >> > + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD >> > select HAVE_MOVE_PMD >> > select HAVE_MOVE_PUD >> > select HAVE_PCI >> > -- >> > 2.41.0.162.gfafddb0af9-goog >> > >>
On Sun, Jun 25, 2023 at 1:06 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Sun, 25 Jun 2023 05:43:13 PDT (-0700), Conor Dooley wrote: > > On Sun, Jun 25, 2023 at 08:24:56PM +0800, Jisheng Zhang wrote: > >> On Fri, Jun 23, 2023 at 10:17:54AM -0700, Nick Desaulniers wrote: > >> > On Thu, Jun 22, 2023 at 11:18:03PM +0000, Nathan Chancellor wrote: > >> > > If you wanted to restrict it to just LD_IS_BFD in arch/riscv/Kconfig, > >> > > that would be fine with me too. > >> > > > >> > > select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if LD_IS_BFD > >> > > >> > Hi Jisheng, would you mind sending a v3 with the attached patch applied > >> > on top / at the end of your series? > >> > >> Hi Nick, Nathan, Palmer, > >> > >> I saw the series has been applied to riscv-next, so I'm not sure which > >> solution would it be, Palmer to apply Nick's patch to riscv-next or > >> I to send out v3, any suggestion is appreciated. > > > > I don't see what you are seeing w/ riscv/for-next. HEAD is currently at > > 4681dacadeef ("riscv: replace deprecated scall with ecall") and there > > are no patches from your series in the branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=for-next > > It's been in and out of staging a few times as we tracked down the > performance regression, but it shouldn't have ever made it to linux-next > for real. > > I'm fine just picking up the patch to disable DCE, I've got a few other > (hopefully small) things to work through first though. Note: for GCC, -fpatchable-function-entry= (used by arch/riscv/Kconfig) require GCC 13 for correct garbage collection semantics. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110729 > > Cheers, > > Conor. > > > >> > > Nick said he would work on a report for the LLVM side, so as long as > >> > > this issue is handled in some way to avoid regressing LLD builds until > >> > > it is resolved, I don't think there is anything else for the kernel to > >> > > do. We like to have breadcrumbs via issue links, not sure if the report > >> > > will be internal to Google or on LLVM's issue tracker though; > >> > > regardless, we will have to touch this block to add a version check > >> > > later, at which point we can add a link to the fix in LLD. > >> > > >> > https://github.com/ClangBuiltLinux/linux/issues/1881 > >> > >> > From 3e5e010958ee41b9fb408cfade8fb017c2fe7169 Mon Sep 17 00:00:00 2001 > >> > From: Nick Desaulniers <ndesaulniers@google.com> > >> > Date: Fri, 23 Jun 2023 10:06:17 -0700 > >> > Subject: [PATCH] riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD > >> > > >> > Linking allyesconfig with ld.lld-17 with CONFIG_DEAD_CODE_ELIMINATION=y > >> > takes hours. Assuming this is a performance regression that can be > >> > fixed, tentatively disable this for now so that allyesconfig builds > >> > don't start timing out. If and when there's a fix to ld.lld, this can > >> > be converted to a version check instead so that users of older but still > >> > supported versions of ld.lld don't hurt themselves by enabling > >> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. > >> > > >> > Link: https://github.com/ClangBuiltLinux/linux/issues/1881 > >> > Reported-by: Palmer Dabbelt <palmer@dabbelt.com> > >> > Suggested-by: Nathan Chancellor <nathan@kernel.org> > >> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > >> > --- > >> > Hi Jisheng, would you mind sending a v3 with this patch on top/at the > >> > end of your patch series? > >> > > >> > arch/riscv/Kconfig | 3 ++- > >> > 1 file changed, 2 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > >> > index 8effe5bb7788..0573991e9b78 100644 > >> > --- a/arch/riscv/Kconfig > >> > +++ b/arch/riscv/Kconfig > >> > @@ -116,7 +116,8 @@ config RISCV > >> > select HAVE_KPROBES if !XIP_KERNEL > >> > select HAVE_KPROBES_ON_FTRACE if !XIP_KERNEL > >> > select HAVE_KRETPROBES if !XIP_KERNEL > >> > - select HAVE_LD_DEAD_CODE_DATA_ELIMINATION > >> > + # https://github.com/ClangBuiltLinux/linux/issues/1881 > >> > + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD > >> > select HAVE_MOVE_PMD > >> > select HAVE_MOVE_PUD > >> > select HAVE_PCI > >> > -- > >> > 2.41.0.162.gfafddb0af9-goog > >> > > >> >