Message ID | 20210328201400.1426437-1-yhs@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | permit merging all dwarf cu's for clang lto built binary | expand |
Em Sun, Mar 28, 2021 at 01:14:00PM -0700, Yonghong Song escreveu: > For vmlinux built with clang thin-lto or lto for latest bpf-next, > there exist cross cu debuginfo type references. For example, > compile unit 1: > tag 10: type A > compile unit 2: > ... > refer to type A (tag 10 in compile unit 1) > I only checked a few but have seen type A may be a simple type > like "unsigned char" or a complex type like an array of base types. > I am using latest llvm trunk and bpf-next. I suspect llvm12 or > linus tree >= 5.12 rc2 should be able to exhibit the issue as well. > Both thin-lto and lto have the same issues. Works, now we're again at: [acme@five pahole]$ time btfdiff vmlinux real 0m7.679s user 0m7.337s sys 0m0.303s [acme@five pahole]$ time btfdiff vmlinux.clang.thin.LTO --- /tmp/btfdiff.dwarf.Ls059V 2021-03-29 14:36:02.675859035 -0300 +++ /tmp/btfdiff.btf.rxRd6R 2021-03-29 14:36:02.935864663 -0300 @@ -67255,7 +67255,7 @@ struct cpu_rmap { struct { u16 index; /* 16 2 */ u16 dist; /* 18 2 */ - } near[0]; /* 16 0 */ + } near[]; /* 16 0 */ /* size: 16, cachelines: 1, members: 5 */ /* last cacheline: 16 bytes */ @@ -101181,7 +101181,7 @@ struct linux_efi_memreserve { struct { phys_addr_t base; /* 16 8 */ phys_addr_t size; /* 24 8 */ - } entry[0]; /* 16 0 */ + } entry[]; /* 16 0 */ /* size: 16, cachelines: 1, members: 4 */ /* last cacheline: 16 bytes */ @@ -113516,7 +113516,7 @@ struct netlink_policy_dump_state { struct { const struct nla_policy * policy; /* 16 8 */ unsigned int maxtype; /* 24 4 */ - } policies[0]; /* 16 0 */ + } policies[]; /* 16 0 */ /* size: 16, cachelines: 1, members: 4 */ /* sum members: 12, holes: 1, sum holes: 4 */ real 0m20.402s user 0m19.163s sys 0m1.096s [acme@five pahole]$ And: [acme@five pahole]$ ulimit -c 10000000 [acme@five pahole]$ [acme@five pahole]$ file tcp_bbr.o tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped [acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer <d> DW_AT_producer : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none [acme@five pahole]$ fullcircle tcp_bbr.o /home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault (core dumped) ${pfunct_bin} --compile $file > $c_output /tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment 1435 | /* si | ^ /tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input 1433 | u32 * saved_syn; /* 2184 8 */ | ^~~ codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o /home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault (core dumped) ${codiff_bin} -q -s $file $o_output [acme@five pahole]$ Both seem unrelated to what you've done here, I'm investigating it now. - Arnaldo
(replying manually to https://lore.kernel.org/dwarves/20210328201400.1426437-1-yhs@fb.com/)
I didn't validate or try to use the produced data, but with this and the
kernel patch
https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@fb.com/
I was able to build a x86_64 defconfig + CONFIG_LTO_CLANG_THIN +
CONFIG_DEBUG_INFO_BTF without further errors. Thank you for the series! FWIW:
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Em Mon, Mar 29, 2021 at 02:40:05PM -0300, Arnaldo Carvalho de Melo escreveu: > [acme@five pahole]$ ulimit -c 10000000 > [acme@five pahole]$ > [acme@five pahole]$ file tcp_bbr.o > tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped > [acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer > <d> DW_AT_producer : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none > [acme@five pahole]$ fullcircle tcp_bbr.o > /home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault (core dumped) ${pfunct_bin} --compile $file > $c_output > /tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment > 1435 | /* si > | ^ > /tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input > 1433 | u32 * saved_syn; /* 2184 8 */ > | ^~~ > codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o > /home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault (core dumped) ${codiff_bin} -q -s $file $o_output > [acme@five pahole]$ > > Both seem unrelated to what you've done here, I'm investigating it now. The fullcircle one, that crashes at the 'codiff' utility is related to the patch that makes dwarf_cu to allocate space for the hash tables, as you introduced a destructor for the dwarf_cu hashtables and the dwarf_cu that was assigned to cu->priv was a local variable, which wasn't much of a problem because we were not freeing it, as it went away at each loop iteration, the following patch to that first patch in the series seems to cure it, I'm folding it into your patch + a commiter note. - Arnaldo diff --git a/dwarf_loader.c b/dwarf_loader.c index 5a1e860da079e04c..3e7875d4ab577f1b 100644 --- a/dwarf_loader.c +++ b/dwarf_loader.c @@ -150,6 +150,18 @@ static int dwarf_cu__init(struct dwarf_cu *dcu) return 0; } +static struct dwarf_cu *dwarf_cu__new(void) +{ + struct dwarf_cu *dwarf_cu = zalloc(sizeof(*dwarf_cu)); + + if (dwarf_cu != NULL && dwarf_cu__init(dwarf_cu) != 0) { + free(dwarf_cu); + dwarf_cu = NULL; + } + + return dwarf_cu; +} + static void dwarf_cu__delete(struct cu *cu) { struct dwarf_cu *dcu = cu->priv; @@ -2542,21 +2554,20 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf, } cu->little_endian = ehdr.e_ident[EI_DATA] == ELFDATA2LSB; - struct dwarf_cu dcu; + struct dwarf_cu *dcu = dwarf_cu__new(); - if (dwarf_cu__init(&dcu) != 0) + if (dcu == NULL) return DWARF_CB_ABORT; - dcu.cu = cu; - dcu.type_unit = type_cu ? &type_dcu : NULL; - cu->priv = &dcu; + dcu->cu = cu; + dcu->type_unit = type_cu ? &type_dcu : NULL; + cu->priv = dcu; cu->dfops = &dwarf__ops; if (die__process_and_recode(cu_die, cu) != 0) return DWARF_CB_ABORT; - if (finalize_cu_immediately(cus, cu, &dcu, conf) - == LSK__STOP_LOADING) + if (finalize_cu_immediately(cus, cu, dcu, conf) == LSK__STOP_LOADING) return DWARF_CB_ABORT; off = noff;
Em Tue, Mar 30, 2021 at 12:10:10PM -0300, Arnaldo Carvalho de Melo escreveu: > Em Mon, Mar 29, 2021 at 02:40:05PM -0300, Arnaldo Carvalho de Melo escreveu: > > [acme@five pahole]$ ulimit -c 10000000 > > [acme@five pahole]$ > > [acme@five pahole]$ file tcp_bbr.o > > tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped > > [acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer > > <d> DW_AT_producer : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none > > [acme@five pahole]$ fullcircle tcp_bbr.o > > /home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault (core dumped) ${pfunct_bin} --compile $file > $c_output > > /tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment > > 1435 | /* si > > | ^ > > /tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input > > 1433 | u32 * saved_syn; /* 2184 8 */ > > | ^~~ > > codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o > > /home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault (core dumped) ${codiff_bin} -q -s $file $o_output > > [acme@five pahole]$ > > > > Both seem unrelated to what you've done here, I'm investigating it now. > > The fullcircle one, that crashes at the 'codiff' utility is related to > the patch that makes dwarf_cu to allocate space for the hash tables, as > you introduced a destructor for the dwarf_cu hashtables and the dwarf_cu > that was assigned to cu->priv was a local variable, which wasn't much of > a problem because we were not freeing it, as it went away at each loop > iteration, the following patch to that first patch in the series seems > to cure it, I'm folding it into your patch + a commiter note. [acme@five pahole]$ codiff tcp_bbr.o /tmp/fullcircle.ceBLyj.o /home/acme/git/linux/net/ipv4/tcp_bbr.c: bbr_unregister | -6 __compiletime_assert_691 | +0 bbr_register | -11 bbr_ssthresh | -76 bbr_undo_cwnd | -101 bbr_sndbuf_expand | -11 bbr_init | -385 bbr_main | -2640 bbr_lt_bw_sampling | -803 bbr_packets_in_net_at_edt | -212 bbr_inflight | -172 __compiletime_assert_655 | +0 bbr_set_pacing_rate | -182 kcsan_check_access | +6 kasan_check_write | +14 tcp_unregister_congestion_control | +0 tcp_register_congestion_control | +0 minmax_running_max | +0 prandom_u32 | +0 __warn_printk | +0 __stack_chk_fail | +0 21 functions changed, 20 bytes added, 4599 bytes removed, diff: -4579 [acme@five pahole]$ [acme@five pahole]$ [acme@five pahole]$ fullcircle tcp_bbr.o [acme@five pahole]$ This one is dealt with, doing some more tests and looking at that array[] versus array[0]. - Arnaldo
Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu: > [acme@five pahole]$ > [acme@five pahole]$ > [acme@five pahole]$ fullcircle tcp_bbr.o > [acme@five pahole]$ > > This one is dealt with, doing some more tests and looking at that > array[] versus array[0]. I've pushed what I have to the main repos at kernel.org and github, please check, I'll continue from there. - Arnaldo
On 3/30/21 8:10 AM, Arnaldo Carvalho de Melo wrote: > Em Mon, Mar 29, 2021 at 02:40:05PM -0300, Arnaldo Carvalho de Melo escreveu: >> [acme@five pahole]$ ulimit -c 10000000 >> [acme@five pahole]$ >> [acme@five pahole]$ file tcp_bbr.o >> tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped >> [acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer >> <d> DW_AT_producer : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none >> [acme@five pahole]$ fullcircle tcp_bbr.o >> /home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault (core dumped) ${pfunct_bin} --compile $file > $c_output >> /tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment >> 1435 | /* si >> | ^ >> /tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input >> 1433 | u32 * saved_syn; /* 2184 8 */ >> | ^~~ >> codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o >> /home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault (core dumped) ${codiff_bin} -q -s $file $o_output >> [acme@five pahole]$ >> >> Both seem unrelated to what you've done here, I'm investigating it now. > > The fullcircle one, that crashes at the 'codiff' utility is related to > the patch that makes dwarf_cu to allocate space for the hash tables, as > you introduced a destructor for the dwarf_cu hashtables and the dwarf_cu > that was assigned to cu->priv was a local variable, which wasn't much of > a problem because we were not freeing it, as it went away at each loop > iteration, the following patch to that first patch in the series seems > to cure it, I'm folding it into your patch + a commiter note. Thanks for the fix! > > - Arnaldo > > diff --git a/dwarf_loader.c b/dwarf_loader.c > index 5a1e860da079e04c..3e7875d4ab577f1b 100644 > --- a/dwarf_loader.c > +++ b/dwarf_loader.c > @@ -150,6 +150,18 @@ static int dwarf_cu__init(struct dwarf_cu *dcu) > return 0; > } > [...]
On 3/30/21 11:24 AM, Arnaldo Carvalho de Melo wrote: > Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu: >> [acme@five pahole]$ >> [acme@five pahole]$ >> [acme@five pahole]$ fullcircle tcp_bbr.o >> [acme@five pahole]$ >> >> This one is dealt with, doing some more tests and looking at that >> array[] versus array[0]. > > I've pushed what I have to the main repos at kernel.org and github, > please check, I'll continue from there. Looks good. Thanks! I will try to experiment with an alternative way ([1]) to check whether cross-cu reference happens or not. But at least checking flags approach can be adapted to gcc (if we want after comparing the alternative) since gcc always has flags in dwarf. [1] https://lore.kernel.org/bpf/d34a3d62-bae8-3a30-26b6-4e5e8efcd0af@fb.com/T/#m1b0b1206091c19a90b15d054aa26239101289f84 > > - Arnaldo >
Em Tue, Mar 30, 2021 at 08:20:20PM -0700, Yonghong Song escreveu: > On 3/30/21 11:24 AM, Arnaldo Carvalho de Melo wrote: > > Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu: > > > [acme@five pahole]$ fullcircle tcp_bbr.o > > > [acme@five pahole]$ > > > This one is dealt with, doing some more tests and looking at that > > > array[] versus array[0]. > > I've pushed what I have to the main repos at kernel.org and github, > > please check, I'll continue from there. > Looks good. Thanks! > I will try to experiment with an alternative way ([1]) to check whether > cross-cu reference happens or not. But at least checking flags > approach can be adapted to gcc (if we want after comparing the alternative) > since gcc always has flags in dwarf. > [1] https://lore.kernel.org/bpf/d34a3d62-bae8-3a30-26b6-4e5e8efcd0af@fb.com/T/#m1b0b1206091c19a90b15d054aa26239101289f84 I thought about some other method, like adding a ELF note to vmlinux stating that this was built with LTO, that would be the fastest way, I think. If that note wasn't there, then we would fallback to looking at inter CU references, that way we would have the best of both worlds and wouldn't incur in per-CU DW_AT_producer overheads with the flags for each object file. - Arnaldo
On 3/31/21 6:54 AM, Arnaldo Carvalho de Melo wrote: > Em Tue, Mar 30, 2021 at 08:20:20PM -0700, Yonghong Song escreveu: >> On 3/30/21 11:24 AM, Arnaldo Carvalho de Melo wrote: >>> Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu: >>>> [acme@five pahole]$ fullcircle tcp_bbr.o >>>> [acme@five pahole]$ > >>>> This one is dealt with, doing some more tests and looking at that >>>> array[] versus array[0]. > >>> I've pushed what I have to the main repos at kernel.org and github, >>> please check, I'll continue from there. > >> Looks good. Thanks! > >> I will try to experiment with an alternative way ([1]) to check whether >> cross-cu reference happens or not. But at least checking flags >> approach can be adapted to gcc (if we want after comparing the alternative) >> since gcc always has flags in dwarf. > >> [1] https://lore.kernel.org/bpf/d34a3d62-bae8-3a30-26b6-4e5e8efcd0af@fb.com/T/#m1b0b1206091c19a90b15d054aa26239101289f84 > > I thought about some other method, like adding a ELF note to vmlinux > stating that this was built with LTO, that would be the fastest way, I Adding to the ELF .notes is a great idea. Let me explore it. Thanks! > think. If that note wasn't there, then we would fallback to looking at > inter CU references, that way we would have the best of both worlds and > wouldn't incur in per-CU DW_AT_producer overheads with the flags for > each object file. Totally agree. > > - Arnaldo >