mbox series

[dwarves,v3,0/3] permit merging all dwarf cu's for clang lto built binary

Message ID 20210328201400.1426437-1-yhs@fb.com (mailing list archive)
Headers show
Series permit merging all dwarf cu's for clang lto built binary | expand

Message

Yonghong Song March 28, 2021, 8:14 p.m. UTC
For vmlinux built with clang thin-lto or lto for latest bpf-next,
there exist cross cu debuginfo type references. For example,
      compile unit 1:
         tag 10:  type A
      compile unit 2:
         ...
           refer to type A (tag 10 in compile unit 1)
I only checked a few but have seen type A may be a simple type
like "unsigned char" or a complex type like an array of base types.
I am using latest llvm trunk and bpf-next. I suspect llvm12 or
linus tree >= 5.12 rc2 should be able to exhibit the issue as well.
Both thin-lto and lto have the same issues.

Current pahole cannot handle this. It will report types cannot
be found error. Bill Wendling has attempted to fix the issue
with [1] by permitting all tags/types are hashed to the same
hash table and then process cu's one by one. This does not
really work. The reason is that each cu resolves types locally
so for the above example we may have
  compile unit 1:
    type A : type_id = 10
  compile unit 2:
    refer to type A : type A will be resolved as type id = 10
But id 10 refers to compile unit 1, we will get either out
of bound type id or incorrect one.

This patch set is a continuation of Bill's work. We still
increase the hashtable size and traverse all cu's before
recoding and finalization. But instead of creating one-to-one
mapping between debuginfo cu and pahole cu, we just create
one pahole cu, which should solve the above incorrect type
id issue.

This patch set depends on kernel patch [2]
to emit compilation flags for clang lto build so pahole
can properly discover whether to merge cu's or not.

Patch #1 and #2 are refactoring the existing code and
Patch #3 added logic to premit merging all debuginfo cu's
into one pahole cu. The detection for whether merging is
desirable is done by checking the existence of
"clang" compiler and its "lto" option in dwarf producer tag.

[1] https://lore.kernel.org/bpf/20210212211607.2890660-1-morbo@google.com/
[2] https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@fb.com/

Changelogs:
  v2 -> v3:
    . change "return 1" to "return DWARF_CB_ABORT" in
      cus__merge_and_process_cu().
    . add kbuild/bpf link (above [2]) for kernel patch reference.
  v1 -> v2:
    . removed "--merge_cus" option, relied on detections on
      clang compiler and its lto flags.

Yonghong Song (3):
  dwarf_loader: permits flexible HASHTAGS__BITS
  dwarf_loader: factor out common code to initialize a cu
  dwarf_loader: permit merging all dwarf cu's for clang lto built binary

 dwarf_loader.c | 209 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 175 insertions(+), 34 deletions(-)

Comments

Arnaldo Carvalho de Melo March 29, 2021, 5:40 p.m. UTC | #1
Em Sun, Mar 28, 2021 at 01:14:00PM -0700, Yonghong Song escreveu:
> For vmlinux built with clang thin-lto or lto for latest bpf-next,
> there exist cross cu debuginfo type references. For example,
>       compile unit 1:
>          tag 10:  type A
>       compile unit 2:
>          ...
>            refer to type A (tag 10 in compile unit 1)
> I only checked a few but have seen type A may be a simple type
> like "unsigned char" or a complex type like an array of base types.
> I am using latest llvm trunk and bpf-next. I suspect llvm12 or
> linus tree >= 5.12 rc2 should be able to exhibit the issue as well.
> Both thin-lto and lto have the same issues.

Works, now we're again at:

[acme@five pahole]$ time btfdiff vmlinux
real	0m7.679s
user	0m7.337s
sys	0m0.303s
[acme@five pahole]$ time btfdiff vmlinux.clang.thin.LTO
--- /tmp/btfdiff.dwarf.Ls059V	2021-03-29 14:36:02.675859035 -0300
+++ /tmp/btfdiff.btf.rxRd6R	2021-03-29 14:36:02.935864663 -0300
@@ -67255,7 +67255,7 @@ struct cpu_rmap {
 	struct {
 		u16                index;                /*    16     2 */
 		u16                dist;                 /*    18     2 */
-	} near[0]; /*    16     0 */
+	} near[]; /*    16     0 */

 	/* size: 16, cachelines: 1, members: 5 */
 	/* last cacheline: 16 bytes */
@@ -101181,7 +101181,7 @@ struct linux_efi_memreserve {
 	struct {
 		phys_addr_t        base;                 /*    16     8 */
 		phys_addr_t        size;                 /*    24     8 */
-	} entry[0]; /*    16     0 */
+	} entry[]; /*    16     0 */

 	/* size: 16, cachelines: 1, members: 4 */
 	/* last cacheline: 16 bytes */
@@ -113516,7 +113516,7 @@ struct netlink_policy_dump_state {
 	struct {
 		const struct nla_policy  * policy;       /*    16     8 */
 		unsigned int       maxtype;              /*    24     4 */
-	} policies[0]; /*    16     0 */
+	} policies[]; /*    16     0 */

 	/* size: 16, cachelines: 1, members: 4 */
 	/* sum members: 12, holes: 1, sum holes: 4 */

real	0m20.402s
user	0m19.163s
sys	0m1.096s
[acme@five pahole]$

And:

[acme@five pahole]$ ulimit -c 10000000
[acme@five pahole]$
[acme@five pahole]$ file tcp_bbr.o
tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
[acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer
    <d>   DW_AT_producer    : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none
[acme@five pahole]$ fullcircle tcp_bbr.o
/home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault      (core dumped) ${pfunct_bin} --compile $file > $c_output
/tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment
 1435 |  /* si
      |  ^
/tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input
 1433 |  u32 *                      saved_syn;            /*  2184     8 */
      |  ^~~
codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o
/home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault      (core dumped) ${codiff_bin} -q -s $file $o_output
[acme@five pahole]$

Both seem unrelated to what you've done here, I'm investigating it now.

- Arnaldo
Nick Desaulniers March 29, 2021, 11:14 p.m. UTC | #2
(replying manually to https://lore.kernel.org/dwarves/20210328201400.1426437-1-yhs@fb.com/)

I didn't validate or try to use the produced data, but with this and the
kernel patch
https://lore.kernel.org/bpf/20210328064121.2062927-1-yhs@fb.com/

I was able to build a x86_64 defconfig + CONFIG_LTO_CLANG_THIN +
CONFIG_DEBUG_INFO_BTF without further errors.  Thank you for the series! FWIW:

Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Arnaldo Carvalho de Melo March 30, 2021, 3:10 p.m. UTC | #3
Em Mon, Mar 29, 2021 at 02:40:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> [acme@five pahole]$ ulimit -c 10000000
> [acme@five pahole]$
> [acme@five pahole]$ file tcp_bbr.o
> tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
> [acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer
>     <d>   DW_AT_producer    : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none
> [acme@five pahole]$ fullcircle tcp_bbr.o
> /home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault      (core dumped) ${pfunct_bin} --compile $file > $c_output
> /tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment
>  1435 |  /* si
>       |  ^
> /tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input
>  1433 |  u32 *                      saved_syn;            /*  2184     8 */
>       |  ^~~
> codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o
> /home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault      (core dumped) ${codiff_bin} -q -s $file $o_output
> [acme@five pahole]$
> 
> Both seem unrelated to what you've done here, I'm investigating it now.

The fullcircle one, that crashes at the 'codiff' utility is related to
the patch that makes dwarf_cu to allocate space for the hash tables, as
you introduced a destructor for the dwarf_cu hashtables and the dwarf_cu
that was assigned to cu->priv was a local variable, which wasn't much of
a problem because we were not freeing it, as it went away at each loop
iteration, the following patch to that first patch in the series seems
to cure it, I'm folding it into your patch + a commiter note.

- Arnaldo

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 5a1e860da079e04c..3e7875d4ab577f1b 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -150,6 +150,18 @@ static int dwarf_cu__init(struct dwarf_cu *dcu)
 	return 0;
 }
 
+static struct dwarf_cu *dwarf_cu__new(void)
+{
+	struct dwarf_cu *dwarf_cu = zalloc(sizeof(*dwarf_cu));
+
+	if (dwarf_cu != NULL && dwarf_cu__init(dwarf_cu) != 0) {
+		free(dwarf_cu);
+		dwarf_cu = NULL;
+	}
+
+	return dwarf_cu;
+}
+
 static void dwarf_cu__delete(struct cu *cu)
 {
 	struct dwarf_cu *dcu = cu->priv;
@@ -2542,21 +2554,20 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
 		}
 		cu->little_endian = ehdr.e_ident[EI_DATA] == ELFDATA2LSB;
 
-		struct dwarf_cu dcu;
+		struct dwarf_cu *dcu = dwarf_cu__new();
 
-		if (dwarf_cu__init(&dcu) != 0)
+		if (dcu == NULL)
 			return DWARF_CB_ABORT;
 
-		dcu.cu = cu;
-		dcu.type_unit = type_cu ? &type_dcu : NULL;
-		cu->priv = &dcu;
+		dcu->cu = cu;
+		dcu->type_unit = type_cu ? &type_dcu : NULL;
+		cu->priv = dcu;
 		cu->dfops = &dwarf__ops;
 
 		if (die__process_and_recode(cu_die, cu) != 0)
 			return DWARF_CB_ABORT;
 
-		if (finalize_cu_immediately(cus, cu, &dcu, conf)
-		    == LSK__STOP_LOADING)
+		if (finalize_cu_immediately(cus, cu, dcu, conf) == LSK__STOP_LOADING)
 			return DWARF_CB_ABORT;
 
 		off = noff;
Arnaldo Carvalho de Melo March 30, 2021, 6:08 p.m. UTC | #4
Em Tue, Mar 30, 2021 at 12:10:10PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Mar 29, 2021 at 02:40:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> > [acme@five pahole]$ ulimit -c 10000000
> > [acme@five pahole]$
> > [acme@five pahole]$ file tcp_bbr.o
> > tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
> > [acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer
> >     <d>   DW_AT_producer    : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none
> > [acme@five pahole]$ fullcircle tcp_bbr.o
> > /home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault      (core dumped) ${pfunct_bin} --compile $file > $c_output
> > /tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment
> >  1435 |  /* si
> >       |  ^
> > /tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input
> >  1433 |  u32 *                      saved_syn;            /*  2184     8 */
> >       |  ^~~
> > codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o
> > /home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault      (core dumped) ${codiff_bin} -q -s $file $o_output
> > [acme@five pahole]$
> > 
> > Both seem unrelated to what you've done here, I'm investigating it now.
> 
> The fullcircle one, that crashes at the 'codiff' utility is related to
> the patch that makes dwarf_cu to allocate space for the hash tables, as
> you introduced a destructor for the dwarf_cu hashtables and the dwarf_cu
> that was assigned to cu->priv was a local variable, which wasn't much of
> a problem because we were not freeing it, as it went away at each loop
> iteration, the following patch to that first patch in the series seems
> to cure it, I'm folding it into your patch + a commiter note.

[acme@five pahole]$ codiff tcp_bbr.o /tmp/fullcircle.ceBLyj.o
/home/acme/git/linux/net/ipv4/tcp_bbr.c:
  bbr_unregister                    |   -6
  __compiletime_assert_691          |   +0
  bbr_register                      |  -11
  bbr_ssthresh                      |  -76
  bbr_undo_cwnd                     | -101
  bbr_sndbuf_expand                 |  -11
  bbr_init                          | -385
  bbr_main                          | -2640
  bbr_lt_bw_sampling                | -803
  bbr_packets_in_net_at_edt         | -212
  bbr_inflight                      | -172
  __compiletime_assert_655          |   +0
  bbr_set_pacing_rate               | -182
  kcsan_check_access                |   +6
  kasan_check_write                 |  +14
  tcp_unregister_congestion_control |   +0
  tcp_register_congestion_control   |   +0
  minmax_running_max                |   +0
  prandom_u32                       |   +0
  __warn_printk                     |   +0
  __stack_chk_fail                  |   +0
 21 functions changed, 20 bytes added, 4599 bytes removed, diff: -4579
[acme@five pahole]$
[acme@five pahole]$
[acme@five pahole]$ fullcircle tcp_bbr.o
[acme@five pahole]$

This one is dealt with, doing some more tests and looking at that
array[] versus array[0].

- Arnaldo
Arnaldo Carvalho de Melo March 30, 2021, 6:24 p.m. UTC | #5
Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> [acme@five pahole]$
> [acme@five pahole]$
> [acme@five pahole]$ fullcircle tcp_bbr.o
> [acme@five pahole]$
> 
> This one is dealt with, doing some more tests and looking at that
> array[] versus array[0].

I've pushed what I have to the main repos at kernel.org and github,
please check, I'll continue from there.

- Arnaldo
Yonghong Song March 31, 2021, 12:29 a.m. UTC | #6
On 3/30/21 8:10 AM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Mar 29, 2021 at 02:40:05PM -0300, Arnaldo Carvalho de Melo escreveu:
>> [acme@five pahole]$ ulimit -c 10000000
>> [acme@five pahole]$
>> [acme@five pahole]$ file tcp_bbr.o
>> tcp_bbr.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), with debug_info, not stripped
>> [acme@five pahole]$ readelf -wi tcp_bbr.o | grep DW_AT_producer
>>      <d>   DW_AT_producer    : (indirect string, offset: 0x4a97): GNU C89 10.2.1 20200723 (Red Hat 10.2.1-1) -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -mindirect-branch=thunk-extern -mindirect-branch-register -mrecord-mcount -mfentry -march=x86-64 -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -falign-jumps=1 -falign-loops=1 -fno-asynchronous-unwind-tables -fno-jump-tables -fno-delete-null-pointer-checks -fno-allow-store-data-races -fstack-protector-strong -fno-var-tracking-assignments -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fcf-protection=none
>> [acme@five pahole]$ fullcircle tcp_bbr.o
>> /home/acme/bin/fullcircle: line 38: 3969006 Segmentation fault      (core dumped) ${pfunct_bin} --compile $file > $c_output
>> /tmp/fullcircle.4XujnI.c:1435:2: error: unterminated comment
>>   1435 |  /* si
>>        |  ^
>> /tmp/fullcircle.4XujnI.c:1433:2: error: expected specifier-qualifier-list at end of input
>>   1433 |  u32 *                      saved_syn;            /*  2184     8 */
>>        |  ^~~
>> codiff: couldn't load debugging info from /tmp/fullcircle.ZOVXGv.o
>> /home/acme/bin/fullcircle: line 40: 3969019 Segmentation fault      (core dumped) ${codiff_bin} -q -s $file $o_output
>> [acme@five pahole]$
>>
>> Both seem unrelated to what you've done here, I'm investigating it now.
> 
> The fullcircle one, that crashes at the 'codiff' utility is related to
> the patch that makes dwarf_cu to allocate space for the hash tables, as
> you introduced a destructor for the dwarf_cu hashtables and the dwarf_cu
> that was assigned to cu->priv was a local variable, which wasn't much of
> a problem because we were not freeing it, as it went away at each loop
> iteration, the following patch to that first patch in the series seems
> to cure it, I'm folding it into your patch + a commiter note.

Thanks for the fix!

> 
> - Arnaldo
> 
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index 5a1e860da079e04c..3e7875d4ab577f1b 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -150,6 +150,18 @@ static int dwarf_cu__init(struct dwarf_cu *dcu)
>   	return 0;
>   }
>   
[...]
Yonghong Song March 31, 2021, 3:20 a.m. UTC | #7
On 3/30/21 11:24 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu:
>> [acme@five pahole]$
>> [acme@five pahole]$
>> [acme@five pahole]$ fullcircle tcp_bbr.o
>> [acme@five pahole]$
>>
>> This one is dealt with, doing some more tests and looking at that
>> array[] versus array[0].
> 
> I've pushed what I have to the main repos at kernel.org and github,
> please check, I'll continue from there.

Looks good. Thanks!

I will try to experiment with an alternative way ([1]) to check whether
cross-cu reference happens or not. But at least checking flags
approach can be adapted to gcc (if we want after comparing the 
alternative) since gcc always has flags in dwarf.

[1] 
https://lore.kernel.org/bpf/d34a3d62-bae8-3a30-26b6-4e5e8efcd0af@fb.com/T/#m1b0b1206091c19a90b15d054aa26239101289f84

> 
> - Arnaldo
>
Arnaldo Carvalho de Melo March 31, 2021, 1:54 p.m. UTC | #8
Em Tue, Mar 30, 2021 at 08:20:20PM -0700, Yonghong Song escreveu:
> On 3/30/21 11:24 AM, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > [acme@five pahole]$ fullcircle tcp_bbr.o
> > > [acme@five pahole]$

> > > This one is dealt with, doing some more tests and looking at that
> > > array[] versus array[0].

> > I've pushed what I have to the main repos at kernel.org and github,
> > please check, I'll continue from there.

> Looks good. Thanks!

> I will try to experiment with an alternative way ([1]) to check whether
> cross-cu reference happens or not. But at least checking flags
> approach can be adapted to gcc (if we want after comparing the alternative)
> since gcc always has flags in dwarf.
 
> [1] https://lore.kernel.org/bpf/d34a3d62-bae8-3a30-26b6-4e5e8efcd0af@fb.com/T/#m1b0b1206091c19a90b15d054aa26239101289f84

I thought about some other method, like adding a ELF note to vmlinux
stating that this was built with LTO, that would be the fastest way, I
think. If that note wasn't there, then we would fallback to looking at
inter CU references, that way we would have the best of both worlds and
wouldn't incur in per-CU DW_AT_producer overheads with the flags for
each object file.

- Arnaldo
Yonghong Song March 31, 2021, 3:08 p.m. UTC | #9
On 3/31/21 6:54 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 30, 2021 at 08:20:20PM -0700, Yonghong Song escreveu:
>> On 3/30/21 11:24 AM, Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Mar 30, 2021 at 03:08:06PM -0300, Arnaldo Carvalho de Melo escreveu:
>>>> [acme@five pahole]$ fullcircle tcp_bbr.o
>>>> [acme@five pahole]$
> 
>>>> This one is dealt with, doing some more tests and looking at that
>>>> array[] versus array[0].
> 
>>> I've pushed what I have to the main repos at kernel.org and github,
>>> please check, I'll continue from there.
> 
>> Looks good. Thanks!
> 
>> I will try to experiment with an alternative way ([1]) to check whether
>> cross-cu reference happens or not. But at least checking flags
>> approach can be adapted to gcc (if we want after comparing the alternative)
>> since gcc always has flags in dwarf.
>   
>> [1] https://lore.kernel.org/bpf/d34a3d62-bae8-3a30-26b6-4e5e8efcd0af@fb.com/T/#m1b0b1206091c19a90b15d054aa26239101289f84
> 
> I thought about some other method, like adding a ELF note to vmlinux
> stating that this was built with LTO, that would be the fastest way, I

Adding to the ELF .notes is a great idea. Let me explore it. Thanks!

> think. If that note wasn't there, then we would fallback to looking at
> inter CU references, that way we would have the best of both worlds and
> wouldn't incur in per-CU DW_AT_producer overheads with the flags for
> each object file.

Totally agree.

> 
> - Arnaldo
>