Message ID | 20221025222802.2295103-2-eddyz87@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | Use uapi kernel headers with vmlinux.h | expand |
On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > Deduplicate forward declarations that don't take part in type graphs > comparisons if declaration name is unambiguous. Example: > > CU #1: > > struct foo; // standalone forward declaration > struct foo *some_global; > > CU #2: > > struct foo { int x; }; > struct foo *another_global; > > The `struct foo` from CU #1 is not a part of any definition that is > compared against another definition while `btf_dedup_struct_types` > processes structural types. The the BTF after `btf_dedup_struct_types` > the BTF looks as follows: > > [1] STRUCT 'foo' size=4 vlen=1 ... > [2] INT 'int' size=4 ... > [3] PTR '(anon)' type_id=1 > [4] FWD 'foo' fwd_kind=struct > [5] PTR '(anon)' type_id=4 > > This commit adds a new pass `btf_dedup_standalone_fwds`, that maps > such forward declarations to structs or unions with identical name in > case if the name is not ambiguous. > > The pass is positioned before `btf_dedup_ref_types` so that types > [3] and [5] could be merged as a same type after [1] and [4] are merged. > The final result for the example above looks as follows: > > [1] STRUCT 'foo' size=4 vlen=1 > 'x' type_id=2 bits_offset=0 > [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > [3] PTR '(anon)' type_id=1 > > For defconfig kernel with BTF enabled this removes 63 forward > declarations. Examples of removed declarations: `pt_regs`, `in6_addr`. > The running time of `btf__dedup` function is increased by about 3%. > What about modules, can you share stats for module BTFs? Also cc Alan as he was looking at BTF dedup improvements for kernel module BTF dedup. > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> > --- > tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 174 insertions(+), 4 deletions(-) > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c > index d88647da2c7f..c34c68d8e8a0 100644 > --- a/tools/lib/bpf/btf.c > +++ b/tools/lib/bpf/btf.c > @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d); > static int btf_dedup_prim_types(struct btf_dedup *d); > static int btf_dedup_struct_types(struct btf_dedup *d); > static int btf_dedup_ref_types(struct btf_dedup *d); > +static int btf_dedup_standalone_fwds(struct btf_dedup *d); > static int btf_dedup_compact_types(struct btf_dedup *d); > static int btf_dedup_remap_types(struct btf_dedup *d); > > @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d); > * Algorithm summary > * ================= > * > - * Algorithm completes its work in 6 separate passes: > + * Algorithm completes its work in 7 separate passes: > * > * 1. Strings deduplication. > * 2. Primitive types deduplication (int, enum, fwd). > * 3. Struct/union types deduplication. > - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func > + * 4. Standalone fwd declarations deduplication. Let's call this "Resolve unambiguous forward declarations", we don't really deduplicate anything. And call the function btf_dedup_resolve_fwds()? > + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func > * protos, and const/volatile/restrict modifiers). > - * 5. Types compaction. > - * 6. Types remapping. > + * 6. Types compaction. > + * 7. Types remapping. > * > * Algorithm determines canonical type descriptor, which is a single > * representative type for each truly unique type. This canonical type is the > @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts) > pr_debug("btf_dedup_struct_types failed:%d\n", err); > goto done; > } > + err = btf_dedup_standalone_fwds(d); > + if (err < 0) { > + pr_debug("btf_dedup_standalone_fwd failed:%d\n", err); > + goto done; > + } > err = btf_dedup_ref_types(d); > if (err < 0) { > pr_debug("btf_dedup_ref_types failed:%d\n", err); > @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d) > return 0; > } > > +/* > + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32). > + * > + * The __u32 key/value representations are cast to `void *` before passing > + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced. > + * > + */ > +static struct hashmap *name_off_map__new(void) > +{ > + return hashmap__new(btf_dedup_identity_hash_fn, > + btf_dedup_equal_fn, > + NULL); > +} is there a point in name_off_map__new and name_off_map__find wrappers except to add one extra function to jump through when reading the code? If you look at other uses of hashmaps in this file, we use the directly. Let's drop those. > + > +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id) > +{ > + /* This has to be sizeof(void *) in order to be passed to hashmap__find */ > + void *tmp; > + int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp); but this (void *) casting everything was an error in API design, mea culpa. I've been wanting to switch hashmap to use long as key/value type for a long while, maybe let's do it now, as we are adding even more code that looks weird? It seems like accepting long will make hashmap API usage cleaner in most cases. There are not a lot of places where we use hashmap APIs in libbpf, but we'll also need to fix up bpftool usage, and I believe perf copy/pasted hashmap.h (cc Arnaldo), so we'd need to make sure to not break all that. But good thing it's all in the same repo and we can convert them at the same time with no breakage. WDYT? > + /* > + * __u64 cast is necessary to avoid pointer to integer conversion size warning. > + * It is fine to get rid of this warning as `void *` is used as an integer value. > + */ > + if (found) > + *type_id = (__u64)tmp; > + return found; > +} > + > +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id) > +{ > + return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id, > + NULL, NULL); > +} this function will also be completely unnecessary with longs > + > +/* > + * Collect a `name_off_map` that maps type names to type ids for all > + * canonical structs and unions. If the same name is shared by several > + * canonical types use a special value 0 to indicate this fact. > + */ > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map) > +{ > + int i, err = 0; > + __u32 type_id, collision_id; > + __u16 kind; > + struct btf_type *t; > + > + for (i = 0; i < d->btf->nr_types; i++) { > + type_id = d->btf->start_id + i; > + t = btf_type_by_id(d->btf, type_id); > + kind = btf_kind(t); > + > + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) > + continue; let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0 > + > + /* Skip non-canonical types */ > + if (type_id != d->map[type_id]) > + continue; > + > + err = 0; > + if (name_off_map__find(names_map, t->name_off, &collision_id)) { > + /* Mark non-unique names with 0 */ > + if (collision_id != 0 && collision_id != type_id) > + err = name_off_map__set(names_map, t->name_off, 0); > + } else { > + err = name_off_map__set(names_map, t->name_off, type_id); > + } err = hashmap__add(..., t->name_off, type_id); if (err == -EEXISTS) { hashmap__set(..., 0); return 0; } see comment for hashmap_insert_strategy in hashmap.h > + > + if (err < 0) > + return err; > + } > + > + return 0; > +} > + > +static int btf_dedup_standalone_fwd(struct btf_dedup *d, > + struct hashmap *names_map, > + __u32 type_id) > +{ > + struct btf_type *t = btf_type_by_id(d->btf, type_id); > + __u16 kind = btf_kind(t); > + enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info); > + nit: don't break variables block in two parts, there shouldn't be empty lines also please use btf_kflag(t) > + struct btf_type *cand_t; > + __u16 cand_kind; > + __u32 cand_id = 0; > + > + if (kind != BTF_KIND_FWD) > + return 0; > + > + /* Skip if this FWD already has a mapping */ > + if (type_id != d->map[type_id]) > + return 0; > + [...]
On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote: > On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > Deduplicate forward declarations that don't take part in type graphs > > comparisons if declaration name is unambiguous. Example: > > > > CU #1: > > > > struct foo; // standalone forward declaration > > struct foo *some_global; > > > > CU #2: > > > > struct foo { int x; }; > > struct foo *another_global; > > > > The `struct foo` from CU #1 is not a part of any definition that is > > compared against another definition while `btf_dedup_struct_types` > > processes structural types. The the BTF after `btf_dedup_struct_types` > > the BTF looks as follows: > > > > [1] STRUCT 'foo' size=4 vlen=1 ... > > [2] INT 'int' size=4 ... > > [3] PTR '(anon)' type_id=1 > > [4] FWD 'foo' fwd_kind=struct > > [5] PTR '(anon)' type_id=4 > > > > This commit adds a new pass `btf_dedup_standalone_fwds`, that maps > > such forward declarations to structs or unions with identical name in > > case if the name is not ambiguous. > > > > The pass is positioned before `btf_dedup_ref_types` so that types > > [3] and [5] could be merged as a same type after [1] and [4] are merged. > > The final result for the example above looks as follows: > > > > [1] STRUCT 'foo' size=4 vlen=1 > > 'x' type_id=2 bits_offset=0 > > [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > > [3] PTR '(anon)' type_id=1 > > > > For defconfig kernel with BTF enabled this removes 63 forward > > declarations. Examples of removed declarations: `pt_regs`, `in6_addr`. > > The running time of `btf__dedup` function is increased by about 3%. > > > > What about modules, can you share stats for module BTFs? > > Also cc Alan as he was looking at BTF dedup improvements for kernel > module BTF dedup. > > > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> > > --- > > tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 174 insertions(+), 4 deletions(-) > > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c > > index d88647da2c7f..c34c68d8e8a0 100644 > > --- a/tools/lib/bpf/btf.c > > +++ b/tools/lib/bpf/btf.c > > @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d); > > static int btf_dedup_prim_types(struct btf_dedup *d); > > static int btf_dedup_struct_types(struct btf_dedup *d); > > static int btf_dedup_ref_types(struct btf_dedup *d); > > +static int btf_dedup_standalone_fwds(struct btf_dedup *d); > > static int btf_dedup_compact_types(struct btf_dedup *d); > > static int btf_dedup_remap_types(struct btf_dedup *d); > > > > @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d); > > * Algorithm summary > > * ================= > > * > > - * Algorithm completes its work in 6 separate passes: > > + * Algorithm completes its work in 7 separate passes: > > * > > * 1. Strings deduplication. > > * 2. Primitive types deduplication (int, enum, fwd). > > * 3. Struct/union types deduplication. > > - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func > > + * 4. Standalone fwd declarations deduplication. > > Let's call this "Resolve unambiguous forward declarations", we don't > really deduplicate anything. And call the function > btf_dedup_resolve_fwds()? > > > + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func > > * protos, and const/volatile/restrict modifiers). > > - * 5. Types compaction. > > - * 6. Types remapping. > > + * 6. Types compaction. > > + * 7. Types remapping. > > * > > * Algorithm determines canonical type descriptor, which is a single > > * representative type for each truly unique type. This canonical type is the > > @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts) > > pr_debug("btf_dedup_struct_types failed:%d\n", err); > > goto done; > > } > > + err = btf_dedup_standalone_fwds(d); > > + if (err < 0) { > > + pr_debug("btf_dedup_standalone_fwd failed:%d\n", err); > > + goto done; > > + } > > err = btf_dedup_ref_types(d); > > if (err < 0) { > > pr_debug("btf_dedup_ref_types failed:%d\n", err); > > @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d) > > return 0; > > } > > > > +/* > > + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32). > > + * > > + * The __u32 key/value representations are cast to `void *` before passing > > + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced. > > + * > > + */ > > +static struct hashmap *name_off_map__new(void) > > +{ > > + return hashmap__new(btf_dedup_identity_hash_fn, > > + btf_dedup_equal_fn, > > + NULL); > > +} > > is there a point in name_off_map__new and name_off_map__find wrappers > except to add one extra function to jump through when reading the > code? If you look at other uses of hashmaps in this file, we use the > directly. Let's drop those. > > > + > > +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id) > > +{ > > + /* This has to be sizeof(void *) in order to be passed to hashmap__find */ > > + void *tmp; > > + int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp); > > but this (void *) casting everything was an error in API design, mea > culpa. I've been wanting to switch hashmap to use long as key/value > type for a long while, maybe let's do it now, as we are adding even > more code that looks weird? It seems like accepting long will make > hashmap API usage cleaner in most cases. There are not a lot of places > where we use hashmap APIs in libbpf, but we'll also need to fix up > bpftool usage, and I believe perf copy/pasted hashmap.h (cc Arnaldo), > so we'd need to make sure to not break all that. But good thing it's > all in the same repo and we can convert them at the same time with no > breakage. > > WDYT? Well, I did the change, excluding tests it amounts to: - 15 files changed, 114 insertions(+), 137 deletions(-); - 45 casts removed; - 30 casts added. TBH, it seems like I should just use "u32_as_hash_field" and be done with it. In any case I'll post this as a part of v1 series for "libbpf: Resolve unambigous forward declarations". To account for a case when map has to store pointers and pointers are 32 bit I chose to update the map interface to be "uintptr_t -> uintptr_t". Had it been "u64 -> u64" the additional temporary variable would be necessary for "old" values, e.g. while working with hashmap__insert. (Contrary to what we discussed on Friday). > > > + /* > > + * __u64 cast is necessary to avoid pointer to integer conversion size warning. > > + * It is fine to get rid of this warning as `void *` is used as an integer value. > > + */ > > + if (found) > > + *type_id = (__u64)tmp; > > + return found; > > +} > > + > > +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id) > > +{ > > + return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id, > > + NULL, NULL); > > +} > > this function will also be completely unnecessary with longs > > > + > > +/* > > + * Collect a `name_off_map` that maps type names to type ids for all > > + * canonical structs and unions. If the same name is shared by several > > + * canonical types use a special value 0 to indicate this fact. > > + */ > > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map) > > +{ > > + int i, err = 0; > > + __u32 type_id, collision_id; > > + __u16 kind; > > + struct btf_type *t; > > + > > + for (i = 0; i < d->btf->nr_types; i++) { > > + type_id = d->btf->start_id + i; > > + t = btf_type_by_id(d->btf, type_id); > > + kind = btf_kind(t); > > + > > + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) > > + continue; > > let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0 > > > + > > + /* Skip non-canonical types */ > > + if (type_id != d->map[type_id]) > > + continue; > > + > > + err = 0; > > + if (name_off_map__find(names_map, t->name_off, &collision_id)) { > > + /* Mark non-unique names with 0 */ > > + if (collision_id != 0 && collision_id != type_id) > > + err = name_off_map__set(names_map, t->name_off, 0); > > + } else { > > + err = name_off_map__set(names_map, t->name_off, type_id); > > + } > > err = hashmap__add(..., t->name_off, type_id); > if (err == -EEXISTS) { > hashmap__set(..., 0); > return 0; > } > > see comment for hashmap_insert_strategy in hashmap.h > > > + > > + if (err < 0) > > + return err; > > + } > > + > > + return 0; > > +} > > + > > +static int btf_dedup_standalone_fwd(struct btf_dedup *d, > > + struct hashmap *names_map, > > + __u32 type_id) > > +{ > > + struct btf_type *t = btf_type_by_id(d->btf, type_id); > > + __u16 kind = btf_kind(t); > > + enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info); > > + > > nit: don't break variables block in two parts, there shouldn't be empty lines > > also please use btf_kflag(t) > > > > + struct btf_type *cand_t; > > + __u16 cand_kind; > > + __u32 cand_id = 0; > > + > > + if (kind != BTF_KIND_FWD) > > + return 0; > > + > > + /* Skip if this FWD already has a mapping */ > > + if (type_id != d->map[type_id]) > > + return 0; > > + > > [...]
On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote: > On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote: [...] > > + > > +/* > > + * Collect a `name_off_map` that maps type names to type ids for all > > + * canonical structs and unions. If the same name is shared by several > > + * canonical types use a special value 0 to indicate this fact. > > + */ > > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map) > > +{ > > + int i, err = 0; > > + __u32 type_id, collision_id; > > + __u16 kind; > > + struct btf_type *t; > > + > > + for (i = 0; i < d->btf->nr_types; i++) { > > + type_id = d->btf->start_id + i; > > + t = btf_type_by_id(d->btf, type_id); > > + kind = btf_kind(t); > > + > > + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) > > + continue; > > let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0 Interestingly this is necessary only for mixed enum / enum64 case. Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type: case BTF_KIND_ENUM: h = btf_hash_enum(t); for_each_dedup_cand(d, hash_entry, h) { cand_id = (__u32)(long)hash_entry->value; cand = btf_type_by_id(d->btf, cand_id); if (btf_equal_enum(t, cand)) { new_id = cand_id; break; } if (btf_compat_enum(t, cand)) { if (btf_is_enum_fwd(t)) { /* resolve fwd to full enum */ new_id = cand_id; break; } /* resolve canonical enum fwd to full enum */ d->map[cand_id] = type_id; } } break; // ... similar logic for ENUM64 ... - btf_hash_enum ignores vlen when hashing; - btf_compat_enum compares only names and sizes. So, if forward and main declaration kinds match (either BTF_KIND_ENUM or BTF_KIND_ENUM64) the forward declaration would be removed. But if the kinds are different the forward declaration would remain. E.g.: CU #1: enum foo; enum foo *a; CU #2: enum foo { x = 0xfffffffff }; enum foo *b; BTF: [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1 'x' val=68719476735ULL [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none) [3] PTR '(anon)' type_id=1 [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0 [5] PTR '(anon)' type_id=4 BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum forward declarations are not. So it would be incorrect to add enum forward declaration unification logic to btf_dedup_resolve_fwds, because the following case would not be covered: CU #1: enum foo; struct s { enum foo *a; } *a; CU #2: enum foo { x = 0xfffffffff }; struct s { enum foo *a; } *b; Currently STRUCTs 's' are not de-duplicated. I think that btf_dedup_prim_type should be adjusted to handle this case. [...]
On 31/10/2022 15:49, Eduard Zingerman wrote: > On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote: >> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > [...] >>> + >>> +/* >>> + * Collect a `name_off_map` that maps type names to type ids for all >>> + * canonical structs and unions. If the same name is shared by several >>> + * canonical types use a special value 0 to indicate this fact. >>> + */ >>> +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map) >>> +{ >>> + int i, err = 0; >>> + __u32 type_id, collision_id; >>> + __u16 kind; >>> + struct btf_type *t; >>> + >>> + for (i = 0; i < d->btf->nr_types; i++) { >>> + type_id = d->btf->start_id + i; >>> + t = btf_type_by_id(d->btf, type_id); >>> + kind = btf_kind(t); >>> + >>> + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) >>> + continue; >> >> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0 > > Interestingly this is necessary only for mixed enum / enum64 case. > Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type: > Ah, great catch! A forward can look like an enum to one CU but another CU can specify values that make it an enum64. > case BTF_KIND_ENUM: > h = btf_hash_enum(t); > for_each_dedup_cand(d, hash_entry, h) { > cand_id = (__u32)(long)hash_entry->value; > cand = btf_type_by_id(d->btf, cand_id); > if (btf_equal_enum(t, cand)) { > new_id = cand_id; > break; > } > if (btf_compat_enum(t, cand)) { > if (btf_is_enum_fwd(t)) { > /* resolve fwd to full enum */ > new_id = cand_id; > break; > } > /* resolve canonical enum fwd to full enum */ > d->map[cand_id] = type_id; > } > } > break; > // ... similar logic for ENUM64 ... > > - btf_hash_enum ignores vlen when hashing; > - btf_compat_enum compares only names and sizes. > > So, if forward and main declaration kinds match (either BTF_KIND_ENUM > or BTF_KIND_ENUM64) the forward declaration would be removed. But if > the kinds are different the forward declaration would remain. E.g.: > > CU #1: > enum foo; > enum foo *a; > > CU #2: > enum foo { x = 0xfffffffff }; > enum foo *b; > > BTF: > [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1 > 'x' val=68719476735ULL > [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none) > [3] PTR '(anon)' type_id=1 > [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0 > [5] PTR '(anon)' type_id=4 > > BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum > forward declarations are not. So it would be incorrect to add enum > forward declaration unification logic to btf_dedup_resolve_fwds, > because the following case would not be covered: > > CU #1: > enum foo; > struct s { enum foo *a; } *a; > > CU #2: > enum foo { x = 0xfffffffff }; > struct s { enum foo *a; } *b; > > Currently STRUCTs 's' are not de-duplicated. > What if CU#1 is in base BTF and CU#2 in split module BTF? I think we'd explicitly want to avoid deduping "struct s" then since we can't be sure that it is the same enum they are pointing at. That's the logic we employ for structs at least, based upon the rationale that we can't feed back knowledge of types from module to kernel BTF since the latter is now fixed (Andrii, do correct me if I have this wrong). In such a case the enum is no longer standalone; it serves the purpose of allowing us to define a pointer to a module-specific type. We recently found some examples of this sort of thing with structs, where the struct was defined in module BTF, making dedup fail for some core kernel data types, but the problem was restricted to modules which _did_ define the type so wasn't a major driver of dedup failures. Not sure if there's many (any?) enum cases of this in practice. I suppose if we could guarantee the dedup happened within the same object (kernel or module) we could relax this constraint though? Alan
On Tue, 2022-11-01 at 17:08 +0000, Alan Maguire wrote: > On 31/10/2022 15:49, Eduard Zingerman wrote: > > On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote: > > > On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > [...] > > > > + > > > > +/* > > > > + * Collect a `name_off_map` that maps type names to type ids for all > > > > + * canonical structs and unions. If the same name is shared by several > > > > + * canonical types use a special value 0 to indicate this fact. > > > > + */ > > > > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map) > > > > +{ > > > > + int i, err = 0; > > > > + __u32 type_id, collision_id; > > > > + __u16 kind; > > > > + struct btf_type *t; > > > > + > > > > + for (i = 0; i < d->btf->nr_types; i++) { > > > > + type_id = d->btf->start_id + i; > > > > + t = btf_type_by_id(d->btf, type_id); > > > > + kind = btf_kind(t); > > > > + > > > > + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) > > > > + continue; > > > > > > let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0 > > > > Interestingly this is necessary only for mixed enum / enum64 case. > > Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type: > > > > Ah, great catch! A forward can look like an enum to one CU but another CU can > specify values that make it an enum64. > > > case BTF_KIND_ENUM: > > h = btf_hash_enum(t); > > for_each_dedup_cand(d, hash_entry, h) { > > cand_id = (__u32)(long)hash_entry->value; > > cand = btf_type_by_id(d->btf, cand_id); > > if (btf_equal_enum(t, cand)) { > > new_id = cand_id; > > break; > > } > > if (btf_compat_enum(t, cand)) { > > if (btf_is_enum_fwd(t)) { > > /* resolve fwd to full enum */ > > new_id = cand_id; > > break; > > } > > /* resolve canonical enum fwd to full enum */ > > d->map[cand_id] = type_id; > > } > > } > > break; > > // ... similar logic for ENUM64 ... > > > > - btf_hash_enum ignores vlen when hashing; > > - btf_compat_enum compares only names and sizes. > > > > So, if forward and main declaration kinds match (either BTF_KIND_ENUM > > or BTF_KIND_ENUM64) the forward declaration would be removed. But if > > the kinds are different the forward declaration would remain. E.g.: > > > > CU #1: > > enum foo; > > enum foo *a; > > > > CU #2: > > enum foo { x = 0xfffffffff }; > > enum foo *b; > > > > BTF: > > [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1 > > 'x' val=68719476735ULL > > [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none) > > [3] PTR '(anon)' type_id=1 > > [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0 > > [5] PTR '(anon)' type_id=4 > > > > BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum > > forward declarations are not. So it would be incorrect to add enum > > forward declaration unification logic to btf_dedup_resolve_fwds, > > because the following case would not be covered: > > > > CU #1: > > enum foo; > > struct s { enum foo *a; } *a; > > > > CU #2: > > enum foo { x = 0xfffffffff }; > > struct s { enum foo *a; } *b; > > > > Currently STRUCTs 's' are not de-duplicated. > > > > What if CU#1 is in base BTF and CU#2 in split module BTF? I think we'd explicitly > want to avoid deduping "struct s" then since we can't be sure that it is the > same enum they are pointing at. That's the logic we employ for structs at > least, based upon the rationale that we can't feed back knowledge of types > from module to kernel BTF since the latter is now fixed (Andrii, do correct me > if I have this wrong). In such a case the enum is no longer standalone; it > serves the purpose of allowing us to define a pointer to a module-specific > type. We recently found some examples of this sort of thing with structs, > where the struct was defined in module BTF, making dedup fail for some core > kernel data types, but the problem was restricted to modules which _did_ > define the type so wasn't a major driver of dedup failures. Not sure if > there's many (any?) enum cases of this in practice. Hi Alan, As far as I understand the loop in `btf_dedup_prim_types` guarantees that only ids from the split module would be remapped: struct btf { ... /* BTF type ID of the first type in this BTF instance: * - for base BTF it's equal to 1; * - for split BTF it's equal to biggest type ID of base BTF plus 1. */ int start_id; ... } ... for (i = 0; i < d->btf->nr_types; i++) { err = btf_dedup_prim_type(d, d->btf->start_id + i); if (err) return err; } Thus CU1:foo won't be updated to be CU2:foo and CU1:s will not be the same as CU2:s. Is that right or am I confused? Thanks, Eduard > > I suppose if we could guarantee the dedup happened within the same object > (kernel or module) we could relax this constraint though? > > Alan
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index d88647da2c7f..c34c68d8e8a0 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d); static int btf_dedup_prim_types(struct btf_dedup *d); static int btf_dedup_struct_types(struct btf_dedup *d); static int btf_dedup_ref_types(struct btf_dedup *d); +static int btf_dedup_standalone_fwds(struct btf_dedup *d); static int btf_dedup_compact_types(struct btf_dedup *d); static int btf_dedup_remap_types(struct btf_dedup *d); @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d); * Algorithm summary * ================= * - * Algorithm completes its work in 6 separate passes: + * Algorithm completes its work in 7 separate passes: * * 1. Strings deduplication. * 2. Primitive types deduplication (int, enum, fwd). * 3. Struct/union types deduplication. - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func + * 4. Standalone fwd declarations deduplication. + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func * protos, and const/volatile/restrict modifiers). - * 5. Types compaction. - * 6. Types remapping. + * 6. Types compaction. + * 7. Types remapping. * * Algorithm determines canonical type descriptor, which is a single * representative type for each truly unique type. This canonical type is the @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts) pr_debug("btf_dedup_struct_types failed:%d\n", err); goto done; } + err = btf_dedup_standalone_fwds(d); + if (err < 0) { + pr_debug("btf_dedup_standalone_fwd failed:%d\n", err); + goto done; + } err = btf_dedup_ref_types(d); if (err < 0) { pr_debug("btf_dedup_ref_types failed:%d\n", err); @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d) return 0; } +/* + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32). + * + * The __u32 key/value representations are cast to `void *` before passing + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced. + * + */ +static struct hashmap *name_off_map__new(void) +{ + return hashmap__new(btf_dedup_identity_hash_fn, + btf_dedup_equal_fn, + NULL); +} + +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id) +{ + /* This has to be sizeof(void *) in order to be passed to hashmap__find */ + void *tmp; + int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp); + /* + * __u64 cast is necessary to avoid pointer to integer conversion size warning. + * It is fine to get rid of this warning as `void *` is used as an integer value. + */ + if (found) + *type_id = (__u64)tmp; + return found; +} + +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id) +{ + return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id, + NULL, NULL); +} + +/* + * Collect a `name_off_map` that maps type names to type ids for all + * canonical structs and unions. If the same name is shared by several + * canonical types use a special value 0 to indicate this fact. + */ +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map) +{ + int i, err = 0; + __u32 type_id, collision_id; + __u16 kind; + struct btf_type *t; + + for (i = 0; i < d->btf->nr_types; i++) { + type_id = d->btf->start_id + i; + t = btf_type_by_id(d->btf, type_id); + kind = btf_kind(t); + + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) + continue; + + /* Skip non-canonical types */ + if (type_id != d->map[type_id]) + continue; + + err = 0; + if (name_off_map__find(names_map, t->name_off, &collision_id)) { + /* Mark non-unique names with 0 */ + if (collision_id != 0 && collision_id != type_id) + err = name_off_map__set(names_map, t->name_off, 0); + } else { + err = name_off_map__set(names_map, t->name_off, type_id); + } + + if (err < 0) + return err; + } + + return 0; +} + +static int btf_dedup_standalone_fwd(struct btf_dedup *d, + struct hashmap *names_map, + __u32 type_id) +{ + struct btf_type *t = btf_type_by_id(d->btf, type_id); + __u16 kind = btf_kind(t); + enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info); + + struct btf_type *cand_t; + __u16 cand_kind; + __u32 cand_id = 0; + + if (kind != BTF_KIND_FWD) + return 0; + + /* Skip if this FWD already has a mapping */ + if (type_id != d->map[type_id]) + return 0; + + name_off_map__find(names_map, t->name_off, &cand_id); + if (!cand_id) + return 0; + + cand_t = btf_type_by_id(d->btf, cand_id); + cand_kind = btf_kind(cand_t); + if (!(cand_kind == BTF_KIND_STRUCT && fwd_kind == BTF_FWD_STRUCT) && + !(cand_kind == BTF_KIND_UNION && fwd_kind == BTF_FWD_UNION)) + return 0; + + d->map[type_id] = cand_id; + + return 0; +} + +/* + * Standalone fwd declarations deduplication. + * + * The lion's share of all FWD declarations is resolved during + * `btf_dedup_struct_types` phase when different type graphs are + * compared against each other. However, if in some compilation unit a + * FWD declaration is not a part of a type graph compared against + * another type graph that declaration's canonical type would not be + * changed. Example: + * + * CU #1: + * + * struct foo; + * struct foo *some_global; + * + * CU #2: + * + * struct foo { int u; }; + * struct foo *another_global; + * + * After `btf_dedup_struct_types` the BTF looks as follows: + * + * [1] STRUCT 'foo' size=4 vlen=1 ... + * [2] INT 'int' size=4 ... + * [3] PTR '(anon)' type_id=1 + * [4] FWD 'foo' fwd_kind=struct + * [5] PTR '(anon)' type_id=4 + * + * This pass assumes that such FWD declarations should be mapped to + * structs or unions with identical name in case if the name is not + * ambiguous. + */ +static int btf_dedup_standalone_fwds(struct btf_dedup *d) +{ + int i, err; + struct hashmap *names_map = name_off_map__new(); + + if (!names_map) + return -ENOMEM; + + err = btf_dedup_fill_unique_names_map(d, names_map); + if (err < 0) + goto exit; + + for (i = 0; i < d->btf->nr_types; i++) { + err = btf_dedup_standalone_fwd(d, names_map, d->btf->start_id + i); + if (err < 0) + goto exit; + } + +exit: + hashmap__free(names_map); + return err; +} + /* * Compact types. *
Deduplicate forward declarations that don't take part in type graphs comparisons if declaration name is unambiguous. Example: CU #1: struct foo; // standalone forward declaration struct foo *some_global; CU #2: struct foo { int x; }; struct foo *another_global; The `struct foo` from CU #1 is not a part of any definition that is compared against another definition while `btf_dedup_struct_types` processes structural types. The the BTF after `btf_dedup_struct_types` the BTF looks as follows: [1] STRUCT 'foo' size=4 vlen=1 ... [2] INT 'int' size=4 ... [3] PTR '(anon)' type_id=1 [4] FWD 'foo' fwd_kind=struct [5] PTR '(anon)' type_id=4 This commit adds a new pass `btf_dedup_standalone_fwds`, that maps such forward declarations to structs or unions with identical name in case if the name is not ambiguous. The pass is positioned before `btf_dedup_ref_types` so that types [3] and [5] could be merged as a same type after [1] and [4] are merged. The final result for the example above looks as follows: [1] STRUCT 'foo' size=4 vlen=1 'x' type_id=2 bits_offset=0 [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [3] PTR '(anon)' type_id=1 For defconfig kernel with BTF enabled this removes 63 forward declarations. Examples of removed declarations: `pt_regs`, `in6_addr`. The running time of `btf__dedup` function is increased by about 3%. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> --- tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 174 insertions(+), 4 deletions(-)