diff mbox series

[RFC,bpf-next,01/12] libbpf: Deduplicate unambigous standalone forward declarations

Message ID 20221025222802.2295103-2-eddyz87@gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series Use uapi kernel headers with vmlinux.h | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 pending Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 7 maintainers not CCed: sdf@google.com john.fastabend@gmail.com haoluo@google.com jolsa@kernel.org kpsingh@kernel.org song@kernel.org martin.lau@linux.dev
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eduard Zingerman Oct. 25, 2022, 10:27 p.m. UTC
Deduplicate forward declarations that don't take part in type graphs
comparisons if declaration name is unambiguous. Example:

CU #1:

struct foo;              // standalone forward declaration
struct foo *some_global;

CU #2:

struct foo { int x; };
struct foo *another_global;

The `struct foo` from CU #1 is not a part of any definition that is
compared against another definition while `btf_dedup_struct_types`
processes structural types. The the BTF after `btf_dedup_struct_types`
the BTF looks as follows:

[1] STRUCT 'foo' size=4 vlen=1 ...
[2] INT 'int' size=4 ...
[3] PTR '(anon)' type_id=1
[4] FWD 'foo' fwd_kind=struct
[5] PTR '(anon)' type_id=4

This commit adds a new pass `btf_dedup_standalone_fwds`, that maps
such forward declarations to structs or unions with identical name in
case if the name is not ambiguous.

The pass is positioned before `btf_dedup_ref_types` so that types
[3] and [5] could be merged as a same type after [1] and [4] are merged.
The final result for the example above looks as follows:

[1] STRUCT 'foo' size=4 vlen=1
	'x' type_id=2 bits_offset=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] PTR '(anon)' type_id=1

For defconfig kernel with BTF enabled this removes 63 forward
declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
The running time of `btf__dedup` function is increased by about 3%.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 174 insertions(+), 4 deletions(-)

Comments

Andrii Nakryiko Oct. 27, 2022, 10:07 p.m. UTC | #1
On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Deduplicate forward declarations that don't take part in type graphs
> comparisons if declaration name is unambiguous. Example:
>
> CU #1:
>
> struct foo;              // standalone forward declaration
> struct foo *some_global;
>
> CU #2:
>
> struct foo { int x; };
> struct foo *another_global;
>
> The `struct foo` from CU #1 is not a part of any definition that is
> compared against another definition while `btf_dedup_struct_types`
> processes structural types. The the BTF after `btf_dedup_struct_types`
> the BTF looks as follows:
>
> [1] STRUCT 'foo' size=4 vlen=1 ...
> [2] INT 'int' size=4 ...
> [3] PTR '(anon)' type_id=1
> [4] FWD 'foo' fwd_kind=struct
> [5] PTR '(anon)' type_id=4
>
> This commit adds a new pass `btf_dedup_standalone_fwds`, that maps
> such forward declarations to structs or unions with identical name in
> case if the name is not ambiguous.
>
> The pass is positioned before `btf_dedup_ref_types` so that types
> [3] and [5] could be merged as a same type after [1] and [4] are merged.
> The final result for the example above looks as follows:
>
> [1] STRUCT 'foo' size=4 vlen=1
>         'x' type_id=2 bits_offset=0
> [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> [3] PTR '(anon)' type_id=1
>
> For defconfig kernel with BTF enabled this removes 63 forward
> declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
> The running time of `btf__dedup` function is increased by about 3%.
>

What about modules, can you share stats for module BTFs?

Also cc Alan as he was looking at BTF dedup improvements for kernel
module BTF dedup.

> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
>  tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 174 insertions(+), 4 deletions(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index d88647da2c7f..c34c68d8e8a0 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d);
>  static int btf_dedup_prim_types(struct btf_dedup *d);
>  static int btf_dedup_struct_types(struct btf_dedup *d);
>  static int btf_dedup_ref_types(struct btf_dedup *d);
> +static int btf_dedup_standalone_fwds(struct btf_dedup *d);
>  static int btf_dedup_compact_types(struct btf_dedup *d);
>  static int btf_dedup_remap_types(struct btf_dedup *d);
>
> @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d);
>   * Algorithm summary
>   * =================
>   *
> - * Algorithm completes its work in 6 separate passes:
> + * Algorithm completes its work in 7 separate passes:
>   *
>   * 1. Strings deduplication.
>   * 2. Primitive types deduplication (int, enum, fwd).
>   * 3. Struct/union types deduplication.
> - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> + * 4. Standalone fwd declarations deduplication.

Let's call this "Resolve unambiguous forward declarations", we don't
really deduplicate anything. And call the function
btf_dedup_resolve_fwds()?

> + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
>   *    protos, and const/volatile/restrict modifiers).
> - * 5. Types compaction.
> - * 6. Types remapping.
> + * 6. Types compaction.
> + * 7. Types remapping.
>   *
>   * Algorithm determines canonical type descriptor, which is a single
>   * representative type for each truly unique type. This canonical type is the
> @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
>                 pr_debug("btf_dedup_struct_types failed:%d\n", err);
>                 goto done;
>         }
> +       err = btf_dedup_standalone_fwds(d);
> +       if (err < 0) {
> +               pr_debug("btf_dedup_standalone_fwd failed:%d\n", err);
> +               goto done;
> +       }
>         err = btf_dedup_ref_types(d);
>         if (err < 0) {
>                 pr_debug("btf_dedup_ref_types failed:%d\n", err);
> @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
>         return 0;
>  }
>
> +/*
> + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32).
> + *
> + * The __u32 key/value representations are cast to `void *` before passing
> + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced.
> + *
> + */
> +static struct hashmap *name_off_map__new(void)
> +{
> +       return hashmap__new(btf_dedup_identity_hash_fn,
> +                           btf_dedup_equal_fn,
> +                           NULL);
> +}

is there a point in name_off_map__new and name_off_map__find wrappers
except to add one extra function to jump through when reading the
code? If you look at other uses of hashmaps in this file, we use the
directly. Let's drop those.

> +
> +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id)
> +{
> +       /* This has to be sizeof(void *) in order to be passed to hashmap__find */
> +       void *tmp;
> +       int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp);

but this (void *) casting everything was an error in API design, mea
culpa. I've been wanting to switch hashmap to use long as key/value
type for a long while, maybe let's do it now, as we are adding even
more code that looks weird? It seems like accepting long will make
hashmap API usage cleaner in most cases. There are not a lot of places
where we use hashmap APIs in libbpf, but we'll also need to fix up
bpftool usage, and I believe perf copy/pasted hashmap.h (cc Arnaldo),
so we'd need to make sure to not break all that. But good thing it's
all in the same repo and we can convert them at the same time with no
breakage.

WDYT?

> +       /*
> +        * __u64 cast is necessary to avoid pointer to integer conversion size warning.
> +        * It is fine to get rid of this warning as `void *` is used as an integer value.
> +        */
> +       if (found)
> +               *type_id = (__u64)tmp;
> +       return found;
> +}
> +
> +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id)
> +{
> +       return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id,
> +                           NULL, NULL);
> +}

this function will also be completely unnecessary with longs

> +
> +/*
> + * Collect a `name_off_map` that maps type names to type ids for all
> + * canonical structs and unions. If the same name is shared by several
> + * canonical types use a special value 0 to indicate this fact.
> + */
> +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> +{
> +       int i, err = 0;
> +       __u32 type_id, collision_id;
> +       __u16 kind;
> +       struct btf_type *t;
> +
> +       for (i = 0; i < d->btf->nr_types; i++) {
> +               type_id = d->btf->start_id + i;
> +               t = btf_type_by_id(d->btf, type_id);
> +               kind = btf_kind(t);
> +
> +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> +                       continue;

let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0

> +
> +               /* Skip non-canonical types */
> +               if (type_id != d->map[type_id])
> +                       continue;
> +
> +               err = 0;
> +               if (name_off_map__find(names_map, t->name_off, &collision_id)) {
> +                       /* Mark non-unique names with 0 */
> +                       if (collision_id != 0 && collision_id != type_id)
> +                               err = name_off_map__set(names_map, t->name_off, 0);
> +               } else {
> +                       err = name_off_map__set(names_map, t->name_off, type_id);
> +               }

err = hashmap__add(..., t->name_off, type_id);
if (err == -EEXISTS) {
    hashmap__set(..., 0);
    return 0;
}

see comment for hashmap_insert_strategy in hashmap.h

> +
> +               if (err < 0)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static int btf_dedup_standalone_fwd(struct btf_dedup *d,
> +                                   struct hashmap *names_map,
> +                                   __u32 type_id)
> +{
> +       struct btf_type *t = btf_type_by_id(d->btf, type_id);
> +       __u16 kind = btf_kind(t);
> +       enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info);
> +

nit: don't break variables block in two parts, there shouldn't be empty lines

also please use btf_kflag(t)


> +       struct btf_type *cand_t;
> +       __u16 cand_kind;
> +       __u32 cand_id = 0;
> +
> +       if (kind != BTF_KIND_FWD)
> +               return 0;
> +
> +       /* Skip if this FWD already has a mapping */
> +       if (type_id != d->map[type_id])
> +               return 0;
> +

[...]
Eduard Zingerman Oct. 31, 2022, 1 a.m. UTC | #2
On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > Deduplicate forward declarations that don't take part in type graphs
> > comparisons if declaration name is unambiguous. Example:
> > 
> > CU #1:
> > 
> > struct foo;              // standalone forward declaration
> > struct foo *some_global;
> > 
> > CU #2:
> > 
> > struct foo { int x; };
> > struct foo *another_global;
> > 
> > The `struct foo` from CU #1 is not a part of any definition that is
> > compared against another definition while `btf_dedup_struct_types`
> > processes structural types. The the BTF after `btf_dedup_struct_types`
> > the BTF looks as follows:
> > 
> > [1] STRUCT 'foo' size=4 vlen=1 ...
> > [2] INT 'int' size=4 ...
> > [3] PTR '(anon)' type_id=1
> > [4] FWD 'foo' fwd_kind=struct
> > [5] PTR '(anon)' type_id=4
> > 
> > This commit adds a new pass `btf_dedup_standalone_fwds`, that maps
> > such forward declarations to structs or unions with identical name in
> > case if the name is not ambiguous.
> > 
> > The pass is positioned before `btf_dedup_ref_types` so that types
> > [3] and [5] could be merged as a same type after [1] and [4] are merged.
> > The final result for the example above looks as follows:
> > 
> > [1] STRUCT 'foo' size=4 vlen=1
> >         'x' type_id=2 bits_offset=0
> > [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> > [3] PTR '(anon)' type_id=1
> > 
> > For defconfig kernel with BTF enabled this removes 63 forward
> > declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
> > The running time of `btf__dedup` function is increased by about 3%.
> > 
> 
> What about modules, can you share stats for module BTFs?
> 
> Also cc Alan as he was looking at BTF dedup improvements for kernel
> module BTF dedup.
> 
> > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> > ---
> >  tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 174 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index d88647da2c7f..c34c68d8e8a0 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d);
> >  static int btf_dedup_prim_types(struct btf_dedup *d);
> >  static int btf_dedup_struct_types(struct btf_dedup *d);
> >  static int btf_dedup_ref_types(struct btf_dedup *d);
> > +static int btf_dedup_standalone_fwds(struct btf_dedup *d);
> >  static int btf_dedup_compact_types(struct btf_dedup *d);
> >  static int btf_dedup_remap_types(struct btf_dedup *d);
> > 
> > @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d);
> >   * Algorithm summary
> >   * =================
> >   *
> > - * Algorithm completes its work in 6 separate passes:
> > + * Algorithm completes its work in 7 separate passes:
> >   *
> >   * 1. Strings deduplication.
> >   * 2. Primitive types deduplication (int, enum, fwd).
> >   * 3. Struct/union types deduplication.
> > - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> > + * 4. Standalone fwd declarations deduplication.
> 
> Let's call this "Resolve unambiguous forward declarations", we don't
> really deduplicate anything. And call the function
> btf_dedup_resolve_fwds()?
> 
> > + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> >   *    protos, and const/volatile/restrict modifiers).
> > - * 5. Types compaction.
> > - * 6. Types remapping.
> > + * 6. Types compaction.
> > + * 7. Types remapping.
> >   *
> >   * Algorithm determines canonical type descriptor, which is a single
> >   * representative type for each truly unique type. This canonical type is the
> > @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
> >                 pr_debug("btf_dedup_struct_types failed:%d\n", err);
> >                 goto done;
> >         }
> > +       err = btf_dedup_standalone_fwds(d);
> > +       if (err < 0) {
> > +               pr_debug("btf_dedup_standalone_fwd failed:%d\n", err);
> > +               goto done;
> > +       }
> >         err = btf_dedup_ref_types(d);
> >         if (err < 0) {
> >                 pr_debug("btf_dedup_ref_types failed:%d\n", err);
> > @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
> >         return 0;
> >  }
> > 
> > +/*
> > + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32).
> > + *
> > + * The __u32 key/value representations are cast to `void *` before passing
> > + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced.
> > + *
> > + */
> > +static struct hashmap *name_off_map__new(void)
> > +{
> > +       return hashmap__new(btf_dedup_identity_hash_fn,
> > +                           btf_dedup_equal_fn,
> > +                           NULL);
> > +}
> 
> is there a point in name_off_map__new and name_off_map__find wrappers
> except to add one extra function to jump through when reading the
> code? If you look at other uses of hashmaps in this file, we use the
> directly. Let's drop those.
> 
> > +
> > +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id)
> > +{
> > +       /* This has to be sizeof(void *) in order to be passed to hashmap__find */
> > +       void *tmp;
> > +       int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp);
> 
> but this (void *) casting everything was an error in API design, mea
> culpa. I've been wanting to switch hashmap to use long as key/value
> type for a long while, maybe let's do it now, as we are adding even
> more code that looks weird? It seems like accepting long will make
> hashmap API usage cleaner in most cases. There are not a lot of places
> where we use hashmap APIs in libbpf, but we'll also need to fix up
> bpftool usage, and I believe perf copy/pasted hashmap.h (cc Arnaldo),
> so we'd need to make sure to not break all that. But good thing it's
> all in the same repo and we can convert them at the same time with no
> breakage.
> 
> WDYT?

Well, I did the change, excluding tests it amounts to:
- 15 files changed, 114 insertions(+), 137 deletions(-);
- 45 casts removed;
- 30 casts added.

TBH, it seems like I should just use "u32_as_hash_field" and be done
with it. In any case I'll post this as a part of v1 series for
"libbpf: Resolve unambigous forward declarations".

To account for a case when map has to store pointers and pointers are
32 bit I chose to update the map interface to be "uintptr_t -> uintptr_t".
Had it been "u64 -> u64" the additional temporary variable would be
necessary for "old" values, e.g. while working with hashmap__insert.
(Contrary to what we discussed on Friday).

> 
> > +       /*
> > +        * __u64 cast is necessary to avoid pointer to integer conversion size warning.
> > +        * It is fine to get rid of this warning as `void *` is used as an integer value.
> > +        */
> > +       if (found)
> > +               *type_id = (__u64)tmp;
> > +       return found;
> > +}
> > +
> > +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id)
> > +{
> > +       return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id,
> > +                           NULL, NULL);
> > +}
> 
> this function will also be completely unnecessary with longs
> 
> > +
> > +/*
> > + * Collect a `name_off_map` that maps type names to type ids for all
> > + * canonical structs and unions. If the same name is shared by several
> > + * canonical types use a special value 0 to indicate this fact.
> > + */
> > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> > +{
> > +       int i, err = 0;
> > +       __u32 type_id, collision_id;
> > +       __u16 kind;
> > +       struct btf_type *t;
> > +
> > +       for (i = 0; i < d->btf->nr_types; i++) {
> > +               type_id = d->btf->start_id + i;
> > +               t = btf_type_by_id(d->btf, type_id);
> > +               kind = btf_kind(t);
> > +
> > +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> > +                       continue;
> 
> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0
> 
> > +
> > +               /* Skip non-canonical types */
> > +               if (type_id != d->map[type_id])
> > +                       continue;
> > +
> > +               err = 0;
> > +               if (name_off_map__find(names_map, t->name_off, &collision_id)) {
> > +                       /* Mark non-unique names with 0 */
> > +                       if (collision_id != 0 && collision_id != type_id)
> > +                               err = name_off_map__set(names_map, t->name_off, 0);
> > +               } else {
> > +                       err = name_off_map__set(names_map, t->name_off, type_id);
> > +               }
> 
> err = hashmap__add(..., t->name_off, type_id);
> if (err == -EEXISTS) {
>     hashmap__set(..., 0);
>     return 0;
> }
> 
> see comment for hashmap_insert_strategy in hashmap.h
> 
> > +
> > +               if (err < 0)
> > +                       return err;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static int btf_dedup_standalone_fwd(struct btf_dedup *d,
> > +                                   struct hashmap *names_map,
> > +                                   __u32 type_id)
> > +{
> > +       struct btf_type *t = btf_type_by_id(d->btf, type_id);
> > +       __u16 kind = btf_kind(t);
> > +       enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info);
> > +
> 
> nit: don't break variables block in two parts, there shouldn't be empty lines
> 
> also please use btf_kflag(t)
> 
> 
> > +       struct btf_type *cand_t;
> > +       __u16 cand_kind;
> > +       __u32 cand_id = 0;
> > +
> > +       if (kind != BTF_KIND_FWD)
> > +               return 0;
> > +
> > +       /* Skip if this FWD already has a mapping */
> > +       if (type_id != d->map[type_id])
> > +               return 0;
> > +
> 
> [...]
Eduard Zingerman Oct. 31, 2022, 3:49 p.m. UTC | #3
On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
[...] 
> > +
> > +/*
> > + * Collect a `name_off_map` that maps type names to type ids for all
> > + * canonical structs and unions. If the same name is shared by several
> > + * canonical types use a special value 0 to indicate this fact.
> > + */
> > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> > +{
> > +       int i, err = 0;
> > +       __u32 type_id, collision_id;
> > +       __u16 kind;
> > +       struct btf_type *t;
> > +
> > +       for (i = 0; i < d->btf->nr_types; i++) {
> > +               type_id = d->btf->start_id + i;
> > +               t = btf_type_by_id(d->btf, type_id);
> > +               kind = btf_kind(t);
> > +
> > +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> > +                       continue;
> 
> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0

Interestingly this is necessary only for mixed enum / enum64 case.
Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type:

	case BTF_KIND_ENUM:
		h = btf_hash_enum(t);
		for_each_dedup_cand(d, hash_entry, h) {
			cand_id = (__u32)(long)hash_entry->value;
			cand = btf_type_by_id(d->btf, cand_id);
			if (btf_equal_enum(t, cand)) {
				new_id = cand_id;
				break;
			}
			if (btf_compat_enum(t, cand)) {
				if (btf_is_enum_fwd(t)) {
					/* resolve fwd to full enum */
					new_id = cand_id;
					break;
				}
				/* resolve canonical enum fwd to full enum */
				d->map[cand_id] = type_id;
			}
		}
		break;
    // ... similar logic for ENUM64 ...

- btf_hash_enum ignores vlen when hashing;
- btf_compat_enum compares only names and sizes.

So, if forward and main declaration kinds match (either BTF_KIND_ENUM
or BTF_KIND_ENUM64) the forward declaration would be removed. But if
the kinds are different the forward declaration would remain. E.g.:

CU #1:
enum foo;
enum foo *a;

CU #2:
enum foo { x = 0xfffffffff };
enum foo *b;

BTF:
[1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
	'x' val=68719476735ULL
[2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[3] PTR '(anon)' type_id=1
[4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
[5] PTR '(anon)' type_id=4

BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum
forward declarations are not. So it would be incorrect to add enum
forward declaration unification logic to btf_dedup_resolve_fwds,
because the following case would not be covered:

CU #1:
enum foo;
struct s { enum foo *a; } *a;

CU #2:
enum foo { x = 0xfffffffff };
struct s { enum foo *a; } *b;

Currently STRUCTs 's' are not de-duplicated.

I think that btf_dedup_prim_type should be adjusted to handle this case.

[...]
Alan Maguire Nov. 1, 2022, 5:08 p.m. UTC | #4
On 31/10/2022 15:49, Eduard Zingerman wrote:
> On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> [...] 
>>> +
>>> +/*
>>> + * Collect a `name_off_map` that maps type names to type ids for all
>>> + * canonical structs and unions. If the same name is shared by several
>>> + * canonical types use a special value 0 to indicate this fact.
>>> + */
>>> +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
>>> +{
>>> +       int i, err = 0;
>>> +       __u32 type_id, collision_id;
>>> +       __u16 kind;
>>> +       struct btf_type *t;
>>> +
>>> +       for (i = 0; i < d->btf->nr_types; i++) {
>>> +               type_id = d->btf->start_id + i;
>>> +               t = btf_type_by_id(d->btf, type_id);
>>> +               kind = btf_kind(t);
>>> +
>>> +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
>>> +                       continue;
>>
>> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0
> 
> Interestingly this is necessary only for mixed enum / enum64 case.
> Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type:
>

Ah, great catch! A forward can look like an enum to one CU but another CU can
specify values that make it an enum64.

> 	case BTF_KIND_ENUM:
> 		h = btf_hash_enum(t);
> 		for_each_dedup_cand(d, hash_entry, h) {
> 			cand_id = (__u32)(long)hash_entry->value;
> 			cand = btf_type_by_id(d->btf, cand_id);
> 			if (btf_equal_enum(t, cand)) {
> 				new_id = cand_id;
> 				break;
> 			}
> 			if (btf_compat_enum(t, cand)) {
> 				if (btf_is_enum_fwd(t)) {
> 					/* resolve fwd to full enum */
> 					new_id = cand_id;
> 					break;
> 				}
> 				/* resolve canonical enum fwd to full enum */
> 				d->map[cand_id] = type_id;
> 			}
> 		}
> 		break;
>     // ... similar logic for ENUM64 ...
> 
> - btf_hash_enum ignores vlen when hashing;
> - btf_compat_enum compares only names and sizes.
> 
> So, if forward and main declaration kinds match (either BTF_KIND_ENUM
> or BTF_KIND_ENUM64) the forward declaration would be removed. But if
> the kinds are different the forward declaration would remain. E.g.:
> 
> CU #1:
> enum foo;
> enum foo *a;
> 
> CU #2:
> enum foo { x = 0xfffffffff };
> enum foo *b;
> 
> BTF:
> [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
> 	'x' val=68719476735ULL
> [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
> [3] PTR '(anon)' type_id=1
> [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
> [5] PTR '(anon)' type_id=4
> 
> BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum
> forward declarations are not. So it would be incorrect to add enum
> forward declaration unification logic to btf_dedup_resolve_fwds,
> because the following case would not be covered:
> 
> CU #1:
> enum foo;
> struct s { enum foo *a; } *a;
> 
> CU #2:
> enum foo { x = 0xfffffffff };
> struct s { enum foo *a; } *b;
> 
> Currently STRUCTs 's' are not de-duplicated.
> 

What if CU#1 is in base BTF and CU#2 in split module BTF? I think we'd explicitly
want to avoid deduping "struct s" then since we can't be sure that it is the
same enum they are pointing at.  That's the logic we employ for structs at 
least, based upon the rationale that we can't feed back knowledge of types
from module to kernel BTF since the latter is now fixed (Andrii, do correct me
if I have this wrong). In such a case the enum is no longer standalone; it
serves the purpose of allowing us to define a pointer to a module-specific
type. We recently found some examples of this sort of thing with structs,
where the struct was defined in module BTF, making dedup fail for some core
kernel data types, but the problem was restricted to modules which _did_
define the type so wasn't a major driver of dedup failures. Not sure if
there's many (any?) enum cases of this in practice.

I suppose if we could guarantee the dedup happened within the same object
(kernel or module) we could relax this constraint though?

Alan
Eduard Zingerman Nov. 1, 2022, 5:37 p.m. UTC | #5
On Tue, 2022-11-01 at 17:08 +0000, Alan Maguire wrote:
> On 31/10/2022 15:49, Eduard Zingerman wrote:
> > On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
> > > On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > [...] 
> > > > +
> > > > +/*
> > > > + * Collect a `name_off_map` that maps type names to type ids for all
> > > > + * canonical structs and unions. If the same name is shared by several
> > > > + * canonical types use a special value 0 to indicate this fact.
> > > > + */
> > > > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> > > > +{
> > > > +       int i, err = 0;
> > > > +       __u32 type_id, collision_id;
> > > > +       __u16 kind;
> > > > +       struct btf_type *t;
> > > > +
> > > > +       for (i = 0; i < d->btf->nr_types; i++) {
> > > > +               type_id = d->btf->start_id + i;
> > > > +               t = btf_type_by_id(d->btf, type_id);
> > > > +               kind = btf_kind(t);
> > > > +
> > > > +               if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> > > > +                       continue;
> > > 
> > > let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0
> > 
> > Interestingly this is necessary only for mixed enum / enum64 case.
> > Forward enum declarations are resolved by bpf/btf.c:btf_dedup_prim_type:
> > 
> 
> Ah, great catch! A forward can look like an enum to one CU but another CU can
> specify values that make it an enum64.
> 
> > 	case BTF_KIND_ENUM:
> > 		h = btf_hash_enum(t);
> > 		for_each_dedup_cand(d, hash_entry, h) {
> > 			cand_id = (__u32)(long)hash_entry->value;
> > 			cand = btf_type_by_id(d->btf, cand_id);
> > 			if (btf_equal_enum(t, cand)) {
> > 				new_id = cand_id;
> > 				break;
> > 			}
> > 			if (btf_compat_enum(t, cand)) {
> > 				if (btf_is_enum_fwd(t)) {
> > 					/* resolve fwd to full enum */
> > 					new_id = cand_id;
> > 					break;
> > 				}
> > 				/* resolve canonical enum fwd to full enum */
> > 				d->map[cand_id] = type_id;
> > 			}
> > 		}
> > 		break;
> >     // ... similar logic for ENUM64 ...
> > 
> > - btf_hash_enum ignores vlen when hashing;
> > - btf_compat_enum compares only names and sizes.
> > 
> > So, if forward and main declaration kinds match (either BTF_KIND_ENUM
> > or BTF_KIND_ENUM64) the forward declaration would be removed. But if
> > the kinds are different the forward declaration would remain. E.g.:
> > 
> > CU #1:
> > enum foo;
> > enum foo *a;
> > 
> > CU #2:
> > enum foo { x = 0xfffffffff };
> > enum foo *b;
> > 
> > BTF:
> > [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
> > 	'x' val=68719476735ULL
> > [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
> > [3] PTR '(anon)' type_id=1
> > [4] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
> > [5] PTR '(anon)' type_id=4
> > 
> > BTF_KIND_FWDs are unified during btf_dedup_struct_types but enum
> > forward declarations are not. So it would be incorrect to add enum
> > forward declaration unification logic to btf_dedup_resolve_fwds,
> > because the following case would not be covered:
> > 
> > CU #1:
> > enum foo;
> > struct s { enum foo *a; } *a;
> > 
> > CU #2:
> > enum foo { x = 0xfffffffff };
> > struct s { enum foo *a; } *b;
> > 
> > Currently STRUCTs 's' are not de-duplicated.
> > 
> 
> What if CU#1 is in base BTF and CU#2 in split module BTF? I think we'd explicitly
> want to avoid deduping "struct s" then since we can't be sure that it is the
> same enum they are pointing at.  That's the logic we employ for structs at 
> least, based upon the rationale that we can't feed back knowledge of types
> from module to kernel BTF since the latter is now fixed (Andrii, do correct me
> if I have this wrong). In such a case the enum is no longer standalone; it
> serves the purpose of allowing us to define a pointer to a module-specific
> type. We recently found some examples of this sort of thing with structs,
> where the struct was defined in module BTF, making dedup fail for some core
> kernel data types, but the problem was restricted to modules which _did_
> define the type so wasn't a major driver of dedup failures. Not sure if
> there's many (any?) enum cases of this in practice.

Hi Alan,

As far as I understand the loop in `btf_dedup_prim_types` guarantees
that only ids from the split module would be remapped:

	struct btf {
    	...
		/* BTF type ID of the first type in this BTF instance:
		 *   - for base BTF it's equal to 1;
		 *   - for split BTF it's equal to biggest type ID of base BTF plus 1.
		 */
		int start_id;
    	...
	}

    ...
	for (i = 0; i < d->btf->nr_types; i++) {
		err = btf_dedup_prim_type(d, d->btf->start_id + i);
		if (err)
			return err;
	}

Thus CU1:foo won't be updated to be CU2:foo and CU1:s will not be the
same as CU2:s. Is that right or am I confused?

Thanks,
Eduard

> 
> I suppose if we could guarantee the dedup happened within the same object
> (kernel or module) we could relax this constraint though?
> 
> Alan
diff mbox series

Patch

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index d88647da2c7f..c34c68d8e8a0 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -2881,6 +2881,7 @@  static int btf_dedup_strings(struct btf_dedup *d);
 static int btf_dedup_prim_types(struct btf_dedup *d);
 static int btf_dedup_struct_types(struct btf_dedup *d);
 static int btf_dedup_ref_types(struct btf_dedup *d);
+static int btf_dedup_standalone_fwds(struct btf_dedup *d);
 static int btf_dedup_compact_types(struct btf_dedup *d);
 static int btf_dedup_remap_types(struct btf_dedup *d);
 
@@ -2988,15 +2989,16 @@  static int btf_dedup_remap_types(struct btf_dedup *d);
  * Algorithm summary
  * =================
  *
- * Algorithm completes its work in 6 separate passes:
+ * Algorithm completes its work in 7 separate passes:
  *
  * 1. Strings deduplication.
  * 2. Primitive types deduplication (int, enum, fwd).
  * 3. Struct/union types deduplication.
- * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
+ * 4. Standalone fwd declarations deduplication.
+ * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
  *    protos, and const/volatile/restrict modifiers).
- * 5. Types compaction.
- * 6. Types remapping.
+ * 6. Types compaction.
+ * 7. Types remapping.
  *
  * Algorithm determines canonical type descriptor, which is a single
  * representative type for each truly unique type. This canonical type is the
@@ -3060,6 +3062,11 @@  int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
 		pr_debug("btf_dedup_struct_types failed:%d\n", err);
 		goto done;
 	}
+	err = btf_dedup_standalone_fwds(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_standalone_fwd failed:%d\n", err);
+		goto done;
+	}
 	err = btf_dedup_ref_types(d);
 	if (err < 0) {
 		pr_debug("btf_dedup_ref_types failed:%d\n", err);
@@ -4525,6 +4532,169 @@  static int btf_dedup_ref_types(struct btf_dedup *d)
 	return 0;
 }
 
+/*
+ * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32).
+ *
+ * The __u32 key/value representations are cast to `void *` before passing
+ * to `hashmap__*` functions. These pseudo-pointers are never dereferenced.
+ *
+ */
+static struct hashmap *name_off_map__new(void)
+{
+	return hashmap__new(btf_dedup_identity_hash_fn,
+			    btf_dedup_equal_fn,
+			    NULL);
+}
+
+static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id)
+{
+	/* This has to be sizeof(void *) in order to be passed to hashmap__find */
+	void *tmp;
+	int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp);
+	/*
+	 * __u64 cast is necessary to avoid pointer to integer conversion size warning.
+	 * It is fine to get rid of this warning as `void *` is used as an integer value.
+	 */
+	if (found)
+		*type_id = (__u64)tmp;
+	return found;
+}
+
+static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id)
+{
+	return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id,
+			    NULL, NULL);
+}
+
+/*
+ * Collect a `name_off_map` that maps type names to type ids for all
+ * canonical structs and unions. If the same name is shared by several
+ * canonical types use a special value 0 to indicate this fact.
+ */
+static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
+{
+	int i, err = 0;
+	__u32 type_id, collision_id;
+	__u16 kind;
+	struct btf_type *t;
+
+	for (i = 0; i < d->btf->nr_types; i++) {
+		type_id = d->btf->start_id + i;
+		t = btf_type_by_id(d->btf, type_id);
+		kind = btf_kind(t);
+
+		if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
+			continue;
+
+		/* Skip non-canonical types */
+		if (type_id != d->map[type_id])
+			continue;
+
+		err = 0;
+		if (name_off_map__find(names_map, t->name_off, &collision_id)) {
+			/* Mark non-unique names with 0 */
+			if (collision_id != 0 && collision_id != type_id)
+				err = name_off_map__set(names_map, t->name_off, 0);
+		} else {
+			err = name_off_map__set(names_map, t->name_off, type_id);
+		}
+
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
+static int btf_dedup_standalone_fwd(struct btf_dedup *d,
+				    struct hashmap *names_map,
+				    __u32 type_id)
+{
+	struct btf_type *t = btf_type_by_id(d->btf, type_id);
+	__u16 kind = btf_kind(t);
+	enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info);
+
+	struct btf_type *cand_t;
+	__u16 cand_kind;
+	__u32 cand_id = 0;
+
+	if (kind != BTF_KIND_FWD)
+		return 0;
+
+	/* Skip if this FWD already has a mapping */
+	if (type_id != d->map[type_id])
+		return 0;
+
+	name_off_map__find(names_map, t->name_off, &cand_id);
+	if (!cand_id)
+		return 0;
+
+	cand_t = btf_type_by_id(d->btf, cand_id);
+	cand_kind = btf_kind(cand_t);
+	if (!(cand_kind == BTF_KIND_STRUCT && fwd_kind == BTF_FWD_STRUCT) &&
+	    !(cand_kind == BTF_KIND_UNION && fwd_kind == BTF_FWD_UNION))
+		return 0;
+
+	d->map[type_id] = cand_id;
+
+	return 0;
+}
+
+/*
+ * Standalone fwd declarations deduplication.
+ *
+ * The lion's share of all FWD declarations is resolved during
+ * `btf_dedup_struct_types` phase when different type graphs are
+ * compared against each other. However, if in some compilation unit a
+ * FWD declaration is not a part of a type graph compared against
+ * another type graph that declaration's canonical type would not be
+ * changed. Example:
+ *
+ * CU #1:
+ *
+ * struct foo;
+ * struct foo *some_global;
+ *
+ * CU #2:
+ *
+ * struct foo { int u; };
+ * struct foo *another_global;
+ *
+ * After `btf_dedup_struct_types` the BTF looks as follows:
+ *
+ * [1] STRUCT 'foo' size=4 vlen=1 ...
+ * [2] INT 'int' size=4 ...
+ * [3] PTR '(anon)' type_id=1
+ * [4] FWD 'foo' fwd_kind=struct
+ * [5] PTR '(anon)' type_id=4
+ *
+ * This pass assumes that such FWD declarations should be mapped to
+ * structs or unions with identical name in case if the name is not
+ * ambiguous.
+ */
+static int btf_dedup_standalone_fwds(struct btf_dedup *d)
+{
+	int i, err;
+	struct hashmap *names_map = name_off_map__new();
+
+	if (!names_map)
+		return -ENOMEM;
+
+	err = btf_dedup_fill_unique_names_map(d, names_map);
+	if (err < 0)
+		goto exit;
+
+	for (i = 0; i < d->btf->nr_types; i++) {
+		err = btf_dedup_standalone_fwd(d, names_map, d->btf->start_id + i);
+		if (err < 0)
+			goto exit;
+	}
+
+exit:
+	hashmap__free(names_map);
+	return err;
+}
+
 /*
  * Compact types.
  *