diff mbox series

[RFC,v3] bpf: Using binary search to improve the performance of btf_find_by_name_kind

Message ID 20240608140835.965949-1-dolinux.peng@gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [RFC,v3] bpf: Using binary search to improve the performance of btf_find_by_name_kind | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-12 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / test
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-llvm-17 / test
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-18 / test
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7458 this patch: 7458
netdev/build_tools success Errors and warnings before: 1 this patch: 1
netdev/cc_maintainers warning 5 maintainers not CCed: john.fastabend@gmail.com kpsingh@kernel.org martin.lau@linux.dev sdf@google.com jolsa@kernel.org
netdev/build_clang success Errors and warnings before: 1176 this patch: 1176
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 7829 this patch: 7829
netdev/checkpatch warning CHECK: Alignment should match open parenthesis CHECK: multiple assignments should be avoided CHECK: spaces preferred around that '+' (ctx:VxV) CHECK: spaces preferred around that '-' (ctx:VxV) WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline fail Was 0 now: 1

Commit Message

Donglin Peng June 8, 2024, 2:08 p.m. UTC
Currently, we are only using the linear search method to find the type id
by the name, which has a time complexity of O(n). This change involves
sorting the names of btf types in ascending order and using binary search,
which has a time complexity of O(log(n)). This idea was inspired by the
following patch:

60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_name()").

At present, this improvement is only for searching in vmlinux's and
module's BTFs.

Another change is the search direction, where we search the BTF first and
then its base, the type id of the first matched btf_type will be returned.

Here is a time-consuming result that finding 87590 type ids by their names in
vmlinux's BTF.

Before: 158426 ms
After:     114 ms

The average lookup performance has improved more than 1000x in the above scenario.

Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
---
Changes in RFC v3:
 - Sort the btf types during the build process in order to reduce memory usage
   and decrease boot time.

RFC v2:
 - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@sangfor.com.cn
---
 include/linux/btf.h |   1 +
 kernel/bpf/btf.c    | 160 +++++++++++++++++++++++++++++++++---
 tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 345 insertions(+), 11 deletions(-)

Comments

Masami Hiramatsu (Google) June 9, 2024, 7:23 a.m. UTC | #1
Hi

On Sat,  8 Jun 2024 07:08:35 -0700
Donglin Peng <dolinux.peng@gmail.com> wrote:

> Currently, we are only using the linear search method to find the type id
> by the name, which has a time complexity of O(n). This change involves
> sorting the names of btf types in ascending order and using binary search,
> which has a time complexity of O(log(n)). This idea was inspired by the
> following patch:
> 
> 60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_name()").
> 
> At present, this improvement is only for searching in vmlinux's and
> module's BTFs.
> 
> Another change is the search direction, where we search the BTF first and
> then its base, the type id of the first matched btf_type will be returned.
> 
> Here is a time-consuming result that finding 87590 type ids by their names in
> vmlinux's BTF.
> 
> Before: 158426 ms
> After:     114 ms
> 
> The average lookup performance has improved more than 1000x in the above scenario.

This looks great improvement! so searching function entry in ~2us?
BTW, I have some comments below, but basically it looks good to me and
it passed my ftracetest.

Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

> 
> Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
> ---
> Changes in RFC v3:
>  - Sort the btf types during the build process in order to reduce memory usage
>    and decrease boot time.
> 
> RFC v2:
>  - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@sangfor.com.cn
> ---
>  include/linux/btf.h |   1 +
>  kernel/bpf/btf.c    | 160 +++++++++++++++++++++++++++++++++---
>  tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 345 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index f9e56fd12a9f..1dc1000a7dc9 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -214,6 +214,7 @@ bool btf_is_kernel(const struct btf *btf);
>  bool btf_is_module(const struct btf *btf);
>  struct module *btf_try_get_module(const struct btf *btf);
>  u32 btf_nr_types(const struct btf *btf);
> +u32 btf_type_cnt(const struct btf *btf);
>  bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
>  			   const struct btf_member *m,
>  			   u32 expected_offset, u32 expected_size);
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 821063660d9f..5b7b464204bf 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -262,6 +262,7 @@ struct btf {
>  	u32 data_size;
>  	refcount_t refcnt;
>  	u32 id;
> +	u32 nr_types_sorted;
>  	struct rcu_head rcu;
>  	struct btf_kfunc_set_tab *kfunc_set_tab;
>  	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
> @@ -542,23 +543,102 @@ u32 btf_nr_types(const struct btf *btf)
>  	return total;
>  }
>  
> -s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
> +u32 btf_type_cnt(const struct btf *btf)
> +{
> +	return btf->start_id + btf->nr_types;
> +}
> +
> +static s32 btf_find_by_name_bsearch(const struct btf *btf, const char *name,
> +				    int *start, int *end)
>  {
>  	const struct btf_type *t;
> -	const char *tname;
> -	u32 i, total;
> +	const char *name_buf;
> +	int low, low_start, mid, high, high_end;
> +	int ret, start_id;
> +
> +	start_id = btf->base_btf ? btf->start_id : 1;
> +	low_start = low = start_id;
> +	high_end = high = start_id + btf->nr_types_sorted - 1;
> +
> +	while (low <= high) {
> +		mid = low + (high - low) / 2;
> +		t = btf_type_by_id(btf, mid);
> +		name_buf = btf_name_by_offset(btf, t->name_off);
> +		ret = strcmp(name, name_buf);
> +		if (ret > 0)
> +			low = mid + 1;
> +		else if (ret < 0)
> +			high = mid - 1;
> +		else
> +			break;
> +	}
>  
> -	total = btf_nr_types(btf);
> -	for (i = 1; i < total; i++) {
> -		t = btf_type_by_id(btf, i);
> -		if (BTF_INFO_KIND(t->info) != kind)
> -			continue;
> +	if (low > high)
> +		return -ESRCH;

nit: -ENOENT ?

>  
> -		tname = btf_name_by_offset(btf, t->name_off);
> -		if (!strcmp(tname, name))
> -			return i;
> +	if (start) {
> +		low = mid;
> +		while (low > low_start) {
> +			t = btf_type_by_id(btf, low-1);
> +			name_buf = btf_name_by_offset(btf, t->name_off);
> +			if (strcmp(name, name_buf))
> +				break;
> +			low--;
> +		}
> +		*start = low;
>  	}
>  
> +	if (end) {
> +		high = mid;
> +		while (high < high_end) {
> +			t = btf_type_by_id(btf, high+1);
> +			name_buf = btf_name_by_offset(btf, t->name_off);
> +			if (strcmp(name, name_buf))
> +				break;
> +			high++;
> +		}
> +		*end = high;
> +	}
> +
> +	return mid;
> +}
> +
> +s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
> +{
> +	const struct btf_type *t;
> +	const char *tname;
> +	int start, end;
> +	s32 id, total;
> +
> +	do {
> +		if (btf->nr_types_sorted) {
> +			/* binary search */
> +			id = btf_find_by_name_bsearch(btf, name, &start, &end);
> +			if (id > 0) {
> +				while (start <= end) {
> +					t = btf_type_by_id(btf, start);
> +					if (BTF_INFO_KIND(t->info) == kind)
> +						return start;
> +					start++;
> +				}
> +			}
> +		} else {
> +			/* linear search */
> +			total = btf_type_cnt(btf);
> +			for (id = btf->base_btf ? btf->start_id : 1;
> +				id < total; id++) {
> +				t = btf_type_by_id(btf, id);
> +				if (BTF_INFO_KIND(t->info) != kind)
> +					continue;
> +
> +				tname = btf_name_by_offset(btf, t->name_off);
> +				if (!strcmp(tname, name))
> +					return id;
> +			}
> +		}
> +		btf = btf->base_btf;
> +	} while (btf);
> +
>  	return -ENOENT;
>  }
>  
> @@ -5979,6 +6059,56 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
>  	return kctx_type_id;
>  }
>  
> +static int btf_check_sort(struct btf *btf, int start_id)
> +{
> +	int i, n, nr_names = 0;
> +
> +	n = btf_nr_types(btf);
> +	for (i = start_id; i < n; i++) {
> +		const struct btf_type *t;
> +		const char *name;
> +
> +		t = btf_type_by_id(btf, i);
> +		if (!t)
> +			return -EINVAL;
> +
> +		name = btf_str_by_offset(btf, t->name_off);
> +		if (!str_is_empty(name))
> +			nr_names++;

else {
	goto out; 
}
? (*)

> +	}
> +
> +	if (nr_names < 3)
> +		goto out;

What does this `3` mean?

> +
> +	for (i = 0; i < nr_names - 1; i++) {
> +		const struct btf_type *t1, *t2;
> +		const char *s1, *s2;
> +
> +		t1 = btf_type_by_id(btf, start_id + i);
> +		if (!t1)
> +			return -EINVAL;
> +
> +		s1 = btf_str_by_offset(btf, t1->name_off);
> +		if (str_is_empty(s1))
> +			goto out;

Why not continue? This case is expected, or the previous block (*)
must go `out` at that point.

> +
> +		t2 = btf_type_by_id(btf, start_id + i + 1);
> +		if (!t2)
> +			return -EINVAL;
> +
> +		s2 = btf_str_by_offset(btf, t2->name_off);
> +		if (str_is_empty(s2))
> +			goto out;

Ditto.

> +
> +		if (strcmp(s1, s2) > 0)
> +			goto out;
> +	}
> +
> +	btf->nr_types_sorted = nr_names;
> +out:
> +	return 0;
> +}
> +
>  BTF_ID_LIST(bpf_ctx_convert_btf_id)
>  BTF_ID(struct, bpf_ctx_convert)
>  
> @@ -6029,6 +6159,10 @@ struct btf *btf_parse_vmlinux(void)
>  	if (err)
>  		goto errout;
>  
> +	err = btf_check_sort(btf, 1);

Why `1`?

> +	if (err)
> +		goto errout;
> +
>  	/* btf_parse_vmlinux() runs under bpf_verifier_lock */
>  	bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
>  
> @@ -6111,6 +6245,10 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
>  	if (err)
>  		goto errout;
>  
> +	err = btf_check_sort(btf, btf_nr_types(base_btf));
> +	if (err)
> +		goto errout;
> +
>  	btf_verifier_env_free(env);
>  	refcount_set(&btf->refcnt, 1);
>  	return btf;
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 2d0840ef599a..93c1ab677bfa 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -1,6 +1,9 @@
>  // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>  /* Copyright (c) 2018 Facebook */
>  
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
>  #include <byteswap.h>
>  #include <endian.h>
>  #include <stdio.h>
> @@ -3072,6 +3075,7 @@ static int btf_dedup_ref_types(struct btf_dedup *d);
>  static int btf_dedup_resolve_fwds(struct btf_dedup *d);
>  static int btf_dedup_compact_types(struct btf_dedup *d);
>  static int btf_dedup_remap_types(struct btf_dedup *d);
> +static int btf_sort_type_by_name(struct btf *btf);
>  
>  /*
>   * Deduplicate BTF types and strings.
> @@ -3270,6 +3274,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
>  		pr_debug("btf_dedup_remap_types failed:%d\n", err);
>  		goto done;
>  	}
> +	err = btf_sort_type_by_name(btf);
> +	if (err < 0) {
> +		pr_debug("btf_sort_type_by_name failed:%d\n", err);
> +		goto done;
> +	}
>  
>  done:
>  	btf_dedup_free(d);
> @@ -5212,3 +5221,189 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
>  
>  	return 0;
>  }
> +
> +static int btf_compare_type_name(const void *a, const void *b, void *priv)
> +{
> +	struct btf *btf = (struct btf *)priv;
> +	__u32 ta = *(const __u32 *)a;
> +	__u32 tb = *(const __u32 *)b;
> +	struct btf_type *bta, *btb;
> +	const char *na, *nb;
> +
> +	bta = (struct btf_type *)(btf->types_data + ta);
> +	btb = (struct btf_type *)(btf->types_data + tb);
> +	na = btf__str_by_offset(btf, bta->name_off);
> +	nb = btf__str_by_offset(btf, btb->name_off);
> +
> +	return strcmp(na, nb);
> +}
> +
> +static int btf_compare_offs(const void *o1, const void *o2)
> +{
> +	__u32 *offs1 = (__u32 *)o1;
> +	__u32 *offs2 = (__u32 *)o2;
> +
> +	return *offs1 - *offs2;
> +}
> +
> +static inline __u32 btf_get_mapped_type(struct btf *btf, __u32 *maps, __u32 type)
> +{
> +	if (type < btf->start_id)
> +		return type;
> +	return maps[type - btf->start_id] + btf->start_id;
> +}
> +
> +/*
> + * Collect and move the btf_types with names to the start location, and
> + * sort them in ascending order by name, so we can use the binary search
> + * method.
> + */
> +static int btf_sort_type_by_name(struct btf *btf)
> +{
> +	struct btf_type *bt;
> +	__u32 *new_type_offs = NULL, *new_type_offs_noname = NULL;
> +	__u32 *maps = NULL, *found_offs;
> +	void *new_types_data = NULL, *loc_data;
> +	int i, j, k, type_cnt, ret = 0, type_size;
> +	__u32 data_size;
> +
> +	if (btf_ensure_modifiable(btf))
> +		return libbpf_err(-ENOMEM);
> +
> +	type_cnt = btf->nr_types;
> +	data_size = btf->type_offs_cap * sizeof(*new_type_offs);
> +
> +	maps = (__u32 *)malloc(type_cnt * sizeof(__u32));
> +	if (!maps) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	new_type_offs = (__u32 *)malloc(data_size);

nit:
new_type_offs = (__u32 *)calloc(btf->type_offs_cap, sizeof(*new_type_offs))

> +	if (!new_type_offs) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	new_type_offs_noname = (__u32 *)malloc(data_size);
> +	if (!new_type_offs_noname) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	new_types_data = malloc(btf->types_data_cap);
> +	if (!new_types_data) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	memset(new_type_offs, 0, data_size);

Then this memset is not required.

Thank you,
Donglin Peng June 9, 2024, 4:12 p.m. UTC | #2
On Sun, Jun 9, 2024 at 3:23 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi
>
> On Sat,  8 Jun 2024 07:08:35 -0700
> Donglin Peng <dolinux.peng@gmail.com> wrote:
>
> > Currently, we are only using the linear search method to find the type id
> > by the name, which has a time complexity of O(n). This change involves
> > sorting the names of btf types in ascending order and using binary search,
> > which has a time complexity of O(log(n)). This idea was inspired by the
> > following patch:
> >
> > 60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_name()").
> >
> > At present, this improvement is only for searching in vmlinux's and
> > module's BTFs.
> >
> > Another change is the search direction, where we search the BTF first and
> > then its base, the type id of the first matched btf_type will be returned.
> >
> > Here is a time-consuming result that finding 87590 type ids by their names in
> > vmlinux's BTF.
> >
> > Before: 158426 ms
> > After:     114 ms
> >
> > The average lookup performance has improved more than 1000x in the above scenario.
>
> This looks great improvement! so searching function entry in ~2us?

Yes, it is the average test result on a Ubuntu VM. However, if you are
specifically
searching for a particular btf_type, it may take more time due to cache misses.

The data is obtained from the following test code:

 /*
  * total: the number of btf_type
  * nr_names: the number of btf_type that has a name
  *
  */

 pr_info("Begin to test ... total: %d, nr_names: %d\n", total, nr_names);

 t0 = ktime_get_ns();
 for (i = 0; i < nr_names; i++) {
         if (btf_find_by_name_kind(btf, tn[i].name, tn[i].kind) <= 0) {
                 pr_err("Find failed: name: %s, kind: %d\n",
tn[i].name, tn[i].kind);
                 break;
         }

         if ((i & 0xfff) == 0)
                 touch_nmi_watchdog();
 }
 delta = ktime_get_ns() - t0;

 pr_info("Consume total: %llu ms,  Per btf_type: %llu ns\n", delta /
NSEC_PER_MSEC, delta / nr_names);


Using binary search:
[   40.381095] Begin to test ... total: 155010, nr_names: 87596
[   40.493024] Consume total: 111 ms,  Per btf_type: 1275 ns

Using linear search:
[   56.464520] Begin to test ... total: 155003, nr_names: 87591
[  213.344919] Consume total: 156880 ms,  Per btf_type: 1791054 ns

> BTW, I have some comments below, but basically it looks good to me and
> it passed my ftracetest.

Thank you.

>
> Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>

I will include this in v4.

> >
> > Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
> > ---
> > Changes in RFC v3:
> >  - Sort the btf types during the build process in order to reduce memory usage
> >    and decrease boot time.
> >
> > RFC v2:
> >  - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@sangfor.com.cn
> > ---
> >  include/linux/btf.h |   1 +
> >  kernel/bpf/btf.c    | 160 +++++++++++++++++++++++++++++++++---
> >  tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 345 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index f9e56fd12a9f..1dc1000a7dc9 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -214,6 +214,7 @@ bool btf_is_kernel(const struct btf *btf);
> >  bool btf_is_module(const struct btf *btf);
> >  struct module *btf_try_get_module(const struct btf *btf);
> >  u32 btf_nr_types(const struct btf *btf);
> > +u32 btf_type_cnt(const struct btf *btf);
> >  bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
> >                          const struct btf_member *m,
> >                          u32 expected_offset, u32 expected_size);
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 821063660d9f..5b7b464204bf 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -262,6 +262,7 @@ struct btf {
> >       u32 data_size;
> >       refcount_t refcnt;
> >       u32 id;
> > +     u32 nr_types_sorted;
> >       struct rcu_head rcu;
> >       struct btf_kfunc_set_tab *kfunc_set_tab;
> >       struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
> > @@ -542,23 +543,102 @@ u32 btf_nr_types(const struct btf *btf)
> >       return total;
> >  }
> >
> > -s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
> > +u32 btf_type_cnt(const struct btf *btf)
> > +{
> > +     return btf->start_id + btf->nr_types;
> > +}
> > +
> > +static s32 btf_find_by_name_bsearch(const struct btf *btf, const char *name,
> > +                                 int *start, int *end)
> >  {
> >       const struct btf_type *t;
> > -     const char *tname;
> > -     u32 i, total;
> > +     const char *name_buf;
> > +     int low, low_start, mid, high, high_end;
> > +     int ret, start_id;
> > +
> > +     start_id = btf->base_btf ? btf->start_id : 1;
> > +     low_start = low = start_id;
> > +     high_end = high = start_id + btf->nr_types_sorted - 1;
> > +
> > +     while (low <= high) {
> > +             mid = low + (high - low) / 2;
> > +             t = btf_type_by_id(btf, mid);
> > +             name_buf = btf_name_by_offset(btf, t->name_off);
> > +             ret = strcmp(name, name_buf);
> > +             if (ret > 0)
> > +                     low = mid + 1;
> > +             else if (ret < 0)
> > +                     high = mid - 1;
> > +             else
> > +                     break;
> > +     }
> >
> > -     total = btf_nr_types(btf);
> > -     for (i = 1; i < total; i++) {
> > -             t = btf_type_by_id(btf, i);
> > -             if (BTF_INFO_KIND(t->info) != kind)
> > -                     continue;
> > +     if (low > high)
> > +             return -ESRCH;
>
> nit: -ENOENT ?

Good, I will update it in the v4.

>
> >
> > -             tname = btf_name_by_offset(btf, t->name_off);
> > -             if (!strcmp(tname, name))
> > -                     return i;
> > +     if (start) {
> > +             low = mid;
> > +             while (low > low_start) {
> > +                     t = btf_type_by_id(btf, low-1);
> > +                     name_buf = btf_name_by_offset(btf, t->name_off);
> > +                     if (strcmp(name, name_buf))
> > +                             break;
> > +                     low--;
> > +             }
> > +             *start = low;
> >       }
> >
> > +     if (end) {
> > +             high = mid;
> > +             while (high < high_end) {
> > +                     t = btf_type_by_id(btf, high+1);
> > +                     name_buf = btf_name_by_offset(btf, t->name_off);
> > +                     if (strcmp(name, name_buf))
> > +                             break;
> > +                     high++;
> > +             }
> > +             *end = high;
> > +     }
> > +
> > +     return mid;
> > +}
> > +
> > +s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
> > +{
> > +     const struct btf_type *t;
> > +     const char *tname;
> > +     int start, end;
> > +     s32 id, total;
> > +
> > +     do {
> > +             if (btf->nr_types_sorted) {
> > +                     /* binary search */
> > +                     id = btf_find_by_name_bsearch(btf, name, &start, &end);
> > +                     if (id > 0) {
> > +                             while (start <= end) {
> > +                                     t = btf_type_by_id(btf, start);
> > +                                     if (BTF_INFO_KIND(t->info) == kind)
> > +                                             return start;
> > +                                     start++;
> > +                             }
> > +                     }
> > +             } else {
> > +                     /* linear search */
> > +                     total = btf_type_cnt(btf);
> > +                     for (id = btf->base_btf ? btf->start_id : 1;
> > +                             id < total; id++) {
> > +                             t = btf_type_by_id(btf, id);
> > +                             if (BTF_INFO_KIND(t->info) != kind)
> > +                                     continue;
> > +
> > +                             tname = btf_name_by_offset(btf, t->name_off);
> > +                             if (!strcmp(tname, name))
> > +                                     return id;
> > +                     }
> > +             }
> > +             btf = btf->base_btf;
> > +     } while (btf);
> > +
> >       return -ENOENT;
> >  }
> >
> > @@ -5979,6 +6059,56 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
> >       return kctx_type_id;
> >  }
> >
> > +static int btf_check_sort(struct btf *btf, int start_id)
> > +{
> > +     int i, n, nr_names = 0;
> > +
> > +     n = btf_nr_types(btf);
> > +     for (i = start_id; i < n; i++) {
> > +             const struct btf_type *t;
> > +             const char *name;
> > +
> > +             t = btf_type_by_id(btf, i);
> > +             if (!t)
> > +                     return -EINVAL;
> > +
> > +             name = btf_str_by_offset(btf, t->name_off);
> > +             if (!str_is_empty(name))
> > +                     nr_names++;
>
> else {
>         goto out;
> }
> ? (*)
>

No need for that. The purpose of the for loop here is to count the
number of btf_types
with names. If the BTF file is sorted, the btf_types with names will
be placed at the
beginning, while the btf_types without names will be appended at the end.

> > +     }
> > +
> > +     if (nr_names < 3)
> > +             goto out;
>
> What does this `3` mean?

It is just my own opinion. The binary search method does not offer any
superiority if there
 are only 1 or 2 btf_types with names.

>
> > +
> > +     for (i = 0; i < nr_names - 1; i++) {
> > +             const struct btf_type *t1, *t2;
> > +             const char *s1, *s2;
> > +
> > +             t1 = btf_type_by_id(btf, start_id + i);
> > +             if (!t1)
> > +                     return -EINVAL;
> > +
> > +             s1 = btf_str_by_offset(btf, t1->name_off);
> > +             if (str_is_empty(s1))
> > +                     goto out;
>
> Why not continue? This case is expected, or the previous block (*)
> must go `out` at that point.

The purpose of the for loop here is to verify whether the first
nr_name btf_types are sorted
or not. If the BTF file is indeed sorted, btf_types with names will be
placed at the beginning,
and the number of btf_types with names is equal to nr_names. However,
if a btf_type without a
name is found within the first nr_name btf_types, it indicates that
the BTF file is not sorted.

>
> > +
> > +             t2 = btf_type_by_id(btf, start_id + i + 1);
> > +             if (!t2)
> > +                     return -EINVAL;
> > +
> > +             s2 = btf_str_by_offset(btf, t2->name_off);
> > +             if (str_is_empty(s2))
> > +                     goto out;
>
> Ditto.
>

Ditto.

> > +
> > +             if (strcmp(s1, s2) > 0)
> > +                     goto out;
> > +     }
> > +
> > +     btf->nr_types_sorted = nr_names;
> > +out:
> > +     return 0;
> > +}
> > +
> >  BTF_ID_LIST(bpf_ctx_convert_btf_id)
> >  BTF_ID(struct, bpf_ctx_convert)
> >
> > @@ -6029,6 +6159,10 @@ struct btf *btf_parse_vmlinux(void)
> >       if (err)
> >               goto errout;
> >
> > +     err = btf_check_sort(btf, 1);
>
> Why `1`?

The start ID for the sorted btf_types in vmlinux's BTF is 1, as the
btf_void is placed before
the sorted types.

>
> > +     if (err)
> > +             goto errout;
> > +
> >       /* btf_parse_vmlinux() runs under bpf_verifier_lock */
> >       bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
> >
> > @@ -6111,6 +6245,10 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
> >       if (err)
> >               goto errout;
> >
> > +     err = btf_check_sort(btf, btf_nr_types(base_btf));
> > +     if (err)
> > +             goto errout;
> > +
> >       btf_verifier_env_free(env);
> >       refcount_set(&btf->refcnt, 1);
> >       return btf;
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 2d0840ef599a..93c1ab677bfa 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -1,6 +1,9 @@
> >  // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> >  /* Copyright (c) 2018 Facebook */
> >
> > +#ifndef _GNU_SOURCE
> > +#define _GNU_SOURCE
> > +#endif
> >  #include <byteswap.h>
> >  #include <endian.h>
> >  #include <stdio.h>
> > @@ -3072,6 +3075,7 @@ static int btf_dedup_ref_types(struct btf_dedup *d);
> >  static int btf_dedup_resolve_fwds(struct btf_dedup *d);
> >  static int btf_dedup_compact_types(struct btf_dedup *d);
> >  static int btf_dedup_remap_types(struct btf_dedup *d);
> > +static int btf_sort_type_by_name(struct btf *btf);
> >
> >  /*
> >   * Deduplicate BTF types and strings.
> > @@ -3270,6 +3274,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
> >               pr_debug("btf_dedup_remap_types failed:%d\n", err);
> >               goto done;
> >       }
> > +     err = btf_sort_type_by_name(btf);
> > +     if (err < 0) {
> > +             pr_debug("btf_sort_type_by_name failed:%d\n", err);
> > +             goto done;
> > +     }
> >
> >  done:
> >       btf_dedup_free(d);
> > @@ -5212,3 +5221,189 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
> >
> >       return 0;
> >  }
> > +
> > +static int btf_compare_type_name(const void *a, const void *b, void *priv)
> > +{
> > +     struct btf *btf = (struct btf *)priv;
> > +     __u32 ta = *(const __u32 *)a;
> > +     __u32 tb = *(const __u32 *)b;
> > +     struct btf_type *bta, *btb;
> > +     const char *na, *nb;
> > +
> > +     bta = (struct btf_type *)(btf->types_data + ta);
> > +     btb = (struct btf_type *)(btf->types_data + tb);
> > +     na = btf__str_by_offset(btf, bta->name_off);
> > +     nb = btf__str_by_offset(btf, btb->name_off);
> > +
> > +     return strcmp(na, nb);
> > +}
> > +
> > +static int btf_compare_offs(const void *o1, const void *o2)
> > +{
> > +     __u32 *offs1 = (__u32 *)o1;
> > +     __u32 *offs2 = (__u32 *)o2;
> > +
> > +     return *offs1 - *offs2;
> > +}
> > +
> > +static inline __u32 btf_get_mapped_type(struct btf *btf, __u32 *maps, __u32 type)
> > +{
> > +     if (type < btf->start_id)
> > +             return type;
> > +     return maps[type - btf->start_id] + btf->start_id;
> > +}
> > +
> > +/*
> > + * Collect and move the btf_types with names to the start location, and
> > + * sort them in ascending order by name, so we can use the binary search
> > + * method.
> > + */
> > +static int btf_sort_type_by_name(struct btf *btf)
> > +{
> > +     struct btf_type *bt;
> > +     __u32 *new_type_offs = NULL, *new_type_offs_noname = NULL;
> > +     __u32 *maps = NULL, *found_offs;
> > +     void *new_types_data = NULL, *loc_data;
> > +     int i, j, k, type_cnt, ret = 0, type_size;
> > +     __u32 data_size;
> > +
> > +     if (btf_ensure_modifiable(btf))
> > +             return libbpf_err(-ENOMEM);
> > +
> > +     type_cnt = btf->nr_types;
> > +     data_size = btf->type_offs_cap * sizeof(*new_type_offs);
> > +
> > +     maps = (__u32 *)malloc(type_cnt * sizeof(__u32));
> > +     if (!maps) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
> > +
> > +     new_type_offs = (__u32 *)malloc(data_size);
>
> nit:
> new_type_offs = (__u32 *)calloc(btf->type_offs_cap, sizeof(*new_type_offs))

Okay, I will update it in v4.

>
> > +     if (!new_type_offs) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
> > +
> > +     new_type_offs_noname = (__u32 *)malloc(data_size);
> > +     if (!new_type_offs_noname) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
> > +
> > +     new_types_data = malloc(btf->types_data_cap);
> > +     if (!new_types_data) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
> > +
> > +     memset(new_type_offs, 0, data_size);
>
> Then this memset is not required.

Good, I will update it in v4.

>
> Thank you,
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
Eduard Zingerman June 11, 2024, 10:13 a.m. UTC | #3
On Sat, 2024-06-08 at 07:08 -0700, Donglin Peng wrote:

[...]

> Changes in RFC v3:
>  - Sort the btf types during the build process in order to reduce memory usage
>    and decrease boot time.
> 
> RFC v2:
>  - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@sangfor.com.cn
> ---
>  include/linux/btf.h |   1 +
>  kernel/bpf/btf.c    | 160 +++++++++++++++++++++++++++++++++---

I think that kernel part is in a good shape,
please split it as a separate commit.

>  tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 345 insertions(+), 11 deletions(-)

[...]

> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 2d0840ef599a..93c1ab677bfa 100644

I'm not sure that libbpf is the best place to put this functionality,
as there might be different kinds of orderings
(e.g. see a fresh commit to bpftool to output stable vmlinux.h:
 94133cf24bb3 "bpftool: Introduce btf c dump sorting").

I'm curious what Andrii, Alan and Arnaldo think on libbpf vs pahole
for this feature.

Also, I have a selftests build failure with this patch-set
(and I suspect that a bunch of dedup test cases would need an update):

$ pwd
/home/eddy/work/bpf-next/tools/testing/selftests/bpf
$ make -j14 test_progs
...

  GEN-SKEL [test_progs] access_map_in_map.skel.h
Binary files /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked2.o and /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked3.o differ
make: *** [Makefile:658: /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.skel.h] Error 1
make: *** Waiting for unfinished jobs....

If this change remains in libbpf, I think it would be great to update
btf_find_by_name_kind() to work the same way as kernel one.

> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c

[...]

> +static int btf_sort_type_by_name(struct btf *btf)
> +{
> +	struct btf_type *bt;
> +	__u32 *new_type_offs = NULL, *new_type_offs_noname = NULL;
> +	__u32 *maps = NULL, *found_offs;
> +	void *new_types_data = NULL, *loc_data;
> +	int i, j, k, type_cnt, ret = 0, type_size;
> +	__u32 data_size;
> +
> +	if (btf_ensure_modifiable(btf))
> +		return libbpf_err(-ENOMEM);
> +
> +	type_cnt = btf->nr_types;
> +	data_size = btf->type_offs_cap * sizeof(*new_type_offs);
> +
> +	maps = (__u32 *)malloc(type_cnt * sizeof(__u32));
> +	if (!maps) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	new_type_offs = (__u32 *)malloc(data_size);
> +	if (!new_type_offs) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	new_type_offs_noname = (__u32 *)malloc(data_size);
> +	if (!new_type_offs_noname) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}

What is the point of separating offsets in new_type_offs vs
new_type_offs_noname? It should be possible to use a single offsets
array and have a comparison function that puts all named types before
unnamed.

> +
> +	new_types_data = malloc(btf->types_data_cap);
> +	if (!new_types_data) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	memset(new_type_offs, 0, data_size);
> +
> +	for (i = 0, j = 0, k = 0; i < type_cnt; i++) {
> +		const char *name;
> +
> +		bt = (struct btf_type *)(btf->types_data + btf->type_offs[i]);
> +		name = btf__str_by_offset(btf, bt->name_off);
> +		if (!name || !name[0])
> +			new_type_offs_noname[k++] = btf->type_offs[i];
> +		else
> +			new_type_offs[j++] = btf->type_offs[i];
> +	}
> +
> +	memmove(new_type_offs + j, new_type_offs_noname, sizeof(__u32) * k);
> +
> +	qsort_r(new_type_offs, j, sizeof(*new_type_offs),
> +		btf_compare_type_name, btf);
> +
> +	for (i = 0; i < type_cnt; i++) {
> +		found_offs = bsearch(&new_type_offs[i], btf->type_offs, type_cnt,
> +					sizeof(__u32), btf_compare_offs);
> +		if (!found_offs) {
> +			ret = -EINVAL;
> +			goto err_out;
> +		}
> +		maps[found_offs - btf->type_offs] = i;
> +	}
> +
> +	loc_data = new_types_data;
> +	for (i = 0; i < type_cnt; i++, loc_data += type_size) {
> +		bt = (struct btf_type *)(btf->types_data + new_type_offs[i]);
> +		type_size = btf_type_size(bt);
> +		if (type_size < 0) {
> +			ret = type_size;
> +			goto err_out;
> +		}
> +
> +		memcpy(loc_data, bt, type_size);
> +
> +		bt = (struct btf_type *)loc_data;
> +		switch (btf_kind(bt)) {

Please take a look at btf_dedup_remap_types(): it uses newly added
iterator interface to enumerate all ID references in the type.
It could be used here to avoid enumerating every BTF kind.
Also, the d->hypot_map could be used instead of `maps`.
And if so, I think that it should be possible to put this pass before
btf_dedup_remap_types() in order for it to do the remapping.

Alternatively, it might make sense to merge this pass with
btf_dedup_compact_types() in order to minimize number of operations,
e.g. as in my crude attempt:
https://github.com/eddyz87/bpf/tree/binsort-btf-dedup
(fails with similar selftests issue).

> +		case BTF_KIND_PTR:
> +		case BTF_KIND_CONST:
> +		case BTF_KIND_VOLATILE:
> +		case BTF_KIND_RESTRICT:
> +		case BTF_KIND_TYPEDEF:
> +		case BTF_KIND_TYPE_TAG:
> +		case BTF_KIND_FUNC:
> +		case BTF_KIND_VAR:
> +		case BTF_KIND_DECL_TAG:
> +			bt->type = btf_get_mapped_type(btf, maps, bt->type);
> +			break;

[...]
Donglin Peng June 15, 2024, 11:49 a.m. UTC | #4
On Tue, Jun 11, 2024 at 6:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Sat, 2024-06-08 at 07:08 -0700, Donglin Peng wrote:
>
> [...]
>
> > Changes in RFC v3:
> >  - Sort the btf types during the build process in order to reduce memory usage
> >    and decrease boot time.
> >
> > RFC v2:
> >  - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@sangfor.com.cn
> > ---
> >  include/linux/btf.h |   1 +
> >  kernel/bpf/btf.c    | 160 +++++++++++++++++++++++++++++++++---
>
> I think that kernel part is in a good shape,
> please split it as a separate commit.

Okay, thanks.

>
> >  tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 345 insertions(+), 11 deletions(-)
>
> [...]
>
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 2d0840ef599a..93c1ab677bfa 100644
>
> I'm not sure that libbpf is the best place to put this functionality,
> as there might be different kinds of orderings
> (e.g. see a fresh commit to bpftool to output stable vmlinux.h:
>  94133cf24bb3 "bpftool: Introduce btf c dump sorting").

Thanks, I think it would be better to put it into the libbpf. However, I would
also like to hear the opinions of others.

>
> I'm curious what Andrii, Alan and Arnaldo think on libbpf vs pahole
> for this feature.
>
> Also, I have a selftests build failure with this patch-set
> (and I suspect that a bunch of dedup test cases would need an update):

I appologize for the bug in my patch that caused the issue. I will fix it.

>
> $ pwd
> /home/eddy/work/bpf-next/tools/testing/selftests/bpf
> $ make -j14 test_progs
> ...
>
>   GEN-SKEL [test_progs] access_map_in_map.skel.h
> Binary files /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked2.o and /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked3.o differ
> make: *** [Makefile:658: /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.skel.h] Error 1
> make: *** Waiting for unfinished jobs....

Sorry, I neglected to perform an ID remap for the btf_types in the BTF.ext
section. I will fix it.

>
> If this change remains in libbpf, I think it would be great to update
> btf_find_by_name_kind() to work the same way as kernel one.

Sounds good, we might do it later.

>
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
>
> [...]
>
> > +static int btf_sort_type_by_name(struct btf *btf)
> > +{
> > +     struct btf_type *bt;
> > +     __u32 *new_type_offs = NULL, *new_type_offs_noname = NULL;
> > +     __u32 *maps = NULL, *found_offs;
> > +     void *new_types_data = NULL, *loc_data;
> > +     int i, j, k, type_cnt, ret = 0, type_size;
> > +     __u32 data_size;
> > +
> > +     if (btf_ensure_modifiable(btf))
> > +             return libbpf_err(-ENOMEM);
> > +
> > +     type_cnt = btf->nr_types;
> > +     data_size = btf->type_offs_cap * sizeof(*new_type_offs);
> > +
> > +     maps = (__u32 *)malloc(type_cnt * sizeof(__u32));
> > +     if (!maps) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
> > +
> > +     new_type_offs = (__u32 *)malloc(data_size);
> > +     if (!new_type_offs) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
> > +
> > +     new_type_offs_noname = (__u32 *)malloc(data_size);
> > +     if (!new_type_offs_noname) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
>
> What is the point of separating offsets in new_type_offs vs
> new_type_offs_noname? It should be possible to use a single offsets
> array and have a comparison function that puts all named types before
> unnamed.

Great, you are right.

>
> > +
> > +     new_types_data = malloc(btf->types_data_cap);
> > +     if (!new_types_data) {
> > +             ret = -ENOMEM;
> > +             goto err_out;
> > +     }
> > +
> > +     memset(new_type_offs, 0, data_size);
> > +
> > +     for (i = 0, j = 0, k = 0; i < type_cnt; i++) {
> > +             const char *name;
> > +
> > +             bt = (struct btf_type *)(btf->types_data + btf->type_offs[i]);
> > +             name = btf__str_by_offset(btf, bt->name_off);
> > +             if (!name || !name[0])
> > +                     new_type_offs_noname[k++] = btf->type_offs[i];
> > +             else
> > +                     new_type_offs[j++] = btf->type_offs[i];
> > +     }
> > +
> > +     memmove(new_type_offs + j, new_type_offs_noname, sizeof(__u32) * k);
> > +
> > +     qsort_r(new_type_offs, j, sizeof(*new_type_offs),
> > +             btf_compare_type_name, btf);
> > +
> > +     for (i = 0; i < type_cnt; i++) {
> > +             found_offs = bsearch(&new_type_offs[i], btf->type_offs, type_cnt,
> > +                                     sizeof(__u32), btf_compare_offs);
> > +             if (!found_offs) {
> > +                     ret = -EINVAL;
> > +                     goto err_out;
> > +             }
> > +             maps[found_offs - btf->type_offs] = i;
> > +     }
> > +
> > +     loc_data = new_types_data;
> > +     for (i = 0; i < type_cnt; i++, loc_data += type_size) {
> > +             bt = (struct btf_type *)(btf->types_data + new_type_offs[i]);
> > +             type_size = btf_type_size(bt);
> > +             if (type_size < 0) {
> > +                     ret = type_size;
> > +                     goto err_out;
> > +             }
> > +
> > +             memcpy(loc_data, bt, type_size);
> > +
> > +             bt = (struct btf_type *)loc_data;
> > +             switch (btf_kind(bt)) {
>
> Please take a look at btf_dedup_remap_types(): it uses newly added
> iterator interface to enumerate all ID references in the type.
> It could be used here to avoid enumerating every BTF kind.
> Also, the d->hypot_map could be used instead of `maps`.
> And if so, I think that it should be possible to put this pass before
> btf_dedup_remap_types() in order for it to do the remapping.

Thank you. I will revise the code.

>
> Alternatively, it might make sense to merge this pass with
> btf_dedup_compact_types() in order to minimize number of operations,
> e.g. as in my crude attempt:
> https://github.com/eddyz87/bpf/tree/binsort-btf-dedup

Thank you. I would refer to your patch.

> (fails with similar selftests issue).

In addition to the bug in my patch, I have also identified a bug in
linker_fixup_btf
in the libbpf. After resolving the issue, the selftests successfully
passed, and I will
create a new patch to address the bug.

>
> > +             case BTF_KIND_PTR:
> > +             case BTF_KIND_CONST:
> > +             case BTF_KIND_VOLATILE:
> > +             case BTF_KIND_RESTRICT:
> > +             case BTF_KIND_TYPEDEF:
> > +             case BTF_KIND_TYPE_TAG:
> > +             case BTF_KIND_FUNC:
> > +             case BTF_KIND_VAR:
> > +             case BTF_KIND_DECL_TAG:
> > +                     bt->type = btf_get_mapped_type(btf, maps, bt->type);
> > +                     break;
>
> [...]
Donglin Peng June 15, 2024, 2:59 p.m. UTC | #5
On Sat, Jun 15, 2024 at 7:49 PM Donglin Peng <dolinux.peng@gmail.com> wrote:
>
> On Tue, Jun 11, 2024 at 6:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Sat, 2024-06-08 at 07:08 -0700, Donglin Peng wrote:
> >
> > [...]
> >
> > > Changes in RFC v3:
> > >  - Sort the btf types during the build process in order to reduce memory usage
> > >    and decrease boot time.
> > >
> > > RFC v2:
> > >  - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@sangfor.com.cn
> > > ---
> > >  include/linux/btf.h |   1 +
> > >  kernel/bpf/btf.c    | 160 +++++++++++++++++++++++++++++++++---
> >
> > I think that kernel part is in a good shape,
> > please split it as a separate commit.
>
> Okay, thanks.
>
> >
> > >  tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 345 insertions(+), 11 deletions(-)
> >
> > [...]
> >
> > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > index 2d0840ef599a..93c1ab677bfa 100644
> >
> > I'm not sure that libbpf is the best place to put this functionality,
> > as there might be different kinds of orderings
> > (e.g. see a fresh commit to bpftool to output stable vmlinux.h:
> >  94133cf24bb3 "bpftool: Introduce btf c dump sorting").
>
> Thanks, I think it would be better to put it into the libbpf. However, I would
> also like to hear the opinions of others.
>
> >
> > I'm curious what Andrii, Alan and Arnaldo think on libbpf vs pahole
> > for this feature.
> >
> > Also, I have a selftests build failure with this patch-set
> > (and I suspect that a bunch of dedup test cases would need an update):

Yes,many test cases need to be updated as the BTF layout is modified
unconditionally.

>
> I appologize for the bug in my patch that caused the issue. I will fix it.
>
> >
> > $ pwd
> > /home/eddy/work/bpf-next/tools/testing/selftests/bpf
> > $ make -j14 test_progs
> > ...
> >
> >   GEN-SKEL [test_progs] access_map_in_map.skel.h
> > Binary files /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked2.o and /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.bpf.linked3.o differ
> > make: *** [Makefile:658: /home/eddy/work/bpf-next/tools/testing/selftests/bpf/access_map_in_map.skel.h] Error 1
> > make: *** Waiting for unfinished jobs....
>
> Sorry, I neglected to perform an ID remap for the btf_types in the BTF.ext
> section. I will fix it.
>
> >
> > If this change remains in libbpf, I think it would be great to update
> > btf_find_by_name_kind() to work the same way as kernel one.
>
> Sounds good, we might do it later.
>
> >
> > > --- a/tools/lib/bpf/btf.c
> > > +++ b/tools/lib/bpf/btf.c
> >
> > [...]
> >
> > > +static int btf_sort_type_by_name(struct btf *btf)
> > > +{
> > > +     struct btf_type *bt;
> > > +     __u32 *new_type_offs = NULL, *new_type_offs_noname = NULL;
> > > +     __u32 *maps = NULL, *found_offs;
> > > +     void *new_types_data = NULL, *loc_data;
> > > +     int i, j, k, type_cnt, ret = 0, type_size;
> > > +     __u32 data_size;
> > > +
> > > +     if (btf_ensure_modifiable(btf))
> > > +             return libbpf_err(-ENOMEM);
> > > +
> > > +     type_cnt = btf->nr_types;
> > > +     data_size = btf->type_offs_cap * sizeof(*new_type_offs);
> > > +
> > > +     maps = (__u32 *)malloc(type_cnt * sizeof(__u32));
> > > +     if (!maps) {
> > > +             ret = -ENOMEM;
> > > +             goto err_out;
> > > +     }
> > > +
> > > +     new_type_offs = (__u32 *)malloc(data_size);
> > > +     if (!new_type_offs) {
> > > +             ret = -ENOMEM;
> > > +             goto err_out;
> > > +     }
> > > +
> > > +     new_type_offs_noname = (__u32 *)malloc(data_size);
> > > +     if (!new_type_offs_noname) {
> > > +             ret = -ENOMEM;
> > > +             goto err_out;
> > > +     }
> >
> > What is the point of separating offsets in new_type_offs vs
> > new_type_offs_noname? It should be possible to use a single offsets
> > array and have a comparison function that puts all named types before
> > unnamed.
>
> Great, you are right.
>
> >
> > > +
> > > +     new_types_data = malloc(btf->types_data_cap);
> > > +     if (!new_types_data) {
> > > +             ret = -ENOMEM;
> > > +             goto err_out;
> > > +     }
> > > +
> > > +     memset(new_type_offs, 0, data_size);
> > > +
> > > +     for (i = 0, j = 0, k = 0; i < type_cnt; i++) {
> > > +             const char *name;
> > > +
> > > +             bt = (struct btf_type *)(btf->types_data + btf->type_offs[i]);
> > > +             name = btf__str_by_offset(btf, bt->name_off);
> > > +             if (!name || !name[0])
> > > +                     new_type_offs_noname[k++] = btf->type_offs[i];
> > > +             else
> > > +                     new_type_offs[j++] = btf->type_offs[i];
> > > +     }
> > > +
> > > +     memmove(new_type_offs + j, new_type_offs_noname, sizeof(__u32) * k);
> > > +
> > > +     qsort_r(new_type_offs, j, sizeof(*new_type_offs),
> > > +             btf_compare_type_name, btf);
> > > +
> > > +     for (i = 0; i < type_cnt; i++) {
> > > +             found_offs = bsearch(&new_type_offs[i], btf->type_offs, type_cnt,
> > > +                                     sizeof(__u32), btf_compare_offs);
> > > +             if (!found_offs) {
> > > +                     ret = -EINVAL;
> > > +                     goto err_out;
> > > +             }
> > > +             maps[found_offs - btf->type_offs] = i;
> > > +     }
> > > +
> > > +     loc_data = new_types_data;
> > > +     for (i = 0; i < type_cnt; i++, loc_data += type_size) {
> > > +             bt = (struct btf_type *)(btf->types_data + new_type_offs[i]);
> > > +             type_size = btf_type_size(bt);
> > > +             if (type_size < 0) {
> > > +                     ret = type_size;
> > > +                     goto err_out;
> > > +             }
> > > +
> > > +             memcpy(loc_data, bt, type_size);
> > > +
> > > +             bt = (struct btf_type *)loc_data;
> > > +             switch (btf_kind(bt)) {
> >
> > Please take a look at btf_dedup_remap_types(): it uses newly added
> > iterator interface to enumerate all ID references in the type.
> > It could be used here to avoid enumerating every BTF kind.
> > Also, the d->hypot_map could be used instead of `maps`.
> > And if so, I think that it should be possible to put this pass before
> > btf_dedup_remap_types() in order for it to do the remapping.
>
> Thank you. I will revise the code.
>
> >
> > Alternatively, it might make sense to merge this pass with
> > btf_dedup_compact_types() in order to minimize number of operations,
> > e.g. as in my crude attempt:
> > https://github.com/eddyz87/bpf/tree/binsort-btf-dedup

Could you please provide me with the patch?

>
> Thank you. I would refer to your patch.
>
> > (fails with similar selftests issue).
>
> In addition to the bug in my patch, I have also identified a bug in
> linker_fixup_btf
> in the libbpf. After resolving the issue, the selftests successfully
> passed, and I will
> create a new patch to address the bug.

After fixing the bug, the "make test_progs" command passes
successfully. However,
the dedup test cases are still failing.

>
> >
> > > +             case BTF_KIND_PTR:
> > > +             case BTF_KIND_CONST:
> > > +             case BTF_KIND_VOLATILE:
> > > +             case BTF_KIND_RESTRICT:
> > > +             case BTF_KIND_TYPEDEF:
> > > +             case BTF_KIND_TYPE_TAG:
> > > +             case BTF_KIND_FUNC:
> > > +             case BTF_KIND_VAR:
> > > +             case BTF_KIND_DECL_TAG:
> > > +                     bt->type = btf_get_mapped_type(btf, maps, bt->type);
> > > +                     break;
> >
> > [...]
Alan Maguire June 17, 2024, 2:18 p.m. UTC | #6
On 15/06/2024 15:59, Donglin Peng wrote:
> On Sat, Jun 15, 2024 at 7:49 PM Donglin Peng <dolinux.peng@gmail.com> wrote:
>>
>> On Tue, Jun 11, 2024 at 6:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>
>>> On Sat, 2024-06-08 at 07:08 -0700, Donglin Peng wrote:
>>>
>>> [...]
>>>
>>>> Changes in RFC v3:
>>>>  - Sort the btf types during the build process in order to reduce memory usage
>>>>    and decrease boot time.
>>>>
>>>> RFC v2:
>>>>  - https://lore.kernel.org/all/20230909091646.420163-1-pengdonglin@sangfor.com.cn
>>>> ---
>>>>  include/linux/btf.h |   1 +
>>>>  kernel/bpf/btf.c    | 160 +++++++++++++++++++++++++++++++++---
>>>
>>> I think that kernel part is in a good shape,
>>> please split it as a separate commit.
>>
>> Okay, thanks.
>>
>>>
>>>>  tools/lib/bpf/btf.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 345 insertions(+), 11 deletions(-)
>>>
>>> [...]
>>>
>>>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
>>>> index 2d0840ef599a..93c1ab677bfa 100644
>>>
>>> I'm not sure that libbpf is the best place to put this functionality,
>>> as there might be different kinds of orderings
>>> (e.g. see a fresh commit to bpftool to output stable vmlinux.h:
>>>  94133cf24bb3 "bpftool: Introduce btf c dump sorting").
>>
>> Thanks, I think it would be better to put it into the libbpf. However, I would
>> also like to hear the opinions of others.
>>
>>>
>>> I'm curious what Andrii, Alan and Arnaldo think on libbpf vs pahole
>>> for this feature.
>>>
>>> Also, I have a selftests build failure with this patch-set
>>> (and I suspect that a bunch of dedup test cases would need an update):
> 
> Yes,many test cases need to be updated as the BTF layout is modified
> unconditionally.
>

If the plan is to fold the sorting into dedup, pahole will inherit it by
default I suppose. Would it be worth making sorting optional (or at
least providing a way to switch if off) via a dedup_opts option? If we
had an on/off switch we could control sorting via a --btf_features
option to pahole.

One thing we lose with sorting is that currently the base and often-used
types tend to cluster at initial BTF ids, so in some cases linear
searches find what they're looking for pretty quickly. Would it be worth
maintaining a name-sorted index for BTF perhaps? That would mean not
changing type id order (so linear search is unaffected), but for
btf_find_by_name_kind() searches the index could be used.

See the btf_relocate.c code at [1] for an example of this where a
name-based sort index is constructed for the smaller distilled base BTF.

[1]
https://lore.kernel.org/bpf/20240613095014.357981-4-alan.maguire@oracle.com/
Eduard Zingerman June 17, 2024, 10:27 p.m. UTC | #7
On Mon, 2024-06-17 at 15:18 +0100, Alan Maguire wrote:

[...]

> If the plan is to fold the sorting into dedup, pahole will inherit it by
> default I suppose. Would it be worth making sorting optional (or at
> least providing a way to switch if off) via a dedup_opts option? If we
> had an on/off switch we could control sorting via a --btf_features
> option to pahole.

I'd avoid adding more flags if not absolutely necessary.

> One thing we lose with sorting is that currently the base and often-used
> types tend to cluster at initial BTF ids, so in some cases linear
> searches find what they're looking for pretty quickly. Would it be worth
> maintaining a name-sorted index for BTF perhaps? That would mean not
> changing type id order (so linear search is unaffected), but for
> btf_find_by_name_kind() searches the index could be used.

Instrumented kernel code as follows:

--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -555,10 +555,13 @@ s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
                        continue;
 
                tname = btf_name_by_offset(btf, t->name_off);
-               if (!strcmp(tname, name))
+               if (!strcmp(tname, name)) {
+                       printk("btf_find_by_name_kind: kind=%d, name='%s', id=%d\n", kind, name, i);
                        return i;
+               }
        }
 
+       printk("btf_find_by_name_kind: kind=%d, name='%s', id=-1\n", kind, name);
        return -ENOENT;
 }

And analyzed the log generated when test_progs are run.
The summary of results is as follows:

| # of lookups | kind | name                | id     |
|--------------+------+---------------------+--------|
|         3480 |    4 | bpf_refcount        |     -1 |
|         3439 |    4 | bpf_rb_root         |     -1 |
|         3434 |    4 | bpf_rb_node         |     -1 |
|         3340 |    4 | bpf_list_head       |     -1 |
|         3339 |    4 | bpf_list_node       |     -1 |
|         3165 |    4 | bpf_spin_lock       |     -1 |
|          759 |    4 | foo                 |     30 |
|          659 |    4 | bpf_cpumask         |  65569 |
|          194 |    4 | prog_test_ref_kfunc |  29619 |
|          146 |    4 | bpf_spin_lock       |      8 |
|          123 |    4 | bpf_list_node       |     31 |
|          123 |    4 | bpf_list_head       |     11 |
|          116 |    4 | bar                 |     38 |
|      ...  59 rows, 10<N<100 lookups each ...       |
|      ... 227 rows,    N<10  lookups each ...       |

(24680 lookups in total)

I'd say that 'bpf_spin_lock', 'bpf_list_node' and 'bpf_list_head'
could be considered as types that would be found quickly by the linear
search. Their total share is ~1.6% of all lookups. I don't think we
should add a special case for such low number of "hot" cases.
Also, the share of -1 results is surprising.

[...]
diff mbox series

Patch

diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9e56fd12a9f..1dc1000a7dc9 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -214,6 +214,7 @@  bool btf_is_kernel(const struct btf *btf);
 bool btf_is_module(const struct btf *btf);
 struct module *btf_try_get_module(const struct btf *btf);
 u32 btf_nr_types(const struct btf *btf);
+u32 btf_type_cnt(const struct btf *btf);
 bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   const struct btf_member *m,
 			   u32 expected_offset, u32 expected_size);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 821063660d9f..5b7b464204bf 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -262,6 +262,7 @@  struct btf {
 	u32 data_size;
 	refcount_t refcnt;
 	u32 id;
+	u32 nr_types_sorted;
 	struct rcu_head rcu;
 	struct btf_kfunc_set_tab *kfunc_set_tab;
 	struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
@@ -542,23 +543,102 @@  u32 btf_nr_types(const struct btf *btf)
 	return total;
 }
 
-s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
+u32 btf_type_cnt(const struct btf *btf)
+{
+	return btf->start_id + btf->nr_types;
+}
+
+static s32 btf_find_by_name_bsearch(const struct btf *btf, const char *name,
+				    int *start, int *end)
 {
 	const struct btf_type *t;
-	const char *tname;
-	u32 i, total;
+	const char *name_buf;
+	int low, low_start, mid, high, high_end;
+	int ret, start_id;
+
+	start_id = btf->base_btf ? btf->start_id : 1;
+	low_start = low = start_id;
+	high_end = high = start_id + btf->nr_types_sorted - 1;
+
+	while (low <= high) {
+		mid = low + (high - low) / 2;
+		t = btf_type_by_id(btf, mid);
+		name_buf = btf_name_by_offset(btf, t->name_off);
+		ret = strcmp(name, name_buf);
+		if (ret > 0)
+			low = mid + 1;
+		else if (ret < 0)
+			high = mid - 1;
+		else
+			break;
+	}
 
-	total = btf_nr_types(btf);
-	for (i = 1; i < total; i++) {
-		t = btf_type_by_id(btf, i);
-		if (BTF_INFO_KIND(t->info) != kind)
-			continue;
+	if (low > high)
+		return -ESRCH;
 
-		tname = btf_name_by_offset(btf, t->name_off);
-		if (!strcmp(tname, name))
-			return i;
+	if (start) {
+		low = mid;
+		while (low > low_start) {
+			t = btf_type_by_id(btf, low-1);
+			name_buf = btf_name_by_offset(btf, t->name_off);
+			if (strcmp(name, name_buf))
+				break;
+			low--;
+		}
+		*start = low;
 	}
 
+	if (end) {
+		high = mid;
+		while (high < high_end) {
+			t = btf_type_by_id(btf, high+1);
+			name_buf = btf_name_by_offset(btf, t->name_off);
+			if (strcmp(name, name_buf))
+				break;
+			high++;
+		}
+		*end = high;
+	}
+
+	return mid;
+}
+
+s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
+{
+	const struct btf_type *t;
+	const char *tname;
+	int start, end;
+	s32 id, total;
+
+	do {
+		if (btf->nr_types_sorted) {
+			/* binary search */
+			id = btf_find_by_name_bsearch(btf, name, &start, &end);
+			if (id > 0) {
+				while (start <= end) {
+					t = btf_type_by_id(btf, start);
+					if (BTF_INFO_KIND(t->info) == kind)
+						return start;
+					start++;
+				}
+			}
+		} else {
+			/* linear search */
+			total = btf_type_cnt(btf);
+			for (id = btf->base_btf ? btf->start_id : 1;
+				id < total; id++) {
+				t = btf_type_by_id(btf, id);
+				if (BTF_INFO_KIND(t->info) != kind)
+					continue;
+
+				tname = btf_name_by_offset(btf, t->name_off);
+				if (!strcmp(tname, name))
+					return id;
+			}
+		}
+		btf = btf->base_btf;
+	} while (btf);
+
 	return -ENOENT;
 }
 
@@ -5979,6 +6059,56 @@  int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
 	return kctx_type_id;
 }
 
+static int btf_check_sort(struct btf *btf, int start_id)
+{
+	int i, n, nr_names = 0;
+
+	n = btf_nr_types(btf);
+	for (i = start_id; i < n; i++) {
+		const struct btf_type *t;
+		const char *name;
+
+		t = btf_type_by_id(btf, i);
+		if (!t)
+			return -EINVAL;
+
+		name = btf_str_by_offset(btf, t->name_off);
+		if (!str_is_empty(name))
+			nr_names++;
+	}
+
+	if (nr_names < 3)
+		goto out;
+
+	for (i = 0; i < nr_names - 1; i++) {
+		const struct btf_type *t1, *t2;
+		const char *s1, *s2;
+
+		t1 = btf_type_by_id(btf, start_id + i);
+		if (!t1)
+			return -EINVAL;
+
+		s1 = btf_str_by_offset(btf, t1->name_off);
+		if (str_is_empty(s1))
+			goto out;
+
+		t2 = btf_type_by_id(btf, start_id + i + 1);
+		if (!t2)
+			return -EINVAL;
+
+		s2 = btf_str_by_offset(btf, t2->name_off);
+		if (str_is_empty(s2))
+			goto out;
+
+		if (strcmp(s1, s2) > 0)
+			goto out;
+	}
+
+	btf->nr_types_sorted = nr_names;
+out:
+	return 0;
+}
+
 BTF_ID_LIST(bpf_ctx_convert_btf_id)
 BTF_ID(struct, bpf_ctx_convert)
 
@@ -6029,6 +6159,10 @@  struct btf *btf_parse_vmlinux(void)
 	if (err)
 		goto errout;
 
+	err = btf_check_sort(btf, 1);
+	if (err)
+		goto errout;
+
 	/* btf_parse_vmlinux() runs under bpf_verifier_lock */
 	bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
 
@@ -6111,6 +6245,10 @@  static struct btf *btf_parse_module(const char *module_name, const void *data, u
 	if (err)
 		goto errout;
 
+	err = btf_check_sort(btf, btf_nr_types(base_btf));
+	if (err)
+		goto errout;
+
 	btf_verifier_env_free(env);
 	refcount_set(&btf->refcnt, 1);
 	return btf;
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 2d0840ef599a..93c1ab677bfa 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1,6 +1,9 @@ 
 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 /* Copyright (c) 2018 Facebook */
 
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
 #include <byteswap.h>
 #include <endian.h>
 #include <stdio.h>
@@ -3072,6 +3075,7 @@  static int btf_dedup_ref_types(struct btf_dedup *d);
 static int btf_dedup_resolve_fwds(struct btf_dedup *d);
 static int btf_dedup_compact_types(struct btf_dedup *d);
 static int btf_dedup_remap_types(struct btf_dedup *d);
+static int btf_sort_type_by_name(struct btf *btf);
 
 /*
  * Deduplicate BTF types and strings.
@@ -3270,6 +3274,11 @@  int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
 		pr_debug("btf_dedup_remap_types failed:%d\n", err);
 		goto done;
 	}
+	err = btf_sort_type_by_name(btf);
+	if (err < 0) {
+		pr_debug("btf_sort_type_by_name failed:%d\n", err);
+		goto done;
+	}
 
 done:
 	btf_dedup_free(d);
@@ -5212,3 +5221,189 @@  int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
 
 	return 0;
 }
+
+static int btf_compare_type_name(const void *a, const void *b, void *priv)
+{
+	struct btf *btf = (struct btf *)priv;
+	__u32 ta = *(const __u32 *)a;
+	__u32 tb = *(const __u32 *)b;
+	struct btf_type *bta, *btb;
+	const char *na, *nb;
+
+	bta = (struct btf_type *)(btf->types_data + ta);
+	btb = (struct btf_type *)(btf->types_data + tb);
+	na = btf__str_by_offset(btf, bta->name_off);
+	nb = btf__str_by_offset(btf, btb->name_off);
+
+	return strcmp(na, nb);
+}
+
+static int btf_compare_offs(const void *o1, const void *o2)
+{
+	__u32 *offs1 = (__u32 *)o1;
+	__u32 *offs2 = (__u32 *)o2;
+
+	return *offs1 - *offs2;
+}
+
+static inline __u32 btf_get_mapped_type(struct btf *btf, __u32 *maps, __u32 type)
+{
+	if (type < btf->start_id)
+		return type;
+	return maps[type - btf->start_id] + btf->start_id;
+}
+
+/*
+ * Collect and move the btf_types with names to the start location, and
+ * sort them in ascending order by name, so we can use the binary search
+ * method.
+ */
+static int btf_sort_type_by_name(struct btf *btf)
+{
+	struct btf_type *bt;
+	__u32 *new_type_offs = NULL, *new_type_offs_noname = NULL;
+	__u32 *maps = NULL, *found_offs;
+	void *new_types_data = NULL, *loc_data;
+	int i, j, k, type_cnt, ret = 0, type_size;
+	__u32 data_size;
+
+	if (btf_ensure_modifiable(btf))
+		return libbpf_err(-ENOMEM);
+
+	type_cnt = btf->nr_types;
+	data_size = btf->type_offs_cap * sizeof(*new_type_offs);
+
+	maps = (__u32 *)malloc(type_cnt * sizeof(__u32));
+	if (!maps) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+
+	new_type_offs = (__u32 *)malloc(data_size);
+	if (!new_type_offs) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+
+	new_type_offs_noname = (__u32 *)malloc(data_size);
+	if (!new_type_offs_noname) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+
+	new_types_data = malloc(btf->types_data_cap);
+	if (!new_types_data) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+
+	memset(new_type_offs, 0, data_size);
+
+	for (i = 0, j = 0, k = 0; i < type_cnt; i++) {
+		const char *name;
+
+		bt = (struct btf_type *)(btf->types_data + btf->type_offs[i]);
+		name = btf__str_by_offset(btf, bt->name_off);
+		if (!name || !name[0])
+			new_type_offs_noname[k++] = btf->type_offs[i];
+		else
+			new_type_offs[j++] = btf->type_offs[i];
+	}
+
+	memmove(new_type_offs + j, new_type_offs_noname, sizeof(__u32) * k);
+
+	qsort_r(new_type_offs, j, sizeof(*new_type_offs),
+		btf_compare_type_name, btf);
+
+	for (i = 0; i < type_cnt; i++) {
+		found_offs = bsearch(&new_type_offs[i], btf->type_offs, type_cnt,
+					sizeof(__u32), btf_compare_offs);
+		if (!found_offs) {
+			ret = -EINVAL;
+			goto err_out;
+		}
+		maps[found_offs - btf->type_offs] = i;
+	}
+
+	loc_data = new_types_data;
+	for (i = 0; i < type_cnt; i++, loc_data += type_size) {
+		bt = (struct btf_type *)(btf->types_data + new_type_offs[i]);
+		type_size = btf_type_size(bt);
+		if (type_size < 0) {
+			ret = type_size;
+			goto err_out;
+		}
+
+		memcpy(loc_data, bt, type_size);
+
+		bt = (struct btf_type *)loc_data;
+		switch (btf_kind(bt)) {
+		case BTF_KIND_PTR:
+		case BTF_KIND_CONST:
+		case BTF_KIND_VOLATILE:
+		case BTF_KIND_RESTRICT:
+		case BTF_KIND_TYPEDEF:
+		case BTF_KIND_TYPE_TAG:
+		case BTF_KIND_FUNC:
+		case BTF_KIND_VAR:
+		case BTF_KIND_DECL_TAG:
+			bt->type = btf_get_mapped_type(btf, maps, bt->type);
+			break;
+		case BTF_KIND_ARRAY: {
+			struct btf_array *arr = (void *)(bt + 1);
+
+			arr->type = btf_get_mapped_type(btf, maps, arr->type);
+			arr->index_type = btf_get_mapped_type(btf, maps, arr->index_type);
+			break;
+		}
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION: {
+			struct btf_member *m = (void *)(bt + 1);
+			__u16 vlen = BTF_INFO_VLEN(bt->info);
+
+			for (j = 0; j < vlen; j++, m++)
+				m->type = btf_get_mapped_type(btf, maps, m->type);
+			break;
+		}
+		case BTF_KIND_FUNC_PROTO: {
+			struct btf_param *p = (void *)(bt + 1);
+			__u16 vlen = BTF_INFO_VLEN(bt->info);
+
+			bt->type = btf_get_mapped_type(btf, maps, bt->type);
+			for (j = 0; j < vlen; j++, p++)
+				p->type = btf_get_mapped_type(btf, maps, p->type);
+
+			break;
+		}
+		case BTF_KIND_DATASEC: {
+			struct btf_var_secinfo *v = (void *)(bt + 1);
+			__u16 vlen = BTF_INFO_VLEN(bt->info);
+
+			for (j = 0; j < vlen; j++, v++)
+				v->type = btf_get_mapped_type(btf, maps, v->type);
+			break;
+		}
+		default:
+			break;
+		}
+	}
+
+	free(btf->type_offs);
+	btf->type_offs = new_type_offs;
+	free(btf->types_data);
+	btf->types_data = new_types_data;
+	free(maps);
+	free(new_type_offs_noname);
+	return 0;
+
+err_out:
+	if (maps)
+		free(maps);
+	if (new_type_offs)
+		free(new_type_offs);
+	if (new_type_offs_noname)
+		free(new_type_offs_noname);
+	if (new_types_data)
+		free(new_types_data);
+	return libbpf_err(ret);
+}