Message ID | 20240828174608.377204-2-ihor.solodrai@pm.me (mailing list archive) |
---|---|
State | Accepted |
Commit | 2ad6d23f465a4f851e3bcf6d74c315ce7b2c205b |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,1/2] selftests/bpf: specify libbpf headers required for %.bpf.o progs | expand |
On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote: > %.bpf.o objects depend on vmlinux.h, which makes them transitively > dependent on unnecessary libbpf headers. However vmlinux.h doesn't > actually change as often. > > When generating vmlinux.h, compare it to a previous version and update > it only if there are changes. > > Example of build time improvement (after first clean build): > $ touch ../../../lib/bpf/bpf.h > $ time make -j8 > Before: real 1m37.592s > After: real 0m27.310s > > Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. > > Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com > > Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> > --- Unfortunately, I think that this is a half-measure. E.g. the following command forces tests rebuild for me: touch ../../../../kernel/bpf/verifier.c; \ make -j22 -C ../../../../; \ time make test_progs To workaround this we need to enable reproducible_build option: diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf index b75f09f3f424..8cd648f3e32b 100644 --- a/scripts/Makefile.btf +++ b/scripts/Makefile.btf @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsis else # Switch to using --btf_features for v1.26 and later. -pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build ifneq ($(KBUILD_EXTMOD),) module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base Question to the mailing list: do we want this? [...]
On Wed, Aug 28, 2024 at 3:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote: > > %.bpf.o objects depend on vmlinux.h, which makes them transitively > > dependent on unnecessary libbpf headers. However vmlinux.h doesn't > > actually change as often. > > > > When generating vmlinux.h, compare it to a previous version and update > > it only if there are changes. > > > > Example of build time improvement (after first clean build): > > $ touch ../../../lib/bpf/bpf.h > > $ time make -j8 > > Before: real 1m37.592s > > After: real 0m27.310s > > > > Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. > > > > Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com > > > > Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> > > --- > > Unfortunately, I think that this is a half-measure. > E.g. the following command forces tests rebuild for me: > > touch ../../../../kernel/bpf/verifier.c; \ > make -j22 -C ../../../../; \ > time make test_progs > > To workaround this we need to enable reproducible_build option: > > diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf > index b75f09f3f424..8cd648f3e32b 100644 > --- a/scripts/Makefile.btf > +++ b/scripts/Makefile.btf > @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsis > else > > # Switch to using --btf_features for v1.26 and later. > -pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs > +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build > > ifneq ($(KBUILD_EXTMOD),) > module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base > > Question to the mailing list: do we want this? I don't think so. Too drastic just to save rebuild time. imo the patch is good enough as-is.
On Wed, Aug 28, 2024 at 3:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote: > > %.bpf.o objects depend on vmlinux.h, which makes them transitively > > dependent on unnecessary libbpf headers. However vmlinux.h doesn't > > actually change as often. > > > > When generating vmlinux.h, compare it to a previous version and update > > it only if there are changes. > > > > Example of build time improvement (after first clean build): > > $ touch ../../../lib/bpf/bpf.h > > $ time make -j8 > > Before: real 1m37.592s > > After: real 0m27.310s > > > > Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. > > > > Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com > > > > Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> > > --- > > Unfortunately, I think that this is a half-measure. > E.g. the following command forces tests rebuild for me: > > touch ../../../../kernel/bpf/verifier.c; \ > make -j22 -C ../../../../; \ > time make test_progs > > To workaround this we need to enable reproducible_build option: > > diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf > index b75f09f3f424..8cd648f3e32b 100644 > --- a/scripts/Makefile.btf > +++ b/scripts/Makefile.btf > @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsis > else > > # Switch to using --btf_features for v1.26 and later. > -pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs > +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build > > ifneq ($(KBUILD_EXTMOD),) > module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base > > Question to the mailing list: do we want this? Alan, can you please give us a summary of what are the consequences of the reproducible_build pahole option? In terms of performance and otherwise. I've applied patches as is, despite them not solving the issue completely, as they are moving us in the right direction anyways. I do get slightly different BTF every single time I rebuild my kernel, so the change in patch #2 doesn't yet help me. For libbpf headers, Ihor, can you please follow up with adding bpf_helper_defs.h as a dependency? I have some ideas on how to make BTF regeneration in vmlinux.h itself unnecessary, that might help with this issue. Separately (depending on what are the negatives of the reproducible_build option) we can look into making pahole have more consistent internal BTF type ordering without negatively affecting the overall BTF dedup performance in pahole. Hopefully I can work with Ihor on this as follow ups. P.S. I also spent more time than I'm willing to admit trying to improve bpftool's BTF sorting to minimize the chance of vmlinux.h contents being different, and I think I removed a bunch of cases where we had unnecessary differences, but still, it's fundamentally non-deterministic to do everything based on type and field names, unfortunately. Anyways, Mykyta (cc'ed), what do you think about the changes below? Note that I'm also fixing the incorrect handling of enum64 (would be nice to prepare a proper patch and send it upstream, if you get a chance). diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c index 6789c7a4d5ca..e8a244b09d56 100644 --- a/tools/bpf/bpftool/btf.c +++ b/tools/bpf/bpftool/btf.c @@ -50,6 +50,7 @@ struct sort_datum { int type_rank; const char *sort_name; const char *own_name; + __u64 disambig_hash; }; static const char *btf_int_enc_str(__u8 encoding) @@ -552,35 +553,92 @@ static int btf_type_rank(const struct btf *btf, __u32 index, bool has_name) } } -static const char *btf_type_sort_name(const struct btf *btf, __u32 index, bool from_ref) +static const char *btf_type_sort_name(const struct btf *btf, __u32 index, bool from_ref, const char *typedef_name) { const struct btf_type *t = btf__type_by_id(btf, index); + int name_off; switch (btf_kind(t)) { case BTF_KIND_ENUM: - case BTF_KIND_ENUM64: { - int name_off = t->name_off; - /* Use name of the first element for anonymous enums if allowed */ if (!from_ref && !t->name_off && btf_vlen(t)) name_off = btf_enum(t)->name_off; + else + name_off = t->name_off; + + return btf__name_by_offset(btf, name_off); + case BTF_KIND_ENUM64: + /* Use name of the first element for anonymous enums if allowed */ + if (!from_ref && !t->name_off && btf_vlen(t)) + name_off = btf_enum64(t)->name_off; + else + name_off = t->name_off; return btf__name_by_offset(btf, name_off); - } case BTF_KIND_ARRAY: - return btf_type_sort_name(btf, btf_array(t)->type, true); + return btf_type_sort_name(btf, btf_array(t)->type, true, typedef_name); + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + if (t->name_off == 0) + return typedef_name; + return btf__name_by_offset(btf, t->name_off); + case BTF_KIND_TYPEDEF: + return btf_type_sort_name(btf, t->type, true, + btf__name_by_offset(btf, t->name_off)); case BTF_KIND_TYPE_TAG: case BTF_KIND_CONST: case BTF_KIND_PTR: case BTF_KIND_VOLATILE: case BTF_KIND_RESTRICT: - case BTF_KIND_TYPEDEF: case BTF_KIND_DECL_TAG: - return btf_type_sort_name(btf, t->type, true); + return btf_type_sort_name(btf, t->type, true, typedef_name); default: return btf__name_by_offset(btf, t->name_off); } - return NULL; +} + +static __u64 hasher(__u64 hash, __u64 val) +{ + return hash * 31 + val; +} + +static __u64 btf_type_disambig_hash(const struct btf *btf, __u32 index) +{ + const struct btf_type *t = btf__type_by_id(btf, index); + int i; + size_t hash = 0; + + switch (btf_kind(t)) { + case BTF_KIND_ENUM: + hash = hasher(hash, t->size); + for (i = 0; i < btf_vlen(t); i++) + hash = hasher(hash, btf_enum(t)[i].name_off); + break; + case BTF_KIND_ENUM64: + hash = hasher(hash, t->size); + for (i = 0; i < btf_vlen(t); i++) + hash = hasher(hash, btf_enum64(t)[i].name_off); + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: { + const struct btf_member *m; + const char *ftname; + + hash = hasher(hash, t->size); + for (i = 0; i < btf_vlen(t); i++) { + m = btf_members(t) + i; + hash = hasher(hash, m->name_off); + + /* resolve field type's name and hash it as well */ + ftname = btf_type_sort_name(btf, m->type, false, ""); + hash = hasher(hash, str_hash(ftname)); + } + break; + } + default: + break; + } + return hash; } static int btf_type_compare(const void *left, const void *right) @@ -596,7 +654,14 @@ static int btf_type_compare(const void *left, const void *right) if (r) return r; - return strcmp(d1->own_name, d2->own_name); + r = strcmp(d1->own_name, d2->own_name); + if (r) + return r; + + if (d1->disambig_hash != d2->disambig_hash) + return d1->disambig_hash < d2->disambig_hash ? -1 : 1; + + return d1->index < d2->index ? -1 : 1; } static struct sort_datum *sort_btf_c(const struct btf *btf) @@ -615,8 +680,9 @@ static struct sort_datum *sort_btf_c(const struct btf *btf) d->index = i; d->type_rank = btf_type_rank(btf, i, false); - d->sort_name = btf_type_sort_name(btf, i, false); + d->sort_name = btf_type_sort_name(btf, i, false, ""); d->own_name = btf__name_by_offset(btf, t->name_off); + d->disambig_hash = btf_type_disambig_hash(btf, i); } qsort(datums, n, sizeof(struct sort_datum), btf_type_compare); > > [...] >
On 30/08/2024 21:34, Andrii Nakryiko wrote: > On Wed, Aug 28, 2024 at 3:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: >> >> On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote: >>> %.bpf.o objects depend on vmlinux.h, which makes them transitively >>> dependent on unnecessary libbpf headers. However vmlinux.h doesn't >>> actually change as often. >>> >>> When generating vmlinux.h, compare it to a previous version and update >>> it only if there are changes. >>> >>> Example of build time improvement (after first clean build): >>> $ touch ../../../lib/bpf/bpf.h >>> $ time make -j8 >>> Before: real 1m37.592s >>> After: real 0m27.310s >>> >>> Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. >>> >>> Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com >>> >>> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> >>> --- >> >> Unfortunately, I think that this is a half-measure. >> E.g. the following command forces tests rebuild for me: >> >> touch ../../../../kernel/bpf/verifier.c; \ >> make -j22 -C ../../../../; \ >> time make test_progs >> >> To workaround this we need to enable reproducible_build option: >> >> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf >> index b75f09f3f424..8cd648f3e32b 100644 >> --- a/scripts/Makefile.btf >> +++ b/scripts/Makefile.btf >> @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsis >> else >> >> # Switch to using --btf_features for v1.26 and later. >> -pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs >> +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build >> >> ifneq ($(KBUILD_EXTMOD),) >> module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base >> >> Question to the mailing list: do we want this? > > Alan, can you please give us a summary of what are the consequences of > the reproducible_build pahole option? In terms of performance and > otherwise. > Sure. The original context was that the folks trying to do reproducible builds were being impacted by the fact that BTF generation was non-deterministic when done in parallel; i.e. same kernel would give different BTF ids when rebuilding vmlinux BTF; the reason was largely as I understand it that when pahole partitioned CUs between multiple threads, that partitioning could vary. If it varied, when BTF was merged across threads we could end up with differing id assignments. Since BTF then was baked into the vmlinux binary, unstable BTF ids meant non-identical vmlinux. The first approach to solve this was to remove parallel BTF generation to support reproducibility. Arnaldo however added support that retained parallelism while supporting determinism through using the DWARF CU order. He did some great analysis on the overheads for vmlinux generation too [1]; summary is that the overhead in runtime is approx 33% versus parallel non-reproducible encoding. Those numbers might not 100% translate to the vmlinux build during kernel since it was a detached pahole generation and the options might differ slightly, but they give a sense of things. I don't _think_ there should be additional memory overheads during pahole generation (we really can't afford any more memory usage), since it's really more about making order of CU processing consistent. Would be good to get Arnaldo's perspective too if we're considering switching this on by default, as he knows this stuff much better than I do. [1] https://lore.kernel.org/dwarves/20240412211604.789632-12-acme@kernel.org/
On 30/08/2024 21:34, Andrii Nakryiko wrote: > On Wed, Aug 28, 2024 at 3:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: >> On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote: >>> %.bpf.o objects depend on vmlinux.h, which makes them transitively >>> dependent on unnecessary libbpf headers. However vmlinux.h doesn't >>> actually change as often. >>> >>> When generating vmlinux.h, compare it to a previous version and update >>> it only if there are changes. >>> >>> Example of build time improvement (after first clean build): >>> $ touch ../../../lib/bpf/bpf.h >>> $ time make -j8 >>> Before: real 1m37.592s >>> After: real 0m27.310s >>> >>> Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. >>> >>> Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com >>> >>> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> >>> --- >> Unfortunately, I think that this is a half-measure. >> E.g. the following command forces tests rebuild for me: >> >> touch ../../../../kernel/bpf/verifier.c; \ >> make -j22 -C ../../../../; \ >> time make test_progs >> >> To workaround this we need to enable reproducible_build option: >> >> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf >> index b75f09f3f424..8cd648f3e32b 100644 >> --- a/scripts/Makefile.btf >> +++ b/scripts/Makefile.btf >> @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsis >> else >> >> # Switch to using --btf_features for v1.26 and later. >> -pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs >> +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build >> >> ifneq ($(KBUILD_EXTMOD),) >> module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base >> >> Question to the mailing list: do we want this? > Alan, can you please give us a summary of what are the consequences of > the reproducible_build pahole option? In terms of performance and > otherwise. > > I've applied patches as is, despite them not solving the issue > completely, as they are moving us in the right direction anyways. I do > get slightly different BTF every single time I rebuild my kernel, so > the change in patch #2 doesn't yet help me. > > For libbpf headers, Ihor, can you please follow up with adding > bpf_helper_defs.h as a dependency? > > I have some ideas on how to make BTF regeneration in vmlinux.h itself > unnecessary, that might help with this issue. Separately (depending on > what are the negatives of the reproducible_build option) we can look > into making pahole have more consistent internal BTF type ordering > without negatively affecting the overall BTF dedup performance in > pahole. Hopefully I can work with Ihor on this as follow ups. > > P.S. I also spent more time than I'm willing to admit trying to > improve bpftool's BTF sorting to minimize the chance of vmlinux.h > contents being different, and I think I removed a bunch of cases where > we had unnecessary differences, but still, it's fundamentally > non-deterministic to do everything based on type and field names, > unfortunately. > > Anyways, Mykyta (cc'ed), what do you think about the changes below? > Note that I'm also fixing the incorrect handling of enum64 (would be > nice to prepare a proper patch and send it upstream, if you get a > chance). > > diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c > index 6789c7a4d5ca..e8a244b09d56 100644 > --- a/tools/bpf/bpftool/btf.c > +++ b/tools/bpf/bpftool/btf.c > @@ -50,6 +50,7 @@ struct sort_datum { > int type_rank; > const char *sort_name; > const char *own_name; > + __u64 disambig_hash; > }; > > static const char *btf_int_enc_str(__u8 encoding) > @@ -552,35 +553,92 @@ static int btf_type_rank(const struct btf *btf, > __u32 index, bool has_name) > } > } > > -static const char *btf_type_sort_name(const struct btf *btf, __u32 > index, bool from_ref) > +static const char *btf_type_sort_name(const struct btf *btf, __u32 > index, bool from_ref, const char *typedef_name) > { > const struct btf_type *t = btf__type_by_id(btf, index); > + int name_off; > > switch (btf_kind(t)) { > case BTF_KIND_ENUM: > - case BTF_KIND_ENUM64: { > - int name_off = t->name_off; > - > /* Use name of the first element for anonymous enums > if allowed */ > if (!from_ref && !t->name_off && btf_vlen(t)) > name_off = btf_enum(t)->name_off; > + else > + name_off = t->name_off; > + > + return btf__name_by_offset(btf, name_off); > + case BTF_KIND_ENUM64: > + /* Use name of the first element for anonymous enums > if allowed */ > + if (!from_ref && !t->name_off && btf_vlen(t)) > + name_off = btf_enum64(t)->name_off; > + else > + name_off = t->name_off; > > return btf__name_by_offset(btf, name_off); > - } > case BTF_KIND_ARRAY: > - return btf_type_sort_name(btf, btf_array(t)->type, true); > + return btf_type_sort_name(btf, btf_array(t)->type, > true, typedef_name); > + case BTF_KIND_STRUCT: > + case BTF_KIND_UNION: > + if (t->name_off == 0) > + return typedef_name; > + return btf__name_by_offset(btf, t->name_off); > + case BTF_KIND_TYPEDEF: > + return btf_type_sort_name(btf, t->type, true, > + btf__name_by_offset(btf, > t->name_off)); > case BTF_KIND_TYPE_TAG: > case BTF_KIND_CONST: > case BTF_KIND_PTR: > case BTF_KIND_VOLATILE: > case BTF_KIND_RESTRICT: > - case BTF_KIND_TYPEDEF: > case BTF_KIND_DECL_TAG: > - return btf_type_sort_name(btf, t->type, true); > + return btf_type_sort_name(btf, t->type, true, typedef_name); > default: > return btf__name_by_offset(btf, t->name_off); > } > - return NULL; > +} > + > +static __u64 hasher(__u64 hash, __u64 val) > +{ > + return hash * 31 + val; > +} > + > +static __u64 btf_type_disambig_hash(const struct btf *btf, __u32 index) > +{ > + const struct btf_type *t = btf__type_by_id(btf, index); > + int i; > + size_t hash = 0; > + > + switch (btf_kind(t)) { > + case BTF_KIND_ENUM: > + hash = hasher(hash, t->size); > + for (i = 0; i < btf_vlen(t); i++) > + hash = hasher(hash, btf_enum(t)[i].name_off); > + break; > + case BTF_KIND_ENUM64: > + hash = hasher(hash, t->size); > + for (i = 0; i < btf_vlen(t); i++) > + hash = hasher(hash, btf_enum64(t)[i].name_off); > + break; > + case BTF_KIND_STRUCT: > + case BTF_KIND_UNION: { > + const struct btf_member *m; > + const char *ftname; > + > + hash = hasher(hash, t->size); > + for (i = 0; i < btf_vlen(t); i++) { > + m = btf_members(t) + i; > + hash = hasher(hash, m->name_off); > + > + /* resolve field type's name and hash it as well */ > + ftname = btf_type_sort_name(btf, m->type, false, ""); > + hash = hasher(hash, str_hash(ftname)); > + } > + break; > + } > + default: > + break; > + } > + return hash; > } > > static int btf_type_compare(const void *left, const void *right) > @@ -596,7 +654,14 @@ static int btf_type_compare(const void *left, > const void *right) > if (r) > return r; > > - return strcmp(d1->own_name, d2->own_name); > + r = strcmp(d1->own_name, d2->own_name); > + if (r) > + return r; > + > + if (d1->disambig_hash != d2->disambig_hash) > + return d1->disambig_hash < d2->disambig_hash ? -1 : 1; > + > + return d1->index < d2->index ? -1 : 1; > } > > static struct sort_datum *sort_btf_c(const struct btf *btf) > @@ -615,8 +680,9 @@ static struct sort_datum *sort_btf_c(const struct btf *btf) > > d->index = i; > d->type_rank = btf_type_rank(btf, i, false); > - d->sort_name = btf_type_sort_name(btf, i, false); > + d->sort_name = btf_type_sort_name(btf, i, false, ""); > d->own_name = btf__name_by_offset(btf, t->name_off); > + d->disambig_hash = btf_type_disambig_hash(btf, i); > } > > qsort(datums, n, sizeof(struct sort_datum), btf_type_compare); > Thanks for pointing to the bug of enum64 handling. I'll create a patch. Reading the rest of the code, hashing struct/union/enum fields is introduced: this is only useful for disambiguating ordering of the anonymous structs/unions/enums. I suspect the biggest source of the issues are structs and unions, though. Are definitions like this create problems? typedef struct {...} foo_t; ? I'll check what other differences this change makes. >> [...] >>
On Fri, Aug 30, 2024 at 2:23 PM Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> wrote: > > On 30/08/2024 21:34, Andrii Nakryiko wrote: > > On Wed, Aug 28, 2024 at 3:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > >> On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote: > >>> %.bpf.o objects depend on vmlinux.h, which makes them transitively > >>> dependent on unnecessary libbpf headers. However vmlinux.h doesn't > >>> actually change as often. > >>> > >>> When generating vmlinux.h, compare it to a previous version and update > >>> it only if there are changes. > >>> > >>> Example of build time improvement (after first clean build): > >>> $ touch ../../../lib/bpf/bpf.h > >>> $ time make -j8 > >>> Before: real 1m37.592s > >>> After: real 0m27.310s > >>> > >>> Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. > >>> > >>> Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com > >>> > >>> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> > >>> --- > >> Unfortunately, I think that this is a half-measure. > >> E.g. the following command forces tests rebuild for me: > >> > >> touch ../../../../kernel/bpf/verifier.c; \ > >> make -j22 -C ../../../../; \ > >> time make test_progs > >> > >> To workaround this we need to enable reproducible_build option: > >> > >> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf > >> index b75f09f3f424..8cd648f3e32b 100644 > >> --- a/scripts/Makefile.btf > >> +++ b/scripts/Makefile.btf > >> @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsis > >> else > >> > >> # Switch to using --btf_features for v1.26 and later. > >> -pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs > >> +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build > >> > >> ifneq ($(KBUILD_EXTMOD),) > >> module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base > >> > >> Question to the mailing list: do we want this? > > Alan, can you please give us a summary of what are the consequences of > > the reproducible_build pahole option? In terms of performance and > > otherwise. > > > > I've applied patches as is, despite them not solving the issue > > completely, as they are moving us in the right direction anyways. I do > > get slightly different BTF every single time I rebuild my kernel, so > > the change in patch #2 doesn't yet help me. > > > > For libbpf headers, Ihor, can you please follow up with adding > > bpf_helper_defs.h as a dependency? > > > > I have some ideas on how to make BTF regeneration in vmlinux.h itself > > unnecessary, that might help with this issue. Separately (depending on > > what are the negatives of the reproducible_build option) we can look > > into making pahole have more consistent internal BTF type ordering > > without negatively affecting the overall BTF dedup performance in > > pahole. Hopefully I can work with Ihor on this as follow ups. > > > > P.S. I also spent more time than I'm willing to admit trying to > > improve bpftool's BTF sorting to minimize the chance of vmlinux.h > > contents being different, and I think I removed a bunch of cases where > > we had unnecessary differences, but still, it's fundamentally > > non-deterministic to do everything based on type and field names, > > unfortunately. > > > > Anyways, Mykyta (cc'ed), what do you think about the changes below? > > Note that I'm also fixing the incorrect handling of enum64 (would be > > nice to prepare a proper patch and send it upstream, if you get a > > chance). > > [...] > > static struct sort_datum *sort_btf_c(const struct btf *btf) > > @@ -615,8 +680,9 @@ static struct sort_datum *sort_btf_c(const struct btf *btf) > > > > d->index = i; > > d->type_rank = btf_type_rank(btf, i, false); > > - d->sort_name = btf_type_sort_name(btf, i, false); > > + d->sort_name = btf_type_sort_name(btf, i, false, ""); > > d->own_name = btf__name_by_offset(btf, t->name_off); > > + d->disambig_hash = btf_type_disambig_hash(btf, i); > > } > > > > qsort(datums, n, sizeof(struct sort_datum), btf_type_compare); > > > Thanks for pointing to the bug of enum64 handling. I'll create a patch. great, thanks > Reading the rest of the code, hashing struct/union/enum fields is > introduced: > this is only useful for disambiguating ordering of the anonymous > structs/unions/enums. yes, I was trying to have some sort of fixed order. What I was doing was basically: touch kernel/bpf/verifier.c && make -j$(nproc) and after that generated vmlinux.h and compared with the previous version. And every single time it differs a bit due to a) there being types that are named the same, but really are different types and b) as a consequence, different order of types will cause different ___2, ___3 suffixes, which causes tiny changes to vmlinux.h) Reproducible build would help with the ordering, but I was hoping to achieve similar effect with bpftool (and I did remove some of the unnecessary changes, but not all). > > I suspect the biggest source of the issues are structs and unions, though. > Are definitions like this create problems? > typedef struct {...} foo_t; > ? > I'll check what other differences this change makes. just try the kernel rebuild and vmlinux.h generation steps above, you'll see :) > >> [...] > >> >
On Fri, Aug 30, 2024 at 10:03:40PM +0100, Alan Maguire wrote: > On 30/08/2024 21:34, Andrii Nakryiko wrote: > > On Wed, Aug 28, 2024 at 3:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > >> > >> On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote: > >>> %.bpf.o objects depend on vmlinux.h, which makes them transitively > >>> dependent on unnecessary libbpf headers. However vmlinux.h doesn't > >>> actually change as often. > >>> > >>> When generating vmlinux.h, compare it to a previous version and update > >>> it only if there are changes. > >>> > >>> Example of build time improvement (after first clean build): > >>> $ touch ../../../lib/bpf/bpf.h > >>> $ time make -j8 > >>> Before: real 1m37.592s > >>> After: real 0m27.310s > >>> > >>> Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. > >>> > >>> Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com > >>> > >>> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> > >>> --- > >> > >> Unfortunately, I think that this is a half-measure. > >> E.g. the following command forces tests rebuild for me: > >> > >> touch ../../../../kernel/bpf/verifier.c; \ > >> make -j22 -C ../../../../; \ > >> time make test_progs > >> > >> To workaround this we need to enable reproducible_build option: > >> > >> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf > >> index b75f09f3f424..8cd648f3e32b 100644 > >> --- a/scripts/Makefile.btf > >> +++ b/scripts/Makefile.btf > >> @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsis > >> else > >> > >> # Switch to using --btf_features for v1.26 and later. > >> -pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs > >> +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build > >> > >> ifneq ($(KBUILD_EXTMOD),) > >> module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base > >> > >> Question to the mailing list: do we want this? > > > > Alan, can you please give us a summary of what are the consequences of > > the reproducible_build pahole option? In terms of performance and > > otherwise. > > > > Sure. The original context was that the folks trying to do reproducible > builds were being impacted by the fact that BTF generation was > non-deterministic when done in parallel; i.e. same kernel would give > different BTF ids when rebuilding vmlinux BTF; the reason was largely as > I understand it that when pahole partitioned CUs between multiple > threads, that partitioning could vary. If it varied, when BTF was merged > across threads we could end up with differing id assignments. Since BTF > then was baked into the vmlinux binary, unstable BTF ids meant > non-identical vmlinux. > > The first approach to solve this was to remove parallel BTF generation > to support reproducibility. Arnaldo however added support that retained > parallelism while supporting determinism through using the DWARF CU > order. He did some great analysis on the overheads for vmlinux > generation too [1]; summary is that the overhead in runtime is approx > 33% versus parallel non-reproducible encoding. Those numbers might not > 100% translate to the vmlinux build during kernel since it was a > detached pahole generation and the options might differ slightly, but > they give a sense of things. I don't _think_ there should be additional > memory overheads during pahole generation (we really can't afford any > more memory usage), since it's really more about making order of CU > processing consistent. > > Would be good to get Arnaldo's perspective too if we're considering > switching this on by default, as he knows this stuff much better than I do. You described it nicely! And on top of that there was recent work that will be available in 1.28 to reduce the memory footprint of pahole, using it to find things to pack in itself, reducing the number of allocations, not keeping unreferenced CUs around (you did it), part of the work to have it working on 32-bit architectures, where we had reports of it not working. There is certainly more optimizations to be made to reduce its memory footprint while allowing it to run in parallel, but at this point it seems to have addressed the problems that were reported. More people trying it and measuring the impacts, to confirm the tests and analysis we did and you alluded too can be only a good thing in getting us all informed and confortable with using this option by default. BTW, we have now a tests/ directory with a regression test for this feature and another for the --prettify feature in pahole (use DWARF or BTF to pretty print raw data with several tricks on finding the right data structure based on enumerations when the conventions used in a project allow for that, that is the case with tools/perf, also for using header sizes to traverse variable sized records, etc), please see: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/tests And: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=fd14dc67cb6aaead553074afb4a1ddad10209892 https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=be638365781ed0c843249c5bcebe90a01e74b2fe This should help us in detecting problems earlier, brownie point to whoever gets this hooked up in CI systems, existing or new 8-) Thanks, - Arnaldo > [1] > https://lore.kernel.org/dwarves/20240412211604.789632-12-acme@kernel.org/
Andrii, Eduard, On Friday, August 30th, 2024 at 1:34 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: [...] > I've applied patches as is, despite them not solving the issue > completely, as they are moving us in the right direction anyways. I do > get slightly different BTF every single time I rebuild my kernel, so > the change in patch #2 doesn't yet help me. Thanks for applying the patches. I didn't realize vmlinux.h generation is non-deterministic. Interesting. > > For libbpf headers, Ihor, can you please follow up with adding > bpf_helper_defs.h as a dependency? I've tried tracking down where bpf_helper_defs.h is coming from and (assuming my analysis is correct) this header is generated by `scripts/bpf_doc.py`. From the selftests/bpf point of view the dependency chain is as follows: 1. vmlinux.h depends on bpftool: $(INCLUDE_DIR)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL) | $(INCLUDE_DIR) 2. bpftool is installed for selftests via `make -C tools/bpf/bpftool install-bin`: BPFTOOL ?= $(DEFAULT_BPFTOOL) $(DEFAULT_BPFTOOL): ... $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) ... install-bin 3. bpftool install-bin depends on libbpf: $(OUTPUT)bpftool: $(OBJS) $(LIBBPF) ... install-bin: $(OUTPUT)bpftool 4. $(LIBBPF) recipe runs `make -C tools/lib/bpf install_headers`, which depends on $(BPF_GENERATED) which equals to $(BPF_HELPER_DEFS) BPF_GENERATED := $(BPF_HELPER_DEFS) ... install_headers: $(BPF_GENERATED) $(INSTALL_SRC_HDRS) $(INSTALL_GEN_HDRS) 5. Finally $(BPF_HELPER_DEFS) recipe executes the python script (in lib/bpf): $(BPF_HELPER_DEFS): $(srctree)/tools/include/uapi/linux/bpf.h $(QUIET_GEN)$(srctree)/scripts/bpf_doc.py --header \ --file $(srctree)/tools/include/uapi/linux/bpf.h > $(BPF_HELPER_DEFS) I don't see any benefit to adding bpf_helper_defs.h as a direct dependency of anything in selftests/bpf. %.bpf.o already depend on vmlinux.h, and unless we somehow get rid of vmlinux.h dependency on bpftool, bpf_helper_defs.h should always be there at a point when %.bpf.o objects are compiled. > > I have some ideas on how to make BTF regeneration in vmlinux.h itself > unnecessary, that might help with this issue. Separately (depending on > what are the negatives of the reproducible_build option) we can look > into making pahole have more consistent internal BTF type ordering > without negatively affecting the overall BTF dedup performance in > pahole. Hopefully I can work with Ihor on this as follow ups. I still know little about how all this machinery works, but I'd be glad to help. > > P.S. I also spent more time than I'm willing to admit trying to > improve bpftool's BTF sorting to minimize the chance of vmlinux.h > contents being different, and I think I removed a bunch of cases where > we had unnecessary differences, but still, it's fundamentally > non-deterministic to do everything based on type and field names, > unfortunately. [...]
On Sat, Aug 31, 2024 at 11:18 AM Ihor Solodrai <ihor.solodrai@pm.me> wrote: > > Andrii, Eduard, > > On Friday, August 30th, 2024 at 1:34 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > [...] > > > I've applied patches as is, despite them not solving the issue > > completely, as they are moving us in the right direction anyways. I do > > get slightly different BTF every single time I rebuild my kernel, so > > the change in patch #2 doesn't yet help me. > > Thanks for applying the patches. > I didn't realize vmlinux.h generation is non-deterministic. Interesting. > > > > > For libbpf headers, Ihor, can you please follow up with adding > > bpf_helper_defs.h as a dependency? > > I've tried tracking down where bpf_helper_defs.h is coming from and > (assuming my analysis is correct) this header is generated by > `scripts/bpf_doc.py`. From the selftests/bpf point of view the > dependency chain is as follows: > > 1. vmlinux.h depends on bpftool: > > $(INCLUDE_DIR)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL) | $(INCLUDE_DIR) > > 2. bpftool is installed for selftests via `make -C tools/bpf/bpftool install-bin`: > > BPFTOOL ?= $(DEFAULT_BPFTOOL) > $(DEFAULT_BPFTOOL): ... > $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) ... install-bin > > 3. bpftool install-bin depends on libbpf: > > $(OUTPUT)bpftool: $(OBJS) $(LIBBPF) > ... > install-bin: $(OUTPUT)bpftool > > > 4. $(LIBBPF) recipe runs `make -C tools/lib/bpf install_headers`, > which depends on $(BPF_GENERATED) which equals to $(BPF_HELPER_DEFS) > > BPF_GENERATED := $(BPF_HELPER_DEFS) > ... > install_headers: $(BPF_GENERATED) $(INSTALL_SRC_HDRS) $(INSTALL_GEN_HDRS) > > 5. Finally $(BPF_HELPER_DEFS) recipe executes the python script (in lib/bpf): > > $(BPF_HELPER_DEFS): $(srctree)/tools/include/uapi/linux/bpf.h > $(QUIET_GEN)$(srctree)/scripts/bpf_doc.py --header \ > --file $(srctree)/tools/include/uapi/linux/bpf.h > $(BPF_HELPER_DEFS) > > > I don't see any benefit to adding bpf_helper_defs.h as a direct > dependency of anything in selftests/bpf. %.bpf.o already depend on > vmlinux.h, and unless we somehow get rid of vmlinux.h dependency on > bpftool, bpf_helper_defs.h should always be there at a point when > %.bpf.o objects are compiled. > Making sure that bpf_helper_defs.h is generated by the time .bpf.o is being compiled is one thing. Triggering .bpf.o regeneration when bpf_helper_defs.h changes is another. The second used to be important when we were adding new helpers, otherwise we can get compilation error because of missing helper definitions. This is much less of an issue today, we we might just leave it as is. Making sure bpf_helper_defs.h is there might be good enough. > > > > > I have some ideas on how to make BTF regeneration in vmlinux.h itself > > unnecessary, that might help with this issue. Separately (depending on > > what are the negatives of the reproducible_build option) we can look > > into making pahole have more consistent internal BTF type ordering > > without negatively affecting the overall BTF dedup performance in > > pahole. Hopefully I can work with Ihor on this as follow ups. > > I still know little about how all this machinery works, but I'd be > glad to help. I'd like to avoid regenerating BTF inside the vmlinux image, if possible. This will cut down significantly on incremental kernel build times. We can chat about this separately a bit later, don't worry. > > > > > P.S. I also spent more time than I'm willing to admit trying to > > improve bpftool's BTF sorting to minimize the chance of vmlinux.h > > contents being different, and I think I removed a bunch of cases where > > we had unnecessary differences, but still, it's fundamentally > > non-deterministic to do everything based on type and field names, > > unfortunately. > > [...] > >
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 53cc13b92ee2..7660d19b66c2 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -399,10 +399,14 @@ $(HOST_BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \ DESTDIR=$(HOST_SCRATCH_DIR)/ prefix= all install_headers endif +# vmlinux.h is first dumped to a temprorary file and then compared to +# the previous version. This helps to avoid unnecessary re-builds of +# $(TRUNNER_BPF_OBJS) $(INCLUDE_DIR)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL) | $(INCLUDE_DIR) ifeq ($(VMLINUX_H),) $(call msg,GEN,,$@) - $(Q)$(BPFTOOL) btf dump file $(VMLINUX_BTF) format c > $@ + $(Q)$(BPFTOOL) btf dump file $(VMLINUX_BTF) format c > $(INCLUDE_DIR)/.vmlinux.h.tmp + $(Q)cmp -s $(INCLUDE_DIR)/.vmlinux.h.tmp $@ || mv $(INCLUDE_DIR)/.vmlinux.h.tmp $@ else $(call msg,CP,,$@) $(Q)cp "$(VMLINUX_H)" $@
%.bpf.o objects depend on vmlinux.h, which makes them transitively dependent on unnecessary libbpf headers. However vmlinux.h doesn't actually change as often. When generating vmlinux.h, compare it to a previous version and update it only if there are changes. Example of build time improvement (after first clean build): $ touch ../../../lib/bpf/bpf.h $ time make -j8 Before: real 1m37.592s After: real 0m27.310s Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed. Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> --- tools/testing/selftests/bpf/Makefile | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)