Message ID | 20250410083359.198724-1-tony.ambardar@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [dwarves,v1] dwarf_loader: Fix skipped encoding of function BTF on 32-bit systems | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On 10/04/2025 09:33, Tony Ambardar wrote: > While doing JIT development on armhf BTF kernels, I hit a strange issue > where some functions were missing in BTF data. This required considerable > debugging but can be reproduced simply: > > $ bpftool --version > bpftool v7.6.0 > using libbpf v1.6 > features: llvm, skeletons > > $ pahole --version > v1.29 > > $ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf > btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF > btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl' > > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > <nothing> > > $ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); > > $ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf > > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); > > The key things to note are the pahole 'consistent_func' feature and the u64 > 'wake_flags' parameter vs. arm 32-bit registers. These point to existing > code handling arguments larger than register-size, but only structs. > > Generalize the code for any type of argument exceeding register size (i.e. > cu->addr_size). This should work for integral or aggregate types, and also > avoids a bug in the current code where a register-sized struct could be > mistaken for larger. > > Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params") > Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Thanks for investigating this! I've tested this versus baseline on x86_64 and aarch64. I'm seeing some small divergence in functions encoded; for example on aarch64 we don't get a representation for static int __io_run_local_work(struct io_ring_ctx *ctx, io_tw_token_t tw, int min_events, int max_events); The reason for that is the second argument is a typedef io_tw_token_t, which is in turn a typedef for: struct io_tw_state { }; i.e. an empty struct. The reason is with your patch we've moved from type-centric to size-centric criteria used to allow functions into BTF that have unexpected register usage; because the above function uses unexpected registers _and_ does not exceed the address size, the function is marked as having an inconsistent reg mapping. In this case, that seems reasonable since it is true; there is no register needed to represent the second argument. The deeper rationale here in allowing functions that have structs that may be represented by multiple registers is that we can handle this outcome; the BPF_PROG2() macro was added to handle such cases and seems to handle multi-register representation but _not_ representations where a register is not needed at all. I'm basing that on the ___bpf_union_arg() macro in bpf_tracing.h so please correct me if I'm wrong (we could potentially add a sizeof(t) == 0 clause here perhaps). So in other words, though we see small divergences in representation I _think_ they are consistent with our expectations. I'd really like to see wider testing of this patch before it lands however so we can shake out other problematic cases if any. If folks could try this and compare BTF representations to baseline that would be great! In particular comparing raw BTF is necessary since vmlinux.h representations don't include functions (aside from kfuncs). Now that we have always-reproducible BTF a simple diff of "bpftool btf dump file vmlinux" can be used to make such comparisons. However perhaps we could also think about enhancing the bpf_tracing.h macro to handle zero-sized parameters like empty structs such that later parameters are mapped to registers correctly (presuming that's possible)? Yonghong, what do you think? Thanks! Alan > --- > dwarf_loader.c | 37 ++++++++++++------------------------- > 1 file changed, 12 insertions(+), 25 deletions(-) > > diff --git a/dwarf_loader.c b/dwarf_loader.c > index e1ba7bc..22abfdb 100644 > --- a/dwarf_loader.c > +++ b/dwarf_loader.c > @@ -2914,23 +2914,9 @@ out: > return 0; > } > > -static bool param__is_struct(struct cu *cu, struct tag *tag) > +static bool param__is_wide(struct cu *cu, struct tag *tag) > { > - struct tag *type = cu__type(cu, tag->type); > - > - if (!type) > - return false; > - > - switch (type->tag) { > - case DW_TAG_structure_type: > - return true; > - case DW_TAG_const_type: > - case DW_TAG_typedef: > - /* handle "typedef struct", const parameter */ > - return param__is_struct(cu, type); > - default: > - return false; > - } > + return tag__size(tag, cu) > cu->addr_size; > } > > static int cu__resolve_func_ret_types_optimized(struct cu *cu) > @@ -2942,9 +2928,9 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > struct tag *tag = pt->entries[i]; > struct parameter *pos; > struct function *fn = tag__function(tag); > - bool has_unexpected_reg = false, has_struct_param = false; > + bool has_unexpected_reg = false, has_wide_param = false; > > - /* mark function as optimized if parameter is, or > + /* Mark function as optimized if parameter is, or > * if parameter does not have a location; at this > * point location presence has been marked in > * abstract origins for cases where a parameter > @@ -2953,10 +2939,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > * > * Also mark functions which, due to optimization, > * use an unexpected register for a parameter. > - * Exception is functions which have a struct > - * as a parameter, as multiple registers may > - * be used to represent it, throwing off register > - * to parameter mapping. > + * Exception is functions which have a wide > + * parameter, as multiple registers may be used > + * to represent it, throwing off register to > + * parameter mapping. Examples could include > + * structs or 64-bit types on a 32-bit arch. > */ > ftype__for_each_parameter(&fn->proto, pos) { > if (pos->optimized || !pos->has_loc) > @@ -2967,11 +2954,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > } > if (has_unexpected_reg) { > ftype__for_each_parameter(&fn->proto, pos) { > - has_struct_param = param__is_struct(cu, &pos->tag); > - if (has_struct_param) > + has_wide_param = param__is_wide(cu, &pos->tag); > + if (has_wide_param) > break; > } > - if (!has_struct_param) > + if (!has_wide_param) > fn->proto.unexpected_reg = 1; > } >
On Thu, Apr 10, 2025 at 01:20:45PM +0100, Alan Maguire wrote: > On 10/04/2025 09:33, Tony Ambardar wrote: > > While doing JIT development on armhf BTF kernels, I hit a strange issue > > where some functions were missing in BTF data. This required considerable > > debugging but can be reproduced simply: > > > > $ bpftool --version > > bpftool v7.6.0 > > using libbpf v1.6 > > features: llvm, skeletons > > > > $ pahole --version > > v1.29 > > > > $ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf > > btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF > > btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl' > > > > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > > <nothing> > > > > $ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > > s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); > > > > $ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf > > > > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > > bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); > > > > The key things to note are the pahole 'consistent_func' feature and the u64 > > 'wake_flags' parameter vs. arm 32-bit registers. These point to existing > > code handling arguments larger than register-size, but only structs. > > > > Generalize the code for any type of argument exceeding register size (i.e. > > cu->addr_size). This should work for integral or aggregate types, and also > > avoids a bug in the current code where a register-sized struct could be > > mistaken for larger. > > > > Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params") > > Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> > > Thanks for investigating this! I've tested this versus baseline on > x86_64 and aarch64. I'm seeing some small divergence in functions > encoded; for example on aarch64 we don't get a representation for > > static int __io_run_local_work(struct io_ring_ctx *ctx, io_tw_token_t > tw, int min_events, int max_events); > > The reason for that is the second argument is a typedef io_tw_token_t, > which is in turn a typedef for: > > struct io_tw_state { > }; > > i.e. an empty struct. > > The reason is with your patch we've moved from type-centric to > size-centric criteria used to allow functions into BTF that have > unexpected register usage; because the above function uses unexpected > registers _and_ does not exceed the address size, the function is marked > as having an inconsistent reg mapping. In this case, that seems > reasonable since it is true; there is no register needed to represent > the second argument. > > The deeper rationale here in allowing functions that have structs that > may be represented by multiple registers is that we can handle this > outcome; the BPF_PROG2() macro was added to handle such cases and seems > to handle multi-register representation but _not_ representations where > a register is not needed at all. I'm basing that on the > ___bpf_union_arg() macro in bpf_tracing.h so please correct me if I'm > wrong (we could potentially add a sizeof(t) == 0 clause here perhaps). > > So in other words, though we see small divergences in representation I > _think_ they are consistent with our expectations. > > I'd really like to see wider testing of this patch before it lands > however so we can shake out other problematic cases if any. If folks > could try this and compare BTF representations to baseline that would be > great! In particular comparing raw BTF is necessary since vmlinux.h > representations don't include functions (aside from kfuncs). Now that we > have always-reproducible BTF a simple diff of "bpftool btf dump file > vmlinux" can be used to make such comparisons. > > However perhaps we could also think about enhancing the bpf_tracing.h > macro to handle zero-sized parameters like empty structs such that later > parameters are mapped to registers correctly (presuming that's > possible)? Yonghong, what do you think? Hi Alan, Thanks so much for the additional context. I pressed pause to consider this while waiting for further testing news or feedback, but haven't seen anything since. Have you heard anything OOB? I also understood dwarves could have CI working now, so wondering how those tests with the patch might have gone. In fact, it would be great to have a regular arm32 CI running if that's possible. Could you share how the CI changes are being managed? I've recently been trying to update the arm32 JIT and test_progs in tandem, with the goal of having a working 32-bit target for kernel-patches/bpf CI, but some baby-steps with dwarves or libbpf could be very helpful. As far as type-based vs size-based criteria, I'm not wedded to either, and did look at the type-based route as currently exists. I needed to add cases for DW_TAG_base_type (for ints), DW_TAG_volatile_type (recursive), DW_TAG_union_type (same issues as structs), and then we still need size tests anyway. Sticking with size-based (and a zero-test as you suggested) seemed the simplest and preserved the functions you noticed missing. Cheers, Tony > > Thanks! > > Alan > > > --- > > dwarf_loader.c | 37 ++++++++++++------------------------- > > 1 file changed, 12 insertions(+), 25 deletions(-) > > > > diff --git a/dwarf_loader.c b/dwarf_loader.c > > index e1ba7bc..22abfdb 100644 > > --- a/dwarf_loader.c > > +++ b/dwarf_loader.c > > @@ -2914,23 +2914,9 @@ out: > > return 0; > > } > > > > -static bool param__is_struct(struct cu *cu, struct tag *tag) > > +static bool param__is_wide(struct cu *cu, struct tag *tag) > > { > > - struct tag *type = cu__type(cu, tag->type); > > - > > - if (!type) > > - return false; > > - > > - switch (type->tag) { > > - case DW_TAG_structure_type: > > - return true; > > - case DW_TAG_const_type: > > - case DW_TAG_typedef: > > - /* handle "typedef struct", const parameter */ > > - return param__is_struct(cu, type); > > - default: > > - return false; > > - } > > + return tag__size(tag, cu) > cu->addr_size; > > } > > > > static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > @@ -2942,9 +2928,9 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > struct tag *tag = pt->entries[i]; > > struct parameter *pos; > > struct function *fn = tag__function(tag); > > - bool has_unexpected_reg = false, has_struct_param = false; > > + bool has_unexpected_reg = false, has_wide_param = false; > > > > - /* mark function as optimized if parameter is, or > > + /* Mark function as optimized if parameter is, or > > * if parameter does not have a location; at this > > * point location presence has been marked in > > * abstract origins for cases where a parameter > > @@ -2953,10 +2939,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > * > > * Also mark functions which, due to optimization, > > * use an unexpected register for a parameter. > > - * Exception is functions which have a struct > > - * as a parameter, as multiple registers may > > - * be used to represent it, throwing off register > > - * to parameter mapping. > > + * Exception is functions which have a wide > > + * parameter, as multiple registers may be used > > + * to represent it, throwing off register to > > + * parameter mapping. Examples could include > > + * structs or 64-bit types on a 32-bit arch. > > */ > > ftype__for_each_parameter(&fn->proto, pos) { > > if (pos->optimized || !pos->has_loc) > > @@ -2967,11 +2954,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > } > > if (has_unexpected_reg) { > > ftype__for_each_parameter(&fn->proto, pos) { > > - has_struct_param = param__is_struct(cu, &pos->tag); > > - if (has_struct_param) > > + has_wide_param = param__is_wide(cu, &pos->tag); > > + if (has_wide_param) > > break; > > } > > - if (!has_struct_param) > > + if (!has_wide_param) > > fn->proto.unexpected_reg = 1; > > } > > >
diff --git a/dwarf_loader.c b/dwarf_loader.c index e1ba7bc..22abfdb 100644 --- a/dwarf_loader.c +++ b/dwarf_loader.c @@ -2914,23 +2914,9 @@ out: return 0; } -static bool param__is_struct(struct cu *cu, struct tag *tag) +static bool param__is_wide(struct cu *cu, struct tag *tag) { - struct tag *type = cu__type(cu, tag->type); - - if (!type) - return false; - - switch (type->tag) { - case DW_TAG_structure_type: - return true; - case DW_TAG_const_type: - case DW_TAG_typedef: - /* handle "typedef struct", const parameter */ - return param__is_struct(cu, type); - default: - return false; - } + return tag__size(tag, cu) > cu->addr_size; } static int cu__resolve_func_ret_types_optimized(struct cu *cu) @@ -2942,9 +2928,9 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) struct tag *tag = pt->entries[i]; struct parameter *pos; struct function *fn = tag__function(tag); - bool has_unexpected_reg = false, has_struct_param = false; + bool has_unexpected_reg = false, has_wide_param = false; - /* mark function as optimized if parameter is, or + /* Mark function as optimized if parameter is, or * if parameter does not have a location; at this * point location presence has been marked in * abstract origins for cases where a parameter @@ -2953,10 +2939,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) * * Also mark functions which, due to optimization, * use an unexpected register for a parameter. - * Exception is functions which have a struct - * as a parameter, as multiple registers may - * be used to represent it, throwing off register - * to parameter mapping. + * Exception is functions which have a wide + * parameter, as multiple registers may be used + * to represent it, throwing off register to + * parameter mapping. Examples could include + * structs or 64-bit types on a 32-bit arch. */ ftype__for_each_parameter(&fn->proto, pos) { if (pos->optimized || !pos->has_loc) @@ -2967,11 +2954,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) } if (has_unexpected_reg) { ftype__for_each_parameter(&fn->proto, pos) { - has_struct_param = param__is_struct(cu, &pos->tag); - if (has_struct_param) + has_wide_param = param__is_wide(cu, &pos->tag); + if (has_wide_param) break; } - if (!has_struct_param) + if (!has_wide_param) fn->proto.unexpected_reg = 1; }
While doing JIT development on armhf BTF kernels, I hit a strange issue where some functions were missing in BTF data. This required considerable debugging but can be reproduced simply: $ bpftool --version bpftool v7.6.0 using libbpf v1.6 features: llvm, skeletons $ pahole --version v1.29 $ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl' $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf <nothing> $ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); $ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); The key things to note are the pahole 'consistent_func' feature and the u64 'wake_flags' parameter vs. arm 32-bit registers. These point to existing code handling arguments larger than register-size, but only structs. Generalize the code for any type of argument exceeding register size (i.e. cu->addr_size). This should work for integral or aggregate types, and also avoids a bug in the current code where a register-sized struct could be mistaken for larger. Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> --- dwarf_loader.c | 37 ++++++++++++------------------------- 1 file changed, 12 insertions(+), 25 deletions(-)