diff mbox series

[dwarves,v1] dwarf_loader: Fix skipped encoding of function BTF on 32-bit systems

Message ID 20250410083359.198724-1-tony.ambardar@gmail.com (mailing list archive)
State Not Applicable
Headers show
Series [dwarves,v1] dwarf_loader: Fix skipped encoding of function BTF on 32-bit systems | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Tony Ambardar April 10, 2025, 8:33 a.m. UTC
While doing JIT development on armhf BTF kernels, I hit a strange issue
where some functions were missing in BTF data. This required considerable
debugging but can be reproduced simply:

$ bpftool --version
bpftool v7.6.0
using libbpf v1.6
features: llvm, skeletons

$ pahole --version
v1.29

$ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf
btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF
btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl'

$ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
<nothing>

$ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);

$ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf

$ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);

The key things to note are the pahole 'consistent_func' feature and the u64
'wake_flags' parameter vs. arm 32-bit registers. These point to existing
code handling arguments larger than register-size, but only structs.

Generalize the code for any type of argument exceeding register size (i.e.
cu->addr_size). This should work for integral or aggregate types, and also
avoids a bug in the current code where a register-sized struct could be
mistaken for larger.

Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
---
 dwarf_loader.c | 37 ++++++++++++-------------------------
 1 file changed, 12 insertions(+), 25 deletions(-)

Comments

Alan Maguire April 10, 2025, 12:20 p.m. UTC | #1
On 10/04/2025 09:33, Tony Ambardar wrote:
> While doing JIT development on armhf BTF kernels, I hit a strange issue
> where some functions were missing in BTF data. This required considerable
> debugging but can be reproduced simply:
> 
> $ bpftool --version
> bpftool v7.6.0
> using libbpf v1.6
> features: llvm, skeletons
> 
> $ pahole --version
> v1.29
> 
> $ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf
> btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF
> btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl'
> 
> $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> <nothing>
> 
> $ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);
> 
> $ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf
> 
> $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);
> 
> The key things to note are the pahole 'consistent_func' feature and the u64
> 'wake_flags' parameter vs. arm 32-bit registers. These point to existing
> code handling arguments larger than register-size, but only structs.
> 
> Generalize the code for any type of argument exceeding register size (i.e.
> cu->addr_size). This should work for integral or aggregate types, and also
> avoids a bug in the current code where a register-sized struct could be
> mistaken for larger.
> 
> Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params")
> Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>

Thanks for investigating this! I've tested this versus baseline on
x86_64 and aarch64. I'm seeing some small divergence in functions
encoded; for example on aarch64 we don't get a representation for

static int __io_run_local_work(struct io_ring_ctx *ctx, io_tw_token_t
tw, int min_events, int max_events);

The reason for that is the second argument is a typedef io_tw_token_t,
which is in turn a typedef for:

struct io_tw_state {
};

i.e. an empty struct.

The reason is with your patch we've moved from type-centric to
size-centric criteria used to allow functions into BTF that have
unexpected register usage; because the above function uses unexpected
registers _and_ does not exceed the address size, the function is marked
as having an inconsistent reg mapping. In this case, that seems
reasonable since it is true; there is no register needed to represent
the second argument.

The deeper rationale here in allowing functions that have structs that
may be represented by multiple registers is that we can handle this
outcome; the BPF_PROG2() macro was added to handle such cases and seems
to handle multi-register representation but _not_ representations where
a register is not needed at all. I'm basing that on the
___bpf_union_arg() macro in bpf_tracing.h so please correct me if I'm
wrong (we could potentially add a sizeof(t) == 0 clause here perhaps).

So in other words, though we see small divergences in representation I
_think_ they are consistent with our expectations.

I'd really like to see wider testing of this patch before it lands
however so we can shake out other problematic cases if any. If folks
could try this and compare BTF representations to baseline that would be
great! In particular comparing raw BTF is necessary since vmlinux.h
representations don't include functions (aside from kfuncs). Now that we
have always-reproducible BTF a simple diff of "bpftool btf dump file
vmlinux" can be used to make such comparisons.

However perhaps we could also think about enhancing the bpf_tracing.h
macro to handle zero-sized parameters like empty structs such that later
parameters are mapped to registers correctly (presuming that's
possible)? Yonghong, what do you think?

Thanks!

Alan

> ---
>  dwarf_loader.c | 37 ++++++++++++-------------------------
>  1 file changed, 12 insertions(+), 25 deletions(-)
> 
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index e1ba7bc..22abfdb 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -2914,23 +2914,9 @@ out:
>  	return 0;
>  }
>  
> -static bool param__is_struct(struct cu *cu, struct tag *tag)
> +static bool param__is_wide(struct cu *cu, struct tag *tag)
>  {
> -	struct tag *type = cu__type(cu, tag->type);
> -
> -	if (!type)
> -		return false;
> -
> -	switch (type->tag) {
> -	case DW_TAG_structure_type:
> -		return true;
> -	case DW_TAG_const_type:
> -	case DW_TAG_typedef:
> -		/* handle "typedef struct", const parameter */
> -		return param__is_struct(cu, type);
> -	default:
> -		return false;
> -	}
> +	return tag__size(tag, cu) > cu->addr_size;
>  }
>  
>  static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> @@ -2942,9 +2928,9 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
>  		struct tag *tag = pt->entries[i];
>  		struct parameter *pos;
>  		struct function *fn = tag__function(tag);
> -		bool has_unexpected_reg = false, has_struct_param = false;
> +		bool has_unexpected_reg = false, has_wide_param = false;
>  
> -		/* mark function as optimized if parameter is, or
> +		/* Mark function as optimized if parameter is, or
>  		 * if parameter does not have a location; at this
>  		 * point location presence has been marked in
>  		 * abstract origins for cases where a parameter
> @@ -2953,10 +2939,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
>  		 *
>  		 * Also mark functions which, due to optimization,
>  		 * use an unexpected register for a parameter.
> -		 * Exception is functions which have a struct
> -		 * as a parameter, as multiple registers may
> -		 * be used to represent it, throwing off register
> -		 * to parameter mapping.
> +		 * Exception is functions which have a wide
> +		 * parameter, as multiple registers may be used
> +		 * to represent it, throwing off register to
> +		 * parameter mapping. Examples could include
> +		 * structs or 64-bit types on a 32-bit arch.
>  		 */
>  		ftype__for_each_parameter(&fn->proto, pos) {
>  			if (pos->optimized || !pos->has_loc)
> @@ -2967,11 +2954,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
>  		}
>  		if (has_unexpected_reg) {
>  			ftype__for_each_parameter(&fn->proto, pos) {
> -				has_struct_param = param__is_struct(cu, &pos->tag);
> -				if (has_struct_param)
> +				has_wide_param = param__is_wide(cu, &pos->tag);
> +				if (has_wide_param)
>  					break;
>  			}
> -			if (!has_struct_param)
> +			if (!has_wide_param)
>  				fn->proto.unexpected_reg = 1;
>  		}
>
Tony Ambardar April 16, 2025, 10:33 a.m. UTC | #2
On Thu, Apr 10, 2025 at 01:20:45PM +0100, Alan Maguire wrote:
> On 10/04/2025 09:33, Tony Ambardar wrote:
> > While doing JIT development on armhf BTF kernels, I hit a strange issue
> > where some functions were missing in BTF data. This required considerable
> > debugging but can be reproduced simply:
> > 
> > $ bpftool --version
> > bpftool v7.6.0
> > using libbpf v1.6
> > features: llvm, skeletons
> > 
> > $ pahole --version
> > v1.29
> > 
> > $ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf
> > btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF
> > btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl'
> > 
> > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> > <nothing>
> > 
> > $ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> > s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);
> > 
> > $ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf
> > 
> > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> > bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);
> > 
> > The key things to note are the pahole 'consistent_func' feature and the u64
> > 'wake_flags' parameter vs. arm 32-bit registers. These point to existing
> > code handling arguments larger than register-size, but only structs.
> > 
> > Generalize the code for any type of argument exceeding register size (i.e.
> > cu->addr_size). This should work for integral or aggregate types, and also
> > avoids a bug in the current code where a register-sized struct could be
> > mistaken for larger.
> > 
> > Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params")
> > Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
> 
> Thanks for investigating this! I've tested this versus baseline on
> x86_64 and aarch64. I'm seeing some small divergence in functions
> encoded; for example on aarch64 we don't get a representation for
> 
> static int __io_run_local_work(struct io_ring_ctx *ctx, io_tw_token_t
> tw, int min_events, int max_events);
> 
> The reason for that is the second argument is a typedef io_tw_token_t,
> which is in turn a typedef for:
> 
> struct io_tw_state {
> };
> 
> i.e. an empty struct.
> 
> The reason is with your patch we've moved from type-centric to
> size-centric criteria used to allow functions into BTF that have
> unexpected register usage; because the above function uses unexpected
> registers _and_ does not exceed the address size, the function is marked
> as having an inconsistent reg mapping. In this case, that seems
> reasonable since it is true; there is no register needed to represent
> the second argument.
> 
> The deeper rationale here in allowing functions that have structs that
> may be represented by multiple registers is that we can handle this
> outcome; the BPF_PROG2() macro was added to handle such cases and seems
> to handle multi-register representation but _not_ representations where
> a register is not needed at all. I'm basing that on the
> ___bpf_union_arg() macro in bpf_tracing.h so please correct me if I'm
> wrong (we could potentially add a sizeof(t) == 0 clause here perhaps).
> 
> So in other words, though we see small divergences in representation I
> _think_ they are consistent with our expectations.
> 
> I'd really like to see wider testing of this patch before it lands
> however so we can shake out other problematic cases if any. If folks
> could try this and compare BTF representations to baseline that would be
> great! In particular comparing raw BTF is necessary since vmlinux.h
> representations don't include functions (aside from kfuncs). Now that we
> have always-reproducible BTF a simple diff of "bpftool btf dump file
> vmlinux" can be used to make such comparisons.
> 
> However perhaps we could also think about enhancing the bpf_tracing.h
> macro to handle zero-sized parameters like empty structs such that later
> parameters are mapped to registers correctly (presuming that's
> possible)? Yonghong, what do you think?

Hi Alan,

Thanks so much for the additional context. I pressed pause to consider
this while waiting for further testing news or feedback, but haven't seen
anything since. Have you heard anything OOB?

I also understood dwarves could have CI working now, so wondering how
those tests with the patch might have gone. In fact, it would be great to
have a regular arm32 CI running if that's possible. Could you share how
the CI changes are being managed? I've recently been trying to update
the arm32 JIT and test_progs in tandem, with the goal of having a working
32-bit target for kernel-patches/bpf CI, but some baby-steps with dwarves
or libbpf could be very helpful.

As far as type-based vs size-based criteria, I'm not wedded to either, and
did look at the type-based route as currently exists. I needed to add
cases for DW_TAG_base_type (for ints), DW_TAG_volatile_type (recursive),
DW_TAG_union_type (same issues as structs), and then we still need size
tests anyway. Sticking with size-based (and a zero-test as you suggested)
seemed the simplest and preserved the functions you noticed missing.

Cheers,
Tony

> 
> Thanks!
> 
> Alan
> 
> > ---
> >  dwarf_loader.c | 37 ++++++++++++-------------------------
> >  1 file changed, 12 insertions(+), 25 deletions(-)
> > 
> > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > index e1ba7bc..22abfdb 100644
> > --- a/dwarf_loader.c
> > +++ b/dwarf_loader.c
> > @@ -2914,23 +2914,9 @@ out:
> >  	return 0;
> >  }
> >  
> > -static bool param__is_struct(struct cu *cu, struct tag *tag)
> > +static bool param__is_wide(struct cu *cu, struct tag *tag)
> >  {
> > -	struct tag *type = cu__type(cu, tag->type);
> > -
> > -	if (!type)
> > -		return false;
> > -
> > -	switch (type->tag) {
> > -	case DW_TAG_structure_type:
> > -		return true;
> > -	case DW_TAG_const_type:
> > -	case DW_TAG_typedef:
> > -		/* handle "typedef struct", const parameter */
> > -		return param__is_struct(cu, type);
> > -	default:
> > -		return false;
> > -	}
> > +	return tag__size(tag, cu) > cu->addr_size;
> >  }
> >  
> >  static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> > @@ -2942,9 +2928,9 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> >  		struct tag *tag = pt->entries[i];
> >  		struct parameter *pos;
> >  		struct function *fn = tag__function(tag);
> > -		bool has_unexpected_reg = false, has_struct_param = false;
> > +		bool has_unexpected_reg = false, has_wide_param = false;
> >  
> > -		/* mark function as optimized if parameter is, or
> > +		/* Mark function as optimized if parameter is, or
> >  		 * if parameter does not have a location; at this
> >  		 * point location presence has been marked in
> >  		 * abstract origins for cases where a parameter
> > @@ -2953,10 +2939,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> >  		 *
> >  		 * Also mark functions which, due to optimization,
> >  		 * use an unexpected register for a parameter.
> > -		 * Exception is functions which have a struct
> > -		 * as a parameter, as multiple registers may
> > -		 * be used to represent it, throwing off register
> > -		 * to parameter mapping.
> > +		 * Exception is functions which have a wide
> > +		 * parameter, as multiple registers may be used
> > +		 * to represent it, throwing off register to
> > +		 * parameter mapping. Examples could include
> > +		 * structs or 64-bit types on a 32-bit arch.
> >  		 */
> >  		ftype__for_each_parameter(&fn->proto, pos) {
> >  			if (pos->optimized || !pos->has_loc)
> > @@ -2967,11 +2954,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> >  		}
> >  		if (has_unexpected_reg) {
> >  			ftype__for_each_parameter(&fn->proto, pos) {
> > -				has_struct_param = param__is_struct(cu, &pos->tag);
> > -				if (has_struct_param)
> > +				has_wide_param = param__is_wide(cu, &pos->tag);
> > +				if (has_wide_param)
> >  					break;
> >  			}
> > -			if (!has_struct_param)
> > +			if (!has_wide_param)
> >  				fn->proto.unexpected_reg = 1;
> >  		}
> >  
>
diff mbox series

Patch

diff --git a/dwarf_loader.c b/dwarf_loader.c
index e1ba7bc..22abfdb 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -2914,23 +2914,9 @@  out:
 	return 0;
 }
 
-static bool param__is_struct(struct cu *cu, struct tag *tag)
+static bool param__is_wide(struct cu *cu, struct tag *tag)
 {
-	struct tag *type = cu__type(cu, tag->type);
-
-	if (!type)
-		return false;
-
-	switch (type->tag) {
-	case DW_TAG_structure_type:
-		return true;
-	case DW_TAG_const_type:
-	case DW_TAG_typedef:
-		/* handle "typedef struct", const parameter */
-		return param__is_struct(cu, type);
-	default:
-		return false;
-	}
+	return tag__size(tag, cu) > cu->addr_size;
 }
 
 static int cu__resolve_func_ret_types_optimized(struct cu *cu)
@@ -2942,9 +2928,9 @@  static int cu__resolve_func_ret_types_optimized(struct cu *cu)
 		struct tag *tag = pt->entries[i];
 		struct parameter *pos;
 		struct function *fn = tag__function(tag);
-		bool has_unexpected_reg = false, has_struct_param = false;
+		bool has_unexpected_reg = false, has_wide_param = false;
 
-		/* mark function as optimized if parameter is, or
+		/* Mark function as optimized if parameter is, or
 		 * if parameter does not have a location; at this
 		 * point location presence has been marked in
 		 * abstract origins for cases where a parameter
@@ -2953,10 +2939,11 @@  static int cu__resolve_func_ret_types_optimized(struct cu *cu)
 		 *
 		 * Also mark functions which, due to optimization,
 		 * use an unexpected register for a parameter.
-		 * Exception is functions which have a struct
-		 * as a parameter, as multiple registers may
-		 * be used to represent it, throwing off register
-		 * to parameter mapping.
+		 * Exception is functions which have a wide
+		 * parameter, as multiple registers may be used
+		 * to represent it, throwing off register to
+		 * parameter mapping. Examples could include
+		 * structs or 64-bit types on a 32-bit arch.
 		 */
 		ftype__for_each_parameter(&fn->proto, pos) {
 			if (pos->optimized || !pos->has_loc)
@@ -2967,11 +2954,11 @@  static int cu__resolve_func_ret_types_optimized(struct cu *cu)
 		}
 		if (has_unexpected_reg) {
 			ftype__for_each_parameter(&fn->proto, pos) {
-				has_struct_param = param__is_struct(cu, &pos->tag);
-				if (has_struct_param)
+				has_wide_param = param__is_wide(cu, &pos->tag);
+				if (has_wide_param)
 					break;
 			}
-			if (!has_struct_param)
+			if (!has_wide_param)
 				fn->proto.unexpected_reg = 1;
 		}