diff mbox series

[bpf-next] libbpf: support module BTF for BPF_TYPE_ID_TARGET CO-RE relocation

Message ID 20201205025140.443115-1-andrii@kernel.org (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf-next] libbpf: support module BTF for BPF_TYPE_ID_TARGET CO-RE relocation | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 50 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Andrii Nakryiko Dec. 5, 2020, 2:51 a.m. UTC
When Clang emits ldimm64 instruction for BPF_TYPE_ID_TARGET CO-RE relocation,
put module BTF FD, containing target type, into upper 32 bits of imm64.

Because this FD is internal to libbpf, it's very cumbersome to test this in
selftests. Manual testing was performed with debug log messages sprinkled
across selftests and libbpf, confirming expected values are substituted.
Better testing will be performed as part of the work adding module BTF types
support to  bpf_snprintf_btf() helpers.

Cc: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/libbpf.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

Comments

Alan Maguire Dec. 6, 2020, 12:37 a.m. UTC | #1
On Fri, 4 Dec 2020, Andrii Nakryiko wrote:

> When Clang emits ldimm64 instruction for BPF_TYPE_ID_TARGET CO-RE relocation,
> put module BTF FD, containing target type, into upper 32 bits of imm64.
> 
> Because this FD is internal to libbpf, it's very cumbersome to test this in
> selftests. Manual testing was performed with debug log messages sprinkled
> across selftests and libbpf, confirming expected values are substituted.
> Better testing will be performed as part of the work adding module BTF types
> support to  bpf_snprintf_btf() helpers.
> 
> Cc: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Thanks so much for doing this Andrii! When I tested, I ran into a problem;
it turns out when a module struct such as "veth_stats" is used, it's
classified as BTF_KIND_FWD, and as a result when we iterate over
the modules and look in the veth module, "struct veth_stats" does not
match since its module kind (BTF_KIND_STRUCT) does not match the candidate
kind (BTF_KIND_FWD). I'm kind of out of my depth here, but the below
patch (on top of your patch) worked.  However without it - when we find
0  candidate matches - as well as not substituting the module object 
id/type id - we hit a segfault:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000480bf9 in bpf_core_calc_relo (prog=0x4d6ba40, 
relo=0x4d70e7c, 
    relo_idx=0, local_spec=0x7ffe2cf17b00, targ_spec=0x0, 
res=0x7ffe2cf17ae0)
    at libbpf.c:4408
4408		switch (kind) {
Missing separate debuginfos, use: debuginfo-install 
elfutils-libelf-0.172-2.el7.x86_64 glibc-2.17-196.el7.x86_64 
libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 
libgcc-4.8.5-36.0.1.el7_6.2.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x0000000000480bf9 in bpf_core_calc_relo (prog=0x4d6ba40, 
relo=0x4d70e7c, 
    relo_idx=0, local_spec=0x7ffe2cf17b00, targ_spec=0x0, 
res=0x7ffe2cf17ae0)
    at libbpf.c:4408
 

The dereferences of targ_spec in bpf_core_recalc_relo() seem
to be the cause; that function is called with a NULL targ_spec
when 0 candidates are found, so it's possible we'd need to
guard those accesses for cases where a bogus type was passed
in and no candidates were found.  If the below looks good would
it make sense to roll it into your patch or will I add it to my
v3 patch series?

Thanks again for your help with this!

Alan

>From 08040730dbff6c5d7636927777ac85a71c10827f Mon Sep 17 00:00:00 2001
From: Alan Maguire <alan.maguire@oracle.com>
Date: Sun, 6 Dec 2020 01:10:28 +0100
Subject: [PATCH] libbpf: handle fwd kinds when checking candidate relocations
 for modules

when a struct belonging to a module is being assessed, it will be
designated a fwd kind (BTF_KIND_FWD); when matching candidate
types constraints on exact type matching need to be relaxed to
ensure that such structures are found successfully.  Introduce
kinds_match() function to handle this comparison.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/libbpf.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 539956f..00fdb30 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4673,6 +4673,24 @@ static void bpf_core_free_cands(struct core_cand_list *cands)
 	free(cands);
 }
 
+/* module-specific structs will have relo kind set to fwd, so as
+ * well as handling exact matches, struct has to match fwd kind.
+ */
+static bool kinds_match(const struct btf_type *type1,
+			const struct btf_type *type2)
+{
+	__u8 kind1 = btf_kind(type1);
+	__u8 kind2 = btf_kind(type2);
+
+	if (kind1 == kind2)
+		return true;
+	if (kind1 == BTF_KIND_STRUCT && kind2 == BTF_KIND_FWD)
+		return true;
+	if (kind1 == BTF_KIND_FWD && kind2 == BTF_KIND_STRUCT)
+		return true;
+	return false;
+}
+
 static int bpf_core_add_cands(struct core_cand *local_cand,
 			      size_t local_essent_len,
 			      const struct btf *targ_btf,
@@ -4689,7 +4707,7 @@ static int bpf_core_add_cands(struct core_cand *local_cand,
 	n = btf__get_nr_types(targ_btf);
 	for (i = targ_start_id; i <= n; i++) {
 		t = btf__type_by_id(targ_btf, i);
-		if (btf_kind(t) != btf_kind(local_cand->t))
+		if (!kinds_match(t, local_cand->t))
 			continue;
 
 		targ_name = btf__name_by_offset(targ_btf, t->name_off);
@@ -5057,7 +5075,7 @@ static int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id
 	/* caller made sure that names match (ignoring flavor suffix) */
 	local_type = btf__type_by_id(local_btf, local_id);
 	targ_type = btf__type_by_id(targ_btf, targ_id);
-	if (btf_kind(local_type) != btf_kind(targ_type))
+	if (!kinds_match(local_type, targ_type))
 		return 0;
 
 recur:
@@ -5070,7 +5088,7 @@ static int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id
 	if (!local_type || !targ_type)
 		return -EINVAL;
 
-	if (btf_kind(local_type) != btf_kind(targ_type))
+	if (!kinds_match(local_type, targ_type))
 		return 0;
 
 	switch (btf_kind(local_type)) {
Alan Maguire Dec. 7, 2020, 4:38 p.m. UTC | #2
On Fri, 4 Dec 2020, Andrii Nakryiko wrote:

> When Clang emits ldimm64 instruction for BPF_TYPE_ID_TARGET CO-RE relocation,
> put module BTF FD, containing target type, into upper 32 bits of imm64.
> 
> Because this FD is internal to libbpf, it's very cumbersome to test this in
> selftests. Manual testing was performed with debug log messages sprinkled
> across selftests and libbpf, confirming expected values are substituted.
> Better testing will be performed as part of the work adding module BTF types
> support to  bpf_snprintf_btf() helpers.
> 
> Cc: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/libbpf.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 9be88a90a4aa..539956f7920a 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4795,6 +4795,7 @@ static int load_module_btfs(struct bpf_object *obj)
>  
>  		mod_btf = &obj->btf_modules[obj->btf_module_cnt++];
>  
> +		btf__set_fd(btf, fd);
>  		mod_btf->btf = btf;
>  		mod_btf->id = id;
>  		mod_btf->fd = fd;
> @@ -5445,6 +5446,10 @@ struct bpf_core_relo_res
>  	__u32 orig_type_id;
>  	__u32 new_sz;
>  	__u32 new_type_id;
> +	/* FD of the module BTF containing the target candidate, or 0 for
> +	 * vmlinux BTF
> +	 */
> +	int btf_obj_fd;
>  };
>  
>  /* Calculate original and target relocation values, given local and target
> @@ -5469,6 +5474,7 @@ static int bpf_core_calc_relo(const struct bpf_program *prog,
>  	res->fail_memsz_adjust = false;
>  	res->orig_sz = res->new_sz = 0;
>  	res->orig_type_id = res->new_type_id = 0;
> +	res->btf_obj_fd = 0;
>  
>  	if (core_relo_is_field_based(relo->kind)) {
>  		err = bpf_core_calc_field_relo(prog, relo, local_spec,
> @@ -5519,6 +5525,9 @@ static int bpf_core_calc_relo(const struct bpf_program *prog,
>  	} else if (core_relo_is_type_based(relo->kind)) {
>  		err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val);
>  		err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val);
> +		if (!err && relo->kind == BPF_TYPE_ID_TARGET &&
> +		    targ_spec->btf != prog->obj->btf_vmlinux) 
> +			res->btf_obj_fd = btf__fd(targ_spec->btf);

Sorry about this Andrii, but I'm a bit stuck here.

I'm struggling to get tests working where the obj fd is used to designate
the module BTF. Unless I'm missing something there are a few problems:

- the fd association is removed by libbpf when the BPF program has loaded; 
the module fds are closed and the module BTF is discarded.  However even if 
that isn't done (and as you mentioned, we could hold onto BTF that is in 
use, and I commented out the code that does that to test) - there's 
another problem:
- I can't see a way to use the object fd value we set here later in BPF 
program context; btf_get_by_fd() returns -EBADF as the fd is associated 
with the module BTF in the test's process context, not necessarily in 
the context that the BPF program is running.  Would it be possible in this 
case to use object id? Or is there another way to handle the fd->module 
BTF association that we need to make in BPF program context that I'm 
missing?
- A more long-term issue; if we use fds to specify module BTFs and write 
the object fd into the program, we can pin the BPF program such that it 
outlives fds that refer to its associated BTF.  So unless we pinned the 
BTF too, any code that assumed the BTF fd-> module mapping was valid would 
start to break once the user-space side went away and the pinned program 
persisted. 

Maybe there are solutions to these problems that I'm missing of course, 
but for now I'm not sure how to get things working.

Thanks again for your help with this!

Alan
Alexei Starovoitov Dec. 8, 2020, 3:12 a.m. UTC | #3
On Mon, Dec 07, 2020 at 04:38:16PM +0000, Alan Maguire wrote:
> On Fri, 4 Dec 2020, Andrii Nakryiko wrote:
> 
> > When Clang emits ldimm64 instruction for BPF_TYPE_ID_TARGET CO-RE relocation,
> > put module BTF FD, containing target type, into upper 32 bits of imm64.
> > 
> > Because this FD is internal to libbpf, it's very cumbersome to test this in
> > selftests. Manual testing was performed with debug log messages sprinkled
> > across selftests and libbpf, confirming expected values are substituted.
> > Better testing will be performed as part of the work adding module BTF types
> > support to  bpf_snprintf_btf() helpers.
> > 
> > Cc: Alan Maguire <alan.maguire@oracle.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  tools/lib/bpf/libbpf.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 9be88a90a4aa..539956f7920a 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -4795,6 +4795,7 @@ static int load_module_btfs(struct bpf_object *obj)
> >  
> >  		mod_btf = &obj->btf_modules[obj->btf_module_cnt++];
> >  
> > +		btf__set_fd(btf, fd);
> >  		mod_btf->btf = btf;
> >  		mod_btf->id = id;
> >  		mod_btf->fd = fd;
> > @@ -5445,6 +5446,10 @@ struct bpf_core_relo_res
> >  	__u32 orig_type_id;
> >  	__u32 new_sz;
> >  	__u32 new_type_id;
> > +	/* FD of the module BTF containing the target candidate, or 0 for
> > +	 * vmlinux BTF
> > +	 */
> > +	int btf_obj_fd;
> >  };
> >  
> >  /* Calculate original and target relocation values, given local and target
> > @@ -5469,6 +5474,7 @@ static int bpf_core_calc_relo(const struct bpf_program *prog,
> >  	res->fail_memsz_adjust = false;
> >  	res->orig_sz = res->new_sz = 0;
> >  	res->orig_type_id = res->new_type_id = 0;
> > +	res->btf_obj_fd = 0;
> >  
> >  	if (core_relo_is_field_based(relo->kind)) {
> >  		err = bpf_core_calc_field_relo(prog, relo, local_spec,
> > @@ -5519,6 +5525,9 @@ static int bpf_core_calc_relo(const struct bpf_program *prog,
> >  	} else if (core_relo_is_type_based(relo->kind)) {
> >  		err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val);
> >  		err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val);
> > +		if (!err && relo->kind == BPF_TYPE_ID_TARGET &&
> > +		    targ_spec->btf != prog->obj->btf_vmlinux) 
> > +			res->btf_obj_fd = btf__fd(targ_spec->btf);
> 
> Sorry about this Andrii, but I'm a bit stuck here.
> 
> I'm struggling to get tests working where the obj fd is used to designate
> the module BTF. Unless I'm missing something there are a few problems:
> 
> - the fd association is removed by libbpf when the BPF program has loaded; 
> the module fds are closed and the module BTF is discarded.  However even if 
> that isn't done (and as you mentioned, we could hold onto BTF that is in 
> use, and I commented out the code that does that to test) - there's 
> another problem:
> - I can't see a way to use the object fd value we set here later in BPF 
> program context; btf_get_by_fd() returns -EBADF as the fd is associated 
> with the module BTF in the test's process context, not necessarily in 
> the context that the BPF program is running.  Would it be possible in this 
> case to use object id? Or is there another way to handle the fd->module 
> BTF association that we need to make in BPF program context that I'm 
> missing?
> - A more long-term issue; if we use fds to specify module BTFs and write 
> the object fd into the program, we can pin the BPF program such that it 
> outlives fds that refer to its associated BTF.  So unless we pinned the 
> BTF too, any code that assumed the BTF fd-> module mapping was valid would 
> start to break once the user-space side went away and the pinned program 
> persisted. 

All of the above are not issues. They are features of FD based approach.
When the program refers to btf via fd the verifier needs to increment btf's refcnt
so it won't go away while the prog is running. For module's BTF it means
that the module can be unloaded, but its BTF may stay around if there is a prog
that needs to access it.
I think the missing piece in the above is that btf_get_by_fd() should be
done at load time instead of program run-time.
Everything FD based needs to behave similar to map_fds where ld_imm64 insn
contains map_fd that gets converted to map_ptr by the verifier at load time.
In this case single ld_imm64 with 32-bit FD + 32-bit btf_id is not enough.
So either libbpf or the verifier need to insert additional instruction.
I'm not sure yet how to extend 'struct btf_ptr' cleanly, so it looks good
from C side. 
In the other patch I saw:
struct btf_ptr {
        void *ptr;
        __u32 type_id;
-       __u32 flags;            /* BTF ptr flags; unused at present. */
+       __u32 obj_id;           /* BTF object; vmlinux if 0 */
 };
The removal of flags cannot be done, since it will break progs.
Probably something like this:
struct btf_ptr {
  void *ptr;
  __u32 type_id;
  __u32 flags;
  __u64 btf_obj_fd; /* this is 32-bit FD for libbpf which will become pointer after load */
};
would be the most convenient from the bpf prog side. The ld_imm64 init of
btf_obj_fd will be replaced with absolute btf pointer by the verifier. So when
bpf_snprintf_btf() is called the prog will pass the kernel internal pointer
of struct btf to the helper. No extra run-time checks needed.
bpf_snprintf_btf() would print that type_id within given struct btf object.
libbpf would need to deal with two relos. One to store btf_id from
bpf_core_type_id_kernel() into type_id. And another to find module's BTF and
store its FD into btf_obj_fd with ld_imm64. I'm still thinking to how to frame
that cleanly from C side.
Other ideas?
Andrii Nakryiko Dec. 8, 2020, 3:28 a.m. UTC | #4
On Sat, Dec 5, 2020 at 4:38 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 4 Dec 2020, Andrii Nakryiko wrote:
>
> > When Clang emits ldimm64 instruction for BPF_TYPE_ID_TARGET CO-RE relocation,
> > put module BTF FD, containing target type, into upper 32 bits of imm64.
> >
> > Because this FD is internal to libbpf, it's very cumbersome to test this in
> > selftests. Manual testing was performed with debug log messages sprinkled
> > across selftests and libbpf, confirming expected values are substituted.
> > Better testing will be performed as part of the work adding module BTF types
> > support to  bpf_snprintf_btf() helpers.
> >
> > Cc: Alan Maguire <alan.maguire@oracle.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Thanks so much for doing this Andrii! When I tested, I ran into a problem;
> it turns out when a module struct such as "veth_stats" is used, it's
> classified as BTF_KIND_FWD, and as a result when we iterate over
> the modules and look in the veth module, "struct veth_stats" does not
> match since its module kind (BTF_KIND_STRUCT) does not match the candidate
> kind (BTF_KIND_FWD). I'm kind of out of my depth here, but the below
> patch (on top of your patch) worked.

I'm not quite clear on the situation. BTF_KIND_FWD is for the local
type or the remote type? Maybe a small example would help, before we
go straight to assuming FWD can be always resolved into a concrete
STRUCT/UNION.


>  However without it - when we find
> 0  candidate matches - as well as not substituting the module object
> id/type id - we hit a segfault:

Yep, I missed the null check in:

targ_spec->btf != prog->obj->btf_vmlinux

I'll fix that.

>
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000480bf9 in bpf_core_calc_relo (prog=0x4d6ba40,
> relo=0x4d70e7c,
>     relo_idx=0, local_spec=0x7ffe2cf17b00, targ_spec=0x0,
> res=0x7ffe2cf17ae0)
>     at libbpf.c:4408
> 4408            switch (kind) {
> Missing separate debuginfos, use: debuginfo-install
> elfutils-libelf-0.172-2.el7.x86_64 glibc-2.17-196.el7.x86_64
> libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64
> libgcc-4.8.5-36.0.1.el7_6.2.x86_64 zlib-1.2.7-18.el7.x86_64
> (gdb) bt
> #0  0x0000000000480bf9 in bpf_core_calc_relo (prog=0x4d6ba40,
> relo=0x4d70e7c,
>     relo_idx=0, local_spec=0x7ffe2cf17b00, targ_spec=0x0,
> res=0x7ffe2cf17ae0)
>     at libbpf.c:4408
>
>
> The dereferences of targ_spec in bpf_core_recalc_relo() seem
> to be the cause; that function is called with a NULL targ_spec
> when 0 candidates are found, so it's possible we'd need to
> guard those accesses for cases where a bogus type was passed
> in and no candidates were found.  If the below looks good would
> it make sense to roll it into your patch or will I add it to my
> v3 patch series?
>
> Thanks again for your help with this!
>
> Alan
>
> From 08040730dbff6c5d7636927777ac85a71c10827f Mon Sep 17 00:00:00 2001
> From: Alan Maguire <alan.maguire@oracle.com>
> Date: Sun, 6 Dec 2020 01:10:28 +0100
> Subject: [PATCH] libbpf: handle fwd kinds when checking candidate relocations
>  for modules
>
> when a struct belonging to a module is being assessed, it will be
> designated a fwd kind (BTF_KIND_FWD); when matching candidate
> types constraints on exact type matching need to be relaxed to
> ensure that such structures are found successfully.  Introduce
> kinds_match() function to handle this comparison.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/libbpf.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
>

[...]
Andrii Nakryiko Dec. 8, 2020, 3:40 a.m. UTC | #5
On Mon, Dec 7, 2020 at 7:12 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Dec 07, 2020 at 04:38:16PM +0000, Alan Maguire wrote:
> > On Fri, 4 Dec 2020, Andrii Nakryiko wrote:
> >
> > > When Clang emits ldimm64 instruction for BPF_TYPE_ID_TARGET CO-RE relocation,
> > > put module BTF FD, containing target type, into upper 32 bits of imm64.
> > >
> > > Because this FD is internal to libbpf, it's very cumbersome to test this in
> > > selftests. Manual testing was performed with debug log messages sprinkled
> > > across selftests and libbpf, confirming expected values are substituted.
> > > Better testing will be performed as part of the work adding module BTF types
> > > support to  bpf_snprintf_btf() helpers.
> > >
> > > Cc: Alan Maguire <alan.maguire@oracle.com>
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  tools/lib/bpf/libbpf.c | 19 ++++++++++++++++---
> > >  1 file changed, 16 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 9be88a90a4aa..539956f7920a 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -4795,6 +4795,7 @@ static int load_module_btfs(struct bpf_object *obj)
> > >
> > >             mod_btf = &obj->btf_modules[obj->btf_module_cnt++];
> > >
> > > +           btf__set_fd(btf, fd);
> > >             mod_btf->btf = btf;
> > >             mod_btf->id = id;
> > >             mod_btf->fd = fd;
> > > @@ -5445,6 +5446,10 @@ struct bpf_core_relo_res
> > >     __u32 orig_type_id;
> > >     __u32 new_sz;
> > >     __u32 new_type_id;
> > > +   /* FD of the module BTF containing the target candidate, or 0 for
> > > +    * vmlinux BTF
> > > +    */
> > > +   int btf_obj_fd;
> > >  };
> > >
> > >  /* Calculate original and target relocation values, given local and target
> > > @@ -5469,6 +5474,7 @@ static int bpf_core_calc_relo(const struct bpf_program *prog,
> > >     res->fail_memsz_adjust = false;
> > >     res->orig_sz = res->new_sz = 0;
> > >     res->orig_type_id = res->new_type_id = 0;
> > > +   res->btf_obj_fd = 0;
> > >
> > >     if (core_relo_is_field_based(relo->kind)) {
> > >             err = bpf_core_calc_field_relo(prog, relo, local_spec,
> > > @@ -5519,6 +5525,9 @@ static int bpf_core_calc_relo(const struct bpf_program *prog,
> > >     } else if (core_relo_is_type_based(relo->kind)) {
> > >             err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val);
> > >             err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val);
> > > +           if (!err && relo->kind == BPF_TYPE_ID_TARGET &&
> > > +               targ_spec->btf != prog->obj->btf_vmlinux)
> > > +                   res->btf_obj_fd = btf__fd(targ_spec->btf);
> >
> > Sorry about this Andrii, but I'm a bit stuck here.
> >
> > I'm struggling to get tests working where the obj fd is used to designate
> > the module BTF. Unless I'm missing something there are a few problems:
> >
> > - the fd association is removed by libbpf when the BPF program has loaded;
> > the module fds are closed and the module BTF is discarded.  However even if
> > that isn't done (and as you mentioned, we could hold onto BTF that is in
> > use, and I commented out the code that does that to test) - there's
> > another problem:
> > - I can't see a way to use the object fd value we set here later in BPF
> > program context; btf_get_by_fd() returns -EBADF as the fd is associated
> > with the module BTF in the test's process context, not necessarily in
> > the context that the BPF program is running.  Would it be possible in this
> > case to use object id? Or is there another way to handle the fd->module
> > BTF association that we need to make in BPF program context that I'm
> > missing?
> > - A more long-term issue; if we use fds to specify module BTFs and write
> > the object fd into the program, we can pin the BPF program such that it
> > outlives fds that refer to its associated BTF.  So unless we pinned the
> > BTF too, any code that assumed the BTF fd-> module mapping was valid would
> > start to break once the user-space side went away and the pinned program
> > persisted.
>
> All of the above are not issues. They are features of FD based approach.
> When the program refers to btf via fd the verifier needs to increment btf's refcnt
> so it won't go away while the prog is running. For module's BTF it means
> that the module can be unloaded, but its BTF may stay around if there is a prog
> that needs to access it.
> I think the missing piece in the above is that btf_get_by_fd() should be
> done at load time instead of program run-time.
> Everything FD based needs to behave similar to map_fds where ld_imm64 insn
> contains map_fd that gets converted to map_ptr by the verifier at load time.

Right. I was going to extend verifier to do the same for all used BTF
objects as part of ksym support for module BTFs. So totally agree.
Just didn't need it so far.

> In this case single ld_imm64 with 32-bit FD + 32-bit btf_id is not enough.
> So either libbpf or the verifier need to insert additional instruction.

So this part I haven't investigated in detail yet. But, if we are just
talking about keeping struct btf * pointer + BTF type id (u32) in a
single ldimm64, we actually have enough space by using both off + imm
fields in both parts of ldimm64 instruction. Gives exactly 8 + 4
bytes. But I don't know if the problem you are referring to is in the
JIT part.

Also, for the ldimm64 instruction generated by
__builtin_btf_type_id(), btf fd + btf type id are always the same,
regardless of code path, so we can easily use bpf_insn_aux_data to
keep any extra data there, no?

> I'm not sure yet how to extend 'struct btf_ptr' cleanly, so it looks good
> from C side.
> In the other patch I saw:
> struct btf_ptr {
>         void *ptr;
>         __u32 type_id;
> -       __u32 flags;            /* BTF ptr flags; unused at present. */
> +       __u32 obj_id;           /* BTF object; vmlinux if 0 */
>  };
> The removal of flags cannot be done, since it will break progs.

This was something that I suggested to avoid extra logic based on the
size of btf_ptr. Not super critical. The idea was that flags so far
were always enforced to be zero, which make it backwards compatible
and we can now re-use it instead for module BTF fd. If we need flags
later, then we can extend it. But as I said, it's not a critical part
of the design, so I won't fight that :)

> Probably something like this:
> struct btf_ptr {
>   void *ptr;
>   __u32 type_id;
>   __u32 flags;
>   __u64 btf_obj_fd; /* this is 32-bit FD for libbpf which will become pointer after load */
> };
> would be the most convenient from the bpf prog side. The ld_imm64 init of
> btf_obj_fd will be replaced with absolute btf pointer by the verifier. So when
> bpf_snprintf_btf() is called the prog will pass the kernel internal pointer
> of struct btf to the helper. No extra run-time checks needed.
> bpf_snprintf_btf() would print that type_id within given struct btf object.
> libbpf would need to deal with two relos. One to store btf_id from
> bpf_core_type_id_kernel() into type_id. And another to find module's BTF and
> store its FD into btf_obj_fd with ld_imm64. I'm still thinking to how to frame

So the latter we can do as yet another type of type-based CO-RE
relocation, if needed. But if we do that, we should probably revert
current __builtin_type_id(TYPE_ID_REMOTE) to just emit 32-bit register
assignment (no ldimm64).

As for how the verifier would translate such FD into struct btf *
pointer. We have something similar today with ldimm64 with
BPF_PSEUDO_BTF_ID, which resolves into kernel variables. Let's think
if we can re-use that, or we can just add another BPF_PSEUDO_xxx
"flavor"?

> that cleanly from C side.
> Other ideas?

I'll need to think a bit more about this. But some thoughts I've got so far.
Alan Maguire Dec. 8, 2020, 10:02 p.m. UTC | #6
On Mon, 7 Dec 2020, Andrii Nakryiko wrote:

> On Sat, Dec 5, 2020 at 4:38 PM Alan Maguire <alan.maguire@oracle.com> wrote:
> > Thanks so much for doing this Andrii! When I tested, I ran into a problem;
> > it turns out when a module struct such as "veth_stats" is used, it's
> > classified as BTF_KIND_FWD, and as a result when we iterate over
> > the modules and look in the veth module, "struct veth_stats" does not
> > match since its module kind (BTF_KIND_STRUCT) does not match the candidate
> > kind (BTF_KIND_FWD). I'm kind of out of my depth here, but the below
> > patch (on top of your patch) worked.
> 
> I'm not quite clear on the situation. BTF_KIND_FWD is for the local
> type or the remote type? Maybe a small example would help, before we
> go straight to assuming FWD can be always resolved into a concrete
> STRUCT/UNION.
>

The local type was BTF_KIND_FWD, and the target type was BTF_KIND_STRUCT
IIRC; I'll try and get some libbpf debug output for you showing the
relocation info.  If it helps, I think the situation was this; I was
referencing __builtin_btf_type_id(struct veth_stats), and hadn't
included a BTF-generated veth header, so I'm guessing libbpf classified
it as a fwd declaration.  My patch was a bit too general I suspect in
that it assumed that either target or local could be BTF_KIND_FWD and
should match BTF_KIND_STRUCT in local/target, wheres I _think_ the
local only should permit BTF_KIND_FWD.  Does that make sense? 
> 
> >  However without it - when we find
> > 0  candidate matches - as well as not substituting the module object
> > id/type id - we hit a segfault:
> 
> Yep, I missed the null check in:
> 
> targ_spec->btf != prog->obj->btf_vmlinux
> 
> I'll fix that.
> 

Thanks! I think the core_reloc selftests trigger the segfault 
also if you need a test case to verify.

Alan
Alan Maguire Dec. 8, 2020, 10:13 p.m. UTC | #7
On Mon, 7 Dec 2020, Andrii Nakryiko wrote:

> On Mon, Dec 7, 2020 at 7:12 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Dec 07, 2020 at 04:38:16PM +0000, Alan Maguire wrote:
> > > Sorry about this Andrii, but I'm a bit stuck here.
> > >
> > > I'm struggling to get tests working where the obj fd is used to designate
> > > the module BTF. Unless I'm missing something there are a few problems:
> > >
> > > - the fd association is removed by libbpf when the BPF program has loaded;
> > > the module fds are closed and the module BTF is discarded.  However even if
> > > that isn't done (and as you mentioned, we could hold onto BTF that is in
> > > use, and I commented out the code that does that to test) - there's
> > > another problem:
> > > - I can't see a way to use the object fd value we set here later in BPF
> > > program context; btf_get_by_fd() returns -EBADF as the fd is associated
> > > with the module BTF in the test's process context, not necessarily in
> > > the context that the BPF program is running.  Would it be possible in this
> > > case to use object id? Or is there another way to handle the fd->module
> > > BTF association that we need to make in BPF program context that I'm
> > > missing?
> > > - A more long-term issue; if we use fds to specify module BTFs and write
> > > the object fd into the program, we can pin the BPF program such that it
> > > outlives fds that refer to its associated BTF.  So unless we pinned the
> > > BTF too, any code that assumed the BTF fd-> module mapping was valid would
> > > start to break once the user-space side went away and the pinned program
> > > persisted.
> >
> > All of the above are not issues. They are features of FD based approach.
> > When the program refers to btf via fd the verifier needs to increment btf's refcnt
> > so it won't go away while the prog is running. For module's BTF it means
> > that the module can be unloaded, but its BTF may stay around if there is a prog
> > that needs to access it.
> > I think the missing piece in the above is that btf_get_by_fd() should be
> > done at load time instead of program run-time.
> > Everything FD based needs to behave similar to map_fds where ld_imm64 insn
> > contains map_fd that gets converted to map_ptr by the verifier at load time.
> 
> Right. I was going to extend verifier to do the same for all used BTF
> objects as part of ksym support for module BTFs. So totally agree.
> Just didn't need it so far.
> 

Does this approach prevent more complex run-time specification of BTF 
object fd though?  For example, I've been working on a simple tracer 
focused on kernel debugging; it uses a BPF map entry for each kernel 
function that is traced. User-space populates the map entry with BTF type 
ids for the function arguments/return value, and when the BPF program 
runs it uses the instruction pointer to look up the map entry for that
function, and uses bpf_snprintf_btf() to write the string representations 
of the function arguments/return values.  I'll send out an RFC soon, 
but longer-term I was hoping to extend it to support module-specific 
types.  Would a dynamic case like that - where the BTF module fd is looked 
up in a map entry during program execution (rather than derived via 
__btf_builtin_type_id()) work too? Thanks!

Alan
Alexei Starovoitov Dec. 8, 2020, 11:39 p.m. UTC | #8
On Tue, Dec 08, 2020 at 10:13:35PM +0000, Alan Maguire wrote:
> On Mon, 7 Dec 2020, Andrii Nakryiko wrote:
> 
> > On Mon, Dec 7, 2020 at 7:12 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Dec 07, 2020 at 04:38:16PM +0000, Alan Maguire wrote:
> > > > Sorry about this Andrii, but I'm a bit stuck here.
> > > >
> > > > I'm struggling to get tests working where the obj fd is used to designate
> > > > the module BTF. Unless I'm missing something there are a few problems:
> > > >
> > > > - the fd association is removed by libbpf when the BPF program has loaded;
> > > > the module fds are closed and the module BTF is discarded.  However even if
> > > > that isn't done (and as you mentioned, we could hold onto BTF that is in
> > > > use, and I commented out the code that does that to test) - there's
> > > > another problem:
> > > > - I can't see a way to use the object fd value we set here later in BPF
> > > > program context; btf_get_by_fd() returns -EBADF as the fd is associated
> > > > with the module BTF in the test's process context, not necessarily in
> > > > the context that the BPF program is running.  Would it be possible in this
> > > > case to use object id? Or is there another way to handle the fd->module
> > > > BTF association that we need to make in BPF program context that I'm
> > > > missing?
> > > > - A more long-term issue; if we use fds to specify module BTFs and write
> > > > the object fd into the program, we can pin the BPF program such that it
> > > > outlives fds that refer to its associated BTF.  So unless we pinned the
> > > > BTF too, any code that assumed the BTF fd-> module mapping was valid would
> > > > start to break once the user-space side went away and the pinned program
> > > > persisted.
> > >
> > > All of the above are not issues. They are features of FD based approach.
> > > When the program refers to btf via fd the verifier needs to increment btf's refcnt
> > > so it won't go away while the prog is running. For module's BTF it means
> > > that the module can be unloaded, but its BTF may stay around if there is a prog
> > > that needs to access it.
> > > I think the missing piece in the above is that btf_get_by_fd() should be
> > > done at load time instead of program run-time.
> > > Everything FD based needs to behave similar to map_fds where ld_imm64 insn
> > > contains map_fd that gets converted to map_ptr by the verifier at load time.
> > 
> > Right. I was going to extend verifier to do the same for all used BTF
> > objects as part of ksym support for module BTFs. So totally agree.
> > Just didn't need it so far.
> > 
> 
> Does this approach prevent more complex run-time specification of BTF 
> object fd though?  For example, I've been working on a simple tracer 
> focused on kernel debugging; it uses a BPF map entry for each kernel 
> function that is traced. User-space populates the map entry with BTF type 
> ids for the function arguments/return value, and when the BPF program 
> runs it uses the instruction pointer to look up the map entry for that
> function, and uses bpf_snprintf_btf() to write the string representations 
> of the function arguments/return values.  I'll send out an RFC soon, 
> but longer-term I was hoping to extend it to support module-specific 
> types.  Would a dynamic case like that - where the BTF module fd is looked 
> up in a map entry during program execution (rather than derived via 
> __btf_builtin_type_id()) work too? Thanks!

fd has to be resolved in the process context. bpf prog can read fd
number from the map, but that number is meaningless.
Say we allow using btf_obj_id+btf_id, how user space will know these
two numbers? Some new libbpf api that searches for it?
An extension to libbpf_find_vmlinux_btf_id() ? I was hoping that this api
will stay semi-internal. But say it's extended.
The user space will store a pair of numbers into a map and
what program are going to do with it?
If it's printing struct veth_stats contents it should have attached to
a corresponding function in the veth module via fentry or something.
The prog has hard coded logic in C with specific pointer to print.
The prog has its type right there. Why would the prog take a pointer
from one place, but it's type_id from the map? That's not realistic.
Where it would potentially make sense is what I think you're descring
where single kprobe style prog attached to many places and args of
those places are stored in a map and the prog selects them with
map_lookup with key=PT_REGS_IP ?
And passes pointers into bpf_snprintf_btf() from PT_REGS_PARM1() ?
I see why that is useful, but it's so racy. By the time the map
is populated those btf_obj_id+btf_id could be invalid.
I think instead of doing this in user space the program needs an access
to vmlinux+mods BTFs. Sort-of like proposed bpf helper to return ksym
based on IP there could be a helper to figure out btf_id+btf_obj_POINTER
based on IP. Then there will no need for external map to populate.
Would that solve your use case?
Alan Maguire Dec. 9, 2020, 11:21 p.m. UTC | #9
On Tue, 8 Dec 2020, Alexei Starovoitov wrote:

> On Tue, Dec 08, 2020 at 10:13:35PM +0000, Alan Maguire wrote:
> > 
> > Does this approach prevent more complex run-time specification of BTF 
> > object fd though?  For example, I've been working on a simple tracer 
> > focused on kernel debugging; it uses a BPF map entry for each kernel 
> > function that is traced. User-space populates the map entry with BTF type 
> > ids for the function arguments/return value, and when the BPF program 
> > runs it uses the instruction pointer to look up the map entry for that
> > function, and uses bpf_snprintf_btf() to write the string representations 
> > of the function arguments/return values.  I'll send out an RFC soon, 
> > but longer-term I was hoping to extend it to support module-specific 
> > types.  Would a dynamic case like that - where the BTF module fd is looked 
> > up in a map entry during program execution (rather than derived via 
> > __btf_builtin_type_id()) work too? Thanks!
> 
> fd has to be resolved in the process context. bpf prog can read fd
> number from the map, but that number is meaningless.
> Say we allow using btf_obj_id+btf_id, how user space will know these
> two numbers? Some new libbpf api that searches for it?
> An extension to libbpf_find_vmlinux_btf_id() ? I was hoping that this api
> will stay semi-internal. But say it's extended.
> The user space will store a pair of numbers into a map and
> what program are going to do with it?
> If it's printing struct veth_stats contents it should have attached to
> a corresponding function in the veth module via fentry or something.
> The prog has hard coded logic in C with specific pointer to print.
> The prog has its type right there. Why would the prog take a pointer
> from one place, but it's type_id from the map? That's not realistic.
> Where it would potentially make sense is what I think you're descring
> where single kprobe style prog attached to many places and args of
> those places are stored in a map and the prog selects them with
> map_lookup with key=PT_REGS_IP ?

Right, that's exactly it.  A pair of generic tracing BPF programs are
used, and they attach to kprobe/kretprobes, and when they run they use 
the arguments plus the map details about BTF ids of those arguments to 
run bpf_snprintf_btf(), and send perf events to userspace containing
the results.

> And passes pointers into bpf_snprintf_btf() from PT_REGS_PARM1() ?

Exactly.

> I see why that is useful, but it's so racy. By the time the map
> is populated those btf_obj_id+btf_id could be invalid.
> I think instead of doing this in user space the program needs an access
> to vmlinux+mods BTFs. Sort-of like proposed bpf helper to return ksym
> based on IP there could be a helper to figure out btf_id+btf_obj_POINTER
> based on IP. Then there will no need for external map to populate.
> Would that solve your use case?

That would be fantastic! We could do that from the context passed into a
kprobe program as the IP in struct pt_regs points at the function.  
kretprobes seems a bit trickier as in that case the IP in struct pt_regs 
is actually set to kretprobe_trampoline rather than the function we're
returning from due to how kretprobes work; maybe there's another way to 
get it in that case though..

Alan
Alexei Starovoitov Dec. 15, 2020, 10:38 p.m. UTC | #10
On Wed, Dec 09, 2020 at 11:21:43PM +0000, Alan Maguire wrote:
> Right, that's exactly it.  A pair of generic tracing BPF programs are
> used, and they attach to kprobe/kretprobes, and when they run they
> use the arguments plus the map details about BTF ids of those 
> arguments to run bpf_snprintf_btf(), and send perf events to 
> userspace containing the results.
...
> That would be fantastic! We could do that from the context passed 
> into a kprobe program as the IP in struct pt_regs points at the 
> function. kretprobes seems a bit trickier as in that case the IP in 
> struct pt_regs is actually set to kretprobe_trampoline rather than 
> the function we're returning from due to how kretprobes work; maybe 
> there's another way to get it in that case though..

Yeah. kprobe's IP doesn't match kretprobe's IP which makes such tracing
use cases more complicated. Also kretprobe is quite slow. See
prog_tests/test_overhead and selftests/bpf/bench.
I think the key realization is that the user space knows all IPs
it will attach to. It has to know all IPs otherwise
hashmap{key=ip, value=btf_data} is not possible.
Obvious, right ? What it means that we can use this key observation
to build better interfaces at all layers. kprobes are slow to
setup one by one. It's also slow to execute. fentry/fexit is slow
to setup, but fast to execute. Jiri proposed a batching api for
fentry, but it doesn't quite make sense from api perspective
since user space has to give different bpf prog for every fentry.
bpf trampoline is unique for every target fentry kernel function.
The batched attach would make sense for kprobe because one prog
can be attached everywhere. But kprobe is slow.
This thought process justifies an addition of a new program
type where one program can attach to multiple fentry.
Since fentry ctx is no longer fixed the verifier won't be able to
track btf_id-s of arguments, but btf based pointer walking is fast
and powerful, so if btf is passed into the program there could
be a helper that does dynamic cast from long to PTR_TO_BTF_ID.
Since such new fentry prog will have btf in the context and
there will be no need for user space to populate hashmap and
mess with IPs. And the best part that batched attach will not
only be desired, but mandatory part of the api.
So I'm proposing to extend BPF_PROG_LOAD cmd with an array of
pairs (attach_obj_fd, attach_btf_id).
The fentry prog in .c file might even have a regex in attach pattern:
SEC("fentry/sys_*")
int BPF_PROG(test, struct btf *btf_obj, __u32 btf_id, __u64 arg1,
              __u64 arg2, ...__u64 arg6)
{
   struct btf_ptr ptr1 = {
     .ptr = arg1,
     .type_id = bpf_core_type_id_kernel(struct foo),
     .btf_obj = btf_obj,
   },
   ptr2 = {
     .ptr = arg2,
     .type_id = bpf_core_type_id_kernel(struct bar),
     .btf_obj = btf_obj,
   };
   bpf_snprintf_btf(,, &ptr1, sizeof(ptr1), );
   bpf_snprintf_btf(,, &ptr1, sizeof(ptr2), );
}

libbpf will process the attach regex and find all matching functions in
the kernel and in the kernel modules. Then it will pass this list of
(fd,btf_id) pairs to the kernel. The kernel will find IP addresses and
BTFs of all functions. It will generate single bpf trampoline to handle
all the functions. Either one trampoline or multiple trampolines is an
implementation detail. It could be one trampoline that does lookup based
on IP to find btf_obj, btf_id to pass into the program or multiple
trampolines that share most of the code with N unique trampoline
prefixes with hardcoded btf_obj, btf_id. The argument save/restore code
can be the same for all fentries. The same way we can support single
fexit prog attaching to multiple kernel functions. And even single
fmod_ret prog attaching to multiple. The batching part will make
attaching to thousands of functions efficient. We can use batched
text_poke_bp, etc.

As far as dynamic btf casting helper we could do something like this:
SEC("fentry/sys_*")
int BPF_PROG(test, struct btf *btf_obj, __u32 btf_id, __u64 arg1, __u64
arg2, ...__u64 arg6)
{
   struct sk_buff *skb;
   struct task_struct *task;

   skb = bpf_dynamic_cast(btf_obj, btf_id, 1, arg1,
                          bpf_core_type_id_kernel(skb));
   task = bpf_dynamic_cast(btf_obj, btf_id, 2, arg2,
                           bpf_core_type_id_kernel(task));
   skb->len + task->status;
}
The dynamic part of the helper will walk btf of func_proto that was
pointed to by 'btf_id' argument. It will find Nth argument and
if argument's btf_id matches the last u32 passed into bpf_dynamic_cast()
it will return ptr_to_btf_id. The verifier needs 5th u32 arg to know
const value of btf_id during verification.
The execution time of this casting helper will be pretty fast.
Thoughts?
Alan Maguire Dec. 16, 2020, 4:18 p.m. UTC | #11
On Tue, 15 Dec 2020, Alexei Starovoitov wrote:

> On Wed, Dec 09, 2020 at 11:21:43PM +0000, Alan Maguire wrote:
> > Right, that's exactly it.  A pair of generic tracing BPF programs are
> > used, and they attach to kprobe/kretprobes, and when they run they
> > use the arguments plus the map details about BTF ids of those arguments to
> > run bpf_snprintf_btf(), and send perf events to userspace containing the
> > results.
> ...
> > That would be fantastic! We could do that from the context passed into a
> > kprobe program as the IP in struct pt_regs points at the function.
> > kretprobes seems a bit trickier as in that case the IP in struct pt_regs is
> > actually set to kretprobe_trampoline rather than the function we're
> > returning from due to how kretprobes work; maybe there's another way to get
> > it in that case though..
> 
> Yeah. kprobe's IP doesn't match kretprobe's IP which makes such tracing
> use cases more complicated. Also kretprobe is quite slow. See
> prog_tests/test_overhead and selftests/bpf/bench.
> I think the key realization is that the user space knows all IPs
> it will attach to. It has to know all IPs otherwise
> hashmap{key=ip, value=btf_data} is not possible.
> Obvious, right ? What it means that we can use this key observation
> to build better interfaces at all layers. kprobes are slow to
> setup one by one. It's also slow to execute. fentry/fexit is slow
> to setup, but fast to execute. Jiri proposed a batching api for
> fentry, but it doesn't quite make sense from api perspective
> since user space has to give different bpf prog for every fentry.
> bpf trampoline is unique for every target fentry kernel function.
> The batched attach would make sense for kprobe because one prog
> can be attached everywhere. But kprobe is slow.
> This thought process justifies an addition of a new program
> type where one program can attach to multiple fentry.
> Since fentry ctx is no longer fixed the verifier won't be able to
> track btf_id-s of arguments, but btf based pointer walking is fast
> and powerful, so if btf is passed into the program there could
> be a helper that does dynamic cast from long to PTR_TO_BTF_ID.
> Since such new fentry prog will have btf in the context and
> there will be no need for user space to populate hashmap and
> mess with IPs. And the best part that batched attach will not
> only be desired, but mandatory part of the api.
> So I'm proposing to extend BPF_PROG_LOAD cmd with an array of
> pairs (attach_obj_fd, attach_btf_id).
> The fentry prog in .c file might even have a regex in attach pattern:
> SEC("fentry/sys_*")
> int BPF_PROG(test, struct btf *btf_obj, __u32 btf_id, __u64 arg1,
>              __u64 arg2, ...__u64 arg6)
> {
>   struct btf_ptr ptr1 = {
>     .ptr = arg1,
>     .type_id = bpf_core_type_id_kernel(struct foo),
>     .btf_obj = btf_obj,
>   },
>   ptr2 = {
>     .ptr = arg2,
>     .type_id = bpf_core_type_id_kernel(struct bar),
>     .btf_obj = btf_obj,
>   };
>   bpf_snprintf_btf(,, &ptr1, sizeof(ptr1), );
>   bpf_snprintf_btf(,, &ptr1, sizeof(ptr2), );
> }
> 
> libbpf will process the attach regex and find all matching functions in
> the kernel and in the kernel modules. Then it will pass this list of
> (fd,btf_id) pairs to the kernel. The kernel will find IP addresses and
> BTFs of all functions. It will generate single bpf trampoline to handle
> all the functions. Either one trampoline or multiple trampolines is an
> implementation detail. It could be one trampoline that does lookup based
> on IP to find btf_obj, btf_id to pass into the program or multiple
> trampolines that share most of the code with N unique trampoline
> prefixes with hardcoded btf_obj, btf_id. The argument save/restore code
> can be the same for all fentries. The same way we can support single
> fexit prog attaching to multiple kernel functions. And even single
> fmod_ret prog attaching to multiple. The batching part will make
> attaching to thousands of functions efficient. We can use batched
> text_poke_bp, etc.
> 
> As far as dynamic btf casting helper we could do something like this:
> SEC("fentry/sys_*")
> int BPF_PROG(test, struct btf *btf_obj, __u32 btf_id, __u64 arg1, __u64
> arg2, ...__u64 arg6)
> {
>   struct sk_buff *skb;
>   struct task_struct *task;
> 
>   skb = bpf_dynamic_cast(btf_obj, btf_id, 1, arg1,
>                          bpf_core_type_id_kernel(skb));
>   task = bpf_dynamic_cast(btf_obj, btf_id, 2, arg2,
>                           bpf_core_type_id_kernel(task));
>   skb->len + task->status;
> }
> The dynamic part of the helper will walk btf of func_proto that was
> pointed to by 'btf_id' argument. It will find Nth argument and
> if argument's btf_id matches the last u32 passed into bpf_dynamic_cast()
> it will return ptr_to_btf_id. The verifier needs 5th u32 arg to know
> const value of btf_id during verification.
> The execution time of this casting helper will be pretty fast.
> Thoughts?
> 
> 

From a bpf programmer's perspective, the above sounds fantastic and opens 
up a bunch of new possibilities.  For example, a program that attaches to 
a bunch of networking functions at once and uses dynamic casts to find the 
skb argument could help trace packet flow through the stack without having 
to match exact function signatures.  From a mechanics perspective I 
wonder if we could take a similar approach to the cgroup storage, and use 
the bpf prog array structure to store a struct btf * and any other 
site-specific metadata at attach time? Then when the array runs we could 
set a per-cpu variable such that helpers could pick up that info if 
needed.

Having the function argument BTF ids gets us nearly the whole way there 
from a generic tracer perspective - I can now attach my generic tracing 
program to an arbitrary function via fentry/fexit and get the BTF ids 
of the arguments or return value, and even better can do it with wildcarding.
There is an additional use case though - at least for the ksnoop program
I'm working on at least - and it's where we access members and need their 
type ids too.  The ksnoop program (which I wanted to send out last week but due 
to a system move I'm temporarily not able to access the code, sorry about that,
hoping it'll be back online early next week) operates in two modes;

- default mode where we trace function arguments for kprobe and return value
  for kretprobe; that's covered by the above; and
- a mode where the user specifies what they want. For example running

$ ksnoop "ip_send_skb" 

...is an example of default mode, this will trace entry/return and print 
arguments and return values, while

$ ksnoop "ip_send_skb(skb)"

...will trace the skb argument only, and

$ ksnoop "ip_send_skb(skb->sk)"

...will trace the skb->sk value.  The user-space side of the program 
matches the function/arg name and looks up the referenced type, setting it 
in the function's map.  For field references such as skb->sk, it also 
records offset and whether that offset is a pointer (as is the case for 
skb->sk) - in such cases we need to read the offset value via bpf_probe_read()
and use it in bpf_snprintf_btf() along with the referenced type.  Only a
single simple reference like the above is supported currently, but 
multiple levels of reference could be made to work too.

This latter case would be awkward to support programmatically in BPF 
program context I think, though I'm sure it could be done.  To turn back 
to our earlier conversation, your concern as I understand it was that 
pre-specifying module BTF type ids in a map is racy, and I'd like to dig 
into that a bit more if you don't mind because I think some form of
user-space-specified BTF ids may be the easiest approach for more flexible
generic tracing that covers more than function arguments.

I assume the race you're concerned about is caused by the module unloading 
after the BTF ids have been set in the map? And then the module reappears 
with different BTF/type id mappings? Perhaps a helper for retrieving
the struct btf * which was set at attach time would be enough?

So for example something like


	struct btf_ptr ptr;

	ptr.type_id = /* retrieve from map */
	ptr.obj_id = bpf_get_btf(THIS_MODULE);

...where we don't actually specify a type but a libbpf-specified fd
is used to stash the associated "struct btf *" for the module in the 
prog array at attach.  Are there still race conditions we need to worry
about in a scenario like this? Thanks!

Alan
Andrii Nakryiko Dec. 16, 2020, 10:27 p.m. UTC | #12
On Wed, Dec 16, 2020 at 8:18 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Tue, 15 Dec 2020, Alexei Starovoitov wrote:
>
> > On Wed, Dec 09, 2020 at 11:21:43PM +0000, Alan Maguire wrote:
> > > Right, that's exactly it.  A pair of generic tracing BPF programs are
> > > used, and they attach to kprobe/kretprobes, and when they run they
> > > use the arguments plus the map details about BTF ids of those arguments to
> > > run bpf_snprintf_btf(), and send perf events to userspace containing the
> > > results.
> > ...
> > > That would be fantastic! We could do that from the context passed into a
> > > kprobe program as the IP in struct pt_regs points at the function.
> > > kretprobes seems a bit trickier as in that case the IP in struct pt_regs is
> > > actually set to kretprobe_trampoline rather than the function we're
> > > returning from due to how kretprobes work; maybe there's another way to get
> > > it in that case though..
> >
> > Yeah. kprobe's IP doesn't match kretprobe's IP which makes such tracing
> > use cases more complicated. Also kretprobe is quite slow. See
> > prog_tests/test_overhead and selftests/bpf/bench.
> > I think the key realization is that the user space knows all IPs
> > it will attach to. It has to know all IPs otherwise
> > hashmap{key=ip, value=btf_data} is not possible.
> > Obvious, right ? What it means that we can use this key observation
> > to build better interfaces at all layers. kprobes are slow to
> > setup one by one. It's also slow to execute. fentry/fexit is slow
> > to setup, but fast to execute. Jiri proposed a batching api for
> > fentry, but it doesn't quite make sense from api perspective
> > since user space has to give different bpf prog for every fentry.
> > bpf trampoline is unique for every target fentry kernel function.
> > The batched attach would make sense for kprobe because one prog
> > can be attached everywhere. But kprobe is slow.
> > This thought process justifies an addition of a new program
> > type where one program can attach to multiple fentry.
> > Since fentry ctx is no longer fixed the verifier won't be able to
> > track btf_id-s of arguments, but btf based pointer walking is fast
> > and powerful, so if btf is passed into the program there could
> > be a helper that does dynamic cast from long to PTR_TO_BTF_ID.
> > Since such new fentry prog will have btf in the context and
> > there will be no need for user space to populate hashmap and
> > mess with IPs. And the best part that batched attach will not
> > only be desired, but mandatory part of the api.
> > So I'm proposing to extend BPF_PROG_LOAD cmd with an array of
> > pairs (attach_obj_fd, attach_btf_id).

Beyond knowing its attach btf_id, such BPF programs probably need to
store some additional application-specific data per each attached
function, so if we can extend this to also accept an arbitrary long,
in addition to fd+id, that would be great. It can be used as an array
index for some extra configuration, or to correlate fentry with fexit,
etc, etc. No need for an arbitrary amount of data, one long is enough
to then use other means to get to arbitrary program data.

> > The fentry prog in .c file might even have a regex in attach pattern:
> > SEC("fentry/sys_*")
> > int BPF_PROG(test, struct btf *btf_obj, __u32 btf_id, __u64 arg1,
> >              __u64 arg2, ...__u64 arg6)

So I like the overall idea, but I'm going to nitpick on some details :)

This still looks, feels and behaves like fentry/fexit program, so
hopefully it won't be really an entirely new type of BPF program (at
least from user-space point of view).

Also, not a big fan of having struct btf * and separate btf_id. Maybe
we can just abstract it into an opaque (or not opaque) struct
representing fully-qualified BTF type. Further, if instead of passing
it as an input argument, we can just have an BPF helper to return
this, it would be even better and could be actually generalized to
other BTF-powered BPF programs. So something like:

struct btf_type_id *id = bpf_get_btf_type_id(ctx);

...

.type_id = id->type_id;
.btf_obj = id->btf;

As Alan proposed below, this btf and type id could be stored at the
attachment point and just fetched through program context.

> > {
> >   struct btf_ptr ptr1 = {
> >     .ptr = arg1,
> >     .type_id = bpf_core_type_id_kernel(struct foo),
> >     .btf_obj = btf_obj,
> >   },
> >   ptr2 = {
> >     .ptr = arg2,
> >     .type_id = bpf_core_type_id_kernel(struct bar),
> >     .btf_obj = btf_obj,
> >   };
> >   bpf_snprintf_btf(,, &ptr1, sizeof(ptr1), );
> >   bpf_snprintf_btf(,, &ptr1, sizeof(ptr2), );
> > }
> >
> > libbpf will process the attach regex and find all matching functions in
> > the kernel and in the kernel modules. Then it will pass this list of
> > (fd,btf_id) pairs to the kernel. The kernel will find IP addresses and
> > BTFs of all functions. It will generate single bpf trampoline to handle
> > all the functions. Either one trampoline or multiple trampolines is an
> > implementation detail. It could be one trampoline that does lookup based
> > on IP to find btf_obj, btf_id to pass into the program or multiple
> > trampolines that share most of the code with N unique trampoline
> > prefixes with hardcoded btf_obj, btf_id. The argument save/restore code
> > can be the same for all fentries. The same way we can support single
> > fexit prog attaching to multiple kernel functions. And even single
> > fmod_ret prog attaching to multiple. The batching part will make
> > attaching to thousands of functions efficient. We can use batched
> > text_poke_bp, etc.

Yep, agree. And extra information that we'll have to store per each
attachment more than offsets the extra overhead that we'd pay for
thousands of BPF program copies we'd need otherwise to achieve the
same functionality, if attached through existing fentry/fexit APIs.

> >
> > As far as dynamic btf casting helper we could do something like this:
> > SEC("fentry/sys_*")
> > int BPF_PROG(test, struct btf *btf_obj, __u32 btf_id, __u64 arg1, __u64
> > arg2, ...__u64 arg6)
> > {
> >   struct sk_buff *skb;
> >   struct task_struct *task;
> >
> >   skb = bpf_dynamic_cast(btf_obj, btf_id, 1, arg1,
> >                          bpf_core_type_id_kernel(skb));
> >   task = bpf_dynamic_cast(btf_obj, btf_id, 2, arg2,
> >                           bpf_core_type_id_kernel(task));
> >   skb->len + task->status;
> > }
> > The dynamic part of the helper will walk btf of func_proto that was
> > pointed to by 'btf_id' argument. It will find Nth argument and
> > if argument's btf_id matches the last u32 passed into bpf_dynamic_cast()
> > it will return ptr_to_btf_id. The verifier needs 5th u32 arg to know
> > const value of btf_id during verification.
> > The execution time of this casting helper will be pretty fast.
> > Thoughts?

So I haven't dug through the code to see if it's doable, but this API
seems unnecessarily complicated and error-prone. Plus it's not clear
how verifier can validate that btf_obj and btf_id correctly describe
arg1 and it does match "1" argument position. I.e., if I mess up and
for skb specify bpf_dynamic_cast(btf_obj, 123123, 2, arg3,
bpf_core_type_id_kernel(skb)), will verifier stop me? Note that I
specified an arbitrary integer for the second argument. Or at least,
would we get a NULL in runtime? Or we'll just re-interpret arg3
(incorrectly) as a sk_buff? I might be missing something here, of
course.

But this seems more "verifiable" and nicer to use, even though it
won't substituting an arbitrary btf_id and btf_obj (but that's sort of
a goal, I think):

skb = bpf_get_btf_arg(ctx, 1, bpf_core_type_id_kernel(skb));

So btf_obj and btf_id of attach point will be taken from the program's
context, as well as btf info of an argument #1 (and additionally
verified that argument number makes sense at all). Note also that
bpf_core_type_id_kernel(skb) ideally would encode type ID and BTF
object FD of a module, if the destination type is defined in a module.

If we do want explicitly providing btf_obj+btf_id, then I think having
it in a separate struct that verifier can track as a PTR_TO_BTF_ID and
verify it's valid would make this API better (see above).

> >
> >
>
> From a bpf programmer's perspective, the above sounds fantastic and opens
> up a bunch of new possibilities.  For example, a program that attaches to
> a bunch of networking functions at once and uses dynamic casts to find the
> skb argument could help trace packet flow through the stack without having
> to match exact function signatures.  From a mechanics perspective I
> wonder if we could take a similar approach to the cgroup storage, and use
> the bpf prog array structure to store a struct btf * and any other
> site-specific metadata at attach time? Then when the array runs we could
> set a per-cpu variable such that helpers could pick up that info if
> needed.

I don't think per-cpu variable will work for sleepable BPF programs. I
think storing it in whatever BPF context struct is a more reliable
way. But I agree that  remembering struct btf * and attach btf ID at
the attachment point makes a lot of stuff simpler.

>
> Having the function argument BTF ids gets us nearly the whole way there
> from a generic tracer perspective - I can now attach my generic tracing
> program to an arbitrary function via fentry/fexit and get the BTF ids
> of the arguments or return value, and even better can do it with wildcarding.

Yeah, it's a nice interface and libbpf can support that. I think in
more advanced cases (e.g., matching fentry with fexit), libbpf should
still provide APIs to specify extra user-provided long for each
attachment point, so that it's possible to correlate "related" BPF
programs.

> There is an additional use case though - at least for the ksnoop program
> I'm working on at least - and it's where we access members and need their
> type ids too.  The ksnoop program (which I wanted to send out last week but due
> to a system move I'm temporarily not able to access the code, sorry about that,
> hoping it'll be back online early next week) operates in two modes;
>
> - default mode where we trace function arguments for kprobe and return value
>   for kretprobe; that's covered by the above; and
> - a mode where the user specifies what they want. For example running
>
> $ ksnoop "ip_send_skb"
>
> ...is an example of default mode, this will trace entry/return and print
> arguments and return values, while
>
> $ ksnoop "ip_send_skb(skb)"
>
> ...will trace the skb argument only, and
>
> $ ksnoop "ip_send_skb(skb->sk)"
>
> ...will trace the skb->sk value.  The user-space side of the program
> matches the function/arg name and looks up the referenced type, setting it
> in the function's map.  For field references such as skb->sk, it also
> records offset and whether that offset is a pointer (as is the case for
> skb->sk) - in such cases we need to read the offset value via bpf_probe_read()
> and use it in bpf_snprintf_btf() along with the referenced type.  Only a
> single simple reference like the above is supported currently, but
> multiple levels of reference could be made to work too.

I think this is normally done with Clang compiling the code at runtime
(like trace.py), right? I assume you are trying to find a way to do
that without code generation at runtime, am I right? I honestly don't
yet see how this can be done easily...

>
> This latter case would be awkward to support programmatically in BPF
> program context I think, though I'm sure it could be done.  To turn back
> to our earlier conversation, your concern as I understand it was that
> pre-specifying module BTF type ids in a map is racy, and I'd like to dig
> into that a bit more if you don't mind because I think some form of
> user-space-specified BTF ids may be the easiest approach for more flexible
> generic tracing that covers more than function arguments.
>
> I assume the race you're concerned about is caused by the module unloading
> after the BTF ids have been set in the map? And then the module reappears
> with different BTF/type id mappings? Perhaps a helper for retrieving
> the struct btf * which was set at attach time would be enough?
>
> So for example something like
>
>
>         struct btf_ptr ptr;
>
>         ptr.type_id = /* retrieve from map */
>         ptr.obj_id = bpf_get_btf(THIS_MODULE);

On re-reading this this seems like something I was proposing above
with bpf_get_btf_type_id(), right? THIS_MODULE part is confusing,
though.

>
> ...where we don't actually specify a type but a libbpf-specified fd
> is used to stash the associated "struct btf *" for the module in the
> prog array at attach.  Are there still race conditions we need to worry
> about in a scenario like this? Thanks!
>
> Alan
Alexei Starovoitov Dec. 17, 2020, 7:16 a.m. UTC | #13
On Wed, Dec 16, 2020 at 02:27:23PM -0800, Andrii Nakryiko wrote:
> 
> But this seems more "verifiable" and nicer to use, even though it
> won't substituting an arbitrary btf_id and btf_obj (but that's sort of
> a goal, I think):
> 
> skb = bpf_get_btf_arg(ctx, 1, bpf_core_type_id_kernel(skb));

yep. makes sense to me.
Assuming that ctx has both:
- BTF of the func and the helper will follow to arg's BTF at run-time
  to check that it matches 3rd arg btf_id.
- and the actual arg values as well. So that helper will return them.

> > - default mode where we trace function arguments for kprobe and return value
> >   for kretprobe; that's covered by the above; and
> > - a mode where the user specifies what they want. For example running
> >
> > $ ksnoop "ip_send_skb"
> >
> > ...is an example of default mode, this will trace entry/return and print
> > arguments and return values, while
> >
> > $ ksnoop "ip_send_skb(skb)"
> >
> > ...will trace the skb argument only, and
> >
> > $ ksnoop "ip_send_skb(skb->sk)"
> >
> > ...will trace the skb->sk value.  The user-space side of the program
> > matches the function/arg name and looks up the referenced type, setting it
> > in the function's map.  For field references such as skb->sk, it also
> > records offset and whether that offset is a pointer (as is the case for
> > skb->sk) - in such cases we need to read the offset value via bpf_probe_read()
> > and use it in bpf_snprintf_btf() along with the referenced type.  Only a
> > single simple reference like the above is supported currently, but
> > multiple levels of reference could be made to work too.

Alan,

I'm not sure why the last example is so different form the first two.
I think ksnoop tool will generate the program on the fly, right?
So it can generate normal LDX insn with CO-RE relocation (instead of bpf_probe_read)
to access skb->sk. It can also add relo for that LDX to point to
struct sk_buff's btf_id defined inside prog's BTF.
The 'sk' offset inside bpf program and inside BTF can be anything: 0, 4, ...
libbpf relocation logic will find the right offset in kernel's sk_buff.
If ksnoop doesn't have an ability to parse vmlinux.h file or kernel's BTF
it can 'cheat'.
If the cmdline looks like:
$ ksnoop "ip_send_skb(skb->sk)"
It can generate BTF:
struct sk_buff {
   struct sock *sk;
};

If cmdline looks like:
$ ksnoop "ip_send_skb(skb->sock)"
It can generate BTF:
struct sk_buff {
   struct sock *sock;
};
Obviously there is no 'sock' field inside kernel's struct sk_buff, but tool
doesn't need to care. It can let libbpf do the checking and match
fields properly.

> > into that a bit more if you don't mind because I think some form of
> > user-space-specified BTF ids may be the easiest approach for more flexible
> > generic tracing that covers more than function arguments.

I think you're trying to figure out kernel's btf_ids in ksnoop tool.
I suggest to leave that job to libbpf. Generate local BTFs in ksnoop
with CO-RE relocs and let libbpf handle insn patching.
No FDs to worry about from ksnoop side either.
Alan Maguire Dec. 17, 2020, 5:46 p.m. UTC | #14
On Wed, 16 Dec 2020, Alexei Starovoitov wrote:

> > > $ ksnoop "ip_send_skb(skb->sk)"
> > >
> > > ...will trace the skb->sk value.  The user-space side of the program
> > > matches the function/arg name and looks up the referenced type, setting it
> > > in the function's map.  For field references such as skb->sk, it also
> > > records offset and whether that offset is a pointer (as is the case for
> > > skb->sk) - in such cases we need to read the offset value via bpf_probe_read()
> > > and use it in bpf_snprintf_btf() along with the referenced type.  Only a
> > > single simple reference like the above is supported currently, but
> > > multiple levels of reference could be made to work too.
> 
> Alan,
> 
> I'm not sure why the last example is so different form the first two.
> I think ksnoop tool will generate the program on the fly, right?

Nope, the BPF program is hard-coded; it adapts to different functions
through use of the map entries describing function signatures and their
BTF ids, and other associated tracing info.  The aim is to provide a
generic tracing tool which displays kernel function arguments but
doesn't require LLVM/clang on the target, just a kernel built with BTF 
and libbpf.  Sorry this wasn't clearer in my explanation; I'm working
on rewriting the code and will send it out ASAP.

> So it can generate normal LDX insn with CO-RE relocation (instead of bpf_probe_read)
> to access skb->sk. It can also add relo for that LDX to point to
> struct sk_buff's btf_id defined inside prog's BTF.
> The 'sk' offset inside bpf program and inside BTF can be anything: 0, 4, ...
> libbpf relocation logic will find the right offset in kernel's sk_buff.
> If ksnoop doesn't have an ability to parse vmlinux.h file or kernel's BTF
> it can 'cheat'.
> If the cmdline looks like:
> $ ksnoop "ip_send_skb(skb->sk)"
> It can generate BTF:
> struct sk_buff {
>    struct sock *sk;
> };
> 
> If cmdline looks like:
> $ ksnoop "ip_send_skb(skb->sock)"
> It can generate BTF:
> struct sk_buff {
>    struct sock *sock;
> };
> Obviously there is no 'sock' field inside kernel's struct sk_buff, but tool
> doesn't need to care. It can let libbpf do the checking and match
> fields properly.
> 
> > > into that a bit more if you don't mind because I think some form of
> > > user-space-specified BTF ids may be the easiest approach for more flexible
> > > generic tracing that covers more than function arguments.
> 
> I think you're trying to figure out kernel's btf_ids in ksnoop tool.

Yep.

> I suggest to leave that job to libbpf. Generate local BTFs in ksnoop
> with CO-RE relocs and let libbpf handle insn patching.
> No FDs to worry about from ksnoop side either.
> 

The current approach doesn't rely on instruction patching outside
of limited CORE use around struct pt_regs fields (args, IP, etc)
which shouldn't require LLVM/clang availability on the target system. 
I'll try and get it ready for RFC submission by the weekend so you
can see more details of the approach.

Thanks!

Alan
Alexei Starovoitov Dec. 17, 2020, 6:22 p.m. UTC | #15
On Thu, Dec 17, 2020 at 05:46:42PM +0000, Alan Maguire wrote:
> 
> 
> On Wed, 16 Dec 2020, Alexei Starovoitov wrote:
> 
> > > > $ ksnoop "ip_send_skb(skb->sk)"
> > > >
> > > > ...will trace the skb->sk value.  The user-space side of the program
> > > > matches the function/arg name and looks up the referenced type, setting it
> > > > in the function's map.  For field references such as skb->sk, it also
> > > > records offset and whether that offset is a pointer (as is the case for
> > > > skb->sk) - in such cases we need to read the offset value via bpf_probe_read()
> > > > and use it in bpf_snprintf_btf() along with the referenced type.  Only a
> > > > single simple reference like the above is supported currently, but
> > > > multiple levels of reference could be made to work too.
> > 
> > Alan,
> > 
> > I'm not sure why the last example is so different form the first two.
> > I think ksnoop tool will generate the program on the fly, right?
> 
> Nope, the BPF program is hard-coded; it adapts to different functions
> through use of the map entries describing function signatures and their
> BTF ids, and other associated tracing info.  The aim is to provide a
> generic tracing tool which displays kernel function arguments but
> doesn't require LLVM/clang on the target, just a kernel built with BTF 
> and libbpf.  Sorry this wasn't clearer in my explanation; I'm working
> on rewriting the code and will send it out ASAP.
> 
> > So it can generate normal LDX insn with CO-RE relocation (instead of bpf_probe_read)
> > to access skb->sk. It can also add relo for that LDX to point to
> > struct sk_buff's btf_id defined inside prog's BTF.
> > The 'sk' offset inside bpf program and inside BTF can be anything: 0, 4, ...
> > libbpf relocation logic will find the right offset in kernel's sk_buff.
> > If ksnoop doesn't have an ability to parse vmlinux.h file or kernel's BTF
> > it can 'cheat'.
> > If the cmdline looks like:
> > $ ksnoop "ip_send_skb(skb->sk)"
> > It can generate BTF:
> > struct sk_buff {
> >    struct sock *sk;
> > };
> > 
> > If cmdline looks like:
> > $ ksnoop "ip_send_skb(skb->sock)"
> > It can generate BTF:
> > struct sk_buff {
> >    struct sock *sock;
> > };
> > Obviously there is no 'sock' field inside kernel's struct sk_buff, but tool
> > doesn't need to care. It can let libbpf do the checking and match
> > fields properly.
> > 
> > > > into that a bit more if you don't mind because I think some form of
> > > > user-space-specified BTF ids may be the easiest approach for more flexible
> > > > generic tracing that covers more than function arguments.
> > 
> > I think you're trying to figure out kernel's btf_ids in ksnoop tool.
> 
> Yep.
> 
> > I suggest to leave that job to libbpf. Generate local BTFs in ksnoop
> > with CO-RE relocs and let libbpf handle insn patching.
> > No FDs to worry about from ksnoop side either.
> > 
> 
> The current approach doesn't rely on instruction patching outside
> of limited CORE use around struct pt_regs fields (args, IP, etc)
> which shouldn't require LLVM/clang availability on the target system. 

I'm not suggesting to use clang.
Everything I proposed above is for ksnoop to do. Not for the clang.
diff mbox series

Patch

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9be88a90a4aa..539956f7920a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4795,6 +4795,7 @@  static int load_module_btfs(struct bpf_object *obj)
 
 		mod_btf = &obj->btf_modules[obj->btf_module_cnt++];
 
+		btf__set_fd(btf, fd);
 		mod_btf->btf = btf;
 		mod_btf->id = id;
 		mod_btf->fd = fd;
@@ -5445,6 +5446,10 @@  struct bpf_core_relo_res
 	__u32 orig_type_id;
 	__u32 new_sz;
 	__u32 new_type_id;
+	/* FD of the module BTF containing the target candidate, or 0 for
+	 * vmlinux BTF
+	 */
+	int btf_obj_fd;
 };
 
 /* Calculate original and target relocation values, given local and target
@@ -5469,6 +5474,7 @@  static int bpf_core_calc_relo(const struct bpf_program *prog,
 	res->fail_memsz_adjust = false;
 	res->orig_sz = res->new_sz = 0;
 	res->orig_type_id = res->new_type_id = 0;
+	res->btf_obj_fd = 0;
 
 	if (core_relo_is_field_based(relo->kind)) {
 		err = bpf_core_calc_field_relo(prog, relo, local_spec,
@@ -5519,6 +5525,9 @@  static int bpf_core_calc_relo(const struct bpf_program *prog,
 	} else if (core_relo_is_type_based(relo->kind)) {
 		err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val);
 		err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val);
+		if (!err && relo->kind == BPF_TYPE_ID_TARGET &&
+		    targ_spec->btf != prog->obj->btf_vmlinux)
+			res->btf_obj_fd = btf__fd(targ_spec->btf);
 	} else if (core_relo_is_enumval_based(relo->kind)) {
 		err = bpf_core_calc_enumval_relo(relo, local_spec, &res->orig_val);
 		err = err ?: bpf_core_calc_enumval_relo(relo, targ_spec, &res->new_val);
@@ -5725,10 +5734,14 @@  static int bpf_core_patch_insn(struct bpf_program *prog,
 		}
 
 		insn[0].imm = new_val;
-		insn[1].imm = 0; /* currently only 32-bit values are supported */
-		pr_debug("prog '%s': relo #%d: patched insn #%d (LDIMM64) imm64 %llu -> %u\n",
+		/* btf_obj_fd is zero for all relos but BPF_TYPE_ID_TARGET
+		 * with target type in the kernel module BTF
+		 */
+		insn[1].imm = res->btf_obj_fd;
+		pr_debug("prog '%s': relo #%d: patched insn #%d (LDIMM64) imm64 %llu -> %llu\n",
 			 prog->name, relo_idx, insn_idx,
-			 (unsigned long long)imm, new_val);
+			 (unsigned long long)imm,
+			 ((unsigned long long)res->btf_obj_fd << 32) | new_val);
 		break;
 	}
 	default: