diff mbox series

[bpf-next,v2,19/25] bpf: Introduce bpf_kptr_new

Message ID 20221013062303.896469-20-memxor@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Local kptrs, BPF linked lists | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs_no_alu32 on s390x with gcc
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count fail Series longer than 15 patches (and no cover letter)
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1373 this patch: 1355
netdev/cc_maintainers warning 11 maintainers not CCed: sdf@google.com john.fastabend@gmail.com yhs@fb.com haoluo@google.com linux-kselftest@vger.kernel.org jolsa@kernel.org kpsingh@kernel.org song@kernel.org shuah@kernel.org mykolal@fb.com martin.lau@linux.dev
netdev/build_clang success Errors and warnings before: 159 this patch: 159
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 1367 this patch: 1368
netdev/checkpatch warning CHECK: extern prototypes should be avoided in .h files CHECK: multiple assignments should be avoided WARNING: 'unitialized' may be misspelled - perhaps 'uninitialized'? WARNING: line length of 101 exceeds 80 columns WARNING: line length of 103 exceeds 80 columns WARNING: line length of 105 exceeds 80 columns WARNING: line length of 106 exceeds 80 columns WARNING: line length of 108 exceeds 80 columns WARNING: line length of 111 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 96 exceeds 80 columns WARNING: line length of 97 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix

Commit Message

Kumar Kartikeya Dwivedi Oct. 13, 2022, 6:22 a.m. UTC
Introduce type safe memory allocator bpf_kptr_new for BPF programs. The
kernel side kfunc is named bpf_kptr_new_impl, as passing hidden
arguments to kfuncs still requires having them in prototype, unlike BPF
helpers which always take 5 arguments and have them checked using
bpf_func_proto in verifier, ignoring unset argument types.

Introduce __ign suffix to ignore a specific kfunc argument during type
checks, then use this to introduce support for passing type metadata to
the bpf_kptr_new_impl kfunc.

The user passes BTF ID of the type it wants to allocates in program BTF,
the verifier then rewrites the first argument as the size of this type,
after performing some sanity checks (to ensure it exists and it is a
struct type).

The second argument flags is reserved to be 0 for now.

The third argument is also fixed up and passed by the verifier. This is
the btf_struct_meta for the type being allocated. It would be needed
mostly for the offset array which is required for zero initializing
special fields while leaving the rest of storage in unitialized state.

It would also be needed in the next patch to perform proper destruction
of the object's special fields.

A convenience macro is included in the bpf_experimental.h header to hide
over the ugly details of the implementation, leading to user code
looking similar to a language level extension which allocates and
constructs fields of a user type.

struct bar {
	struct bpf_list_node node;
};

struct foo {
	struct bpf_spin_lock lock;
	struct bpf_list_head head __contains(bar, node);
};

void prog(void) {
	struct foo *f;

	f = bpf_kptr_new(typeof(*f));
	if (!f)
		return;
	...
}

A key piece of this story is still missing, i.e. the free function,
which will come in the next patch.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  21 ++--
 include/linux/bpf_verifier.h                  |   2 +
 kernel/bpf/core.c                             |  14 +++
 kernel/bpf/helpers.c                          |  41 +++++--
 kernel/bpf/verifier.c                         | 107 ++++++++++++++++--
 .../testing/selftests/bpf/bpf_experimental.h  |  19 ++++
 6 files changed, 181 insertions(+), 23 deletions(-)

Comments

Alexei Starovoitov Oct. 19, 2022, 2:31 a.m. UTC | #1
On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> +{
> +	struct btf_struct_meta *meta = meta__ign;
> +	u64 size = local_type_id__k;
> +	void *p;
> +
> +	if (unlikely(flags || !bpf_global_ma_set))
> +		return NULL;

Unused 'flags' looks weird in unstable api. Just drop it?
And keep it as:
void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);

and in bpf_experimental.h:

extern void *bpf_kptr_new(__u64 local_type_id) __ksym;

since __ign args are ignored during kfunc type match
the bpf progs can use it without #define.

> +	p = bpf_mem_alloc(&bpf_global_ma, size);
> +	if (!p)
> +		return NULL;
> +	if (meta)
> +		bpf_obj_init(meta->off_arr, p);

I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
It reminds me of programming style where people tried to add types into
variable names. imo dropping _arr wouldn't be just fine.
Kumar Kartikeya Dwivedi Oct. 19, 2022, 5:58 a.m. UTC | #2
On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > +{
> > +	struct btf_struct_meta *meta = meta__ign;
> > +	u64 size = local_type_id__k;
> > +	void *p;
> > +
> > +	if (unlikely(flags || !bpf_global_ma_set))
> > +		return NULL;
>
> Unused 'flags' looks weird in unstable api. Just drop it?
> And keep it as:
> void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
>
> and in bpf_experimental.h:
>
> extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
>
> since __ign args are ignored during kfunc type match
> the bpf progs can use it without #define.
>

It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
prototypes will not be the same. I guess I'll have to teach it do that during
type match, but IDK how you feel about that.

Otherwise unless you want people to manually pass something to the ignored
argument, we have to hide it behind a macro.

I actually like the macro on top, then I don't even pass the type ID but the
type. But that's a personal preference, and I don't feel strongly about it.

So in C one does malloc(sizeof(*p)), here we'll just write
bpf_kptr_new(typeof(*p)). YMMV.

> > +	p = bpf_mem_alloc(&bpf_global_ma, size);
> > +	if (!p)
> > +		return NULL;
> > +	if (meta)
> > +		bpf_obj_init(meta->off_arr, p);
>
> I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> It reminds me of programming style where people tried to add types into
> variable names. imo dropping _arr wouldn't be just fine.

Ack, I'll do it in v3.

Also, I'd like to invite people to please bikeshed a bit over the naming of the
APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.

In the BPF list API, it's named bpf_list_del but it's actually distinct from how
list_del in the kernel works. So it does make sense to give them a different
name (like pop_front/pop_back and push_front/push_back)?

Because even bpf_list_add takes bpf_list_head, in the kernel there's no
distinction between node and head, so you can do list_add on a node as well, but
it won't be possible with the kfunc (unless we overload the head argument to
also work with nodes).

Later we'll probably have to add bpf_list_node_add etc. that add before or after
a node to make that work.

The main question is whether it should closely resembly the linked list API in
the kernel, or can it steer away considerably from that?
Alexei Starovoitov Oct. 19, 2022, 4:31 p.m. UTC | #3
On Tue, Oct 18, 2022 at 10:58 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> > On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > > +{
> > > +   struct btf_struct_meta *meta = meta__ign;
> > > +   u64 size = local_type_id__k;
> > > +   void *p;
> > > +
> > > +   if (unlikely(flags || !bpf_global_ma_set))
> > > +           return NULL;
> >
> > Unused 'flags' looks weird in unstable api. Just drop it?
> > And keep it as:
> > void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
> >
> > and in bpf_experimental.h:
> >
> > extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
> >
> > since __ign args are ignored during kfunc type match
> > the bpf progs can use it without #define.
> >
>
> It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
> prototypes will not be the same. I guess I'll have to teach it do that during
> type match, but IDK how you feel about that.

libbpf does the full type match, really?
Could you point me to the code?

> Otherwise unless you want people to manually pass something to the ignored
> argument, we have to hide it behind a macro.
>
> I actually like the macro on top, then I don't even pass the type ID but the
> type. But that's a personal preference, and I don't feel strongly about it.
>
> So in C one does malloc(sizeof(*p)), here we'll just write
> bpf_kptr_new(typeof(*p)). YMMV.

bpf_kptr_new(typeof(*p)) is cleaner.

> > > +   p = bpf_mem_alloc(&bpf_global_ma, size);
> > > +   if (!p)
> > > +           return NULL;
> > > +   if (meta)
> > > +           bpf_obj_init(meta->off_arr, p);
> >
> > I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> > It reminds me of programming style where people tried to add types into
> > variable names. imo dropping _arr wouldn't be just fine.
>
> Ack, I'll do it in v3.
>
> Also, I'd like to invite people to please bikeshed a bit over the naming of the
> APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.

bpf_kptr_drop is more precise.
delete assumes instant free which is not the case here.

How about
extern void *__bpf_obj_new(__u64 local_type_id) __ksym;
extern void bpf_obj_drop(void *obj) __ksym;
#define bpf_obj_new(t) \
 (t *)__bpf_obj_new(bpf_core_type_id_local(t));

kptr means 'kernel pointer'.
Here we have program supplied object.
It feels 'obj' is better than 'kptr' in this context.

> In the BPF list API, it's named bpf_list_del but it's actually distinct from how
> list_del in the kernel works. So it does make sense to give them a different
> name (like pop_front/pop_back and push_front/push_back)?
>
> Because even bpf_list_add takes bpf_list_head, in the kernel there's no
> distinction between node and head, so you can do list_add on a node as well, but
> it won't be possible with the kfunc (unless we overload the head argument to
> also work with nodes).
>
> Later we'll probably have to add bpf_list_node_add etc. that add before or after
> a node to make that work.
>
> The main question is whether it should closely resembly the linked list API in
> the kernel, or can it steer away considerably from that?

If we do doubly linked list we should allow delete in
the middle with
bpf_list_del_any(head, node)

and
bpf_list_pop_front/pop_back(head)

bpf_list_add(node, head) would match kernel style,
but I think it's cleaner to have head as 1st arg.
In that sense new pop/push/_front/_back are cleaner.
And similar for rbtree.

If we keep (node, head) and (rb_node, rb_root) order
we should keep kernel names.
Kumar Kartikeya Dwivedi Oct. 20, 2022, 12:44 a.m. UTC | #4
On Wed, Oct 19, 2022 at 10:01:21PM IST, Alexei Starovoitov wrote:
> On Tue, Oct 18, 2022 at 10:58 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> > > On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > > > +{
> > > > +   struct btf_struct_meta *meta = meta__ign;
> > > > +   u64 size = local_type_id__k;
> > > > +   void *p;
> > > > +
> > > > +   if (unlikely(flags || !bpf_global_ma_set))
> > > > +           return NULL;
> > >
> > > Unused 'flags' looks weird in unstable api. Just drop it?
> > > And keep it as:
> > > void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
> > >
> > > and in bpf_experimental.h:
> > >
> > > extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
> > >
> > > since __ign args are ignored during kfunc type match
> > > the bpf progs can use it without #define.
> > >
> >
> > It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
> > prototypes will not be the same. I guess I'll have to teach it do that during
> > type match, but IDK how you feel about that.
>
> libbpf does the full type match, really?
> Could you point me to the code?
>

Not full type match, but the number of arguments must be same, so it won't allow
having kfunc as:

void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);

in the kernel and ksym declaration in the program as:

extern void *bpf_kptr_new(__u64 local_type_id) __ksym;

I get:

libbpf: extern (func ksym) 'bpf_kptr_new_impl': func_proto [25] incompatible with kernel [60043]

vlen of func_proto in kernel type is 2, for us it will be 1.

> > Otherwise unless you want people to manually pass something to the ignored
> > argument, we have to hide it behind a macro.
> >
> > I actually like the macro on top, then I don't even pass the type ID but the
> > type. But that's a personal preference, and I don't feel strongly about it.
> >
> > So in C one does malloc(sizeof(*p)), here we'll just write
> > bpf_kptr_new(typeof(*p)). YMMV.
>
> bpf_kptr_new(typeof(*p)) is cleaner.
>

So if we're having a macro anyway, the thing above can be hidden behind it.

> > > > +   p = bpf_mem_alloc(&bpf_global_ma, size);
> > > > +   if (!p)
> > > > +           return NULL;
> > > > +   if (meta)
> > > > +           bpf_obj_init(meta->off_arr, p);
> > >
> > > I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> > > It reminds me of programming style where people tried to add types into
> > > variable names. imo dropping _arr wouldn't be just fine.
> >
> > Ack, I'll do it in v3.
> >
> > Also, I'd like to invite people to please bikeshed a bit over the naming of the
> > APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.
>
> bpf_kptr_drop is more precise.
> delete assumes instant free which is not the case here.
>
> How about
> extern void *__bpf_obj_new(__u64 local_type_id) __ksym;
> extern void bpf_obj_drop(void *obj) __ksym;
> #define bpf_obj_new(t) \
>  (t *)__bpf_obj_new(bpf_core_type_id_local(t));
>
> kptr means 'kernel pointer'.
> Here we have program supplied object.
> It feels 'obj' is better than 'kptr' in this context.
>

I agree, I'll rename it to bpf_obj_*.

Also, that __bpf_obj_new doesn't work yet in clang [0] but I think we can rename
it to that once clang is fixed.

 [0]: https://reviews.llvm.org/D136041

> > In the BPF list API, it's named bpf_list_del but it's actually distinct from how
> > list_del in the kernel works. So it does make sense to give them a different
> > name (like pop_front/pop_back and push_front/push_back)?
> >
> > Because even bpf_list_add takes bpf_list_head, in the kernel there's no
> > distinction between node and head, so you can do list_add on a node as well, but
> > it won't be possible with the kfunc (unless we overload the head argument to
> > also work with nodes).
> >
> > Later we'll probably have to add bpf_list_node_add etc. that add before or after
> > a node to make that work.
> >
> > The main question is whether it should closely resembly the linked list API in
> > the kernel, or can it steer away considerably from that?
>
> If we do doubly linked list we should allow delete in
> the middle with
> bpf_list_del_any(head, node)
>
> and
> bpf_list_pop_front/pop_back(head)
>
> bpf_list_add(node, head) would match kernel style,
> but I think it's cleaner to have head as 1st arg.
> In that sense new pop/push/_front/_back are cleaner.
> And similar for rbtree.
>
> If we keep (node, head) and (rb_node, rb_root) order
> we should keep kernel names.

There's also tradeoffs in how various operations are done.

Right now we have bpf_list_del and bpf_list_del_tail doing pop_front and
pop_back.

To replicate the same in kernel style API, you would do:

struct foo *f = bpf_list_first_entry_or_null(head);
if (!f) {}
bpf_list_del(&f->node);

Between those two calls, you might do bpf_list_last_entry -> bpf_list_del, the
verifier is going to have a hard time proving the aliasing of the two nodes, so
it will have to invalidate all pointers peeking into the list whenever something
modifies it. You would still be able to access memory but cannot pass it to list
ops anymore.

But I think we will pay this cost eventually anyway, people will add a peek
operation and such analysis would have to be done then. So I think it's unlikely
we can avoid this, and it might be better to make things more consistent and end
up mirroring the kernel list APIs.

WDYT?
Alexei Starovoitov Oct. 20, 2022, 1:11 a.m. UTC | #5
On Wed, Oct 19, 2022 at 5:44 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Oct 19, 2022 at 10:01:21PM IST, Alexei Starovoitov wrote:
> > On Tue, Oct 18, 2022 at 10:58 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Wed, Oct 19, 2022 at 08:01:24AM IST, Alexei Starovoitov wrote:
> > > > On Thu, Oct 13, 2022 at 11:52:57AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > +void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
> > > > > +{
> > > > > +   struct btf_struct_meta *meta = meta__ign;
> > > > > +   u64 size = local_type_id__k;
> > > > > +   void *p;
> > > > > +
> > > > > +   if (unlikely(flags || !bpf_global_ma_set))
> > > > > +           return NULL;
> > > >
> > > > Unused 'flags' looks weird in unstable api. Just drop it?
> > > > And keep it as:
> > > > void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
> > > >
> > > > and in bpf_experimental.h:
> > > >
> > > > extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
> > > >
> > > > since __ign args are ignored during kfunc type match
> > > > the bpf progs can use it without #define.
> > > >
> > >
> > > It's ignored during check_kfunc_call, but libbpf doesn't ignore that. The
> > > prototypes will not be the same. I guess I'll have to teach it do that during
> > > type match, but IDK how you feel about that.
> >
> > libbpf does the full type match, really?
> > Could you point me to the code?
> >
>
> Not full type match, but the number of arguments must be same, so it won't allow
> having kfunc as:
>
> void *bpf_kptr_new(u64 local_type_id__k, struct btf_struct_meta *meta__ign);
>
> in the kernel and ksym declaration in the program as:
>
> extern void *bpf_kptr_new(__u64 local_type_id) __ksym;
>
> I get:
>
> libbpf: extern (func ksym) 'bpf_kptr_new_impl': func_proto [25] incompatible with kernel [60043]
>
> vlen of func_proto in kernel type is 2, for us it will be 1.

Ahh. Found it.
__bpf_core_types_are_compat() runs in both kernel and libbpf.
We could hack it for this case, but let's not complicate it.

> > > Otherwise unless you want people to manually pass something to the ignored
> > > argument, we have to hide it behind a macro.
> > >
> > > I actually like the macro on top, then I don't even pass the type ID but the
> > > type. But that's a personal preference, and I don't feel strongly about it.
> > >
> > > So in C one does malloc(sizeof(*p)), here we'll just write
> > > bpf_kptr_new(typeof(*p)). YMMV.
> >
> > bpf_kptr_new(typeof(*p)) is cleaner.
> >
>
> So if we're having a macro anyway, the thing above can be hidden behind it.

Indeed. Since #define is there let's not mess with
__bpf_core_types_are_compat().

> > > > > +   p = bpf_mem_alloc(&bpf_global_ma, size);
> > > > > +   if (!p)
> > > > > +           return NULL;
> > > > > +   if (meta)
> > > > > +           bpf_obj_init(meta->off_arr, p);
> > > >
> > > > I'm starting to dislike all that _arr and _tab suffixes in the verifier code base.
> > > > It reminds me of programming style where people tried to add types into
> > > > variable names. imo dropping _arr wouldn't be just fine.
> > >
> > > Ack, I'll do it in v3.
> > >
> > > Also, I'd like to invite people to please bikeshed a bit over the naming of the
> > > APIs, e.g. whether it should be bpf_kptr_drop vs bpf_kptr_delete.
> >
> > bpf_kptr_drop is more precise.
> > delete assumes instant free which is not the case here.
> >
> > How about
> > extern void *__bpf_obj_new(__u64 local_type_id) __ksym;
> > extern void bpf_obj_drop(void *obj) __ksym;
> > #define bpf_obj_new(t) \
> >  (t *)__bpf_obj_new(bpf_core_type_id_local(t));
> >
> > kptr means 'kernel pointer'.
> > Here we have program supplied object.
> > It feels 'obj' is better than 'kptr' in this context.
> >
>
> I agree, I'll rename it to bpf_obj_*.
>
> Also, that __bpf_obj_new doesn't work yet in clang [0] but I think we can rename
> it to that once clang is fixed.

argh. ok. let's keep bpf_obj_new_impl() for now.

>
>  [0]: https://reviews.llvm.org/D136041
>
> > > In the BPF list API, it's named bpf_list_del but it's actually distinct from how
> > > list_del in the kernel works. So it does make sense to give them a different
> > > name (like pop_front/pop_back and push_front/push_back)?
> > >
> > > Because even bpf_list_add takes bpf_list_head, in the kernel there's no
> > > distinction between node and head, so you can do list_add on a node as well, but
> > > it won't be possible with the kfunc (unless we overload the head argument to
> > > also work with nodes).
> > >
> > > Later we'll probably have to add bpf_list_node_add etc. that add before or after
> > > a node to make that work.
> > >
> > > The main question is whether it should closely resembly the linked list API in
> > > the kernel, or can it steer away considerably from that?
> >
> > If we do doubly linked list we should allow delete in
> > the middle with
> > bpf_list_del_any(head, node)
> >
> > and
> > bpf_list_pop_front/pop_back(head)
> >
> > bpf_list_add(node, head) would match kernel style,
> > but I think it's cleaner to have head as 1st arg.
> > In that sense new pop/push/_front/_back are cleaner.
> > And similar for rbtree.
> >
> > If we keep (node, head) and (rb_node, rb_root) order
> > we should keep kernel names.
>
> There's also tradeoffs in how various operations are done.
>
> Right now we have bpf_list_del and bpf_list_del_tail doing pop_front and
> pop_back.
>
> To replicate the same in kernel style API, you would do:
>
> struct foo *f = bpf_list_first_entry_or_null(head);
> if (!f) {}
> bpf_list_del(&f->node);
>
> Between those two calls, you might do bpf_list_last_entry -> bpf_list_del, the
> verifier is going to have a hard time proving the aliasing of the two nodes, so
> it will have to invalidate all pointers peeking into the list whenever something
> modifies it. You would still be able to access memory but cannot pass it to list
> ops anymore.

yeah. ugly.

>
> But I think we will pay this cost eventually anyway, people will add a peek
> operation and such analysis would have to be done then. So I think it's unlikely
> we can avoid this, and it might be better to make things more consistent and end
> up mirroring the kernel list APIs.

Let's keep the verifier simpler for now.
We can go with pop/push/_front/_back, since they are simpler
and later add kernel style accessor as well with swapped
node/head order.
One doesn't preclude the other.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7ffafa5bb866..29fccf7c8505 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -55,6 +55,8 @@  struct cgroup;
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 extern struct kobject *btf_kobj;
+extern struct bpf_mem_alloc bpf_global_ma;
+extern bool bpf_global_ma_set;
 
 typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
@@ -335,16 +337,19 @@  static inline bool btf_type_fields_has_field(const struct btf_type_fields *tab,
 	return tab->field_mask & type;
 }
 
-static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
+static inline void bpf_obj_init(const struct btf_type_fields_off *off_arr, void *obj)
 {
-	if (!IS_ERR_OR_NULL(map->fields_tab)) {
-		struct btf_field *fields = map->fields_tab->fields;
-		u32 cnt = map->fields_tab->cnt;
-		int i;
+	int i;
 
-		for (i = 0; i < cnt; i++)
-			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
-	}
+	if (!off_arr)
+		return;
+	for (i = 0; i < off_arr->cnt; i++)
+		memset(obj + off_arr->field_off[i], 0, off_arr->field_sz[i]);
+}
+
+static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
+{
+	bpf_obj_init(map->off_arr, dst);
 }
 
 /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 8b09c3f82071..0cc4679f3f42 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -426,6 +426,8 @@  struct bpf_insn_aux_data {
 		 */
 		struct bpf_loop_inline_state loop_inline_state;
 	};
+	u64 kptr_new_size; /* remember the size of type passed to bpf_kptr_new to rewrite R1 */
+	struct btf_struct_meta *kptr_struct_meta;
 	u64 map_key_state; /* constant (32 bit) key tracking for maps */
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 711fd293b6de..a8b3263a9a45 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -34,6 +34,7 @@ 
 #include <linux/log2.h>
 #include <linux/bpf_verifier.h>
 #include <linux/nodemask.h>
+#include <linux/bpf_mem_alloc.h>
 
 #include <asm/barrier.h>
 #include <asm/unaligned.h>
@@ -60,6 +61,9 @@ 
 #define CTX	regs[BPF_REG_CTX]
 #define IMM	insn->imm
 
+struct bpf_mem_alloc bpf_global_ma;
+bool bpf_global_ma_set;
+
 /* No hurry in this branch
  *
  * Exported for the bpf jit load helper.
@@ -2740,6 +2744,16 @@  int __weak bpf_arch_text_invalidate(void *dst, size_t len)
 	return -ENOTSUPP;
 }
 
+static int __init bpf_global_ma_init(void)
+{
+	int ret;
+
+	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
+	bpf_global_ma_set = !ret;
+	return ret;
+}
+late_initcall(bpf_global_ma_init);
+
 DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 EXPORT_SYMBOL(bpf_stats_enabled_key);
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 238103dc6c5e..954e0bf18269 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -19,6 +19,7 @@ 
 #include <linux/proc_ns.h>
 #include <linux/security.h>
 #include <linux/btf_ids.h>
+#include <linux/bpf_mem_alloc.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -1725,8 +1726,11 @@  void bpf_list_head_free(const struct btf_field *field, void *list_head,
 
 		obj -= field->list_head.node_offset;
 		head = head->next;
-		/* TODO: Rework later */
-		kfree(obj);
+		/* The contained type can also have resources, including a
+		 * bpf_list_head which needs to be freed.
+		 */
+		bpf_obj_free_fields(field->list_head.value_tab, obj);
+		bpf_mem_free(&bpf_global_ma, obj);
 	}
 unlock:
 	INIT_LIST_HEAD(head);
@@ -1734,20 +1738,43 @@  void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	local_irq_restore(flags);
 }
 
-BTF_SET8_START(tracing_btf_ids)
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+void *bpf_kptr_new_impl(u64 local_type_id__k, u64 flags, void *meta__ign)
+{
+	struct btf_struct_meta *meta = meta__ign;
+	u64 size = local_type_id__k;
+	void *p;
+
+	if (unlikely(flags || !bpf_global_ma_set))
+		return NULL;
+	p = bpf_mem_alloc(&bpf_global_ma, size);
+	if (!p)
+		return NULL;
+	if (meta)
+		bpf_obj_init(meta->off_arr, p);
+	return p;
+}
+
+__diag_pop();
+
+BTF_SET8_START(generic_btf_ids)
 #ifdef CONFIG_KEXEC_CORE
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
-BTF_SET8_END(tracing_btf_ids)
+BTF_ID_FLAGS(func, bpf_kptr_new_impl, KF_ACQUIRE | KF_RET_NULL)
+BTF_SET8_END(generic_btf_ids)
 
-static const struct btf_kfunc_id_set tracing_kfunc_set = {
+static const struct btf_kfunc_id_set generic_kfunc_set = {
 	.owner = THIS_MODULE,
-	.set   = &tracing_btf_ids,
+	.set   = &generic_btf_ids,
 };
 
 static int __init kfunc_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
 }
 
 late_initcall(kfunc_init);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b324c1042fb8..9cc01535e391 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7766,6 +7766,11 @@  static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_pa
 	return __kfunc_param_match_suffix(btf, arg, "__k");
 }
 
+static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__ign");
+}
+
 static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
 				      const struct btf_param *arg,
 				      const struct bpf_reg_state *reg,
@@ -8035,6 +8040,10 @@  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		int kf_arg_type;
 
 		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
+
+		if (is_kfunc_arg_sfx_ignore(btf, &args[i]))
+			continue;
+
 		if (btf_type_is_scalar(t)) {
 			if (reg->type != SCALAR_VALUE) {
 				verbose(env, "R%d is not a scalar\n", regno);
@@ -8212,6 +8221,17 @@  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	return 0;
 }
 
+enum special_kfunc_type {
+	KF_bpf_kptr_new_impl,
+};
+
+BTF_SET_START(special_kfunc_set)
+BTF_ID(func, bpf_kptr_new_impl)
+BTF_SET_END(special_kfunc_set)
+
+BTF_ID_LIST(special_kfunc_list)
+BTF_ID(func, bpf_kptr_new_impl)
+
 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
@@ -8286,17 +8306,64 @@  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
 
 	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
-		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
-		return -EINVAL;
+		/* Only exception is bpf_kptr_new_impl */
+		if (meta.btf != btf_vmlinux || meta.func_id != special_kfunc_list[KF_bpf_kptr_new_impl]) {
+			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
+			return -EINVAL;
+		}
 	}
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
-		ptr_type = btf_type_skip_modifiers(desc_btf, t->type,
-						   &ptr_type_id);
-		if (!btf_type_is_struct(ptr_type)) {
+		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
+
+		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
+			if (!btf_type_is_void(ptr_type)) {
+				verbose(env, "kernel function %s must have void * return type\n",
+					meta.func_name);
+				return -EINVAL;
+			}
+			if (meta.func_id == special_kfunc_list[KF_bpf_kptr_new_impl]) {
+				const struct btf_type *ret_t;
+				struct btf *ret_btf;
+				u32 ret_btf_id;
+
+				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
+					verbose(env, "local type ID argument must be in range [0, U32_MAX]\n");
+					return -EINVAL;
+				}
+
+				ret_btf = env->prog->aux->btf;
+				ret_btf_id = meta.arg_constant.value;
+
+				/* This may be NULL due to user not supplying a BTF */
+				if (!ret_btf) {
+					verbose(env, "bpf_kptr_new requires prog BTF\n");
+					return -EINVAL;
+				}
+
+				mark_reg_known_zero(env, regs, BPF_REG_0);
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_TYPE_LOCAL;
+				regs[BPF_REG_0].btf = ret_btf;
+				regs[BPF_REG_0].btf_id = ret_btf_id;
+
+				ret_t = btf_type_by_id(ret_btf, ret_btf_id);
+				if (!ret_t || !__btf_type_is_struct(ret_t)) {
+					verbose(env, "bpf_kptr_new type ID argument must be of a struct\n");
+					return -EINVAL;
+				}
+
+				env->insn_aux_data[insn_idx].kptr_new_size = ret_t->size;
+				env->insn_aux_data[insn_idx].kptr_struct_meta =
+					btf_find_struct_meta(ret_btf, ret_btf_id);
+			} else {
+				verbose(env, "kernel function %s unhandled dynamic return type\n",
+					meta.func_name);
+				return -EFAULT;
+			}
+		} else if (!__btf_type_is_struct(ptr_type)) {
 			if (!meta.r0_size) {
 				ptr_type_name = btf_name_by_offset(desc_btf,
 								   ptr_type->name_off);
@@ -8324,6 +8391,7 @@  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
 			regs[BPF_REG_0].btf_id = ptr_type_id;
 		}
+
 		if (is_kfunc_ret_null(&meta)) {
 			regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
 			/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
@@ -14455,8 +14523,8 @@  static int fixup_call_args(struct bpf_verifier_env *env)
 	return err;
 }
 
-static int fixup_kfunc_call(struct bpf_verifier_env *env,
-			    struct bpf_insn *insn)
+static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
+			    struct bpf_insn *insn_buf, int insn_idx, int *cnt)
 {
 	const struct bpf_kfunc_desc *desc;
 
@@ -14475,8 +14543,21 @@  static int fixup_kfunc_call(struct bpf_verifier_env *env,
 		return -EFAULT;
 	}
 
+	*cnt = 0;
 	insn->imm = desc->imm;
+	if (insn->off)
+		return 0;
+	if (desc->func_id == special_kfunc_list[KF_bpf_kptr_new_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
+		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_3, (long)kptr_struct_meta) };
+		u64 kptr_new_size = env->insn_aux_data[insn_idx].kptr_new_size;
 
+		insn_buf[0] = BPF_MOV64_IMM(BPF_REG_1, kptr_new_size);
+		insn_buf[1] = addr[0];
+		insn_buf[2] = addr[1];
+		insn_buf[3] = *insn;
+		*cnt = 4;
+	}
 	return 0;
 }
 
@@ -14618,9 +14699,19 @@  static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (insn->src_reg == BPF_PSEUDO_CALL)
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
-			ret = fixup_kfunc_call(env, insn);
+			ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt);
 			if (ret)
 				return ret;
+			if (cnt == 0)
+				continue;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta	 += cnt - 1;
+			env->prog = prog = new_prog;
+			insn	  = new_prog->insnsi + i + delta;
 			continue;
 		}
 
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 4e31790e433d..9c7d0badb02e 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -20,4 +20,23 @@  struct bpf_list_node {
 #endif
 
 #ifndef __KERNEL__
+
+/* Description
+ *	Allocates a local kptr of type represented by 'local_type_id' in program
+ *	BTF. User may use the bpf_core_type_id_local macro to pass the type ID
+ *	of a struct in program BTF.
+ *
+ *	The 'local_type_id' parameter must be a known constant. The 'flags'
+ *	parameter must be 0.
+ *
+ *	The 'meta__ign' parameter is a hidden argument that is ignored.
+ * Returns
+ *	A local kptr corresponding to passed in 'local_type_id', or NULL on
+ *	failure.
+ */
+extern void *bpf_kptr_new_impl(__u64 local_type_id, __u64 flags, void *meta__ign) __ksym;
+
+/* Convenience macro to wrap over bpf_kptr_new_impl */
+#define bpf_kptr_new(type) bpf_kptr_new_impl(bpf_core_type_id_local(type), 0, NULL)
+
 #endif