mbox series

[bpf-next,v1,00/15] Introduce typed pointer support in BPF maps

Message ID 20220220134813.3411982-1-memxor@gmail.com (mailing list archive)
Headers show
Series Introduce typed pointer support in BPF maps | expand

Message

Kumar Kartikeya Dwivedi Feb. 20, 2022, 1:47 p.m. UTC
Introduction
------------

This set enables storing pointers of a certain type in BPF map, and extends the
verifier to enforce type safety and lifetime correctness properties.

The infrastructure being added is generic enough for allowing storing any kind
of pointers whose type is available using BTF (user or kernel) in the future
(e.g. strongly typed memory allocation in BPF program), which are internally
tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
four kinds of pointers obtained from the kernel.

Obviously, use of this feature depends on map BTF.

1. Unreferenced kernel pointer

In this case, there are very few restrictions. The pointer type being stored
must match the type declared in the map value. However, such a pointer when
loaded from the map can only be dereferenced, but not passed to any in-kernel
helpers or kernel functions available to the program. This is because while the
verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
which are then handled specially by the JIT implementation, the same liberty is
not available to accesses inside the kernel. The pointer by the time it is
passed into a helper has no lifetime related guarantees about the object it is
pointing to, and may well be referencing invalid memory.

2. Referenced kernel pointer

This case imposes a lot of restrictions on the programmer, to ensure safety. To
transfer the ownership of a reference in the BPF program to the map, the user
must use the BPF_XCHG instruction, which returns the old pointer contained in
the map, as an acquired reference, and releases verifier state for the
referenced pointer being exchanged, as it moves into the map.

This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
functions callable by the program.

However, if BPF_LDX is used to load a referenced pointer from the map, it is
still not permitted to pass it to in-kernel helpers or kernel functions. To
obtain a reference usable with helpers, the user must invoke a kfunc helper
which returns a usable reference (which also must be eventually released before
BPF_EXIT, or moved into a map).

Since the load of the pointer (preserving data dependency ordering) must happen
inside the RCU read section, the kfunc helper will take a pointer to the map
value, which must point to the actual pointer of the object whose reference is
to be raised. The type will be verified from the BTF information of the kfunc,
as the prototype must be:

	T *func(T **, ... /* other arguments */);

Then, the verifier checks whether pointer at offset of the map value points to
the type T, and permits the call.

This convention is followed so that such helpers may also be called from
sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
program context, hence necessiating the need to pass in a pointer to the actual
pointer to perform the load inside the RCU read section.

3. per-CPU kernel pointer

These have very little restrictions. The user can store a PTR_TO_PERCPU_BTF_ID
into the map, and when loading from the map, they must NULL check it before use,
because while a non-zero value stored into the map should always be valid, it can
still be reset to zero on updates. After checking it to be non-NULL, it can be
passed to bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers to obtain a PTR_TO_BTF_ID
to underlying per-CPU object.

It is also permitted to write 0 and reset the value.

4. Userspace pointer

The verifier recently gained support for annotating BTF with __user type tag.
This indicates pointers pointing to memory which must be read using the
bpf_probe_read_user helper to ensure correct results. The set also permits
storing them into the BPF map, and ensures user pointer cannot be stored
into other kinds of pointers mentioned above.

When loaded from the map, the only thing that can be done is to pass this
pointer to bpf_probe_read_user. No dereference is allowed.

Notes
-----

This set requires the following LLVM fix to pass the BPF CI:

  https://reviews.llvm.org/D119799

Also, I applied Alexei's suggestion of removing callback for btf_find_field, but
that 'ugly' is still required, since bad offset alignment etc. can return an
error, and we don't want to leave a partial ptr_off_tab around in that case. The
other option is freeing inside btf_find_field, but that would be more code
conditional on BTF_FIELD_KPTR, when the caller can do it based on ret < 0.

TODO
----

Needs a lot more testing, especially for stuff apart from verifier correctness.
Will work on that in parallel during v1 review. The idea was to get a little
more feedback (esp. for kptr_get stuff) before moving forward with adding more
tests. Posting it now to just get discussion started. The verifier tests fairly
comprehensively test many edge cases I could think of.

Kumar Kartikeya Dwivedi (15):
  bpf: Factor out fd returning from bpf_btf_find_by_name_kind
  bpf: Make btf_find_field more generic
  bpf: Allow storing PTR_TO_BTF_ID in map
  bpf: Allow storing referenced PTR_TO_BTF_ID in map
  bpf: Allow storing PTR_TO_PERCPU_BTF_ID in map
  bpf: Allow storing __user PTR_TO_BTF_ID in map
  bpf: Prevent escaping of pointers loaded from maps
  bpf: Adapt copy_map_value for multiple offset case
  bpf: Populate pairs of btf_id and destructor kfunc in btf
  bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map
  bpf: Teach verifier about kptr_get style kfunc helpers
  net/netfilter: Add bpf_ct_kptr_get helper
  libbpf: Add __kptr* macros to bpf_helpers.h
  selftests/bpf: Add C tests for PTR_TO_BTF_ID in map
  selftests/bpf: Add verifier tests for PTR_TO_BTF_ID in map

 include/linux/bpf.h                           |  90 ++-
 include/linux/btf.h                           |  24 +
 include/net/netfilter/nf_conntrack_core.h     |  17 +
 kernel/bpf/arraymap.c                         |  13 +-
 kernel/bpf/btf.c                              | 565 ++++++++++++++--
 kernel/bpf/hashtab.c                          |  27 +-
 kernel/bpf/map_in_map.c                       |   5 +-
 kernel/bpf/syscall.c                          | 227 ++++++-
 kernel/bpf/verifier.c                         | 311 ++++++++-
 net/bpf/test_run.c                            |  17 +-
 net/netfilter/nf_conntrack_bpf.c              | 132 +++-
 net/netfilter/nf_conntrack_core.c             |  17 -
 tools/lib/bpf/bpf_helpers.h                   |   4 +
 .../selftests/bpf/prog_tests/map_btf_ptr.c    |  13 +
 .../testing/selftests/bpf/progs/map_btf_ptr.c | 105 +++
 .../testing/selftests/bpf/progs/test_bpf_nf.c |  31 +
 tools/testing/selftests/bpf/test_verifier.c   |  57 +-
 .../selftests/bpf/verifier/map_btf_ptr.c      | 624 ++++++++++++++++++
 18 files changed, 2144 insertions(+), 135 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/map_btf_ptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/map_btf_ptr.c
 create mode 100644 tools/testing/selftests/bpf/verifier/map_btf_ptr.c

Comments

Song Liu Feb. 22, 2022, 6:05 a.m. UTC | #1
On Sun, Feb 20, 2022 at 5:48 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Introduction
> ------------
>
> This set enables storing pointers of a certain type in BPF map, and extends the
> verifier to enforce type safety and lifetime correctness properties.
>
> The infrastructure being added is generic enough for allowing storing any kind
> of pointers whose type is available using BTF (user or kernel) in the future
> (e.g. strongly typed memory allocation in BPF program), which are internally
> tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
> four kinds of pointers obtained from the kernel.
>
> Obviously, use of this feature depends on map BTF.
>
> 1. Unreferenced kernel pointer
>
> In this case, there are very few restrictions. The pointer type being stored
> must match the type declared in the map value. However, such a pointer when
> loaded from the map can only be dereferenced, but not passed to any in-kernel
> helpers or kernel functions available to the program. This is because while the
> verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
> which are then handled specially by the JIT implementation, the same liberty is
> not available to accesses inside the kernel. The pointer by the time it is
> passed into a helper has no lifetime related guarantees about the object it is
> pointing to, and may well be referencing invalid memory.
>
> 2. Referenced kernel pointer
>
> This case imposes a lot of restrictions on the programmer, to ensure safety. To
> transfer the ownership of a reference in the BPF program to the map, the user
> must use the BPF_XCHG instruction, which returns the old pointer contained in
> the map, as an acquired reference, and releases verifier state for the
> referenced pointer being exchanged, as it moves into the map.
>
> This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
> functions callable by the program.
>
> However, if BPF_LDX is used to load a referenced pointer from the map, it is
> still not permitted to pass it to in-kernel helpers or kernel functions. To
> obtain a reference usable with helpers, the user must invoke a kfunc helper
> which returns a usable reference (which also must be eventually released before
> BPF_EXIT, or moved into a map).
>
> Since the load of the pointer (preserving data dependency ordering) must happen
> inside the RCU read section, the kfunc helper will take a pointer to the map
> value, which must point to the actual pointer of the object whose reference is
> to be raised. The type will be verified from the BTF information of the kfunc,
> as the prototype must be:
>
>         T *func(T **, ... /* other arguments */);
>
> Then, the verifier checks whether pointer at offset of the map value points to
> the type T, and permits the call.
>
> This convention is followed so that such helpers may also be called from
> sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
> program context, hence necessiating the need to pass in a pointer to the actual
> pointer to perform the load inside the RCU read section.
>
> 3. per-CPU kernel pointer
>
> These have very little restrictions. The user can store a PTR_TO_PERCPU_BTF_ID
> into the map, and when loading from the map, they must NULL check it before use,
> because while a non-zero value stored into the map should always be valid, it can
> still be reset to zero on updates. After checking it to be non-NULL, it can be
> passed to bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers to obtain a PTR_TO_BTF_ID
> to underlying per-CPU object.
>
> It is also permitted to write 0 and reset the value.
>
> 4. Userspace pointer
>
> The verifier recently gained support for annotating BTF with __user type tag.
> This indicates pointers pointing to memory which must be read using the
> bpf_probe_read_user helper to ensure correct results. The set also permits
> storing them into the BPF map, and ensures user pointer cannot be stored
> into other kinds of pointers mentioned above.
>
> When loaded from the map, the only thing that can be done is to pass this
> pointer to bpf_probe_read_user. No dereference is allowed.
>

I guess I missed some context here. Could you please provide some reference
to the use cases of these features?

For Unreferenced kernel pointer and userspace pointer, it seems that there is
no guarantee the pointer will still be valid during access (we only know it is
valid when it is stored in the map). Is this correct?

Thanks,
Song

[...]
Kumar Kartikeya Dwivedi Feb. 22, 2022, 8:21 a.m. UTC | #2
On Tue, Feb 22, 2022 at 11:35:14AM IST, Song Liu wrote:
> On Sun, Feb 20, 2022 at 5:48 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Introduction
> > ------------
> >
> > This set enables storing pointers of a certain type in BPF map, and extends the
> > verifier to enforce type safety and lifetime correctness properties.
> >
> > The infrastructure being added is generic enough for allowing storing any kind
> > of pointers whose type is available using BTF (user or kernel) in the future
> > (e.g. strongly typed memory allocation in BPF program), which are internally
> > tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
> > four kinds of pointers obtained from the kernel.
> >
> > Obviously, use of this feature depends on map BTF.
> >
> > 1. Unreferenced kernel pointer
> >
> > In this case, there are very few restrictions. The pointer type being stored
> > must match the type declared in the map value. However, such a pointer when
> > loaded from the map can only be dereferenced, but not passed to any in-kernel
> > helpers or kernel functions available to the program. This is because while the
> > verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
> > which are then handled specially by the JIT implementation, the same liberty is
> > not available to accesses inside the kernel. The pointer by the time it is
> > passed into a helper has no lifetime related guarantees about the object it is
> > pointing to, and may well be referencing invalid memory.
> >
> > 2. Referenced kernel pointer
> >
> > This case imposes a lot of restrictions on the programmer, to ensure safety. To
> > transfer the ownership of a reference in the BPF program to the map, the user
> > must use the BPF_XCHG instruction, which returns the old pointer contained in
> > the map, as an acquired reference, and releases verifier state for the
> > referenced pointer being exchanged, as it moves into the map.
> >
> > This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
> > functions callable by the program.
> >
> > However, if BPF_LDX is used to load a referenced pointer from the map, it is
> > still not permitted to pass it to in-kernel helpers or kernel functions. To
> > obtain a reference usable with helpers, the user must invoke a kfunc helper
> > which returns a usable reference (which also must be eventually released before
> > BPF_EXIT, or moved into a map).
> >
> > Since the load of the pointer (preserving data dependency ordering) must happen
> > inside the RCU read section, the kfunc helper will take a pointer to the map
> > value, which must point to the actual pointer of the object whose reference is
> > to be raised. The type will be verified from the BTF information of the kfunc,
> > as the prototype must be:
> >
> >         T *func(T **, ... /* other arguments */);
> >
> > Then, the verifier checks whether pointer at offset of the map value points to
> > the type T, and permits the call.
> >
> > This convention is followed so that such helpers may also be called from
> > sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
> > program context, hence necessiating the need to pass in a pointer to the actual
> > pointer to perform the load inside the RCU read section.
> >
> > 3. per-CPU kernel pointer
> >
> > These have very little restrictions. The user can store a PTR_TO_PERCPU_BTF_ID
> > into the map, and when loading from the map, they must NULL check it before use,
> > because while a non-zero value stored into the map should always be valid, it can
> > still be reset to zero on updates. After checking it to be non-NULL, it can be
> > passed to bpf_per_cpu_ptr and bpf_this_cpu_ptr helpers to obtain a PTR_TO_BTF_ID
> > to underlying per-CPU object.
> >
> > It is also permitted to write 0 and reset the value.
> >
> > 4. Userspace pointer
> >
> > The verifier recently gained support for annotating BTF with __user type tag.
> > This indicates pointers pointing to memory which must be read using the
> > bpf_probe_read_user helper to ensure correct results. The set also permits
> > storing them into the BPF map, and ensures user pointer cannot be stored
> > into other kinds of pointers mentioned above.
> >
> > When loaded from the map, the only thing that can be done is to pass this
> > pointer to bpf_probe_read_user. No dereference is allowed.
> >
>
> I guess I missed some context here. Could you please provide some reference
> to the use cases of these features?
>

The common usecase is caching references to objects inside BPF maps, to avoid
costly lookups, and being able to raise it once for the duration of program
invocation when passing it to multiple helpers (to avoid further re-lookups).
Storing references also allows you to control object lifetime.

One other use case is enabling xdp_frame queueing in XDP using this, but that
still needs some integration work after this lands, so it's a bit early to
comment on the specifics.

Other than that, I think Alexei already mentioned this could be easily extended
to do memory allocation returning a PTR_TO_BTF_ID in a BPF program [0] in the
future.

  [0]: https://lore.kernel.org/bpf/20220216230615.po6huyrgkswk7u67@ast-mbp.dhcp.thefacebook.com

> For Unreferenced kernel pointer and userspace pointer, it seems that there is
> no guarantee the pointer will still be valid during access (we only know it is
> valid when it is stored in the map). Is this correct?
>

That is correct. In the case of unreferenced and referenced kernel pointers,
when you do a BPF_LDX, both are marked as PTR_UNTRUSTED, and it is not allowed
to pass them into helpers or kfuncs, because from that point onwards we cannot
claim that the object is still alive when pointer is used later. Still,
dereference is permitted because verifier handles faults for bad accesses using
PROBE_MEM conversion for PTR_TO_BTF_ID loads in convert_ctx_accesses (which is
then later detected by JIT to build exception table used by exception handler).

In case of reading unreferenced pointer, in some cases you know that the pointer
will stay valid, so you can just store it in the map and load and directly
access it, it imposes very little restrictions.

For the referenced case, and BPF_LDX marking it as PTR_UNTRUSTED, you could say
that this makes it a lot less useful, because if BPF program already holds
reference, just to make sure I _read valid data_, I still have to use the
kptr_get style helper to raise and put reference to ensure the object is alive
when it is accessed.

So in that case, for RCU protected objects, it should still wait for BPF program
to hit BPF_EXIT before the actual release, but for other cases like the case of
sleepable programs, or objects where refcount alone manages lifetime, you can
also detect writer presence of the other BPF program (to detect if pointer
during our access was xchg'd out) using a seqlock style scheme:

	v = bpf_map_lookup_elem(&map, ...);
	if (!v)
		return 0;
	seq_begin = v->seq;
	atomic_thread_fence(memory_order_acquire); // A
	<do access>
	atomic_thread_fence(memory_order_acquire); // B
	seq_end = v->seq;
	if (seq_begin & 1 || seq_begin != seq_end)
		goto bad_read;
	<use data>

Ofcourse, barriers are not yet in BPF, but you get the idea (it should work on
x86). The updater BPF program will increment v->seq before and after xchg,
ensuring proper ordering. v->seq starts as 0, so odd seq indicates writer update
is in progress.

This would allow you to not raise refcount, while still ensuring that as long as
object was accessed, it was still valid between A and B. Even if raising
uncontended refcount is cheap, this is much cheaper.

The case of userspace pointer is different, it sets the MEM_USER flag, so the
only useful thing to do is calling bpf_probe_read_user, you can't even
dereference it. You are right that in most cases that userspace pointer won't be
useful, but for some cooperative cases between BPF program and userspace thread,
it can act as a way to share certain thread local areas/userspace memory that
the BPF program can then store keyed by the task_struct *, where using a BPF map
to share memory is not always possible.

> Thanks,
> Song
>
> [...]

--
Kartikeya
Song Liu Feb. 23, 2022, 7:29 a.m. UTC | #3
On Tue, Feb 22, 2022 at 12:21 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
[...]


> >
> > I guess I missed some context here. Could you please provide some reference
> > to the use cases of these features?
> >
>
> The common usecase is caching references to objects inside BPF maps, to avoid
> costly lookups, and being able to raise it once for the duration of program
> invocation when passing it to multiple helpers (to avoid further re-lookups).
> Storing references also allows you to control object lifetime.
>
> One other use case is enabling xdp_frame queueing in XDP using this, but that
> still needs some integration work after this lands, so it's a bit early to
> comment on the specifics.
>
> Other than that, I think Alexei already mentioned this could be easily extended
> to do memory allocation returning a PTR_TO_BTF_ID in a BPF program [0] in the
> future.
>
>   [0]: https://lore.kernel.org/bpf/20220216230615.po6huyrgkswk7u67@ast-mbp.dhcp.thefacebook.com
>
> > For Unreferenced kernel pointer and userspace pointer, it seems that there is
> > no guarantee the pointer will still be valid during access (we only know it is
> > valid when it is stored in the map). Is this correct?
> >
>
> That is correct. In the case of unreferenced and referenced kernel pointers,
> when you do a BPF_LDX, both are marked as PTR_UNTRUSTED, and it is not allowed
> to pass them into helpers or kfuncs, because from that point onwards we cannot
> claim that the object is still alive when pointer is used later. Still,
> dereference is permitted because verifier handles faults for bad accesses using
> PROBE_MEM conversion for PTR_TO_BTF_ID loads in convert_ctx_accesses (which is
> then later detected by JIT to build exception table used by exception handler).
>
> In case of reading unreferenced pointer, in some cases you know that the pointer
> will stay valid, so you can just store it in the map and load and directly
> access it, it imposes very little restrictions.
>
> For the referenced case, and BPF_LDX marking it as PTR_UNTRUSTED, you could say
> that this makes it a lot less useful, because if BPF program already holds
> reference, just to make sure I _read valid data_, I still have to use the
> kptr_get style helper to raise and put reference to ensure the object is alive
> when it is accessed.
>
> So in that case, for RCU protected objects, it should still wait for BPF program
> to hit BPF_EXIT before the actual release, but for other cases like the case of
> sleepable programs, or objects where refcount alone manages lifetime, you can
> also detect writer presence of the other BPF program (to detect if pointer
> during our access was xchg'd out) using a seqlock style scheme:
>
>         v = bpf_map_lookup_elem(&map, ...);
>         if (!v)
>                 return 0;
>         seq_begin = v->seq;
>         atomic_thread_fence(memory_order_acquire); // A
>         <do access>
>         atomic_thread_fence(memory_order_acquire); // B
>         seq_end = v->seq;
>         if (seq_begin & 1 || seq_begin != seq_end)
>                 goto bad_read;
>         <use data>
>
> Ofcourse, barriers are not yet in BPF, but you get the idea (it should work on
> x86). The updater BPF program will increment v->seq before and after xchg,
> ensuring proper ordering. v->seq starts as 0, so odd seq indicates writer update
> is in progress.
>
> This would allow you to not raise refcount, while still ensuring that as long as
> object was accessed, it was still valid between A and B. Even if raising
> uncontended refcount is cheap, this is much cheaper.
>
> The case of userspace pointer is different, it sets the MEM_USER flag, so the
> only useful thing to do is calling bpf_probe_read_user, you can't even
> dereference it. You are right that in most cases that userspace pointer won't be
> useful, but for some cooperative cases between BPF program and userspace thread,
> it can act as a way to share certain thread local areas/userspace memory that
> the BPF program can then store keyed by the task_struct *, where using a BPF map
> to share memory is not always possible.

Thanks for the explanation! I can see the referenced kernel pointer be very
powerful in many use cases. The per cpu pointer is also interesting.

Song