mbox series

[v1,bpf-next,0/9] Shared ownership for local kptrs

Message ID 20230410190753.2012798-1-davemarchevsky@fb.com (mailing list archive)
Headers show
Series Shared ownership for local kptrs | expand

Message

Dave Marchevsky April 10, 2023, 7:07 p.m. UTC
This series adds support for refcounted local kptrs to the verifier. A local
kptr is 'refcounted' if its type contains a struct bpf_refcount field:

  struct refcounted_node {
    long data;
    struct bpf_list_node ll;
    struct bpf_refcount ref;
  };

bpf_refcount is used to implement shared ownership for local kptrs.

Motivating usecase
==================

If a struct has two collection node fields, e.g.:

  struct node {
    long key;
    long val;
    struct bpf_rb_node rb;
    struct bpf_list_node ll;
  };

It's not currently possible to add a node to both the list and rbtree:

  long bpf_prog(void *ctx)
  {
    struct node *n = bpf_obj_new(typeof(*n));
    if (!n) { /* ... */ }

    bpf_spin_lock(&lock);

    bpf_list_push_back(&head, &n->ll);
    bpf_rbtree_add(&root, &n->rb, less); /* Assume a resonable less() */
    bpf_spin_unlock(&lock);
  }

The above program will fail verification due to current owning / non-owning ref
logic: after bpf_list_push_back, n is a non-owning reference and thus cannot be
passed to bpf_rbtree_add. The only way to get an owning reference for the node
that was added is to bpf_list_pop_{front,back} it.

More generally, verifier ownership semantics expect that a node has one
owner (program, collection, or stashed in map) with exclusive ownership
of the node's lifetime. The owner free's the node's underlying memory when it
itself goes away.

Without a shared ownership concept it's impossible to express many real-world
usecases such that they pass verification.

Semantic Changes
================

Before this series, the verifier could make this statement: "whoever has the
owning reference has exclusive ownership of the referent's lifetime". As
demonstrated in the previous section, this implies that a BPF program can't
have an owning reference to some node if that node is in a collection. If
such a state were possible, the node would have multiple owners, each thinking
they have exclusive ownership. In order to support shared ownership it's
necessary to modify the exclusive ownership semantic.

After this series' changes, an owning reference has ownership of the referent's
lifetime, but it's not necessarily exclusive. The referent's underlying memory
is guaranteed to be valid (i.e. not free'd) until the reference is dropped or
used for collection insert.

This change doesn't affect UX of owning or non-owning references much:

  * insert kfuncs (bpf_rbtree_add, bpf_list_push_{front,back}) still require
    an owning reference arg, as ownership still must be passed to the
    collection in a shared-ownership world.

  * non-owning references still refer to valid memory without claiming
    any ownership.

One important conclusion that followed from "exclusive ownership" statement
is no longer valid, though. In exclusive-ownership world, if a BPF prog has
an owning reference to a node, the verifier can conclude that no collection has
ownership of it. This conclusion was used to avoid runtime checking in the
implementations of insert and remove operations (""has the node already been
{inserted, removed}?").

In a shared-ownership world the aforementioned conclusion is no longer valid,
which necessitates doing runtime checking in insert and remove operation
kfuncs, and those functions possibly failing to insert or remove anything.

Luckily the verifier changes necessary to go from exclusive to shared ownership
were fairly minimal. Patches in this series which do change verifier semantics
generally have some summary dedicated to explaining why certain usecases
Just Work for shared ownership without verifier changes.

Implementation
==============

The changes in this series can be categorized as follows:

  * struct bpf_refcount opaque field + plumbing
  * support for refcounted kptrs in bpf_obj_new and bpf_obj_drop
  * bpf_refcount_acquire kfunc
    * enables shared ownershp by bumping refcount + acquiring owning ref
  * support for possibly-failing collection insertion and removal
    * insertion changes are more complex

If a patch's changes have some nuance to their effect - or lack of effect - on
verifier behavior, the patch summary talks about it at length.

Patch contents:
  * Patch 1 removes btf_field_offs struct
  * Patch 2 adds struct bpf_refcount and associated plumbing
  * Patch 3 modifies semantics of bpf_obj_drop and bpf_obj_new to handle
    refcounted kptrs
  * Patch 4 adds bpf_refcount_acquire
  * Patches 5-7 add support for possibly-failing collection insert and remove
  * Patch 8 centralizes constructor-like functionality for local kptr types
  * Patch 9 adds tests for new functionality

base-commit: c4d3b488a90be95f4f9413dc7eae5fc113d15fe9

Dave Marchevsky (9):
  bpf: Remove btf_field_offs, use btf_record's fields instead
  bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing
  bpf: Support refcounted local kptrs in existing semantics
  bpf: Add bpf_refcount_acquire kfunc
  bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly
    fail
  selftests/bpf: Modify linked_list tests to work with macro-ified
    inserts
  bpf: Migrate bpf_rbtree_remove to possibly fail
  bpf: Centralize btf_field-specific initialization logic
  selftests/bpf: Add refcounted_kptr tests

 include/linux/bpf.h                           |  85 ++--
 include/linux/bpf_verifier.h                  |   7 +-
 include/linux/btf.h                           |   2 -
 include/uapi/linux/bpf.h                      |   4 +
 kernel/bpf/btf.c                              | 126 ++----
 kernel/bpf/helpers.c                          | 116 +++--
 kernel/bpf/map_in_map.c                       |  15 -
 kernel/bpf/syscall.c                          |  23 +-
 kernel/bpf/verifier.c                         | 153 +++++--
 tools/include/uapi/linux/bpf.h                |   4 +
 .../testing/selftests/bpf/bpf_experimental.h  |  60 ++-
 .../selftests/bpf/prog_tests/linked_list.c    |   8 +-
 .../testing/selftests/bpf/prog_tests/rbtree.c |  25 ++
 .../bpf/prog_tests/refcounted_kptr.c          |  18 +
 .../testing/selftests/bpf/progs/linked_list.c |  34 +-
 .../testing/selftests/bpf/progs/linked_list.h |   4 +-
 .../selftests/bpf/progs/linked_list_fail.c    |  96 ++--
 tools/testing/selftests/bpf/progs/rbtree.c    |  74 +++-
 .../testing/selftests/bpf/progs/rbtree_fail.c |  77 ++--
 .../selftests/bpf/progs/refcounted_kptr.c     | 410 ++++++++++++++++++
 .../bpf/progs/refcounted_kptr_fail.c          |  72 +++
 21 files changed, 1066 insertions(+), 347 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c