diff mbox series

[bpf-next,v5,06/25] bpf: Introduce local kptrs

Message ID 20221107230950.7117-7-memxor@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Local kptrs, BPF linked lists | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR pending PR summary
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count fail Series longer than 15 patches (and no cover letter)
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1362 this patch: 1362
netdev/cc_maintainers warning 20 maintainers not CCed: sdf@google.com pablo@netfilter.org coreteam@netfilter.org pabeni@redhat.com yhs@fb.com jolsa@kernel.org martin.lau@linux.dev netfilter-devel@vger.kernel.org kadlec@netfilter.org davem@davemloft.net yoshfuji@linux-ipv6.org netdev@vger.kernel.org fw@strlen.de song@kernel.org kpsingh@kernel.org haoluo@google.com edumazet@google.com kuba@kernel.org dsahern@kernel.org john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 157 this patch: 157
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1352 this patch: 1352
netdev/checkpatch warning WARNING: 'paramters' may be misspelled - perhaps 'parameters'? WARNING: line length of 100 exceeds 80 columns WARNING: line length of 110 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 96 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-1 pending Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix

Commit Message

Kumar Kartikeya Dwivedi Nov. 7, 2022, 11:09 p.m. UTC
Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
program BTF. This is indicated by the presence of MEM_ALLOC type flag in
reg->type to avoid having to check btf_is_kernel when trying to match
argument types in helpers.

Refactor btf_struct_access callback to just take bpf_reg_state instead
of btf and btf_type paramters. Note that the call site in
check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
dummy reg on stack. Since only the type, btf, and btf_id of the register
matter for the checks, it can be done so without complicating the usual
cases elsewhere in the verifier where reg->btf and reg->btf_id is used
verbatim.

Whenever walking such types, any pointers being walked will always yield
a SCALAR instead of pointer. In the future we might permit kptr inside
local kptr (either kernel or local), and it would be permitted only in
that case.

For now, these local kptrs will always be referenced in verifier
context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
to such objects, as long fields that are special are not touched
(support for which will be added in subsequent patches). Note that once
such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
write to it.

No PROBE_MEM handling is therefore done for loads into this type unless
PTR_UNTRUSTED is part of the register type, since they can never be in
an undefined state, and their lifetime will always be valid.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h              | 28 ++++++++++++++++--------
 include/linux/filter.h           |  8 +++----
 kernel/bpf/btf.c                 | 16 ++++++++++----
 kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
 net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
 net/core/filter.c                | 34 ++++++++++++-----------------
 net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
 net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
 8 files changed, 99 insertions(+), 68 deletions(-)

Comments

Andrii Nakryiko Nov. 8, 2022, 11:29 p.m. UTC | #1
On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
> program BTF. This is indicated by the presence of MEM_ALLOC type flag in
> reg->type to avoid having to check btf_is_kernel when trying to match
> argument types in helpers.
>
> Refactor btf_struct_access callback to just take bpf_reg_state instead
> of btf and btf_type paramters. Note that the call site in
> check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
> dummy reg on stack. Since only the type, btf, and btf_id of the register
> matter for the checks, it can be done so without complicating the usual
> cases elsewhere in the verifier where reg->btf and reg->btf_id is used
> verbatim.
>
> Whenever walking such types, any pointers being walked will always yield
> a SCALAR instead of pointer. In the future we might permit kptr inside
> local kptr (either kernel or local), and it would be permitted only in
> that case.
>
> For now, these local kptrs will always be referenced in verifier
> context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> to such objects, as long fields that are special are not touched
> (support for which will be added in subsequent patches). Note that once
> such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
> write to it.
>
> No PROBE_MEM handling is therefore done for loads into this type unless
> PTR_UNTRUSTED is part of the register type, since they can never be in
> an undefined state, and their lifetime will always be valid.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h              | 28 ++++++++++++++++--------
>  include/linux/filter.h           |  8 +++----
>  kernel/bpf/btf.c                 | 16 ++++++++++----
>  kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
>  net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
>  net/core/filter.c                | 34 ++++++++++++-----------------
>  net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
>  net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
>  8 files changed, 99 insertions(+), 68 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index afc1c51b59ff..75dbd2ecf80a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -524,6 +524,11 @@ enum bpf_type_flag {
>         /* Size is known at compile time. */
>         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
>
> +       /* MEM is of a type from program BTF, not kernel BTF. This is used to
> +        * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> +        */
> +       MEM_ALLOC               = BIT(11 + BPF_BASE_TYPE_BITS),
> +

you fixed one naming confusion with RINGBUF and basically are creating
a new one, where "ALLOC" means "local kptr"... If we are stuck with
"local kptr" (which I find very confusing as well, but that's beside
the point), why not stick to the whole "local" terminology here?
MEM_LOCAL?

>         __BPF_TYPE_FLAG_MAX,
>         __BPF_TYPE_LAST_FLAG    = __BPF_TYPE_FLAG_MAX - 1,
>  };
> @@ -771,6 +776,7 @@ struct bpf_prog_ops {
>                         union bpf_attr __user *uattr);
>  };
>

[...]

> -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> -                     const struct btf_type *t, int off, int size,
> -                     enum bpf_access_type atype __maybe_unused,
> +int btf_struct_access(struct bpf_verifier_log *log,
> +                     const struct bpf_reg_state *reg,
> +                     int off, int size, enum bpf_access_type atype __maybe_unused,
>                       u32 *next_btf_id, enum bpf_type_flag *flag)
>  {
> +       const struct btf *btf = reg->btf;
>         enum bpf_type_flag tmp_flag = 0;
> +       const struct btf_type *t;
> +       u32 id = reg->btf_id;
>         int err;
> -       u32 id;
>
> +       t = btf_type_by_id(btf, id);
>         do {
>                 err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
>
>                 switch (err) {
>                 case WALK_PTR:
> +                       /* For local types, the destination register cannot
> +                        * become a pointer again.
> +                        */
> +                       if (type_is_local_kptr(reg->type))
> +                               return SCALAR_VALUE;

passing the entire bpf_reg_state just to differentiate between local
vs kernel pointer seems like a huge overkill. bpf_reg_state is quite a
complicated and extensive amount of state, and it seems cleaner to
just pass it as a flag whether to allow pointer chasing or not. At
least then we know we only care about that specific aspect, not about
dozens of other possible fields of bpf_reg_state.


>                         /* If we found the pointer or scalar on t+off,
>                          * we're done.
>                          */

[...]
Kumar Kartikeya Dwivedi Nov. 9, 2022, midnight UTC | #2
On Wed, Nov 09, 2022 at 04:59:41AM IST, Andrii Nakryiko wrote:
> On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
> > program BTF. This is indicated by the presence of MEM_ALLOC type flag in
> > reg->type to avoid having to check btf_is_kernel when trying to match
> > argument types in helpers.
> >
> > Refactor btf_struct_access callback to just take bpf_reg_state instead
> > of btf and btf_type paramters. Note that the call site in
> > check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
> > dummy reg on stack. Since only the type, btf, and btf_id of the register
> > matter for the checks, it can be done so without complicating the usual
> > cases elsewhere in the verifier where reg->btf and reg->btf_id is used
> > verbatim.
> >
> > Whenever walking such types, any pointers being walked will always yield
> > a SCALAR instead of pointer. In the future we might permit kptr inside
> > local kptr (either kernel or local), and it would be permitted only in
> > that case.
> >
> > For now, these local kptrs will always be referenced in verifier
> > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > to such objects, as long fields that are special are not touched
> > (support for which will be added in subsequent patches). Note that once
> > such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
> > write to it.
> >
> > No PROBE_MEM handling is therefore done for loads into this type unless
> > PTR_UNTRUSTED is part of the register type, since they can never be in
> > an undefined state, and their lifetime will always be valid.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h              | 28 ++++++++++++++++--------
> >  include/linux/filter.h           |  8 +++----
> >  kernel/bpf/btf.c                 | 16 ++++++++++----
> >  kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
> >  net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
> >  net/core/filter.c                | 34 ++++++++++++-----------------
> >  net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
> >  net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
> >  8 files changed, 99 insertions(+), 68 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index afc1c51b59ff..75dbd2ecf80a 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -524,6 +524,11 @@ enum bpf_type_flag {
> >         /* Size is known at compile time. */
> >         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
> >
> > +       /* MEM is of a type from program BTF, not kernel BTF. This is used to
> > +        * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> > +        */
> > +       MEM_ALLOC               = BIT(11 + BPF_BASE_TYPE_BITS),
> > +
>
> you fixed one naming confusion with RINGBUF and basically are creating
> a new one, where "ALLOC" means "local kptr"... If we are stuck with
> "local kptr" (which I find very confusing as well, but that's beside
> the point), why not stick to the whole "local" terminology here?
> MEM_LOCAL?
>

See the discussion about this in v4:
https://lore.kernel.org/bpf/20221104075113.5ighwdvero4mugu7@apollo

It was MEM_TYPE_LOCAL before. Also, better naming suggestions are always
welcome, I asked the same in that message as well.

> >         __BPF_TYPE_FLAG_MAX,
> >         __BPF_TYPE_LAST_FLAG    = __BPF_TYPE_FLAG_MAX - 1,
> >  };
> > @@ -771,6 +776,7 @@ struct bpf_prog_ops {
> >                         union bpf_attr __user *uattr);
> >  };
> >
>
> [...]
>
> > -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> > -                     const struct btf_type *t, int off, int size,
> > -                     enum bpf_access_type atype __maybe_unused,
> > +int btf_struct_access(struct bpf_verifier_log *log,
> > +                     const struct bpf_reg_state *reg,
> > +                     int off, int size, enum bpf_access_type atype __maybe_unused,
> >                       u32 *next_btf_id, enum bpf_type_flag *flag)
> >  {
> > +       const struct btf *btf = reg->btf;
> >         enum bpf_type_flag tmp_flag = 0;
> > +       const struct btf_type *t;
> > +       u32 id = reg->btf_id;
> >         int err;
> > -       u32 id;
> >
> > +       t = btf_type_by_id(btf, id);
> >         do {
> >                 err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
> >
> >                 switch (err) {
> >                 case WALK_PTR:
> > +                       /* For local types, the destination register cannot
> > +                        * become a pointer again.
> > +                        */
> > +                       if (type_is_local_kptr(reg->type))
> > +                               return SCALAR_VALUE;
>
> passing the entire bpf_reg_state just to differentiate between local
> vs kernel pointer seems like a huge overkill. bpf_reg_state is quite a
> complicated and extensive amount of state, and it seems cleaner to
> just pass it as a flag whether to allow pointer chasing or not. At
> least then we know we only care about that specific aspect, not about
> dozens of other possible fields of bpf_reg_state.
>

I agree that the separation is usually better, especially because this is also a
callback. I don't feel too strong about this though, we certainly do pass the
whole reg to functions which only work on a specific type of pointer. Though the
concern in this case is justified as it's not only an internal function but also
a callback.

It was just a bool in the RFC.
But in https://lore.kernel.org/bpf/20220907233023.x3uclwlnjuhftvtb@macbook-pro-4.dhcp.thefacebook.com
Alexei suggested passing reg instead.
From the link:
> imo it's cleaner to pass 'reg' instead of 'reg->btf',
> so we don't have to pass another boolean.
> And check type_is_local(reg) inside btf_struct_access().
Andrii Nakryiko Nov. 9, 2022, 12:36 a.m. UTC | #3
On Tue, Nov 8, 2022 at 4:00 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Wed, Nov 09, 2022 at 04:59:41AM IST, Andrii Nakryiko wrote:
> > On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
> > > program BTF. This is indicated by the presence of MEM_ALLOC type flag in
> > > reg->type to avoid having to check btf_is_kernel when trying to match
> > > argument types in helpers.
> > >
> > > Refactor btf_struct_access callback to just take bpf_reg_state instead
> > > of btf and btf_type paramters. Note that the call site in
> > > check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
> > > dummy reg on stack. Since only the type, btf, and btf_id of the register
> > > matter for the checks, it can be done so without complicating the usual
> > > cases elsewhere in the verifier where reg->btf and reg->btf_id is used
> > > verbatim.
> > >
> > > Whenever walking such types, any pointers being walked will always yield
> > > a SCALAR instead of pointer. In the future we might permit kptr inside
> > > local kptr (either kernel or local), and it would be permitted only in
> > > that case.
> > >
> > > For now, these local kptrs will always be referenced in verifier
> > > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > > to such objects, as long fields that are special are not touched
> > > (support for which will be added in subsequent patches). Note that once
> > > such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
> > > write to it.
> > >
> > > No PROBE_MEM handling is therefore done for loads into this type unless
> > > PTR_UNTRUSTED is part of the register type, since they can never be in
> > > an undefined state, and their lifetime will always be valid.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  include/linux/bpf.h              | 28 ++++++++++++++++--------
> > >  include/linux/filter.h           |  8 +++----
> > >  kernel/bpf/btf.c                 | 16 ++++++++++----
> > >  kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
> > >  net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
> > >  net/core/filter.c                | 34 ++++++++++++-----------------
> > >  net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
> > >  net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
> > >  8 files changed, 99 insertions(+), 68 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index afc1c51b59ff..75dbd2ecf80a 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -524,6 +524,11 @@ enum bpf_type_flag {
> > >         /* Size is known at compile time. */
> > >         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
> > >
> > > +       /* MEM is of a type from program BTF, not kernel BTF. This is used to
> > > +        * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> > > +        */
> > > +       MEM_ALLOC               = BIT(11 + BPF_BASE_TYPE_BITS),
> > > +
> >
> > you fixed one naming confusion with RINGBUF and basically are creating
> > a new one, where "ALLOC" means "local kptr"... If we are stuck with
> > "local kptr" (which I find very confusing as well, but that's beside
> > the point), why not stick to the whole "local" terminology here?
> > MEM_LOCAL?
> >
>
> See the discussion about this in v4:
> https://lore.kernel.org/bpf/20221104075113.5ighwdvero4mugu7@apollo
>
> It was MEM_TYPE_LOCAL before. Also, better naming suggestions are always
> welcome, I asked the same in that message as well.

Sorry, I haven't followed <v5. Don't have perfect name, but logically
this is BPF program memory. So "prog_kptr" would be something to
convert this, but I'm not super happy with such a name. "user_kptr"
would be too confusing, drawing incorrect "kernel space vs user space"
comparison, while both are kernel memory. It's BPF-side kptr, so
"bpf_kptr", but also not great.

So that's why didn't suggest anything specific, but at least as far as
MEM_xxx flag goes, MEM_LOCAL_KPTR is better than MEM_ALLOC, IMO. It's
at least consistent with the official name of the concept it
represents.

>
> > >         __BPF_TYPE_FLAG_MAX,
> > >         __BPF_TYPE_LAST_FLAG    = __BPF_TYPE_FLAG_MAX - 1,
> > >  };
> > > @@ -771,6 +776,7 @@ struct bpf_prog_ops {
> > >                         union bpf_attr __user *uattr);
> > >  };
> > >
> >
> > [...]
> >
> > > -int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
> > > -                     const struct btf_type *t, int off, int size,
> > > -                     enum bpf_access_type atype __maybe_unused,
> > > +int btf_struct_access(struct bpf_verifier_log *log,
> > > +                     const struct bpf_reg_state *reg,
> > > +                     int off, int size, enum bpf_access_type atype __maybe_unused,
> > >                       u32 *next_btf_id, enum bpf_type_flag *flag)
> > >  {
> > > +       const struct btf *btf = reg->btf;
> > >         enum bpf_type_flag tmp_flag = 0;
> > > +       const struct btf_type *t;
> > > +       u32 id = reg->btf_id;
> > >         int err;
> > > -       u32 id;
> > >
> > > +       t = btf_type_by_id(btf, id);
> > >         do {
> > >                 err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
> > >
> > >                 switch (err) {
> > >                 case WALK_PTR:
> > > +                       /* For local types, the destination register cannot
> > > +                        * become a pointer again.
> > > +                        */
> > > +                       if (type_is_local_kptr(reg->type))
> > > +                               return SCALAR_VALUE;
> >
> > passing the entire bpf_reg_state just to differentiate between local
> > vs kernel pointer seems like a huge overkill. bpf_reg_state is quite a
> > complicated and extensive amount of state, and it seems cleaner to
> > just pass it as a flag whether to allow pointer chasing or not. At
> > least then we know we only care about that specific aspect, not about
> > dozens of other possible fields of bpf_reg_state.
> >
>
> I agree that the separation is usually better, especially because this is also a
> callback. I don't feel too strong about this though, we certainly do pass the
> whole reg to functions which only work on a specific type of pointer. Though the

Yeah, and then it takes a lot of grepping and jumping around the code
to verify that only one simple field out of the entire bpf_reg_state
is actually used. I'd say this is actually bad that we do pass it
around so willy-nilly. Verifier code is already a hot complex mess,
let's not actively make it harder than necessary.

> concern in this case is justified as it's not only an internal function but also
> a callback.
>
> It was just a bool in the RFC.
> But in https://lore.kernel.org/bpf/20220907233023.x3uclwlnjuhftvtb@macbook-pro-4.dhcp.thefacebook.com
> Alexei suggested passing reg instead.
> From the link:
> > imo it's cleaner to pass 'reg' instead of 'reg->btf',
> > so we don't have to pass another boolean.
> > And check type_is_local(reg) inside btf_struct_access().

I sympathize with "too many input args" (especially if it's a bunch of
bools) argument, but see above, I find it increasingly harder to know
what parts of complex internal register state is used by helper
functions and which are not.

And the fact that we have to construct a fake register state in some
case is a red flag to me. Pass enum bpf_reg_type type to avoid passing
true/false. Or let's invent a new enum. Or extend enum bpf_access_type
to have READ_NOPTR/WRITE_NOPTR or something like that. Don't know.

This isn't a major issue, I can live with this just fine, but this
definitely doesn't feel like a clean approach.
Alexei Starovoitov Nov. 9, 2022, 1:32 a.m. UTC | #4
On Tue, Nov 8, 2022 at 4:36 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Nov 8, 2022 at 4:00 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Wed, Nov 09, 2022 at 04:59:41AM IST, Andrii Nakryiko wrote:
> > > On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > >
> > > > Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
> > > > program BTF. This is indicated by the presence of MEM_ALLOC type flag in
> > > > reg->type to avoid having to check btf_is_kernel when trying to match
> > > > argument types in helpers.
> > > >
> > > > Refactor btf_struct_access callback to just take bpf_reg_state instead
> > > > of btf and btf_type paramters. Note that the call site in
> > > > check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
> > > > dummy reg on stack. Since only the type, btf, and btf_id of the register
> > > > matter for the checks, it can be done so without complicating the usual
> > > > cases elsewhere in the verifier where reg->btf and reg->btf_id is used
> > > > verbatim.
> > > >
> > > > Whenever walking such types, any pointers being walked will always yield
> > > > a SCALAR instead of pointer. In the future we might permit kptr inside
> > > > local kptr (either kernel or local), and it would be permitted only in
> > > > that case.
> > > >
> > > > For now, these local kptrs will always be referenced in verifier
> > > > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > > > to such objects, as long fields that are special are not touched
> > > > (support for which will be added in subsequent patches). Note that once
> > > > such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
> > > > write to it.
> > > >
> > > > No PROBE_MEM handling is therefore done for loads into this type unless
> > > > PTR_UNTRUSTED is part of the register type, since they can never be in
> > > > an undefined state, and their lifetime will always be valid.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > >  include/linux/bpf.h              | 28 ++++++++++++++++--------
> > > >  include/linux/filter.h           |  8 +++----
> > > >  kernel/bpf/btf.c                 | 16 ++++++++++----
> > > >  kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
> > > >  net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
> > > >  net/core/filter.c                | 34 ++++++++++++-----------------
> > > >  net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
> > > >  net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
> > > >  8 files changed, 99 insertions(+), 68 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index afc1c51b59ff..75dbd2ecf80a 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -524,6 +524,11 @@ enum bpf_type_flag {
> > > >         /* Size is known at compile time. */
> > > >         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
> > > >
> > > > +       /* MEM is of a type from program BTF, not kernel BTF. This is used to
> > > > +        * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> > > > +        */
> > > > +       MEM_ALLOC               = BIT(11 + BPF_BASE_TYPE_BITS),
> > > > +
> > >
> > > you fixed one naming confusion with RINGBUF and basically are creating
> > > a new one, where "ALLOC" means "local kptr"... If we are stuck with
> > > "local kptr" (which I find very confusing as well, but that's beside
> > > the point), why not stick to the whole "local" terminology here?
> > > MEM_LOCAL?
> > >
> >
> > See the discussion about this in v4:
> > https://lore.kernel.org/bpf/20221104075113.5ighwdvero4mugu7@apollo
> >
> > It was MEM_TYPE_LOCAL before. Also, better naming suggestions are always
> > welcome, I asked the same in that message as well.
>
> Sorry, I haven't followed <v5. Don't have perfect name, but logically
> this is BPF program memory. So "prog_kptr" would be something to
> convert this, but I'm not super happy with such a name. "user_kptr"
> would be too confusing, drawing incorrect "kernel space vs user space"
> comparison, while both are kernel memory. It's BPF-side kptr, so
> "bpf_kptr", but also not great.

yep. I went through the same thinking process.

> So that's why didn't suggest anything specific, but at least as far as
> MEM_xxx flag goes, MEM_LOCAL_KPTR is better than MEM_ALLOC, IMO. It's
> at least consistent with the official name of the concept it
> represents.

"local kptr" doesn't fit here.
In libbpf, "local" is equally badly named.
If "local" was a good name we wouldn't have had this discussion.
So we need to fix it libbpf, but we should start with a proper
name in the kernel.
And "local kptr" is not it.

We must avoid exhausting bikeshedding too.
MEM_ALLOC is something we can use right now and
as long as "local kptr" doesn't appear in docs, comments and
commit logs we're good.
We can rename MEM_ALLOC to something else later.

In commit logs we can just say that this is
a pointer to an object allocated by the bpf program.
It's crystal clear definition whereas "local kptr" is nonsensical.

Going back to the kptr definition.
kptr was supposed to mean a pointer to a kernel object.
In that light "pointer to an object allocated by the bpf prog"
is something else.
Maybe "bptr" ?
In some ways bpf is a layer different from kernel space and user space.
Some people joked that there is ring-0 for kernel, ring-3 for user space
while bpf runs in ring-B.
Two new btf_tags __bptr and __bptr_ref (or may be just one?)
might be necessary as well to make it easier to distinguish
kernel and bpf prog allocated objects.

> >
> > It was just a bool in the RFC.
> > But in https://lore.kernel.org/bpf/20220907233023.x3uclwlnjuhftvtb@macbook-pro-4.dhcp.thefacebook.com
> > Alexei suggested passing reg instead.
> > From the link:
> > > imo it's cleaner to pass 'reg' instead of 'reg->btf',
> > > so we don't have to pass another boolean.
> > > And check type_is_local(reg) inside btf_struct_access().
>
> I sympathize with "too many input args" (especially if it's a bunch of
> bools) argument, but see above, I find it increasingly harder to know
> what parts of complex internal register state is used by helper
> functions and which are not.
>
> And the fact that we have to construct a fake register state in some
> case is a red flag to me. Pass enum bpf_reg_type type to avoid passing
> true/false. Or let's invent a new enum. Or extend enum bpf_access_type
> to have READ_NOPTR/WRITE_NOPTR or something like that. Don't know.
>
> This isn't a major issue, I can live with this just fine, but this
> definitely doesn't feel like a clean approach.

I think passing bpf_reg_state is a lesser evil _right_now_ and
I prefer we proceed this way, since other works are blocked on
this patch set.
We did plenty of refactoring of the verifier in the past.
There will be more in the future.
Kumar Kartikeya Dwivedi Nov. 9, 2022, 5 p.m. UTC | #5
On Wed, Nov 09, 2022 at 07:02:11AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 8, 2022 at 4:36 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Nov 8, 2022 at 4:00 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > On Wed, Nov 09, 2022 at 04:59:41AM IST, Andrii Nakryiko wrote:
> > > > On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > > >
> > > > > Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
> > > > > program BTF. This is indicated by the presence of MEM_ALLOC type flag in
> > > > > reg->type to avoid having to check btf_is_kernel when trying to match
> > > > > argument types in helpers.
> > > > >
> > > > > Refactor btf_struct_access callback to just take bpf_reg_state instead
> > > > > of btf and btf_type paramters. Note that the call site in
> > > > > check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
> > > > > dummy reg on stack. Since only the type, btf, and btf_id of the register
> > > > > matter for the checks, it can be done so without complicating the usual
> > > > > cases elsewhere in the verifier where reg->btf and reg->btf_id is used
> > > > > verbatim.
> > > > >
> > > > > Whenever walking such types, any pointers being walked will always yield
> > > > > a SCALAR instead of pointer. In the future we might permit kptr inside
> > > > > local kptr (either kernel or local), and it would be permitted only in
> > > > > that case.
> > > > >
> > > > > For now, these local kptrs will always be referenced in verifier
> > > > > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > > > > to such objects, as long fields that are special are not touched
> > > > > (support for which will be added in subsequent patches). Note that once
> > > > > such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
> > > > > write to it.
> > > > >
> > > > > No PROBE_MEM handling is therefore done for loads into this type unless
> > > > > PTR_UNTRUSTED is part of the register type, since they can never be in
> > > > > an undefined state, and their lifetime will always be valid.
> > > > >
> > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > > ---
> > > > >  include/linux/bpf.h              | 28 ++++++++++++++++--------
> > > > >  include/linux/filter.h           |  8 +++----
> > > > >  kernel/bpf/btf.c                 | 16 ++++++++++----
> > > > >  kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
> > > > >  net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
> > > > >  net/core/filter.c                | 34 ++++++++++++-----------------
> > > > >  net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
> > > > >  net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
> > > > >  8 files changed, 99 insertions(+), 68 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index afc1c51b59ff..75dbd2ecf80a 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -524,6 +524,11 @@ enum bpf_type_flag {
> > > > >         /* Size is known at compile time. */
> > > > >         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
> > > > >
> > > > > +       /* MEM is of a type from program BTF, not kernel BTF. This is used to
> > > > > +        * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> > > > > +        */
> > > > > +       MEM_ALLOC               = BIT(11 + BPF_BASE_TYPE_BITS),
> > > > > +
> > > >
> > > > you fixed one naming confusion with RINGBUF and basically are creating
> > > > a new one, where "ALLOC" means "local kptr"... If we are stuck with
> > > > "local kptr" (which I find very confusing as well, but that's beside
> > > > the point), why not stick to the whole "local" terminology here?
> > > > MEM_LOCAL?
> > > >
> > >
> > > See the discussion about this in v4:
> > > https://lore.kernel.org/bpf/20221104075113.5ighwdvero4mugu7@apollo
> > >
> > > It was MEM_TYPE_LOCAL before. Also, better naming suggestions are always
> > > welcome, I asked the same in that message as well.
> >
> > Sorry, I haven't followed <v5. Don't have perfect name, but logically
> > this is BPF program memory. So "prog_kptr" would be something to
> > convert this, but I'm not super happy with such a name. "user_kptr"
> > would be too confusing, drawing incorrect "kernel space vs user space"
> > comparison, while both are kernel memory. It's BPF-side kptr, so
> > "bpf_kptr", but also not great.
>
> yep. I went through the same thinking process.
>
> > So that's why didn't suggest anything specific, but at least as far as
> > MEM_xxx flag goes, MEM_LOCAL_KPTR is better than MEM_ALLOC, IMO. It's
> > at least consistent with the official name of the concept it
> > represents.
>
> "local kptr" doesn't fit here.
> In libbpf, "local" is equally badly named.
> If "local" was a good name we wouldn't have had this discussion.
> So we need to fix it libbpf, but we should start with a proper
> name in the kernel.
> And "local kptr" is not it.
>
> We must avoid exhausting bikeshedding too.
> MEM_ALLOC is something we can use right now and
> as long as "local kptr" doesn't appear in docs, comments and
> commit logs we're good.
> We can rename MEM_ALLOC to something else later.
>
> In commit logs we can just say that this is
> a pointer to an object allocated by the bpf program.
> It's crystal clear definition whereas "local kptr" is nonsensical.
>

Ok, I'll drop the naming everywhere.

> Going back to the kptr definition.
> kptr was supposed to mean a pointer to a kernel object.
> In that light "pointer to an object allocated by the bpf prog"
> is something else.
> Maybe "bptr" ?
> In some ways bpf is a layer different from kernel space and user space.
> Some people joked that there is ring-0 for kernel, ring-3 for user space
> while bpf runs in ring-B.
> Two new btf_tags __bptr and __bptr_ref (or may be just one?)
> might be necessary as well to make it easier to distinguish
> kernel and bpf prog allocated objects.
>

There's also the option of simply using __kptr and __kptr_ref for these (without
__local tag in BPF maps) and doing two stage name lookup for the types. Kernel
BTF takes precedence, if not found there, then it searches program BTF for a
local type. It would probably the simplest for users.

struct map_value {
	struct nf_conn __kptr_ref *ct; // kernel
	struct foo __kptr_ref *f; // local
	struct task_struct __kptr_ref *t; // kernel
	struct bar __kptr_ref *b; // local
}

We can revisit this again once the post the follow up to store them in maps.
Andrii Nakryiko Nov. 9, 2022, 11:21 p.m. UTC | #6
On Tue, Nov 8, 2022 at 5:32 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Nov 8, 2022 at 4:36 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Nov 8, 2022 at 4:00 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > On Wed, Nov 09, 2022 at 04:59:41AM IST, Andrii Nakryiko wrote:
> > > > On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > > >
> > > > > Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
> > > > > program BTF. This is indicated by the presence of MEM_ALLOC type flag in
> > > > > reg->type to avoid having to check btf_is_kernel when trying to match
> > > > > argument types in helpers.
> > > > >
> > > > > Refactor btf_struct_access callback to just take bpf_reg_state instead
> > > > > of btf and btf_type paramters. Note that the call site in
> > > > > check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
> > > > > dummy reg on stack. Since only the type, btf, and btf_id of the register
> > > > > matter for the checks, it can be done so without complicating the usual
> > > > > cases elsewhere in the verifier where reg->btf and reg->btf_id is used
> > > > > verbatim.
> > > > >
> > > > > Whenever walking such types, any pointers being walked will always yield
> > > > > a SCALAR instead of pointer. In the future we might permit kptr inside
> > > > > local kptr (either kernel or local), and it would be permitted only in
> > > > > that case.
> > > > >
> > > > > For now, these local kptrs will always be referenced in verifier
> > > > > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > > > > to such objects, as long fields that are special are not touched
> > > > > (support for which will be added in subsequent patches). Note that once
> > > > > such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
> > > > > write to it.
> > > > >
> > > > > No PROBE_MEM handling is therefore done for loads into this type unless
> > > > > PTR_UNTRUSTED is part of the register type, since they can never be in
> > > > > an undefined state, and their lifetime will always be valid.
> > > > >
> > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > > ---
> > > > >  include/linux/bpf.h              | 28 ++++++++++++++++--------
> > > > >  include/linux/filter.h           |  8 +++----
> > > > >  kernel/bpf/btf.c                 | 16 ++++++++++----
> > > > >  kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
> > > > >  net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
> > > > >  net/core/filter.c                | 34 ++++++++++++-----------------
> > > > >  net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
> > > > >  net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
> > > > >  8 files changed, 99 insertions(+), 68 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index afc1c51b59ff..75dbd2ecf80a 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -524,6 +524,11 @@ enum bpf_type_flag {
> > > > >         /* Size is known at compile time. */
> > > > >         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
> > > > >
> > > > > +       /* MEM is of a type from program BTF, not kernel BTF. This is used to
> > > > > +        * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> > > > > +        */
> > > > > +       MEM_ALLOC               = BIT(11 + BPF_BASE_TYPE_BITS),
> > > > > +
> > > >
> > > > you fixed one naming confusion with RINGBUF and basically are creating
> > > > a new one, where "ALLOC" means "local kptr"... If we are stuck with
> > > > "local kptr" (which I find very confusing as well, but that's beside
> > > > the point), why not stick to the whole "local" terminology here?
> > > > MEM_LOCAL?
> > > >
> > >
> > > See the discussion about this in v4:
> > > https://lore.kernel.org/bpf/20221104075113.5ighwdvero4mugu7@apollo
> > >
> > > It was MEM_TYPE_LOCAL before. Also, better naming suggestions are always
> > > welcome, I asked the same in that message as well.
> >
> > Sorry, I haven't followed <v5. Don't have perfect name, but logically
> > this is BPF program memory. So "prog_kptr" would be something to
> > convert this, but I'm not super happy with such a name. "user_kptr"
> > would be too confusing, drawing incorrect "kernel space vs user space"
> > comparison, while both are kernel memory. It's BPF-side kptr, so
> > "bpf_kptr", but also not great.
>
> yep. I went through the same thinking process.
>
> > So that's why didn't suggest anything specific, but at least as far as
> > MEM_xxx flag goes, MEM_LOCAL_KPTR is better than MEM_ALLOC, IMO. It's
> > at least consistent with the official name of the concept it
> > represents.
>
> "local kptr" doesn't fit here.
> In libbpf, "local" is equally badly named.
> If "local" was a good name we wouldn't have had this discussion.
> So we need to fix it libbpf, but we should start with a proper
> name in the kernel.
> And "local kptr" is not it.
>
> We must avoid exhausting bikeshedding too.
> MEM_ALLOC is something we can use right now and
> as long as "local kptr" doesn't appear in docs, comments and
> commit logs we're good.
> We can rename MEM_ALLOC to something else later.

agreed

>
> In commit logs we can just say that this is
> a pointer to an object allocated by the bpf program.
> It's crystal clear definition whereas "local kptr" is nonsensical.
>
> Going back to the kptr definition.
> kptr was supposed to mean a pointer to a kernel object.
> In that light "pointer to an object allocated by the bpf prog"
> is something else.
> Maybe "bptr" ?
> In some ways bpf is a layer different from kernel space and user space.
> Some people joked that there is ring-0 for kernel, ring-3 for user space
> while bpf runs in ring-B.
> Two new btf_tags __bptr and __bptr_ref (or may be just one?)
> might be necessary as well to make it easier to distinguish
> kernel and bpf prog allocated objects.

bptr isn't terrific, but I don't have any good suggestions. bptr,
bpfptr, tptr (for "typed pointer" as opposed to untyped dynptr), all
kind of meh. I'm fine with whatever.

>
> > >
> > > It was just a bool in the RFC.
> > > But in https://lore.kernel.org/bpf/20220907233023.x3uclwlnjuhftvtb@macbook-pro-4.dhcp.thefacebook.com
> > > Alexei suggested passing reg instead.
> > > From the link:
> > > > imo it's cleaner to pass 'reg' instead of 'reg->btf',
> > > > so we don't have to pass another boolean.
> > > > And check type_is_local(reg) inside btf_struct_access().
> >
> > I sympathize with "too many input args" (especially if it's a bunch of
> > bools) argument, but see above, I find it increasingly harder to know
> > what parts of complex internal register state is used by helper
> > functions and which are not.
> >
> > And the fact that we have to construct a fake register state in some
> > case is a red flag to me. Pass enum bpf_reg_type type to avoid passing
> > true/false. Or let's invent a new enum. Or extend enum bpf_access_type
> > to have READ_NOPTR/WRITE_NOPTR or something like that. Don't know.
> >
> > This isn't a major issue, I can live with this just fine, but this
> > definitely doesn't feel like a clean approach.
>
> I think passing bpf_reg_state is a lesser evil _right_now_ and
> I prefer we proceed this way, since other works are blocked on
> this patch set.
> We did plenty of refactoring of the verifier in the past.
> There will be more in the future.

sure, not crucial
Andrii Nakryiko Nov. 9, 2022, 11:23 p.m. UTC | #7
On Wed, Nov 9, 2022 at 9:00 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Wed, Nov 09, 2022 at 07:02:11AM IST, Alexei Starovoitov wrote:
> > On Tue, Nov 8, 2022 at 4:36 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Tue, Nov 8, 2022 at 4:00 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 09, 2022 at 04:59:41AM IST, Andrii Nakryiko wrote:
> > > > > On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > > > >
> > > > > > Introduce local kptrs, i.e. PTR_TO_BTF_ID that point to a type in
> > > > > > program BTF. This is indicated by the presence of MEM_ALLOC type flag in
> > > > > > reg->type to avoid having to check btf_is_kernel when trying to match
> > > > > > argument types in helpers.
> > > > > >
> > > > > > Refactor btf_struct_access callback to just take bpf_reg_state instead
> > > > > > of btf and btf_type paramters. Note that the call site in
> > > > > > check_map_access now simulates access to a PTR_TO_BTF_ID by creating a
> > > > > > dummy reg on stack. Since only the type, btf, and btf_id of the register
> > > > > > matter for the checks, it can be done so without complicating the usual
> > > > > > cases elsewhere in the verifier where reg->btf and reg->btf_id is used
> > > > > > verbatim.
> > > > > >
> > > > > > Whenever walking such types, any pointers being walked will always yield
> > > > > > a SCALAR instead of pointer. In the future we might permit kptr inside
> > > > > > local kptr (either kernel or local), and it would be permitted only in
> > > > > > that case.
> > > > > >
> > > > > > For now, these local kptrs will always be referenced in verifier
> > > > > > context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
> > > > > > to such objects, as long fields that are special are not touched
> > > > > > (support for which will be added in subsequent patches). Note that once
> > > > > > such a local kptr is marked PTR_UNTRUSTED, it is no longer allowed to
> > > > > > write to it.
> > > > > >
> > > > > > No PROBE_MEM handling is therefore done for loads into this type unless
> > > > > > PTR_UNTRUSTED is part of the register type, since they can never be in
> > > > > > an undefined state, and their lifetime will always be valid.
> > > > > >
> > > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > > > ---
> > > > > >  include/linux/bpf.h              | 28 ++++++++++++++++--------
> > > > > >  include/linux/filter.h           |  8 +++----
> > > > > >  kernel/bpf/btf.c                 | 16 ++++++++++----
> > > > > >  kernel/bpf/verifier.c            | 37 ++++++++++++++++++++++++++------
> > > > > >  net/bpf/bpf_dummy_struct_ops.c   | 14 ++++++------
> > > > > >  net/core/filter.c                | 34 ++++++++++++-----------------
> > > > > >  net/ipv4/bpf_tcp_ca.c            | 13 ++++++-----
> > > > > >  net/netfilter/nf_conntrack_bpf.c | 17 ++++++---------
> > > > > >  8 files changed, 99 insertions(+), 68 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index afc1c51b59ff..75dbd2ecf80a 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -524,6 +524,11 @@ enum bpf_type_flag {
> > > > > >         /* Size is known at compile time. */
> > > > > >         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
> > > > > >
> > > > > > +       /* MEM is of a type from program BTF, not kernel BTF. This is used to
> > > > > > +        * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
> > > > > > +        */
> > > > > > +       MEM_ALLOC               = BIT(11 + BPF_BASE_TYPE_BITS),
> > > > > > +
> > > > >
> > > > > you fixed one naming confusion with RINGBUF and basically are creating
> > > > > a new one, where "ALLOC" means "local kptr"... If we are stuck with
> > > > > "local kptr" (which I find very confusing as well, but that's beside
> > > > > the point), why not stick to the whole "local" terminology here?
> > > > > MEM_LOCAL?
> > > > >
> > > >
> > > > See the discussion about this in v4:
> > > > https://lore.kernel.org/bpf/20221104075113.5ighwdvero4mugu7@apollo
> > > >
> > > > It was MEM_TYPE_LOCAL before. Also, better naming suggestions are always
> > > > welcome, I asked the same in that message as well.
> > >
> > > Sorry, I haven't followed <v5. Don't have perfect name, but logically
> > > this is BPF program memory. So "prog_kptr" would be something to
> > > convert this, but I'm not super happy with such a name. "user_kptr"
> > > would be too confusing, drawing incorrect "kernel space vs user space"
> > > comparison, while both are kernel memory. It's BPF-side kptr, so
> > > "bpf_kptr", but also not great.
> >
> > yep. I went through the same thinking process.
> >
> > > So that's why didn't suggest anything specific, but at least as far as
> > > MEM_xxx flag goes, MEM_LOCAL_KPTR is better than MEM_ALLOC, IMO. It's
> > > at least consistent with the official name of the concept it
> > > represents.
> >
> > "local kptr" doesn't fit here.
> > In libbpf, "local" is equally badly named.
> > If "local" was a good name we wouldn't have had this discussion.
> > So we need to fix it libbpf, but we should start with a proper
> > name in the kernel.
> > And "local kptr" is not it.
> >
> > We must avoid exhausting bikeshedding too.
> > MEM_ALLOC is something we can use right now and
> > as long as "local kptr" doesn't appear in docs, comments and
> > commit logs we're good.
> > We can rename MEM_ALLOC to something else later.
> >
> > In commit logs we can just say that this is
> > a pointer to an object allocated by the bpf program.
> > It's crystal clear definition whereas "local kptr" is nonsensical.
> >
>
> Ok, I'll drop the naming everywhere.
>
> > Going back to the kptr definition.
> > kptr was supposed to mean a pointer to a kernel object.
> > In that light "pointer to an object allocated by the bpf prog"
> > is something else.
> > Maybe "bptr" ?
> > In some ways bpf is a layer different from kernel space and user space.
> > Some people joked that there is ring-0 for kernel, ring-3 for user space
> > while bpf runs in ring-B.
> > Two new btf_tags __bptr and __bptr_ref (or may be just one?)
> > might be necessary as well to make it easier to distinguish
> > kernel and bpf prog allocated objects.
> >
>
> There's also the option of simply using __kptr and __kptr_ref for these (without
> __local tag in BPF maps) and doing two stage name lookup for the types. Kernel
> BTF takes precedence, if not found there, then it searches program BTF for a
> local type. It would probably the simplest for users.

Disagree about the simplest for users. It's going to be quite
confusing for anyone reading the code and trying to understand what's
going on. Explicit tag seems better to me. But it's subjective.

>
> struct map_value {
>         struct nf_conn __kptr_ref *ct; // kernel
>         struct foo __kptr_ref *f; // local
>         struct task_struct __kptr_ref *t; // kernel
>         struct bar __kptr_ref *b; // local
> }
>
> We can revisit this again once the post the follow up to store them in maps.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index afc1c51b59ff..75dbd2ecf80a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -524,6 +524,11 @@  enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	/* MEM is of a type from program BTF, not kernel BTF. This is used to
+	 * tag PTR_TO_BTF_ID allocated using bpf_obj_new.
+	 */
+	MEM_ALLOC		= BIT(11 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
@@ -771,6 +776,7 @@  struct bpf_prog_ops {
 			union bpf_attr __user *uattr);
 };
 
+struct bpf_reg_state;
 struct bpf_verifier_ops {
 	/* return eBPF function prototype for verification */
 	const struct bpf_func_proto *
@@ -792,9 +798,8 @@  struct bpf_verifier_ops {
 				  struct bpf_insn *dst,
 				  struct bpf_prog *prog, u32 *target_size);
 	int (*btf_struct_access)(struct bpf_verifier_log *log,
-				 const struct btf *btf,
-				 const struct btf_type *t, int off, int size,
-				 enum bpf_access_type atype,
+				 const struct bpf_reg_state *reg,
+				 int off, int size, enum bpf_access_type atype,
 				 u32 *next_btf_id, enum bpf_type_flag *flag);
 };
 
@@ -2080,9 +2085,9 @@  static inline bool bpf_tracing_btf_ctx_access(int off, int size,
 	return btf_ctx_access(off, size, type, prog, info);
 }
 
-int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
-		      const struct btf_type *t, int off, int size,
-		      enum bpf_access_type atype,
+int btf_struct_access(struct bpf_verifier_log *log,
+		      const struct bpf_reg_state *reg,
+		      int off, int size, enum bpf_access_type atype,
 		      u32 *next_btf_id, enum bpf_type_flag *flag);
 bool btf_struct_ids_match(struct bpf_verifier_log *log,
 			  const struct btf *btf, u32 id, int off,
@@ -2333,9 +2338,8 @@  static inline struct bpf_prog *bpf_prog_by_id(u32 id)
 }
 
 static inline int btf_struct_access(struct bpf_verifier_log *log,
-				    const struct btf *btf,
-				    const struct btf_type *t, int off, int size,
-				    enum bpf_access_type atype,
+				    const struct bpf_reg_state *reg,
+				    int off, int size, enum bpf_access_type atype,
 				    u32 *next_btf_id, enum bpf_type_flag *flag)
 {
 	return -EACCES;
@@ -2792,4 +2796,10 @@  struct bpf_key {
 	bool has_ref;
 };
 #endif /* CONFIG_KEYS */
+
+static inline bool type_is_local_kptr(u32 type)
+{
+	return type & MEM_ALLOC;
+}
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index efc42a6e3aed..787d35dbf5b0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -568,10 +568,10 @@  struct sk_filter {
 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 
 extern struct mutex nf_conn_btf_access_lock;
-extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
-				     const struct btf_type *t, int off, int size,
-				     enum bpf_access_type atype, u32 *next_btf_id,
-				     enum bpf_type_flag *flag);
+extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
+				     const struct bpf_reg_state *reg,
+				     int off, int size, enum bpf_access_type atype,
+				     u32 *next_btf_id, enum bpf_type_flag *flag);
 
 typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
 					  const struct bpf_insn *insnsi,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index d8f083b09e5e..4d6c8577bf17 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6015,20 +6015,28 @@  static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 	return -EINVAL;
 }
 
-int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
-		      const struct btf_type *t, int off, int size,
-		      enum bpf_access_type atype __maybe_unused,
+int btf_struct_access(struct bpf_verifier_log *log,
+		      const struct bpf_reg_state *reg,
+		      int off, int size, enum bpf_access_type atype __maybe_unused,
 		      u32 *next_btf_id, enum bpf_type_flag *flag)
 {
+	const struct btf *btf = reg->btf;
 	enum bpf_type_flag tmp_flag = 0;
+	const struct btf_type *t;
+	u32 id = reg->btf_id;
 	int err;
-	u32 id;
 
+	t = btf_type_by_id(btf, id);
 	do {
 		err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag);
 
 		switch (err) {
 		case WALK_PTR:
+			/* For local types, the destination register cannot
+			 * become a pointer again.
+			 */
+			if (type_is_local_kptr(reg->type))
+				return SCALAR_VALUE;
 			/* If we found the pointer or scalar on t+off,
 			 * we're done.
 			 */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fca156eca43..7dcb4629f764 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4682,17 +4682,28 @@  static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	if (env->ops->btf_struct_access) {
-		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
-						  off, size, atype, &btf_id, &flag);
+	if (env->ops->btf_struct_access && !type_is_local_kptr(reg->type)) {
+		if (!btf_is_kernel(reg->btf)) {
+			verbose(env, "verifier internal error: reg->btf must be kernel btf\n");
+			return -EFAULT;
+		}
+		ret = env->ops->btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
 	} else {
-		if (atype != BPF_READ) {
+		/* Writes are permitted with default btf_struct_access for local
+		 * kptrs (which always have ref_obj_id > 0), but not for
+		 * _untrusted_ local kptrs.
+		 */
+		if (atype != BPF_READ && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
 			verbose(env, "only read is supported\n");
 			return -EACCES;
 		}
 
-		ret = btf_struct_access(&env->log, reg->btf, t, off, size,
-					atype, &btf_id, &flag);
+		if (type_is_local_kptr(reg->type) && !reg->ref_obj_id) {
+			verbose(env, "verifier internal error: ref_obj_id for local kptr must be non-zero\n");
+			return -EFAULT;
+		}
+
+		ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag);
 	}
 
 	if (ret < 0)
@@ -4718,6 +4729,7 @@  static int check_ptr_to_map_access(struct bpf_verifier_env *env,
 {
 	struct bpf_reg_state *reg = regs + regno;
 	struct bpf_map *map = reg->map_ptr;
+	struct bpf_reg_state map_reg;
 	enum bpf_type_flag flag = 0;
 	const struct btf_type *t;
 	const char *tname;
@@ -4756,7 +4768,10 @@  static int check_ptr_to_map_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
-	ret = btf_struct_access(&env->log, btf_vmlinux, t, off, size, atype, &btf_id, &flag);
+	/* Simulate access to a PTR_TO_BTF_ID */
+	memset(&map_reg, 0, sizeof(map_reg));
+	mark_btf_ld_reg(env, &map_reg, 0, PTR_TO_BTF_ID, btf_vmlinux, *map->ops->map_btf_id, 0);
+	ret = btf_struct_access(&env->log, &map_reg, off, size, atype, &btf_id, &flag);
 	if (ret < 0)
 		return ret;
 
@@ -5966,6 +5981,7 @@  int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID | MEM_ALLOC:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * it's fixed offset must be 0.	In the other cases, fixed offset
 		 * can be non-zero.
@@ -13648,6 +13664,13 @@  static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			break;
 		case PTR_TO_BTF_ID:
 		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
+		/* PTR_TO_BTF_ID | MEM_ALLOC always has a valid lifetime, unlike
+		 * PTR_TO_BTF_ID, and an active ref_obj_id, but the same cannot
+		 * be said once it is marked PTR_UNTRUSTED, hence we must handle
+		 * any faults for loads into such types. BPF_WRITE is disallowed
+		 * for this case.
+		 */
+		case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index e78dadfc5829..2d434c1f4617 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -156,29 +156,29 @@  static bool bpf_dummy_ops_is_valid_access(int off, int size,
 }
 
 static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log,
-					   const struct btf *btf,
-					   const struct btf_type *t, int off,
-					   int size, enum bpf_access_type atype,
+					   const struct bpf_reg_state *reg,
+					   int off, int size, enum bpf_access_type atype,
 					   u32 *next_btf_id,
 					   enum bpf_type_flag *flag)
 {
 	const struct btf_type *state;
+	const struct btf_type *t;
 	s32 type_id;
 	int err;
 
-	type_id = btf_find_by_name_kind(btf, "bpf_dummy_ops_state",
+	type_id = btf_find_by_name_kind(reg->btf, "bpf_dummy_ops_state",
 					BTF_KIND_STRUCT);
 	if (type_id < 0)
 		return -EINVAL;
 
-	state = btf_type_by_id(btf, type_id);
+	t = btf_type_by_id(reg->btf, reg->btf_id);
+	state = btf_type_by_id(reg->btf, type_id);
 	if (t != state) {
 		bpf_log(log, "only access to bpf_dummy_ops_state is supported\n");
 		return -EACCES;
 	}
 
-	err = btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-				flag);
+	err = btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 	if (err < 0)
 		return err;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index cb3b635e35be..199632e6a7cb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8651,28 +8651,25 @@  static bool tc_cls_act_is_valid_access(int off, int size,
 DEFINE_MUTEX(nf_conn_btf_access_lock);
 EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock);
 
-int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
-			      const struct btf_type *t, int off, int size,
-			      enum bpf_access_type atype, u32 *next_btf_id,
-			      enum bpf_type_flag *flag);
+int (*nfct_btf_struct_access)(struct bpf_verifier_log *log,
+			      const struct bpf_reg_state *reg,
+			      int off, int size, enum bpf_access_type atype,
+			      u32 *next_btf_id, enum bpf_type_flag *flag);
 EXPORT_SYMBOL_GPL(nfct_btf_struct_access);
 
 static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
-					const struct btf *btf,
-					const struct btf_type *t, int off,
-					int size, enum bpf_access_type atype,
-					u32 *next_btf_id,
-					enum bpf_type_flag *flag)
+					const struct bpf_reg_state *reg,
+					int off, int size, enum bpf_access_type atype,
+					u32 *next_btf_id, enum bpf_type_flag *flag)
 {
 	int ret = -EACCES;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-					 flag);
+		return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 
 	mutex_lock(&nf_conn_btf_access_lock);
 	if (nfct_btf_struct_access)
-		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
+		ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 	mutex_unlock(&nf_conn_btf_access_lock);
 
 	return ret;
@@ -8738,21 +8735,18 @@  void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
 static int xdp_btf_struct_access(struct bpf_verifier_log *log,
-				 const struct btf *btf,
-				 const struct btf_type *t, int off,
-				 int size, enum bpf_access_type atype,
-				 u32 *next_btf_id,
-				 enum bpf_type_flag *flag)
+				 const struct bpf_reg_state *reg,
+				 int off, int size, enum bpf_access_type atype,
+				 u32 *next_btf_id, enum bpf_type_flag *flag)
 {
 	int ret = -EACCES;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-					 flag);
+		return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 
 	mutex_lock(&nf_conn_btf_access_lock);
 	if (nfct_btf_struct_access)
-		ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
+		ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 	mutex_unlock(&nf_conn_btf_access_lock);
 
 	return ret;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 6da16ae6a962..d15c91de995f 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -69,18 +69,17 @@  static bool bpf_tcp_ca_is_valid_access(int off, int size,
 }
 
 static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
-					const struct btf *btf,
-					const struct btf_type *t, int off,
-					int size, enum bpf_access_type atype,
-					u32 *next_btf_id,
-					enum bpf_type_flag *flag)
+					const struct bpf_reg_state *reg,
+					int off, int size, enum bpf_access_type atype,
+					u32 *next_btf_id, enum bpf_type_flag *flag)
 {
+	const struct btf_type *t;
 	size_t end;
 
 	if (atype == BPF_READ)
-		return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
-					 flag);
+		return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag);
 
+	t = btf_type_by_id(reg->btf, reg->btf_id);
 	if (t != tcp_sock_type) {
 		bpf_log(log, "only read is supported\n");
 		return -EACCES;
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 8639e7efd0e2..24002bc61e07 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -191,19 +191,16 @@  BTF_ID(struct, nf_conn___init)
 
 /* Check writes into `struct nf_conn` */
 static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
-					   const struct btf *btf,
-					   const struct btf_type *t, int off,
-					   int size, enum bpf_access_type atype,
-					   u32 *next_btf_id,
-					   enum bpf_type_flag *flag)
+					   const struct bpf_reg_state *reg,
+					   int off, int size, enum bpf_access_type atype,
+					   u32 *next_btf_id, enum bpf_type_flag *flag)
 {
-	const struct btf_type *ncit;
-	const struct btf_type *nct;
+	const struct btf_type *ncit, *nct, *t;
 	size_t end;
 
-	ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
-	nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
-
+	ncit = btf_type_by_id(reg->btf, btf_nf_conn_ids[1]);
+	nct = btf_type_by_id(reg->btf, btf_nf_conn_ids[0]);
+	t = btf_type_by_id(reg->btf, reg->btf_id);
 	if (t != nct && t != ncit) {
 		bpf_log(log, "only read is supported\n");
 		return -EACCES;