[v2,2/2] bpf/selftests: Add selftests for new task kfuncs

Message ID	20221001144716.3403120-3-void@manifault.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: David Vernet <void@manifault.com> To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev Cc: kernel-team@fb.com, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, yhs@fb.com, song@kernel.org, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, tj@kernel.org, memxor@gmail.com Subject: [PATCH v2 2/2] bpf/selftests: Add selftests for new task kfuncs Date: Sat, 1 Oct 2022 09:47:16 -0500 Message-Id: <20221001144716.3403120-3-void@manifault.com> In-Reply-To: <20221001144716.3403120-1-void@manifault.com> References: <20221001144716.3403120-1-void@manifault.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Support storing struct task_struct objects as kptrs \| expand [v2,0/2] Support storing struct task_struct objects as kptrs [v2,1/2] bpf: Add kfuncs for storing struct task_struct * as a kptr [v2,2/2] bpf/selftests: Add selftests for new task kfuncs

Context	Check	Description
netdev/tree_selection	success	Guessing tree name failed - patch did not apply
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11	fail	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9	fail	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	fail	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for test_maps on s390x with gcc

David Vernet Oct. 1, 2022, 2:47 p.m. UTC

A previous change added a series of kfuncs for storing struct
task_struct objects as referenced kptrs. This patch adds a new
task_kfunc test suite for validating their expected behavior.

Signed-off-by: David Vernet <void@manifault.com>
---
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/task_kfunc.c     | 155 ++++++++++++
 .../selftests/bpf/progs/task_kfunc_common.h   |  83 +++++++
 .../selftests/bpf/progs/task_kfunc_failure.c  | 225 ++++++++++++++++++
 .../selftests/bpf/progs/task_kfunc_success.c  | 113 +++++++++
 5 files changed, 577 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c

Kumar Kartikeya Dwivedi Oct. 3, 2022, 12:32 a.m. UTC | #1

On Sat, 1 Oct 2022 at 16:47, David Vernet <void@manifault.com> wrote:
>
> A previous change added a series of kfuncs for storing struct
> task_struct objects as referenced kptrs. This patch adds a new
> task_kfunc test suite for validating their expected behavior.
>
> Signed-off-by: David Vernet <void@manifault.com>
> ---
>  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
>  .../selftests/bpf/prog_tests/task_kfunc.c     | 155 ++++++++++++
>  .../selftests/bpf/progs/task_kfunc_common.h   |  83 +++++++
>  .../selftests/bpf/progs/task_kfunc_failure.c  | 225 ++++++++++++++++++
>  .../selftests/bpf/progs/task_kfunc_success.c  | 113 +++++++++
>  5 files changed, 577 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
>  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c
>
> diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> index 17e074eb42b8..4c34818ec1ee 100644
> --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> @@ -75,3 +75,4 @@ user_ringbuf                             # failed to find kernel BTF type ID of
>  lookup_key                               # JIT does not support calling kernel function                                (kfunc)
>  verify_pkcs7_sig                         # JIT does not support calling kernel function                                (kfunc)
>  kfunc_dynptr_param                       # JIT does not support calling kernel function                                (kfunc)
> +task_kfunc                               # JIT does not support calling kernel function
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> new file mode 100644
> index 000000000000..6c577fbca8f7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> @@ -0,0 +1,155 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#define _GNU_SOURCE
> +#include <sys/wait.h>
> +#include <test_progs.h>
> +#include <unistd.h>
> +
> +#include "task_kfunc_failure.skel.h"
> +#include "task_kfunc_success.skel.h"
> +
> +static size_t log_buf_sz = 1 << 20; /* 1 MB */
> +static char obj_log_buf[1048576];
> +
> +static struct task_kfunc_success *open_load_task_kfunc_skel(void)
> +{
> +       struct task_kfunc_success *skel;
> +       int err;
> +
> +       skel = task_kfunc_success__open();
> +       if (!ASSERT_OK_PTR(skel, "skel_open"))
> +               return NULL;
> +
> +       skel->bss->pid = getpid();
> +
> +       err = task_kfunc_success__load(skel);
> +       if (!ASSERT_OK(err, "skel_load"))
> +               goto cleanup;
> +
> +       return skel;
> +
> +cleanup:
> +       task_kfunc_success__destroy(skel);
> +       return NULL;
> +}
> +
> +static void run_success_test(const char *prog_name)
> +{
> +       struct task_kfunc_success *skel;
> +       int status;
> +       pid_t child_pid;
> +       struct bpf_program *prog;
> +       struct bpf_link *link = NULL;
> +
> +       skel = open_load_task_kfunc_skel();
> +       if (!ASSERT_OK_PTR(skel, "open_load_skel"))
> +               return;
> +
> +       if (!ASSERT_OK(skel->bss->err, "pre_spawn_err"))
> +               goto cleanup;
> +
> +       prog = bpf_object__find_program_by_name(skel->obj, prog_name);
> +       if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +               goto cleanup;
> +
> +       link = bpf_program__attach(prog);
> +       if (!ASSERT_OK_PTR(link, "attached_link"))
> +               goto cleanup;
> +
> +       child_pid = fork();
> +       if (!ASSERT_GT(child_pid, -1, "child_pid"))
> +               goto cleanup;
> +       if (child_pid == 0)
> +               _exit(0);
> +       waitpid(child_pid, &status, 0);
> +
> +       ASSERT_OK(skel->bss->err, "post_wait_err");
> +
> +cleanup:
> +       bpf_link__destroy(link);
> +       task_kfunc_success__destroy(skel);
> +}
> +
> +static const char * const success_tests[] = {
> +       "test_task_acquire_release",
> +       "test_task_acquire_leave_in_map",
> +       "test_task_xchg_release",
> +       "test_task_get_release",
> +};
> +
> +static struct {
> +       const char *prog_name;
> +       const char *expected_err_msg;
> +} failure_tests[] = {
> +       {"task_kfunc_acquire_untrusted", "arg#0 pointer type STRUCT task_struct must point"},
> +       {"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"},
> +       {"task_kfunc_acquire_unreleased", "Unreleased reference"},
> +       {"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
> +       {"task_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"},
> +       {"task_kfunc_get_null", "arg#0 expected pointer to map value"},
> +       {"task_kfunc_xchg_unreleased", "Unreleased reference"},
> +       {"task_kfunc_get_unreleased", "Unreleased reference"},
> +       {"task_kfunc_release_untrusted", "arg#0 pointer type STRUCT task_struct must point"},
> +       {"task_kfunc_release_null", "arg#0 pointer type STRUCT task_struct must point"},
> +       {"task_kfunc_release_unacquired", "R1 must be referenced"},
> +};
> +
> +static void verify_fail(const char *prog_name, const char *expected_err_msg)
> +{
> +       LIBBPF_OPTS(bpf_object_open_opts, opts);
> +       struct task_kfunc_failure *skel;
> +       int err, i;
> +
> +       opts.kernel_log_buf = obj_log_buf;
> +       opts.kernel_log_size = log_buf_sz;
> +       opts.kernel_log_level = 1;
> +
> +       skel = task_kfunc_failure__open_opts(&opts);
> +       if (!ASSERT_OK_PTR(skel, "task_kfunc_failure__open_opts"))
> +               goto cleanup;
> +
> +       skel->bss->pid = getpid();
> +
> +       for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
> +               struct bpf_program *prog;
> +               const char *curr_name = failure_tests[i].prog_name;
> +
> +               prog = bpf_object__find_program_by_name(skel->obj, curr_name);
> +               if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +                       goto cleanup;
> +
> +               bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name));
> +       }
> +
> +       err = task_kfunc_failure__load(skel);
> +       if (!ASSERT_ERR(err, "unexpected load success"))
> +               goto cleanup;
> +
> +       if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
> +               fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
> +               fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
> +       }
> +
> +cleanup:
> +       task_kfunc_failure__destroy(skel);
> +}
> +
> +void test_task_kfunc(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(success_tests); i++) {
> +               if (!test__start_subtest(success_tests[i]))
> +                       continue;
> +
> +               run_success_test(success_tests[i]);
> +       }
> +
> +       for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
> +               if (!test__start_subtest(failure_tests[i].prog_name))
> +                       continue;
> +
> +               verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg);
> +       }
> +}
> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
> new file mode 100644
> index 000000000000..bbb0a40572fd
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
> @@ -0,0 +1,83 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#ifndef _TASK_KFUNC_COMMON_H
> +#define _TASK_KFUNC_COMMON_H
> +
> +#include <errno.h>
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +
> +struct __tasks_kfunc_map_value {
> +       struct task_struct __kptr_ref * task;
> +};
> +
> +struct hash_map {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       __type(key, int);
> +       __type(value, struct __tasks_kfunc_map_value);
> +       __uint(max_entries, 1);
> +} __tasks_kfunc_map SEC(".maps");
> +
> +struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
> +struct task_struct *bpf_task_kptr_get(struct task_struct **pp) __ksym;
> +void bpf_task_release(struct task_struct *p) __ksym;
> +
> +#define TEST_NAME_SZ 128
> +
> +/* The pid of the test process used to determine if a newly created task is the test task. */
> +int pid;
> +
> +static inline struct __tasks_kfunc_map_value *tasks_kfunc_map_value_lookup(struct task_struct *p)
> +{
> +       s32 pid;
> +       long status;
> +
> +       status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
> +       if (status)
> +               return NULL;
> +
> +       return bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> +}
> +
> +static inline int tasks_kfunc_map_insert(struct task_struct *p)
> +{
> +       struct __tasks_kfunc_map_value local, *v;
> +       long status;
> +       struct task_struct *acquired, *old;
> +       s32 pid;
> +
> +       status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
> +       if (status)
> +               return status;
> +
> +       local.task = NULL;
> +       status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
> +       if (status)
> +               return status;
> +
> +       v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> +       if (!v) {
> +               bpf_map_delete_elem(&__tasks_kfunc_map, &pid);
> +               return status;
> +       }
> +
> +       acquired = bpf_task_acquire(p);
> +       old = bpf_kptr_xchg(&v->task, acquired);
> +       if (old) {
> +               bpf_task_release(old);
> +               return -EEXIST;
> +       }
> +
> +       return 0;
> +}
> +
> +static inline bool is_test_kfunc_task(struct task_struct *task)
> +{
> +       int cur_pid = bpf_get_current_pid_tgid() >> 32;
> +
> +       return pid == cur_pid;
> +}
> +
> +#endif /* _TASK_KFUNC_COMMON_H */
> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> new file mode 100644
> index 000000000000..4cf01bbc8a16
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#include "task_kfunc_common.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +/* Prototype for all of the program trace events below:
> + *
> + * TRACE_EVENT(task_newtask,
> + *         TP_PROTO(struct task_struct *p, u64 clone_flags)
> + */
> +
> +SEC("tp_btf/task_newtask")
> +int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags)
> +{
> +       struct task_struct *acquired, *stack_ptr;
> +
> +       if (!is_test_kfunc_task(task))
> +               return 0;
> +
> +       /* Can't invoke bpf_task_acquire() on an untrusted, random pointer. */
> +       stack_ptr = (struct task_struct *)0xcafef00d;

This seems like a misleading comment. 'stack_ptr' would just be a
scalar, not a pointer. Maybe you should be testing this for an actual
PTR_UNTRUSTED pointer instead. Load of a __kptr tagged pointer would
be a good way.

Very soon a lot of other pointers obtained from pointer walking are
going to be marked PTR_UNTRUSTED, so then we would cover those as well
similar to this test.

Also, could you include a test to make sure sleepable programs cannot
call bpf_task_acquire? It seems to assume RCU read lock is held while
that may not be true. If already not possible, maybe a WARN_ON_ONCE
inside the helper to ensure future cases don't creep in.

> [...]

David Vernet Oct. 3, 2022, 3:35 p.m. UTC | #2

On Mon, Oct 03, 2022 at 02:32:41AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Sat, 1 Oct 2022 at 16:47, David Vernet <void@manifault.com> wrote:
> >
> > A previous change added a series of kfuncs for storing struct
> > task_struct objects as referenced kptrs. This patch adds a new
> > task_kfunc test suite for validating their expected behavior.
> >
> > Signed-off-by: David Vernet <void@manifault.com>
> > ---
> >  tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
> >  .../selftests/bpf/prog_tests/task_kfunc.c     | 155 ++++++++++++
> >  .../selftests/bpf/progs/task_kfunc_common.h   |  83 +++++++
> >  .../selftests/bpf/progs/task_kfunc_failure.c  | 225 ++++++++++++++++++
> >  .../selftests/bpf/progs/task_kfunc_success.c  | 113 +++++++++
> >  5 files changed, 577 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
> >  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c
> >
> > diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
> > index 17e074eb42b8..4c34818ec1ee 100644
> > --- a/tools/testing/selftests/bpf/DENYLIST.s390x
> > +++ b/tools/testing/selftests/bpf/DENYLIST.s390x
> > @@ -75,3 +75,4 @@ user_ringbuf                             # failed to find kernel BTF type ID of
> >  lookup_key                               # JIT does not support calling kernel function                                (kfunc)
> >  verify_pkcs7_sig                         # JIT does not support calling kernel function                                (kfunc)
> >  kfunc_dynptr_param                       # JIT does not support calling kernel function                                (kfunc)
> > +task_kfunc                               # JIT does not support calling kernel function
> > diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> > new file mode 100644
> > index 000000000000..6c577fbca8f7
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> > @@ -0,0 +1,155 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +
> > +#define _GNU_SOURCE
> > +#include <sys/wait.h>
> > +#include <test_progs.h>
> > +#include <unistd.h>
> > +
> > +#include "task_kfunc_failure.skel.h"
> > +#include "task_kfunc_success.skel.h"
> > +
> > +static size_t log_buf_sz = 1 << 20; /* 1 MB */
> > +static char obj_log_buf[1048576];
> > +
> > +static struct task_kfunc_success *open_load_task_kfunc_skel(void)
> > +{
> > +       struct task_kfunc_success *skel;
> > +       int err;
> > +
> > +       skel = task_kfunc_success__open();
> > +       if (!ASSERT_OK_PTR(skel, "skel_open"))
> > +               return NULL;
> > +
> > +       skel->bss->pid = getpid();
> > +
> > +       err = task_kfunc_success__load(skel);
> > +       if (!ASSERT_OK(err, "skel_load"))
> > +               goto cleanup;
> > +
> > +       return skel;
> > +
> > +cleanup:
> > +       task_kfunc_success__destroy(skel);
> > +       return NULL;
> > +}
> > +
> > +static void run_success_test(const char *prog_name)
> > +{
> > +       struct task_kfunc_success *skel;
> > +       int status;
> > +       pid_t child_pid;
> > +       struct bpf_program *prog;
> > +       struct bpf_link *link = NULL;
> > +
> > +       skel = open_load_task_kfunc_skel();
> > +       if (!ASSERT_OK_PTR(skel, "open_load_skel"))
> > +               return;
> > +
> > +       if (!ASSERT_OK(skel->bss->err, "pre_spawn_err"))
> > +               goto cleanup;
> > +
> > +       prog = bpf_object__find_program_by_name(skel->obj, prog_name);
> > +       if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> > +               goto cleanup;
> > +
> > +       link = bpf_program__attach(prog);
> > +       if (!ASSERT_OK_PTR(link, "attached_link"))
> > +               goto cleanup;
> > +
> > +       child_pid = fork();
> > +       if (!ASSERT_GT(child_pid, -1, "child_pid"))
> > +               goto cleanup;
> > +       if (child_pid == 0)
> > +               _exit(0);
> > +       waitpid(child_pid, &status, 0);
> > +
> > +       ASSERT_OK(skel->bss->err, "post_wait_err");
> > +
> > +cleanup:
> > +       bpf_link__destroy(link);
> > +       task_kfunc_success__destroy(skel);
> > +}
> > +
> > +static const char * const success_tests[] = {
> > +       "test_task_acquire_release",
> > +       "test_task_acquire_leave_in_map",
> > +       "test_task_xchg_release",
> > +       "test_task_get_release",
> > +};
> > +
> > +static struct {
> > +       const char *prog_name;
> > +       const char *expected_err_msg;
> > +} failure_tests[] = {
> > +       {"task_kfunc_acquire_untrusted", "arg#0 pointer type STRUCT task_struct must point"},
> > +       {"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"},
> > +       {"task_kfunc_acquire_unreleased", "Unreleased reference"},
> > +       {"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
> > +       {"task_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"},
> > +       {"task_kfunc_get_null", "arg#0 expected pointer to map value"},
> > +       {"task_kfunc_xchg_unreleased", "Unreleased reference"},
> > +       {"task_kfunc_get_unreleased", "Unreleased reference"},
> > +       {"task_kfunc_release_untrusted", "arg#0 pointer type STRUCT task_struct must point"},
> > +       {"task_kfunc_release_null", "arg#0 pointer type STRUCT task_struct must point"},
> > +       {"task_kfunc_release_unacquired", "R1 must be referenced"},
> > +};
> > +
> > +static void verify_fail(const char *prog_name, const char *expected_err_msg)
> > +{
> > +       LIBBPF_OPTS(bpf_object_open_opts, opts);
> > +       struct task_kfunc_failure *skel;
> > +       int err, i;
> > +
> > +       opts.kernel_log_buf = obj_log_buf;
> > +       opts.kernel_log_size = log_buf_sz;
> > +       opts.kernel_log_level = 1;
> > +
> > +       skel = task_kfunc_failure__open_opts(&opts);
> > +       if (!ASSERT_OK_PTR(skel, "task_kfunc_failure__open_opts"))
> > +               goto cleanup;
> > +
> > +       skel->bss->pid = getpid();
> > +
> > +       for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
> > +               struct bpf_program *prog;
> > +               const char *curr_name = failure_tests[i].prog_name;
> > +
> > +               prog = bpf_object__find_program_by_name(skel->obj, curr_name);
> > +               if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> > +                       goto cleanup;
> > +
> > +               bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name));
> > +       }
> > +
> > +       err = task_kfunc_failure__load(skel);
> > +       if (!ASSERT_ERR(err, "unexpected load success"))
> > +               goto cleanup;
> > +
> > +       if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
> > +               fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
> > +               fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
> > +       }
> > +
> > +cleanup:
> > +       task_kfunc_failure__destroy(skel);
> > +}
> > +
> > +void test_task_kfunc(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(success_tests); i++) {
> > +               if (!test__start_subtest(success_tests[i]))
> > +                       continue;
> > +
> > +               run_success_test(success_tests[i]);
> > +       }
> > +
> > +       for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
> > +               if (!test__start_subtest(failure_tests[i].prog_name))
> > +                       continue;
> > +
> > +               verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg);
> > +       }
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
> > new file mode 100644
> > index 000000000000..bbb0a40572fd
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
> > @@ -0,0 +1,83 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +
> > +#ifndef _TASK_KFUNC_COMMON_H
> > +#define _TASK_KFUNC_COMMON_H
> > +
> > +#include <errno.h>
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +struct __tasks_kfunc_map_value {
> > +       struct task_struct __kptr_ref * task;
> > +};
> > +
> > +struct hash_map {
> > +       __uint(type, BPF_MAP_TYPE_HASH);
> > +       __type(key, int);
> > +       __type(value, struct __tasks_kfunc_map_value);
> > +       __uint(max_entries, 1);
> > +} __tasks_kfunc_map SEC(".maps");
> > +
> > +struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
> > +struct task_struct *bpf_task_kptr_get(struct task_struct **pp) __ksym;
> > +void bpf_task_release(struct task_struct *p) __ksym;
> > +
> > +#define TEST_NAME_SZ 128
> > +
> > +/* The pid of the test process used to determine if a newly created task is the test task. */
> > +int pid;
> > +
> > +static inline struct __tasks_kfunc_map_value *tasks_kfunc_map_value_lookup(struct task_struct *p)
> > +{
> > +       s32 pid;
> > +       long status;
> > +
> > +       status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
> > +       if (status)
> > +               return NULL;
> > +
> > +       return bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> > +}
> > +
> > +static inline int tasks_kfunc_map_insert(struct task_struct *p)
> > +{
> > +       struct __tasks_kfunc_map_value local, *v;
> > +       long status;
> > +       struct task_struct *acquired, *old;
> > +       s32 pid;
> > +
> > +       status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
> > +       if (status)
> > +               return status;
> > +
> > +       local.task = NULL;
> > +       status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
> > +       if (status)
> > +               return status;
> > +
> > +       v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
> > +       if (!v) {
> > +               bpf_map_delete_elem(&__tasks_kfunc_map, &pid);
> > +               return status;
> > +       }
> > +
> > +       acquired = bpf_task_acquire(p);
> > +       old = bpf_kptr_xchg(&v->task, acquired);
> > +       if (old) {
> > +               bpf_task_release(old);
> > +               return -EEXIST;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static inline bool is_test_kfunc_task(struct task_struct *task)
> > +{
> > +       int cur_pid = bpf_get_current_pid_tgid() >> 32;
> > +
> > +       return pid == cur_pid;
> > +}
> > +
> > +#endif /* _TASK_KFUNC_COMMON_H */
> > diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> > new file mode 100644
> > index 000000000000..4cf01bbc8a16
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> > @@ -0,0 +1,225 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +#include "task_kfunc_common.h"
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +/* Prototype for all of the program trace events below:
> > + *
> > + * TRACE_EVENT(task_newtask,
> > + *         TP_PROTO(struct task_struct *p, u64 clone_flags)
> > + */
> > +
> > +SEC("tp_btf/task_newtask")
> > +int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags)
> > +{
> > +       struct task_struct *acquired, *stack_ptr;
> > +
> > +       if (!is_test_kfunc_task(task))
> > +               return 0;
> > +
> > +       /* Can't invoke bpf_task_acquire() on an untrusted, random pointer. */
> > +       stack_ptr = (struct task_struct *)0xcafef00d;
> 
> This seems like a misleading comment. 'stack_ptr' would just be a
> scalar, not a pointer. Maybe you should be testing this for an actual
> PTR_UNTRUSTED pointer instead. Load of a __kptr tagged pointer would
> be a good way.
> 
> Very soon a lot of other pointers obtained from pointer walking are
> going to be marked PTR_UNTRUSTED, so then we would cover those as well
> similar to this test.

Good point and good suggestion, let me update the test to do this.

> Also, could you include a test to make sure sleepable programs cannot
> call bpf_task_acquire? It seems to assume RCU read lock is held while
> that may not be true. If already not possible, maybe a WARN_ON_ONCE
> inside the helper to ensure future cases don't creep in.

I don't _think_ it's unsafe for a sleepable program to call
bpf_task_acquire(). My understanding is that the struct task_struct *
parameter to bpf_task_acquire() is not PTR_UNTRUSTED, so it's safe to
dereference directly in the kfunc. The implicit assumption here is that
the task was either passed to the BPF program (which is calling
bpf_task_acquire()) from the main kernel in something like a trace or
struct_ops callback, or it was a referenced kptr that was removed from a
map with bpf_kptr_xchg(), and is now owned by the BPF program. Given
that the ptr type is not PTR_UNTRUSTED, it seemed correct to assume that
the task was valid in bpf_task_acquire() regardless of whether we were
in an RCU read region or not, but please let me know if I'm wrong about
that.  Other kfuncs I saw such as bpf_xdp_ct_lookup() assumed that the
parameter passed by the BPF program (which itself was passing on a
pointer given to it by the main kernel) is valid as well.

Note that the difference between bpf_task_acquire() and
bpf_task_kptr_get() is that for bpf_task_kptr_get(), we're passing a
pointer to a kptr which could be swapped out and invalidated at any
moment by a bpf_kptr_xchg() elsewhere in the program.  That's why it's
necessary to enter an RCU read region before calling READ_ONCE(), and
why we have to do a refcount_inc_not_zero() rather than just a
refcount_inc() as we do in bpf_task_acquire().

Thanks,
David

Kumar Kartikeya Dwivedi Oct. 3, 2022, 3:56 p.m. UTC | #3

On Mon, 3 Oct 2022 at 17:35, David Vernet <void@manifault.com> wrote:
>
> On Mon, Oct 03, 2022 at 02:32:41AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > [...]
> > > +#endif /* _TASK_KFUNC_COMMON_H */
> > > diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> > > new file mode 100644
> > > index 000000000000..4cf01bbc8a16
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> > > @@ -0,0 +1,225 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > > +
> > > +#include <vmlinux.h>
> > > +#include <bpf/bpf_tracing.h>
> > > +#include <bpf/bpf_helpers.h>
> > > +
> > > +#include "task_kfunc_common.h"
> > > +
> > > +char _license[] SEC("license") = "GPL";
> > > +
> > > +/* Prototype for all of the program trace events below:
> > > + *
> > > + * TRACE_EVENT(task_newtask,
> > > + *         TP_PROTO(struct task_struct *p, u64 clone_flags)
> > > + */
> > > +
> > > +SEC("tp_btf/task_newtask")
> > > +int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags)
> > > +{
> > > +       struct task_struct *acquired, *stack_ptr;
> > > +
> > > +       if (!is_test_kfunc_task(task))
> > > +               return 0;
> > > +
> > > +       /* Can't invoke bpf_task_acquire() on an untrusted, random pointer. */
> > > +       stack_ptr = (struct task_struct *)0xcafef00d;
> >
> > This seems like a misleading comment. 'stack_ptr' would just be a
> > scalar, not a pointer. Maybe you should be testing this for an actual
> > PTR_UNTRUSTED pointer instead. Load of a __kptr tagged pointer would
> > be a good way.
> >
> > Very soon a lot of other pointers obtained from pointer walking are
> > going to be marked PTR_UNTRUSTED, so then we would cover those as well
> > similar to this test.
>
> Good point and good suggestion, let me update the test to do this.
>
> > Also, could you include a test to make sure sleepable programs cannot
> > call bpf_task_acquire? It seems to assume RCU read lock is held while
> > that may not be true. If already not possible, maybe a WARN_ON_ONCE
> > inside the helper to ensure future cases don't creep in.
>
> I don't _think_ it's unsafe for a sleepable program to call
> bpf_task_acquire(). My understanding is that the struct task_struct *
> parameter to bpf_task_acquire() is not PTR_UNTRUSTED, so it's safe to
> dereference directly in the kfunc. The implicit assumption here is that
> the task was either passed to the BPF program (which is calling
> bpf_task_acquire()) from the main kernel in something like a trace or
> struct_ops callback, or it was a referenced kptr that was removed from a
> map with bpf_kptr_xchg(), and is now owned by the BPF program. Given
> that the ptr type is not PTR_UNTRUSTED, it seemed correct to assume that
> the task was valid in bpf_task_acquire() regardless of whether we were
> in an RCU read region or not, but please let me know if I'm wrong about

I don't think it's correct. You can just walk arbitrary structures and
obtain a normal PTR_TO_BTF_ID that looks seemingly ok to the verifier
but has no lifetime guarantees. It's a separate pre-existing problem
unrelated to your series [0]. PTR_UNTRUSTED is not set for those cases
currently.

So the argument to your bpf_task_acquire may already be freed by then.
This issue would be exacerbated in sleepable BPF programs, where RCU
read lock is not held, so some cases of pointer walking where it may
be safe no longer holds.

I am planning to clean this up, but I'd still prefer if we don't allow
this helper in sleepable programs, yet. kptr_get is ok to allow.
Once you have explicit BPF RCU read sections, sleepable programs can
take it, do loads, and operate on the RCU pointer safely until they
invalidate it with the outermost bpf_rcu_read_unlock. It's needed for
local kptrs as well, not just this. I plan to post this very soon, so
we should be able to fix it up in the current cycle after landing your
series.

__rcu tags in the kernel will automatically be understood by the
verifier and for the majority of the correctly annotated cases this
will work fine and not break tracing programs.

[0]: https://lore.kernel.org/bpf/CAADnVQJxe1QT5bvcsrZQCLeZ6kei6WEESP5bDXf_5qcB2Bb6_Q@mail.gmail.com

> that.  Other kfuncs I saw such as bpf_xdp_ct_lookup() assumed that the
> parameter passed by the BPF program (which itself was passing on a
> pointer given to it by the main kernel) is valid as well.

Yeah, but the CT API doesn't assume validity of random PTR_TO_BTF_ID,
it requires KF_TRUSTED_ARGS which forces them to have ref_obj_id != 0.

>
> Note that the difference between bpf_task_acquire() and
> bpf_task_kptr_get() is that for bpf_task_kptr_get(), we're passing a
> pointer to a kptr which could be swapped out and invalidated at any
> moment by a bpf_kptr_xchg() elsewhere in the program.  That's why it's
> necessary to enter an RCU read region before calling READ_ONCE(), and
> why we have to do a refcount_inc_not_zero() rather than just a
> refcount_inc() as we do in bpf_task_acquire().
>

Swapping out is not a problem if the object is RCU protected (which
again, is a requirement for kptr_ref if you wish to support kptr_get,
otherwise it's not needed).

The double pointer ugliness is to allow sleepable programs to safely
do the rcu_dereference/READ_ONCE inside the rcu_read_lock that the
kptr_get helper holds. In non-sleepable programs RCU read lock is
already held, so technically we can just load and pass it to
refcount_inc_not_zero and make it work.

All of this stuff will be gone once we have explicit BPF RCU read
sections, the kptr will be tagged __rcu and while it won't work with
all helpers but those that can deal with refcount == 0, it would be
safer to operate on compared to the PTR_UNTRUSTED case (like the
normal load is right now for kptr_ref), and it also makes it easier to
rely on data read from the object.

Martin KaFai Lau Oct. 3, 2022, 7:53 p.m. UTC | #4

On 10/3/22 8:56 AM, Kumar Kartikeya Dwivedi wrote:
>>> Also, could you include a test to make sure sleepable programs cannot
>>> call bpf_task_acquire? It seems to assume RCU read lock is held while
>>> that may not be true. If already not possible, maybe a WARN_ON_ONCE
>>> inside the helper to ensure future cases don't creep in.
>>
>> I don't _think_ it's unsafe for a sleepable program to call
>> bpf_task_acquire(). My understanding is that the struct task_struct *
>> parameter to bpf_task_acquire() is not PTR_UNTRUSTED, so it's safe to
>> dereference directly in the kfunc. The implicit assumption here is that
>> the task was either passed to the BPF program (which is calling
>> bpf_task_acquire()) from the main kernel in something like a trace or
>> struct_ops callback, or it was a referenced kptr that was removed from a
>> map with bpf_kptr_xchg(), and is now owned by the BPF program. Given
>> that the ptr type is not PTR_UNTRUSTED, it seemed correct to assume that
>> the task was valid in bpf_task_acquire() regardless of whether we were
>> in an RCU read region or not, but please let me know if I'm wrong about
> 
> I don't think it's correct. You can just walk arbitrary structures and
> obtain a normal PTR_TO_BTF_ID that looks seemingly ok to the verifier
> but has no lifetime guarantees. It's a separate pre-existing problem
> unrelated to your series [0]. PTR_UNTRUSTED is not set for those cases
> currently.
> 
> So the argument to your bpf_task_acquire may already be freed by then.
> This issue would be exacerbated in sleepable BPF programs, where RCU
> read lock is not held, so some cases of pointer walking where it may
> be safe no longer holds.
> 
> I am planning to clean this up, but I'd still prefer if we don't allow
> this helper in sleepable programs, yet. kptr_get is ok to allow.
> Once you have explicit BPF RCU read sections, sleepable programs can
> take it, do loads, and operate on the RCU pointer safely until they
> invalidate it with the outermost bpf_rcu_read_unlock. It's needed for
> local kptrs as well, not just this. I plan to post this very soon, so
> we should be able to fix it up in the current cycle after landing your
> series.
> 
> __rcu tags in the kernel will automatically be understood by the
> verifier and for the majority of the correctly annotated cases this
> will work fine and not break tracing programs.
> 
> [0]: https://lore.kernel.org/bpf/CAADnVQJxe1QT5bvcsrZQCLeZ6kei6WEESP5bDXf_5qcB2Bb6_Q@mail.gmail.com
> 
>> that.  Other kfuncs I saw such as bpf_xdp_ct_lookup() assumed that the
>> parameter passed by the BPF program (which itself was passing on a
>> pointer given to it by the main kernel) is valid as well.
> 
> Yeah, but the CT API doesn't assume validity of random PTR_TO_BTF_ID,
> it requires KF_TRUSTED_ARGS which forces them to have ref_obj_id != 0.

Other than ref_obj_id != 0, could the PTR_TO_BTF_ID obtained through 
btf_ctx_access be marked as trusted (e.g. the ctx[0] in the selftest here)
and bpf_task_acquire() will require KF_TRUSTED_ARGS? would it be safe in general?

Kumar Kartikeya Dwivedi Oct. 3, 2022, 9:03 p.m. UTC | #5

On Mon, 3 Oct 2022 at 21:53, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 10/3/22 8:56 AM, Kumar Kartikeya Dwivedi wrote:
> >>> Also, could you include a test to make sure sleepable programs cannot
> >>> call bpf_task_acquire? It seems to assume RCU read lock is held while
> >>> that may not be true. If already not possible, maybe a WARN_ON_ONCE
> >>> inside the helper to ensure future cases don't creep in.
> >>
> >> I don't _think_ it's unsafe for a sleepable program to call
> >> bpf_task_acquire(). My understanding is that the struct task_struct *
> >> parameter to bpf_task_acquire() is not PTR_UNTRUSTED, so it's safe to
> >> dereference directly in the kfunc. The implicit assumption here is that
> >> the task was either passed to the BPF program (which is calling
> >> bpf_task_acquire()) from the main kernel in something like a trace or
> >> struct_ops callback, or it was a referenced kptr that was removed from a
> >> map with bpf_kptr_xchg(), and is now owned by the BPF program. Given
> >> that the ptr type is not PTR_UNTRUSTED, it seemed correct to assume that
> >> the task was valid in bpf_task_acquire() regardless of whether we were
> >> in an RCU read region or not, but please let me know if I'm wrong about
> >
> > I don't think it's correct. You can just walk arbitrary structures and
> > obtain a normal PTR_TO_BTF_ID that looks seemingly ok to the verifier
> > but has no lifetime guarantees. It's a separate pre-existing problem
> > unrelated to your series [0]. PTR_UNTRUSTED is not set for those cases
> > currently.
> >
> > So the argument to your bpf_task_acquire may already be freed by then.
> > This issue would be exacerbated in sleepable BPF programs, where RCU
> > read lock is not held, so some cases of pointer walking where it may
> > be safe no longer holds.
> >
> > I am planning to clean this up, but I'd still prefer if we don't allow
> > this helper in sleepable programs, yet. kptr_get is ok to allow.
> > Once you have explicit BPF RCU read sections, sleepable programs can
> > take it, do loads, and operate on the RCU pointer safely until they
> > invalidate it with the outermost bpf_rcu_read_unlock. It's needed for
> > local kptrs as well, not just this. I plan to post this very soon, so
> > we should be able to fix it up in the current cycle after landing your
> > series.
> >
> > __rcu tags in the kernel will automatically be understood by the
> > verifier and for the majority of the correctly annotated cases this
> > will work fine and not break tracing programs.
> >
> > [0]: https://lore.kernel.org/bpf/CAADnVQJxe1QT5bvcsrZQCLeZ6kei6WEESP5bDXf_5qcB2Bb6_Q@mail.gmail.com
> >
> >> that.  Other kfuncs I saw such as bpf_xdp_ct_lookup() assumed that the
> >> parameter passed by the BPF program (which itself was passing on a
> >> pointer given to it by the main kernel) is valid as well.
> >
> > Yeah, but the CT API doesn't assume validity of random PTR_TO_BTF_ID,
> > it requires KF_TRUSTED_ARGS which forces them to have ref_obj_id != 0.
>
> Other than ref_obj_id != 0, could the PTR_TO_BTF_ID obtained through
> btf_ctx_access be marked as trusted (e.g. the ctx[0] in the selftest here)
> and bpf_task_acquire() will require KF_TRUSTED_ARGS? would it be safe in general?
>

Recently eed807f62610 ("bpf: Tweak definition of KF_TRUSTED_ARGS")
relaxed things a bit, now that constraint is only for PTR_TO_BTF_ID
and it allows other types without ref_obj_id > 0.

But you're right, ctx loads should usually be trusted, this is the
current plan (also elaborated a bit in that link [0]), usually that is
true, an exception is free path as you noted in your reply for patch 1
(and especially fexit path where object may not even be alive
anymore). There are some details to work out, but eventually I'd want
to force KF_TRUSTED_ARGS by default for all kfuncs, and we only make
exceptions in some special cases by having some KF_UNSAFE flag or
__unsafe tag for arguments. It's becoming harder to think about all
permutations if we're too permissive by default in terms of acceptable
arguments.

David Vernet Oct. 3, 2022, 9:59 p.m. UTC | #6

On Mon, Oct 03, 2022 at 11:03:01PM +0200, Kumar Kartikeya Dwivedi wrote:
> On Mon, 3 Oct 2022 at 21:53, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 10/3/22 8:56 AM, Kumar Kartikeya Dwivedi wrote:
> > >>> Also, could you include a test to make sure sleepable programs cannot
> > >>> call bpf_task_acquire? It seems to assume RCU read lock is held while
> > >>> that may not be true. If already not possible, maybe a WARN_ON_ONCE
> > >>> inside the helper to ensure future cases don't creep in.
> > >>
> > >> I don't _think_ it's unsafe for a sleepable program to call
> > >> bpf_task_acquire(). My understanding is that the struct task_struct *
> > >> parameter to bpf_task_acquire() is not PTR_UNTRUSTED, so it's safe to
> > >> dereference directly in the kfunc. The implicit assumption here is that
> > >> the task was either passed to the BPF program (which is calling
> > >> bpf_task_acquire()) from the main kernel in something like a trace or
> > >> struct_ops callback, or it was a referenced kptr that was removed from a
> > >> map with bpf_kptr_xchg(), and is now owned by the BPF program. Given
> > >> that the ptr type is not PTR_UNTRUSTED, it seemed correct to assume that
> > >> the task was valid in bpf_task_acquire() regardless of whether we were
> > >> in an RCU read region or not, but please let me know if I'm wrong about
> > >
> > > I don't think it's correct. You can just walk arbitrary structures and
> > > obtain a normal PTR_TO_BTF_ID that looks seemingly ok to the verifier
> > > but has no lifetime guarantees. It's a separate pre-existing problem
> > > unrelated to your series [0]. PTR_UNTRUSTED is not set for those cases
> > > currently.
> > >
> > > So the argument to your bpf_task_acquire may already be freed by then.
> > > This issue would be exacerbated in sleepable BPF programs, where RCU
> > > read lock is not held, so some cases of pointer walking where it may
> > > be safe no longer holds.
> > >
> > > I am planning to clean this up, but I'd still prefer if we don't allow
> > > this helper in sleepable programs, yet. kptr_get is ok to allow.
> > > Once you have explicit BPF RCU read sections, sleepable programs can
> > > take it, do loads, and operate on the RCU pointer safely until they
> > > invalidate it with the outermost bpf_rcu_read_unlock. It's needed for
> > > local kptrs as well, not just this. I plan to post this very soon, so
> > > we should be able to fix it up in the current cycle after landing your
> > > series.
> > >
> > > __rcu tags in the kernel will automatically be understood by the
> > > verifier and for the majority of the correctly annotated cases this
> > > will work fine and not break tracing programs.
> > >
> > > [0]: https://lore.kernel.org/bpf/CAADnVQJxe1QT5bvcsrZQCLeZ6kei6WEESP5bDXf_5qcB2Bb6_Q@mail.gmail.com
> > >
> > >> that.  Other kfuncs I saw such as bpf_xdp_ct_lookup() assumed that the
> > >> parameter passed by the BPF program (which itself was passing on a
> > >> pointer given to it by the main kernel) is valid as well.
> > >
> > > Yeah, but the CT API doesn't assume validity of random PTR_TO_BTF_ID,
> > > it requires KF_TRUSTED_ARGS which forces them to have ref_obj_id != 0.
> >
> > Other than ref_obj_id != 0, could the PTR_TO_BTF_ID obtained through
> > btf_ctx_access be marked as trusted (e.g. the ctx[0] in the selftest here)
> > and bpf_task_acquire() will require KF_TRUSTED_ARGS? would it be safe in general?
> >
> 
> Recently eed807f62610 ("bpf: Tweak definition of KF_TRUSTED_ARGS")
> relaxed things a bit, now that constraint is only for PTR_TO_BTF_ID
> and it allows other types without ref_obj_id > 0.
> 
> But you're right, ctx loads should usually be trusted, this is the
> current plan (also elaborated a bit in that link [0]), usually that is
> true, an exception is free path as you noted in your reply for patch 1
> (and especially fexit path where object may not even be alive
> anymore). There are some details to work out, but eventually I'd want
> to force KF_TRUSTED_ARGS by default for all kfuncs, and we only make
> exceptions in some special cases by having some KF_UNSAFE flag or
> __unsafe tag for arguments. It's becoming harder to think about all
> permutations if we're too permissive by default in terms of acceptable
> arguments.

Thanks for providing additional context, Kumar. So what do we want to do
for this patch set? IMO it doesn't seem useful to restrict
bpf_kfunc_acquire() to only be callable by non-sleepable programs if our
goal is to avoid crashes for nested task structs. We could easily
accidentally crash if e.g. those pointers are NULL, or someone is doing
something weird like stashing some extra flag bits in unused portions of
the pointer which are masked out when it's actually dereferenced
regardless of whether we're in RCU.  Trusting ctx loads sounds like the
right approach, barring some of the challenges you pointed out such as
dealing with fexit paths after free where the object may not be valid
anymore.

In general, it seems like we should maybe decide on what our policy
should be for kfuncs until we can wire up whatever we need to properly
trust ctx.

Kumar Kartikeya Dwivedi Oct. 3, 2022, 10:22 p.m. UTC | #7

On Mon, 3 Oct 2022 at 23:59, David Vernet <void@manifault.com> wrote:
>
> On Mon, Oct 03, 2022 at 11:03:01PM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Mon, 3 Oct 2022 at 21:53, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >
> > > On 10/3/22 8:56 AM, Kumar Kartikeya Dwivedi wrote:
> > > >>> Also, could you include a test to make sure sleepable programs cannot
> > > >>> call bpf_task_acquire? It seems to assume RCU read lock is held while
> > > >>> that may not be true. If already not possible, maybe a WARN_ON_ONCE
> > > >>> inside the helper to ensure future cases don't creep in.
> > > >>
> > > >> I don't _think_ it's unsafe for a sleepable program to call
> > > >> bpf_task_acquire(). My understanding is that the struct task_struct *
> > > >> parameter to bpf_task_acquire() is not PTR_UNTRUSTED, so it's safe to
> > > >> dereference directly in the kfunc. The implicit assumption here is that
> > > >> the task was either passed to the BPF program (which is calling
> > > >> bpf_task_acquire()) from the main kernel in something like a trace or
> > > >> struct_ops callback, or it was a referenced kptr that was removed from a
> > > >> map with bpf_kptr_xchg(), and is now owned by the BPF program. Given
> > > >> that the ptr type is not PTR_UNTRUSTED, it seemed correct to assume that
> > > >> the task was valid in bpf_task_acquire() regardless of whether we were
> > > >> in an RCU read region or not, but please let me know if I'm wrong about
> > > >
> > > > I don't think it's correct. You can just walk arbitrary structures and
> > > > obtain a normal PTR_TO_BTF_ID that looks seemingly ok to the verifier
> > > > but has no lifetime guarantees. It's a separate pre-existing problem
> > > > unrelated to your series [0]. PTR_UNTRUSTED is not set for those cases
> > > > currently.
> > > >
> > > > So the argument to your bpf_task_acquire may already be freed by then.
> > > > This issue would be exacerbated in sleepable BPF programs, where RCU
> > > > read lock is not held, so some cases of pointer walking where it may
> > > > be safe no longer holds.
> > > >
> > > > I am planning to clean this up, but I'd still prefer if we don't allow
> > > > this helper in sleepable programs, yet. kptr_get is ok to allow.
> > > > Once you have explicit BPF RCU read sections, sleepable programs can
> > > > take it, do loads, and operate on the RCU pointer safely until they
> > > > invalidate it with the outermost bpf_rcu_read_unlock. It's needed for
> > > > local kptrs as well, not just this. I plan to post this very soon, so
> > > > we should be able to fix it up in the current cycle after landing your
> > > > series.
> > > >
> > > > __rcu tags in the kernel will automatically be understood by the
> > > > verifier and for the majority of the correctly annotated cases this
> > > > will work fine and not break tracing programs.
> > > >
> > > > [0]: https://lore.kernel.org/bpf/CAADnVQJxe1QT5bvcsrZQCLeZ6kei6WEESP5bDXf_5qcB2Bb6_Q@mail.gmail.com
> > > >
> > > >> that.  Other kfuncs I saw such as bpf_xdp_ct_lookup() assumed that the
> > > >> parameter passed by the BPF program (which itself was passing on a
> > > >> pointer given to it by the main kernel) is valid as well.
> > > >
> > > > Yeah, but the CT API doesn't assume validity of random PTR_TO_BTF_ID,
> > > > it requires KF_TRUSTED_ARGS which forces them to have ref_obj_id != 0.
> > >
> > > Other than ref_obj_id != 0, could the PTR_TO_BTF_ID obtained through
> > > btf_ctx_access be marked as trusted (e.g. the ctx[0] in the selftest here)
> > > and bpf_task_acquire() will require KF_TRUSTED_ARGS? would it be safe in general?
> > >
> >
> > Recently eed807f62610 ("bpf: Tweak definition of KF_TRUSTED_ARGS")
> > relaxed things a bit, now that constraint is only for PTR_TO_BTF_ID
> > and it allows other types without ref_obj_id > 0.
> >
> > But you're right, ctx loads should usually be trusted, this is the
> > current plan (also elaborated a bit in that link [0]), usually that is
> > true, an exception is free path as you noted in your reply for patch 1
> > (and especially fexit path where object may not even be alive
> > anymore). There are some details to work out, but eventually I'd want
> > to force KF_TRUSTED_ARGS by default for all kfuncs, and we only make
> > exceptions in some special cases by having some KF_UNSAFE flag or
> > __unsafe tag for arguments. It's becoming harder to think about all
> > permutations if we're too permissive by default in terms of acceptable
> > arguments.
>
> Thanks for providing additional context, Kumar. So what do we want to do
> for this patch set? IMO it doesn't seem useful to restrict
> bpf_kfunc_acquire() to only be callable by non-sleepable programs if our
> goal is to avoid crashes for nested task structs. We could easily
> accidentally crash if e.g. those pointers are NULL, or someone is doing
> something weird like stashing some extra flag bits in unused portions of
> the pointer which are masked out when it's actually dereferenced
> regardless of whether we're in RCU.  Trusting ctx loads sounds like the
> right approach, barring some of the challenges you pointed out such as
> dealing with fexit paths after free where the object may not be valid
> anymore.
>
> In general, it seems like we should maybe decide on what our policy
> should be for kfuncs until we can wire up whatever we need to properly
> trust ctx.

Well, we could add it now and work towards closing the gaps after
this, especially if bpf_task_acquire is really only useful in
sleepable programs where it works on the tracing args. A lot of other
kfuncs need these fixes as well, so it's a general problem and not
specific to this set. I am not very familiar with your exact use case.
Hopefully when it is fixed this particular case won't really break, if
you only use the tracepoint argument.

It is true that waiting for all the fixes will unnecessarily stall
this, it is not clear how each of the issues will be addressed either.

Later its use can be made conditional in sleepable programs for
trusted and rcu tagged pointers under appropriate RCU read lock. I
will try to prioritize sending it out so that we resolve this soon.

Martin KaFai Lau Oct. 4, 2022, 5:10 a.m. UTC | #8

On 10/3/22 2:03 PM, Kumar Kartikeya Dwivedi wrote:
> On Mon, 3 Oct 2022 at 21:53, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 10/3/22 8:56 AM, Kumar Kartikeya Dwivedi wrote:
>>>>> Also, could you include a test to make sure sleepable programs cannot
>>>>> call bpf_task_acquire? It seems to assume RCU read lock is held while
>>>>> that may not be true. If already not possible, maybe a WARN_ON_ONCE
>>>>> inside the helper to ensure future cases don't creep in.
>>>>
>>>> I don't _think_ it's unsafe for a sleepable program to call
>>>> bpf_task_acquire(). My understanding is that the struct task_struct *
>>>> parameter to bpf_task_acquire() is not PTR_UNTRUSTED, so it's safe to
>>>> dereference directly in the kfunc. The implicit assumption here is that
>>>> the task was either passed to the BPF program (which is calling
>>>> bpf_task_acquire()) from the main kernel in something like a trace or
>>>> struct_ops callback, or it was a referenced kptr that was removed from a
>>>> map with bpf_kptr_xchg(), and is now owned by the BPF program. Given
>>>> that the ptr type is not PTR_UNTRUSTED, it seemed correct to assume that
>>>> the task was valid in bpf_task_acquire() regardless of whether we were
>>>> in an RCU read region or not, but please let me know if I'm wrong about
>>>
>>> I don't think it's correct. You can just walk arbitrary structures and
>>> obtain a normal PTR_TO_BTF_ID that looks seemingly ok to the verifier
>>> but has no lifetime guarantees. It's a separate pre-existing problem
>>> unrelated to your series [0]. PTR_UNTRUSTED is not set for those cases
>>> currently.
>>>
>>> So the argument to your bpf_task_acquire may already be freed by then.
>>> This issue would be exacerbated in sleepable BPF programs, where RCU
>>> read lock is not held, so some cases of pointer walking where it may
>>> be safe no longer holds.
>>>
>>> I am planning to clean this up, but I'd still prefer if we don't allow
>>> this helper in sleepable programs, yet. kptr_get is ok to allow.
>>> Once you have explicit BPF RCU read sections, sleepable programs can
>>> take it, do loads, and operate on the RCU pointer safely until they
>>> invalidate it with the outermost bpf_rcu_read_unlock. It's needed for
>>> local kptrs as well, not just this. I plan to post this very soon, so
>>> we should be able to fix it up in the current cycle after landing your
>>> series.
>>>
>>> __rcu tags in the kernel will automatically be understood by the
>>> verifier and for the majority of the correctly annotated cases this
>>> will work fine and not break tracing programs.
>>>
>>> [0]: https://lore.kernel.org/bpf/CAADnVQJxe1QT5bvcsrZQCLeZ6kei6WEESP5bDXf_5qcB2Bb6_Q@mail.gmail.com
>>>
>>>> that.  Other kfuncs I saw such as bpf_xdp_ct_lookup() assumed that the
>>>> parameter passed by the BPF program (which itself was passing on a
>>>> pointer given to it by the main kernel) is valid as well.
>>>
>>> Yeah, but the CT API doesn't assume validity of random PTR_TO_BTF_ID,
>>> it requires KF_TRUSTED_ARGS which forces them to have ref_obj_id != 0.
>>
>> Other than ref_obj_id != 0, could the PTR_TO_BTF_ID obtained through
>> btf_ctx_access be marked as trusted (e.g. the ctx[0] in the selftest here)
>> and bpf_task_acquire() will require KF_TRUSTED_ARGS? would it be safe in general?
>>
> 
> Recently eed807f62610 ("bpf: Tweak definition of KF_TRUSTED_ARGS")
> relaxed things a bit, now that constraint is only for PTR_TO_BTF_ID
> and it allows other types without ref_obj_id > 0.
> 
> But you're right, ctx loads should usually be trusted, this is the
> current plan (also elaborated a bit in that link [0]), usually that is
> true,

Ah, it seems you have some of the work already.  Do you have it in somewhere 
like github?  Is the 'trusting ctx loads' piece small enough that can be landed 
together with this set first?

> an exception is free path as you noted in your reply for patch 1
> (and especially fexit path where object may not even be alive
> anymore). There are some details to work out, but eventually I'd want
> to force KF_TRUSTED_ARGS by default for all kfuncs, and we only make
> exceptions in some special cases by having some KF_UNSAFE flag or
> __unsafe tag for arguments. It's becoming harder to think about all
> permutations if we're too permissive by default in terms of acceptable
> arguments.

David Vernet Oct. 4, 2022, 3:10 p.m. UTC | #9

On Tue, Oct 04, 2022 at 12:22:08AM +0200, Kumar Kartikeya Dwivedi wrote:
> > Thanks for providing additional context, Kumar. So what do we want to do
> > for this patch set? IMO it doesn't seem useful to restrict
> > bpf_kfunc_acquire() to only be callable by non-sleepable programs if our
> > goal is to avoid crashes for nested task structs. We could easily
> > accidentally crash if e.g. those pointers are NULL, or someone is doing
> > something weird like stashing some extra flag bits in unused portions of
> > the pointer which are masked out when it's actually dereferenced
> > regardless of whether we're in RCU.  Trusting ctx loads sounds like the
> > right approach, barring some of the challenges you pointed out such as
> > dealing with fexit paths after free where the object may not be valid
> > anymore.
> >
> > In general, it seems like we should maybe decide on what our policy
> > should be for kfuncs until we can wire up whatever we need to properly
> > trust ctx.
> 
> Well, we could add it now and work towards closing the gaps after
> this, especially if bpf_task_acquire is really only useful in
> sleepable programs where it works on the tracing args. A lot of other
> kfuncs need these fixes as well, so it's a general problem and not
> specific to this set. I am not very familiar with your exact use case.
> Hopefully when it is fixed this particular case won't really break, if
> you only use the tracepoint argument.

I'm also interested in using this with struct_ops, not just tracing. I
think that struct_ops should be totally fine though, and easier to
reason about than tracing as we just have to make sure that a few
specific callbacks are always passed a valid, referenced task, rather
than e.g. worrying about fexit on __put_task_struct().

I'm fine with adding this now and working towards closing the gaps
later, though I'd like to hear what Martin, Alexei, and the rest of the
BPF maintainers think. I think Martin asked if there was any preliminary
work you'd already done that we could try to tie into this patch set,
and I'm similarly curious.

> It is true that waiting for all the fixes will unnecessarily stall
> this, it is not clear how each of the issues will be addressed either.
> 
> Later its use can be made conditional in sleepable programs for
> trusted and rcu tagged pointers under appropriate RCU read lock. I
> will try to prioritize sending it out so that we resolve this soon.

Much appreciated!

Kumar Kartikeya Dwivedi Oct. 11, 2022, 2:29 a.m. UTC | #10

On Tue, 4 Oct 2022 at 20:40, David Vernet <void@manifault.com> wrote:
>
> On Tue, Oct 04, 2022 at 12:22:08AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > Thanks for providing additional context, Kumar. So what do we want to do
> > > for this patch set? IMO it doesn't seem useful to restrict
> > > bpf_kfunc_acquire() to only be callable by non-sleepable programs if our
> > > goal is to avoid crashes for nested task structs. We could easily
> > > accidentally crash if e.g. those pointers are NULL, or someone is doing
> > > something weird like stashing some extra flag bits in unused portions of
> > > the pointer which are masked out when it's actually dereferenced
> > > regardless of whether we're in RCU.  Trusting ctx loads sounds like the
> > > right approach, barring some of the challenges you pointed out such as
> > > dealing with fexit paths after free where the object may not be valid
> > > anymore.
> > >
> > > In general, it seems like we should maybe decide on what our policy
> > > should be for kfuncs until we can wire up whatever we need to properly
> > > trust ctx.
> >
> > Well, we could add it now and work towards closing the gaps after
> > this, especially if bpf_task_acquire is really only useful in
> > sleepable programs where it works on the tracing args. A lot of other
> > kfuncs need these fixes as well, so it's a general problem and not
> > specific to this set. I am not very familiar with your exact use case.
> > Hopefully when it is fixed this particular case won't really break, if
> > you only use the tracepoint argument.
>
> I'm also interested in using this with struct_ops, not just tracing. I
> think that struct_ops should be totally fine though, and easier to
> reason about than tracing as we just have to make sure that a few
> specific callbacks are always passed a valid, referenced task, rather
> than e.g. worrying about fexit on __put_task_struct().
>
> I'm fine with adding this now and working towards closing the gaps
> later, though I'd like to hear what Martin, Alexei, and the rest of the
> BPF maintainers think. I think Martin asked if there was any preliminary
> work you'd already done that we could try to tie into this patch set,
> and I'm similarly curious.
>

It's mostly a few experimental patches in my local tree, so nowhere
close to completion. Ofcourse doing it properly will be a lot of work,
but I will be happy to help with reviews if you want to focus on
pointers loaded from ctx for now and make that part of this set, while
not permitting any other cases. It should not be very difficult to add
just that.

So you can set KF_TRUSTED_ARGS for your kfunc, then make it work for
PTR_TO_BTF_ID where it either has PTR_TRUSTED, ref_obj_id > 0, or
both. Just that PTR_TRUSTED is lost for the destination reg as soon as
btf_struct_access is used to walk pointers (unlike PTR_UNTRUSTED which
is inherited). Note that you don't _set_ PTR_UNTRUSTED here for the
dst_reg.

This should enable your immediate use case, while also being useful
for future work that builds on it. It should also preserve backwards
compatibility with existing stable helpers. You set PTR_TRUSTED when
you access ctx of tracing or struct_ops etc. All this will be handled
in is_valid_access callback/btf_ctx_access and setting the flag in
bpf_insn_access_aux. Unless I missed some detail/corner case it won't
be a lot of code.

If that turns out to be too restrictive/coarse for your use case, we
can just go with this as is for now.

Does that sound good? Any other comments/opinions?

David Vernet Oct. 11, 2022, 2:40 a.m. UTC | #11

On Tue, Oct 11, 2022 at 07:59:29AM +0530, Kumar Kartikeya Dwivedi wrote:
> On Tue, 4 Oct 2022 at 20:40, David Vernet <void@manifault.com> wrote:
> >
> > On Tue, Oct 04, 2022 at 12:22:08AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > Thanks for providing additional context, Kumar. So what do we want to do
> > > > for this patch set? IMO it doesn't seem useful to restrict
> > > > bpf_kfunc_acquire() to only be callable by non-sleepable programs if our
> > > > goal is to avoid crashes for nested task structs. We could easily
> > > > accidentally crash if e.g. those pointers are NULL, or someone is doing
> > > > something weird like stashing some extra flag bits in unused portions of
> > > > the pointer which are masked out when it's actually dereferenced
> > > > regardless of whether we're in RCU.  Trusting ctx loads sounds like the
> > > > right approach, barring some of the challenges you pointed out such as
> > > > dealing with fexit paths after free where the object may not be valid
> > > > anymore.
> > > >
> > > > In general, it seems like we should maybe decide on what our policy
> > > > should be for kfuncs until we can wire up whatever we need to properly
> > > > trust ctx.
> > >
> > > Well, we could add it now and work towards closing the gaps after
> > > this, especially if bpf_task_acquire is really only useful in
> > > sleepable programs where it works on the tracing args. A lot of other
> > > kfuncs need these fixes as well, so it's a general problem and not
> > > specific to this set. I am not very familiar with your exact use case.
> > > Hopefully when it is fixed this particular case won't really break, if
> > > you only use the tracepoint argument.
> >
> > I'm also interested in using this with struct_ops, not just tracing. I
> > think that struct_ops should be totally fine though, and easier to
> > reason about than tracing as we just have to make sure that a few
> > specific callbacks are always passed a valid, referenced task, rather
> > than e.g. worrying about fexit on __put_task_struct().
> >
> > I'm fine with adding this now and working towards closing the gaps
> > later, though I'd like to hear what Martin, Alexei, and the rest of the
> > BPF maintainers think. I think Martin asked if there was any preliminary
> > work you'd already done that we could try to tie into this patch set,
> > and I'm similarly curious.
> >
> 
> It's mostly a few experimental patches in my local tree, so nowhere
> close to completion. Ofcourse doing it properly will be a lot of work,
> but I will be happy to help with reviews if you want to focus on
> pointers loaded from ctx for now and make that part of this set, while
> not permitting any other cases. It should not be very difficult to add
> just that.

Sounds good, I'll include that in the v3 iteration of the patch set. We
can worry about black-listing certain unsafe tracing programs, and doing
whatever other fancier things we can think of as well later.

> So you can set KF_TRUSTED_ARGS for your kfunc, then make it work for
> PTR_TO_BTF_ID where it either has PTR_TRUSTED, ref_obj_id > 0, or
> both. Just that PTR_TRUSTED is lost for the destination reg as soon as
> btf_struct_access is used to walk pointers (unlike PTR_UNTRUSTED which
> is inherited). Note that you don't _set_ PTR_UNTRUSTED here for the
> dst_reg.

Makes sense, I think I know what needs to change and it should only be a
few small places in the verifier (plus the testing we'll add to validate
it).

> This should enable your immediate use case, while also being useful
> for future work that builds on it. It should also preserve backwards
> compatibility with existing stable helpers. You set PTR_TRUSTED when
> you access ctx of tracing or struct_ops etc. All this will be handled
> in is_valid_access callback/btf_ctx_access and setting the flag in
> bpf_insn_access_aux. Unless I missed some detail/corner case it won't
> be a lot of code.
> 
> If that turns out to be too restrictive/coarse for your use case, we
> can just go with this as is for now.

Alright, works for me. Let me mess around with this and see if it's as
easy as we think it is. If so, v3 will include those changes. If not,
it'll just include a couple of small fixes that Martin pointed out on
the v2 revision.

Thanks for reviewing these patches.

- David

[v2,2/2] bpf/selftests: Add selftests for new task kfuncs

Checks

Commit Message

Comments

Patch