diff mbox series

[V3,bpf-next,1/2] bpf: add bpf_task_get_cgroup kfunc

Message ID 20240319050302.1085006-1-josef@netflix.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [V3,bpf-next,1/2] bpf: add bpf_task_get_cgroup kfunc | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 985 this patch: 986
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 13 of 13 maintainers
netdev/build_clang success Errors and warnings before: 957 this patch: 957
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 1002 this patch: 1003
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 38 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization

Commit Message

Jose Fernandez March 19, 2024, 5:03 a.m. UTC
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.

The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.

Signed-off-by: Jose Fernandez <josef@netflix.com>
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Stanislav Fomichev <sdf@google.com>
---
V2 -> V3: No changes
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID

 kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)


base-commit: c733239f8f530872a1f80d8c45dcafbaff368737

Comments

Alexei Starovoitov March 20, 2024, 6:03 a.m. UTC | #1
On Mon, Mar 18, 2024 at 10:04 PM Jose Fernandez <josef@netflix.com> wrote:
>
> This patch enhances the BPF helpers by adding a kfunc to retrieve the
> cgroup v2 of a task, addressing a previous limitation where only
> bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
> particularly useful for scenarios where obtaining the cgroup ID of a
> task other than the "current" one is necessary, which the existing
> bpf_get_current_cgroup_id helper cannot accommodate. A specific use
> case at Netflix involved the sched_switch tracepoint, where we had to
> get the cgroup IDs of both the prev and next tasks.
>
> The bpf_task_get_cgroup kfunc acquires and returns a reference to a
> task's default cgroup, ensuring thread-safe access by correctly
> implementing RCU read locking and unlocking. It leverages the existing
> cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
>
> Signed-off-by: Jose Fernandez <josef@netflix.com>
> Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> Acked-by: Stanislav Fomichev <sdf@google.com>
> ---
> V2 -> V3: No changes
> V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
>
>  kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a89587859571..bbd19d5eedb6 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
>                 return NULL;
>         return cgrp;
>  }
> +
> +/**
> + * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
> + * @task: The target task
> + *
> + * This function returns the task's default cgroup, primarily
> + * designed for use with cgroup v2. In cgroup v1, the concept of default
> + * cgroup varies by subsystem, and while this function will work with
> + * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
> + * A cgroup returned by this kfunc which is not subsequently stored in a
> + * map, must be released by calling bpf_cgroup_release().
> + *
> + * Return: On success, the cgroup is returned. On failure, NULL is returned.
> + */
> +__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
> +{
> +       struct cgroup *cgrp;
> +
> +       rcu_read_lock();
> +       cgrp = task_dfl_cgroup(task);
> +       if (!cgroup_tryget(cgrp))
> +               cgrp = NULL;
> +       rcu_read_unlock();
> +       return cgrp;
> +}

Tejun,

Does this patch look good to you?
Tejun Heo March 20, 2024, 2:41 p.m. UTC | #2
On Mon, Mar 18, 2024 at 11:03:01PM -0600, Jose Fernandez wrote:
> This patch enhances the BPF helpers by adding a kfunc to retrieve the
> cgroup v2 of a task, addressing a previous limitation where only
> bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
> particularly useful for scenarios where obtaining the cgroup ID of a
> task other than the "current" one is necessary, which the existing
> bpf_get_current_cgroup_id helper cannot accommodate. A specific use
> case at Netflix involved the sched_switch tracepoint, where we had to
> get the cgroup IDs of both the prev and next tasks.
> 
> The bpf_task_get_cgroup kfunc acquires and returns a reference to a
> task's default cgroup, ensuring thread-safe access by correctly
> implementing RCU read locking and unlocking. It leverages the existing
> cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
> 
> Signed-off-by: Jose Fernandez <josef@netflix.com>
> Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> Acked-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Tejun Heo <tj@kernel.org>

but some questions below

> +__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
> +{
> +	struct cgroup *cgrp;
> +
> +	rcu_read_lock();
> +	cgrp = task_dfl_cgroup(task);
> +	if (!cgroup_tryget(cgrp))
> +		cgrp = NULL;
> +	rcu_read_unlock();
> +	return cgrp;
> +}

So, as this is a lot easier in cgroup2, the above can probably be written
directly in BPF (untested and not sure the necessary annotations are in
place, so please take it with a big grain of salt):

	bpf_rcu_read_lock();
	cgrp = task->cgroups->dfl_cgrp;
	cgrp = bpf_cgroup_from_id(cgrp->kn.id);
	bpf_rcu_read_unlock();

If all you need is ID, it's even simpler:

	bpf_rcu_read_lock();
	cgrp_id = task->cgroups->dfl_cgrp->kn.id;
	bpf_rcu_read_unlock();

In the first example, it's not great that we go from task pointer to cgroup
pointer to ID and then back to acquired cgroup pointer. I wonder whether
what we really want is to support something like the following:

	bpf_rcu_read_lock();
	cgrp = bpf_cgroup_tryget(task->cgroups->dfl_cgrp);
	bpf_rcu_read_unlock();

What do you think?

Thanks.
Alexei Starovoitov March 20, 2024, 11:05 p.m. UTC | #3
On Wed, Mar 20, 2024 at 7:41 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Mon, Mar 18, 2024 at 11:03:01PM -0600, Jose Fernandez wrote:
> > This patch enhances the BPF helpers by adding a kfunc to retrieve the
> > cgroup v2 of a task, addressing a previous limitation where only
> > bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
> > particularly useful for scenarios where obtaining the cgroup ID of a
> > task other than the "current" one is necessary, which the existing
> > bpf_get_current_cgroup_id helper cannot accommodate. A specific use
> > case at Netflix involved the sched_switch tracepoint, where we had to
> > get the cgroup IDs of both the prev and next tasks.
> >
> > The bpf_task_get_cgroup kfunc acquires and returns a reference to a
> > task's default cgroup, ensuring thread-safe access by correctly
> > implementing RCU read locking and unlocking. It leverages the existing
> > cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
> >
> > Signed-off-by: Jose Fernandez <josef@netflix.com>
> > Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
> > Acked-by: Yonghong Song <yonghong.song@linux.dev>
> > Acked-by: Stanislav Fomichev <sdf@google.com>
>
> Acked-by: Tejun Heo <tj@kernel.org>
>
> but some questions below
>
> > +__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
> > +{
> > +     struct cgroup *cgrp;
> > +
> > +     rcu_read_lock();
> > +     cgrp = task_dfl_cgroup(task);
> > +     if (!cgroup_tryget(cgrp))
> > +             cgrp = NULL;
> > +     rcu_read_unlock();
> > +     return cgrp;
> > +}
>
> So, as this is a lot easier in cgroup2, the above can probably be written
> directly in BPF (untested and not sure the necessary annotations are in
> place, so please take it with a big grain of salt):
>
>         bpf_rcu_read_lock();
>         cgrp = task->cgroups->dfl_cgrp;
>         cgrp = bpf_cgroup_from_id(cgrp->kn.id);
>         bpf_rcu_read_unlock();
>
> If all you need is ID, it's even simpler:
>
>         bpf_rcu_read_lock();
>         cgrp_id = task->cgroups->dfl_cgrp->kn.id;
>         bpf_rcu_read_unlock();

Good point.
Looks like this kfunc isn't really needed.

We even have the following in one of the selftests:
        /* simulate bpf_get_current_cgroup_id() helper */
        bpf_rcu_read_lock();
        cgroups = task->cgroups;
        if (!cgroups)
                goto unlock;
        cgroup_id = cgroups->dfl_cgrp->kn->id;
unlock:
        bpf_rcu_read_unlock();


> In the first example, it's not great that we go from task pointer to cgroup
> pointer to ID and then back to acquired cgroup pointer. I wonder whether
> what we really want is to support something like the following:
>
>         bpf_rcu_read_lock();
>         cgrp = bpf_cgroup_tryget(task->cgroups->dfl_cgrp);
>         bpf_rcu_read_unlock();

this one should work already.
that's existing bpf_cgroup_acquire().

pw-bot: cr
diff mbox series

Patch

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@  bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
 		return NULL;
 	return cgrp;
 }
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+	struct cgroup *cgrp;
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(task);
+	if (!cgroup_tryget(cgrp))
+		cgrp = NULL;
+	rcu_read_unlock();
+	return cgrp;
+}
 #endif /* CONFIG_CGROUPS */
 
 /**
@@ -2573,6 +2598,7 @@  BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
 BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 #endif
 BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_throw)