Message ID | 20210809235146.1663522-1-yhs@fb.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: fix a couple of issues with syscall program | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 1 blamed authors not CCed: namhyung@kernel.org; 6 maintainers not CCed: netdev@vger.kernel.org john.fastabend@gmail.com namhyung@kernel.org songliubraving@fb.com kpsingh@kernel.org kafai@fb.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 6 this patch: 6 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | fail | ERROR: Avoid using diff content in the commit message - patch(1) might not work |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 6 this patch: 6 |
netdev/header_inline | success | Link |
On Mon, Aug 9, 2021 at 4:51 PM Yonghong Song <yhs@fb.com> wrote: > > Currently, if bpf_get_current_cgroup_id() or > bpf_get_current_ancestor_cgroup_id() helper is > called with sleepable programs e.g., sleepable > fentry/fmod_ret/fexit/lsm programs, a rcu warning > may appear. For example, if I added the following > hack to test_progs/test_lsm sleepable fentry program > test_sys_setdomainname: > > --- a/tools/testing/selftests/bpf/progs/lsm.c > +++ b/tools/testing/selftests/bpf/progs/lsm.c > @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) > int buf = 0; > long ret; > > + __u64 cg_id = bpf_get_current_cgroup_id(); > + if (cg_id == 1000) > + copy_test++; > + > ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); > if (len == -2 && ret == 0 && buf == 1234) > copy_test++; > > I will hit the following rcu warning: > > include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > other info that might help us debug this: > rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by test_progs/260: > #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 > stack backtrace: > CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > Call Trace: > dump_stack_lvl+0x56/0x7b > bpf_get_current_cgroup_id+0x9c/0xb1 > bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c > bpf_trampoline_6442469132_0+0x2d/0x1000 > __x64_sys_setdomainname+0x5/0x110 > do_syscall_64+0x3a/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. > syzbot reported a similar issue in [1] for syscall program. Helper > bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() > has the following callchain: > task_dfl_cgroup > task_css_set > task_css_set_check > and we have > #define task_css_set_check(task, __c) \ > rcu_dereference_check((task)->cgroups, \ > lockdep_is_held(&cgroup_mutex) || \ > lockdep_is_held(&css_set_lock) || \ > ((task)->flags & PF_EXITING) || (__c)) > Since cgroup_mutex/css_set_lock is not held and the task > is not existing and rcu read_lock is not held, a warning > will be issued. Note that bpf sleepable program is protected by > rcu_read_lock_trace(). > > The above sleepable bpf programs are already protected > by migrate_disable(). Adding rcu_read_lock() in these > two helpers will silence the above warning. > I marked the patch fixing 95b861a7935b > ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > which added bpf_get_current_ancestor_cgroup_id() to tracing programs > in 5.14. I think backporting 5.14 is probably good enough as sleepable > progrems are not widely used. > > This patch should fix [1] as well since syscall program is a sleepable > program protected with migrate_disable(). > > [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ > > Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com > Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > Signed-off-by: Yonghong Song <yhs@fb.com> > --- LGTM, thanks! Acked-by: Andrii Nakryiko <andrii@kernel.org> > kernel/bpf/helpers.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 62cf00383910..4567d2841133 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = { > #ifdef CONFIG_CGROUPS > BPF_CALL_0(bpf_get_current_cgroup_id) > { > - struct cgroup *cgrp = task_dfl_cgroup(current); > + struct cgroup *cgrp; > + > + rcu_read_lock(); > + cgrp = task_dfl_cgroup(current); > + rcu_read_unlock(); > > return cgroup_id(cgrp); > } > @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { > > BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level) > { > - struct cgroup *cgrp = task_dfl_cgroup(current); > + struct cgroup *cgrp; > struct cgroup *ancestor; > > + rcu_read_lock(); > + cgrp = task_dfl_cgroup(current); > + rcu_read_unlock(); > + > ancestor = cgroup_ancestor(cgrp, ancestor_level); > if (!ancestor) > return 0; > -- > 2.30.2 >
[ +Paul ] On 8/10/21 1:51 AM, Yonghong Song wrote: [...] > I will hit the following rcu warning: > > include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > other info that might help us debug this: > rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by test_progs/260: > #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 > stack backtrace: > CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > Call Trace: > dump_stack_lvl+0x56/0x7b > bpf_get_current_cgroup_id+0x9c/0xb1 > bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c > bpf_trampoline_6442469132_0+0x2d/0x1000 > __x64_sys_setdomainname+0x5/0x110 > do_syscall_64+0x3a/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. > syzbot reported a similar issue in [1] for syscall program. Helper > bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() > has the following callchain: > task_dfl_cgroup > task_css_set > task_css_set_check > and we have > #define task_css_set_check(task, __c) \ > rcu_dereference_check((task)->cgroups, \ > lockdep_is_held(&cgroup_mutex) || \ > lockdep_is_held(&css_set_lock) || \ > ((task)->flags & PF_EXITING) || (__c)) > Since cgroup_mutex/css_set_lock is not held and the task > is not existing and rcu read_lock is not held, a warning > will be issued. Note that bpf sleepable program is protected by > rcu_read_lock_trace(). > > The above sleepable bpf programs are already protected > by migrate_disable(). Adding rcu_read_lock() in these > two helpers will silence the above warning. > I marked the patch fixing 95b861a7935b > ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > which added bpf_get_current_ancestor_cgroup_id() to tracing programs > in 5.14. I think backporting 5.14 is probably good enough as sleepable > progrems are not widely used. > > This patch should fix [1] as well since syscall program is a sleepable > program protected with migrate_disable(). > > [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ > > Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com > Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > kernel/bpf/helpers.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 62cf00383910..4567d2841133 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = { > #ifdef CONFIG_CGROUPS > BPF_CALL_0(bpf_get_current_cgroup_id) > { > - struct cgroup *cgrp = task_dfl_cgroup(current); > + struct cgroup *cgrp; > + > + rcu_read_lock(); > + cgrp = task_dfl_cgroup(current); > + rcu_read_unlock(); > > return cgroup_id(cgrp); I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace() context, then the above is racy given you access the memory via cgroup_id() outside of it, same below. If rcu_read_lock_trace() is enough and the above is really just to silence the 'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as a cleaner workaround], which one is it? > } > @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { > > BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level) > { > - struct cgroup *cgrp = task_dfl_cgroup(current); > + struct cgroup *cgrp; > struct cgroup *ancestor; > > + rcu_read_lock(); > + cgrp = task_dfl_cgroup(current); > + rcu_read_unlock(); > + > ancestor = cgroup_ancestor(cgrp, ancestor_level); > if (!ancestor) > return 0; > Thanks, Daniel
On Tue, Aug 10, 2021 at 10:08:58AM +0200, Daniel Borkmann wrote: > [ +Paul ] > > On 8/10/21 1:51 AM, Yonghong Song wrote: > [...] > > I will hit the following rcu warning: > > > > include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > rcu_scheduler_active = 2, debug_locks = 1 > > 1 lock held by test_progs/260: > > #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 > > stack backtrace: > > CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > > Call Trace: > > dump_stack_lvl+0x56/0x7b > > bpf_get_current_cgroup_id+0x9c/0xb1 > > bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c > > bpf_trampoline_6442469132_0+0x2d/0x1000 > > __x64_sys_setdomainname+0x5/0x110 > > do_syscall_64+0x3a/0x80 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. > > syzbot reported a similar issue in [1] for syscall program. Helper > > bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() > > has the following callchain: > > task_dfl_cgroup > > task_css_set > > task_css_set_check > > and we have > > #define task_css_set_check(task, __c) \ > > rcu_dereference_check((task)->cgroups, \ > > lockdep_is_held(&cgroup_mutex) || \ > > lockdep_is_held(&css_set_lock) || \ > > ((task)->flags & PF_EXITING) || (__c)) > > Since cgroup_mutex/css_set_lock is not held and the task > > is not existing and rcu read_lock is not held, a warning > > will be issued. Note that bpf sleepable program is protected by > > rcu_read_lock_trace(). > > > > The above sleepable bpf programs are already protected > > by migrate_disable(). Adding rcu_read_lock() in these > > two helpers will silence the above warning. > > I marked the patch fixing 95b861a7935b > > ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > > which added bpf_get_current_ancestor_cgroup_id() to tracing programs > > in 5.14. I think backporting 5.14 is probably good enough as sleepable > > progrems are not widely used. > > > > This patch should fix [1] as well since syscall program is a sleepable > > program protected with migrate_disable(). > > > > [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ > > > > Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com > > Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > > Signed-off-by: Yonghong Song <yhs@fb.com> > > --- > > kernel/bpf/helpers.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index 62cf00383910..4567d2841133 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = { > > #ifdef CONFIG_CGROUPS > > BPF_CALL_0(bpf_get_current_cgroup_id) > > { > > - struct cgroup *cgrp = task_dfl_cgroup(current); > > + struct cgroup *cgrp; > > + > > + rcu_read_lock(); > > + cgrp = task_dfl_cgroup(current); > > + rcu_read_unlock(); > > return cgroup_id(cgrp); > > I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace() > context, then the above is racy given you access the memory via cgroup_id() outside of it, > same below. If rcu_read_lock_trace() is enough and the above is really just to silence the > 'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from > task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as > a cleaner workaround], which one is it? At first glance, something like this would work from an RCU perspective: BPF_CALL_0(bpf_get_current_cgroup_id) { struct cgroup *cgrp = task_dfl_cgroup(current); struct cgroup *cgrp; u64 ret; rcu_read_lock(); cgrp = task_dfl_cgroup(current); ret = cgroup_id(cgrp); rcu_read_unlock(); return ret; } If I am reading the code correctly, rcu_read_lock() is required to keep the cgroup from going away, and the rcu_read_lock_trace() is needed in addition in order to keep the BPF program from going away. There might well be better ways to do this, but the above obeys the letter of the law. Or at least the law as I understand it. ;-) Or does the fact that a cgroup contains a given task prevent that cgroup from going away? (I -think- that tasks can be removed from cgroups without the cooperation of those tasks, but then again there is much about cgroups that I do not know.) Thanx, Paul > > } > > @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { > > BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level) > > { > > - struct cgroup *cgrp = task_dfl_cgroup(current); > > + struct cgroup *cgrp; > > struct cgroup *ancestor; > > + rcu_read_lock(); > > + cgrp = task_dfl_cgroup(current); > > + rcu_read_unlock(); > > + > > ancestor = cgroup_ancestor(cgrp, ancestor_level); > > if (!ancestor) > > return 0; > > > > Thanks, > Daniel
Add Tejun to provide more clarification on cgroup side. On 8/10/21 9:32 AM, Paul E. McKenney wrote: > On Tue, Aug 10, 2021 at 10:08:58AM +0200, Daniel Borkmann wrote: >> [ +Paul ] >> >> On 8/10/21 1:51 AM, Yonghong Song wrote: >> [...] >>> I will hit the following rcu warning: >>> >>> include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! >>> other info that might help us debug this: >>> rcu_scheduler_active = 2, debug_locks = 1 >>> 1 lock held by test_progs/260: >>> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 >>> stack backtrace: >>> CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >>> Call Trace: >>> dump_stack_lvl+0x56/0x7b >>> bpf_get_current_cgroup_id+0x9c/0xb1 >>> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c >>> bpf_trampoline_6442469132_0+0x2d/0x1000 >>> __x64_sys_setdomainname+0x5/0x110 >>> do_syscall_64+0x3a/0x80 >>> entry_SYSCALL_64_after_hwframe+0x44/0xae >>> >>> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. >>> syzbot reported a similar issue in [1] for syscall program. Helper >>> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() >>> has the following callchain: >>> task_dfl_cgroup >>> task_css_set >>> task_css_set_check >>> and we have >>> #define task_css_set_check(task, __c) \ >>> rcu_dereference_check((task)->cgroups, \ >>> lockdep_is_held(&cgroup_mutex) || \ >>> lockdep_is_held(&css_set_lock) || \ >>> ((task)->flags & PF_EXITING) || (__c)) >>> Since cgroup_mutex/css_set_lock is not held and the task >>> is not existing and rcu read_lock is not held, a warning >>> will be issued. Note that bpf sleepable program is protected by >>> rcu_read_lock_trace(). >>> >>> The above sleepable bpf programs are already protected >>> by migrate_disable(). Adding rcu_read_lock() in these >>> two helpers will silence the above warning. >>> I marked the patch fixing 95b861a7935b >>> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>> which added bpf_get_current_ancestor_cgroup_id() to tracing programs >>> in 5.14. I think backporting 5.14 is probably good enough as sleepable >>> progrems are not widely used. >>> >>> This patch should fix [1] as well since syscall program is a sleepable >>> program protected with migrate_disable(). >>> >>> [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ >>> >>> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com >>> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>> Signed-off-by: Yonghong Song <yhs@fb.com> >>> --- >>> kernel/bpf/helpers.c | 12 ++++++++++-- >>> 1 file changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >>> index 62cf00383910..4567d2841133 100644 >>> --- a/kernel/bpf/helpers.c >>> +++ b/kernel/bpf/helpers.c >>> @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = { >>> #ifdef CONFIG_CGROUPS >>> BPF_CALL_0(bpf_get_current_cgroup_id) >>> { >>> - struct cgroup *cgrp = task_dfl_cgroup(current); >>> + struct cgroup *cgrp; >>> + >>> + rcu_read_lock(); >>> + cgrp = task_dfl_cgroup(current); >>> + rcu_read_unlock(); >>> return cgroup_id(cgrp); >> >> I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace() >> context, then the above is racy given you access the memory via cgroup_id() outside of it, >> same below. If rcu_read_lock_trace() is enough and the above is really just to silence the >> 'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from >> task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as >> a cleaner workaround], which one is it? Thanks for pointing out. Definitely a bug on my side. rcu_read_lock() region should include cgroup_id(cgrp) as well. > > At first glance, something like this would work from an RCU perspective: > > BPF_CALL_0(bpf_get_current_cgroup_id) > { > struct cgroup *cgrp = task_dfl_cgroup(current); > struct cgroup *cgrp; > u64 ret; > > rcu_read_lock(); > cgrp = task_dfl_cgroup(current); > ret = cgroup_id(cgrp); > rcu_read_unlock(); > return ret; > } > > If I am reading the code correctly, rcu_read_lock() is required to keep > the cgroup from going away, and the rcu_read_lock_trace() is needed in > addition in order to keep the BPF program from going away. > > There might well be better ways to do this, but the above obeys the > letter of the law. Or at least the law as I understand it. ;-) > > Or does the fact that a cgroup contains a given task prevent that cgroup > from going away? (I -think- that tasks can be removed from cgroups > without the cooperation of those tasks, but then again there is much > about cgroups that I do not know.) > > Thanx, Paul > >>> } >>> @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { >>> BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level) >>> { >>> - struct cgroup *cgrp = task_dfl_cgroup(current); >>> + struct cgroup *cgrp; >>> struct cgroup *ancestor; >>> + rcu_read_lock(); >>> + cgrp = task_dfl_cgroup(current); >>> + rcu_read_unlock(); >>> + >>> ancestor = cgroup_ancestor(cgrp, ancestor_level); >>> if (!ancestor) >>> return 0; >>> >> >> Thanks, >> Daniel
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 62cf00383910..4567d2841133 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = { #ifdef CONFIG_CGROUPS BPF_CALL_0(bpf_get_current_cgroup_id) { - struct cgroup *cgrp = task_dfl_cgroup(current); + struct cgroup *cgrp; + + rcu_read_lock(); + cgrp = task_dfl_cgroup(current); + rcu_read_unlock(); return cgroup_id(cgrp); } @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level) { - struct cgroup *cgrp = task_dfl_cgroup(current); + struct cgroup *cgrp; struct cgroup *ancestor; + rcu_read_lock(); + cgrp = task_dfl_cgroup(current); + rcu_read_unlock(); + ancestor = cgroup_ancestor(cgrp, ancestor_level); if (!ancestor) return 0;
Currently, if bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() helper is called with sleepable programs e.g., sleepable fentry/fmod_ret/fexit/lsm programs, a rcu warning may appear. For example, if I added the following hack to test_progs/test_lsm sleepable fentry program test_sys_setdomainname: --- a/tools/testing/selftests/bpf/progs/lsm.c +++ b/tools/testing/selftests/bpf/progs/lsm.c @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) int buf = 0; long ret; + __u64 cg_id = bpf_get_current_cgroup_id(); + if (cg_id == 1000) + copy_test++; + ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); if (len == -2 && ret == 0 && buf == 1234) copy_test++; I will hit the following rcu warning: include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by test_progs/260: #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 stack backtrace: CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x56/0x7b bpf_get_current_cgroup_id+0x9c/0xb1 bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c bpf_trampoline_6442469132_0+0x2d/0x1000 __x64_sys_setdomainname+0x5/0x110 do_syscall_64+0x3a/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. syzbot reported a similar issue in [1] for syscall program. Helper bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() has the following callchain: task_dfl_cgroup task_css_set task_css_set_check and we have #define task_css_set_check(task, __c) \ rcu_dereference_check((task)->cgroups, \ lockdep_is_held(&cgroup_mutex) || \ lockdep_is_held(&css_set_lock) || \ ((task)->flags & PF_EXITING) || (__c)) Since cgroup_mutex/css_set_lock is not held and the task is not existing and rcu read_lock is not held, a warning will be issued. Note that bpf sleepable program is protected by rcu_read_lock_trace(). The above sleepable bpf programs are already protected by migrate_disable(). Adding rcu_read_lock() in these two helpers will silence the above warning. I marked the patch fixing 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") which added bpf_get_current_ancestor_cgroup_id() to tracing programs in 5.14. I think backporting 5.14 is probably good enough as sleepable progrems are not widely used. This patch should fix [1] as well since syscall program is a sleepable program protected with migrate_disable(). [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") Signed-off-by: Yonghong Song <yhs@fb.com> --- kernel/bpf/helpers.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)