Message ID | 20210809060315.1175802-1-yhs@fb.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf: fix a couple of issues with syscall program | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 1 blamed authors not CCed: namhyung@kernel.org; 8 maintainers not CCed: mingo@redhat.com netdev@vger.kernel.org john.fastabend@gmail.com namhyung@kernel.org rostedt@goodmis.org songliubraving@fb.com kpsingh@kernel.org kafai@fb.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 6 this patch: 6 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | fail | ERROR: Avoid using diff content in the commit message - patch(1) might not work |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 6 this patch: 6 |
netdev/header_inline | success | Link |
On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: > > Currently, if bpf_get_current_cgroup_id() or > bpf_get_current_ancestor_cgroup_id() helper is > called with sleepable programs e.g., sleepable > fentry/fmod_ret/fexit/lsm programs, a rcu warning > may appear. For example, if I added the following > hack to test_progs/test_lsm sleepable fentry program > test_sys_setdomainname: > > --- a/tools/testing/selftests/bpf/progs/lsm.c > +++ b/tools/testing/selftests/bpf/progs/lsm.c > @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) > int buf = 0; > long ret; > > + __u64 cg_id = bpf_get_current_cgroup_id(); > + if (cg_id == 1000) > + copy_test++; > + > ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); > if (len == -2 && ret == 0 && buf == 1234) > copy_test++; > > I will hit the following rcu warning: > > include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > other info that might help us debug this: > rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by test_progs/260: > #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 > stack backtrace: > CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > Call Trace: > dump_stack_lvl+0x56/0x7b > bpf_get_current_cgroup_id+0x9c/0xb1 > bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c > bpf_trampoline_6442469132_0+0x2d/0x1000 > __x64_sys_setdomainname+0x5/0x110 > do_syscall_64+0x3a/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. > syzbot reported a similar issue in [1] for syscall program. Helper > bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() > has the following callchain: > task_dfl_cgroup > task_css_set > task_css_set_check > and we have > #define task_css_set_check(task, __c) \ > rcu_dereference_check((task)->cgroups, \ > lockdep_is_held(&cgroup_mutex) || \ > lockdep_is_held(&css_set_lock) || \ > ((task)->flags & PF_EXITING) || (__c)) > Since cgroup_mutex/css_set_lock is not held and the task > is not existing and rcu read_lock is not held, a warning > will be issued. Note that bpf sleepable program is protected by > rcu_read_lock_trace(). > > To fix the issue, let us make these two helpers not available > to sleepable program. I marked the patch fixing 95b861a7935b > ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > which added bpf_get_current_ancestor_cgroup_id() to > 5.14. I think backporting 5.14 is probably good enough as sleepable > progrems are not widely used. > > This patch should fix [1] as well since syscall program is a sleepable > program and bpf_get_current_cgroup_id() is not available to > syscall program any more. > > [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ > > Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com > Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > kernel/trace/bpf_trace.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index b4916ef388ad..eaa8a8ffbe46 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > #endif > #ifdef CONFIG_CGROUPS > case BPF_FUNC_get_current_cgroup_id: > - return &bpf_get_current_cgroup_id_proto; > + return prog->aux->sleepable ? > + NULL : &bpf_get_current_cgroup_id_proto; > case BPF_FUNC_get_current_ancestor_cgroup_id: > - return &bpf_get_current_ancestor_cgroup_id_proto; > + return prog->aux->sleepable ? > + NULL : &bpf_get_current_ancestor_cgroup_id_proto; This feels too extreme. I bet these helpers are as useful in sleepable BPF progs as they are in non-sleepable ones. Why don't we just implement a variant of get_current_cgroup_id (and the ancestor variant as well) which takes that cgroup_mutex lock, and just pick the appropriate implementation. Wouldn't that work? > #endif > case BPF_FUNC_send_signal: > return &bpf_send_signal_proto; > -- > 2.30.2 >
On 8/9/21 10:18 AM, Andrii Nakryiko wrote: > On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: >> >> Currently, if bpf_get_current_cgroup_id() or >> bpf_get_current_ancestor_cgroup_id() helper is >> called with sleepable programs e.g., sleepable >> fentry/fmod_ret/fexit/lsm programs, a rcu warning >> may appear. For example, if I added the following >> hack to test_progs/test_lsm sleepable fentry program >> test_sys_setdomainname: >> >> --- a/tools/testing/selftests/bpf/progs/lsm.c >> +++ b/tools/testing/selftests/bpf/progs/lsm.c >> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) >> int buf = 0; >> long ret; >> >> + __u64 cg_id = bpf_get_current_cgroup_id(); >> + if (cg_id == 1000) >> + copy_test++; >> + >> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); >> if (len == -2 && ret == 0 && buf == 1234) >> copy_test++; >> >> I will hit the following rcu warning: >> >> include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! >> other info that might help us debug this: >> rcu_scheduler_active = 2, debug_locks = 1 >> 1 lock held by test_progs/260: >> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 >> stack backtrace: >> CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >> Call Trace: >> dump_stack_lvl+0x56/0x7b >> bpf_get_current_cgroup_id+0x9c/0xb1 >> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c >> bpf_trampoline_6442469132_0+0x2d/0x1000 >> __x64_sys_setdomainname+0x5/0x110 >> do_syscall_64+0x3a/0x80 >> entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. >> syzbot reported a similar issue in [1] for syscall program. Helper >> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() >> has the following callchain: >> task_dfl_cgroup >> task_css_set >> task_css_set_check >> and we have >> #define task_css_set_check(task, __c) \ >> rcu_dereference_check((task)->cgroups, \ >> lockdep_is_held(&cgroup_mutex) || \ >> lockdep_is_held(&css_set_lock) || \ >> ((task)->flags & PF_EXITING) || (__c)) >> Since cgroup_mutex/css_set_lock is not held and the task >> is not existing and rcu read_lock is not held, a warning >> will be issued. Note that bpf sleepable program is protected by >> rcu_read_lock_trace(). >> >> To fix the issue, let us make these two helpers not available >> to sleepable program. I marked the patch fixing 95b861a7935b >> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >> which added bpf_get_current_ancestor_cgroup_id() to >> 5.14. I think backporting 5.14 is probably good enough as sleepable >> progrems are not widely used. >> >> This patch should fix [1] as well since syscall program is a sleepable >> program and bpf_get_current_cgroup_id() is not available to >> syscall program any more. >> >> [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ >> >> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com >> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >> Signed-off-by: Yonghong Song <yhs@fb.com> >> --- >> kernel/trace/bpf_trace.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >> index b4916ef388ad..eaa8a8ffbe46 100644 >> --- a/kernel/trace/bpf_trace.c >> +++ b/kernel/trace/bpf_trace.c >> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) >> #endif >> #ifdef CONFIG_CGROUPS >> case BPF_FUNC_get_current_cgroup_id: >> - return &bpf_get_current_cgroup_id_proto; >> + return prog->aux->sleepable ? >> + NULL : &bpf_get_current_cgroup_id_proto; >> case BPF_FUNC_get_current_ancestor_cgroup_id: >> - return &bpf_get_current_ancestor_cgroup_id_proto; >> + return prog->aux->sleepable ? >> + NULL : &bpf_get_current_ancestor_cgroup_id_proto; > > This feels too extreme. I bet these helpers are as useful in sleepable > BPF progs as they are in non-sleepable ones. > > Why don't we just implement a variant of get_current_cgroup_id (and > the ancestor variant as well) which takes that cgroup_mutex lock, and > just pick the appropriate implementation. Wouldn't that work? This may not work. e.g., for sleepable fentry program, if the to-be-traced function is inside in cgroup_mutex, we will have a deadlock. Currently, affected program types are tracing/fentry.s, tracing/fexit.s, tracing/fmod_ret.s, lsm.s and syscall. For fmod_ret.s, lsm.s, they all have some kind of predefined attachment/context, we might be able to check all potential attachment points and allow these two helpers when attachment point is not surrounded by cgroup_mutex. For syscall program, we should be okay as it is called with bpf_prog_test_run interface but I am not sure why user wants a cgroup_id for that. > >> #endif >> case BPF_FUNC_send_signal: >> return &bpf_send_signal_proto; >> -- >> 2.30.2 >>
On Mon, Aug 9, 2021 at 10:41 AM Yonghong Song <yhs@fb.com> wrote: > > > > On 8/9/21 10:18 AM, Andrii Nakryiko wrote: > > On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: > >> > >> Currently, if bpf_get_current_cgroup_id() or > >> bpf_get_current_ancestor_cgroup_id() helper is > >> called with sleepable programs e.g., sleepable > >> fentry/fmod_ret/fexit/lsm programs, a rcu warning > >> may appear. For example, if I added the following > >> hack to test_progs/test_lsm sleepable fentry program > >> test_sys_setdomainname: > >> > >> --- a/tools/testing/selftests/bpf/progs/lsm.c > >> +++ b/tools/testing/selftests/bpf/progs/lsm.c > >> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) > >> int buf = 0; > >> long ret; > >> > >> + __u64 cg_id = bpf_get_current_cgroup_id(); > >> + if (cg_id == 1000) > >> + copy_test++; > >> + > >> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); > >> if (len == -2 && ret == 0 && buf == 1234) > >> copy_test++; > >> > >> I will hit the following rcu warning: > >> > >> include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > >> other info that might help us debug this: > >> rcu_scheduler_active = 2, debug_locks = 1 > >> 1 lock held by test_progs/260: > >> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 > >> stack backtrace: > >> CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > >> Call Trace: > >> dump_stack_lvl+0x56/0x7b > >> bpf_get_current_cgroup_id+0x9c/0xb1 > >> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c > >> bpf_trampoline_6442469132_0+0x2d/0x1000 > >> __x64_sys_setdomainname+0x5/0x110 > >> do_syscall_64+0x3a/0x80 > >> entry_SYSCALL_64_after_hwframe+0x44/0xae > >> > >> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. > >> syzbot reported a similar issue in [1] for syscall program. Helper > >> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() > >> has the following callchain: > >> task_dfl_cgroup > >> task_css_set > >> task_css_set_check > >> and we have > >> #define task_css_set_check(task, __c) \ > >> rcu_dereference_check((task)->cgroups, \ > >> lockdep_is_held(&cgroup_mutex) || \ > >> lockdep_is_held(&css_set_lock) || \ > >> ((task)->flags & PF_EXITING) || (__c)) > >> Since cgroup_mutex/css_set_lock is not held and the task > >> is not existing and rcu read_lock is not held, a warning > >> will be issued. Note that bpf sleepable program is protected by > >> rcu_read_lock_trace(). > >> > >> To fix the issue, let us make these two helpers not available > >> to sleepable program. I marked the patch fixing 95b861a7935b > >> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > >> which added bpf_get_current_ancestor_cgroup_id() to > >> 5.14. I think backporting 5.14 is probably good enough as sleepable > >> progrems are not widely used. > >> > >> This patch should fix [1] as well since syscall program is a sleepable > >> program and bpf_get_current_cgroup_id() is not available to > >> syscall program any more. > >> > >> [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ > >> > >> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com > >> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > >> Signed-off-by: Yonghong Song <yhs@fb.com> > >> --- > >> kernel/trace/bpf_trace.c | 6 ++++-- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > >> index b4916ef388ad..eaa8a8ffbe46 100644 > >> --- a/kernel/trace/bpf_trace.c > >> +++ b/kernel/trace/bpf_trace.c > >> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > >> #endif > >> #ifdef CONFIG_CGROUPS > >> case BPF_FUNC_get_current_cgroup_id: > >> - return &bpf_get_current_cgroup_id_proto; > >> + return prog->aux->sleepable ? > >> + NULL : &bpf_get_current_cgroup_id_proto; > >> case BPF_FUNC_get_current_ancestor_cgroup_id: > >> - return &bpf_get_current_ancestor_cgroup_id_proto; > >> + return prog->aux->sleepable ? > >> + NULL : &bpf_get_current_ancestor_cgroup_id_proto; > > > > This feels too extreme. I bet these helpers are as useful in sleepable > > BPF progs as they are in non-sleepable ones. > > > > Why don't we just implement a variant of get_current_cgroup_id (and > > the ancestor variant as well) which takes that cgroup_mutex lock, and > > just pick the appropriate implementation. Wouldn't that work? > > This may not work. e.g., for sleepable fentry program, > if the to-be-traced function is inside in cgroup_mutex, we will > have a deadlock. We can also do preempty_disable() + rcu_read_lock() inside the helper itself, no? I mean in the new "sleepable" variant. > > Currently, affected program types are tracing/fentry.s, > tracing/fexit.s, tracing/fmod_ret.s, lsm.s and syscall. > For fmod_ret.s, lsm.s, they all have > some kind of predefined attachment/context, we might > be able to check all potential attachment points and > allow these two helpers when attachment point is not > surrounded by cgroup_mutex. I don't think it's feasible to know if any given attached kernel function can be called with cgroup_mutex taken. Static analysis will be too complicated and too restrictive. Runtime checks might be too expensive and/or not generic enough. But see above, we can do rcu_read_lock() inside the helper while preventing preemption, and it will behave the same way as if it was called from non-sleepable BPF prog. > For syscall program, we should be okay as it is > called with bpf_prog_test_run interface but I am > not sure why user wants a cgroup_id for that. > > > > >> #endif > >> case BPF_FUNC_send_signal: > >> return &bpf_send_signal_proto; > >> -- > >> 2.30.2 > >>
On 8/9/21 7:58 PM, Andrii Nakryiko wrote: > On Mon, Aug 9, 2021 at 10:41 AM Yonghong Song <yhs@fb.com> wrote: >> On 8/9/21 10:18 AM, Andrii Nakryiko wrote: >>> On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: >>>> >>>> Currently, if bpf_get_current_cgroup_id() or >>>> bpf_get_current_ancestor_cgroup_id() helper is >>>> called with sleepable programs e.g., sleepable >>>> fentry/fmod_ret/fexit/lsm programs, a rcu warning >>>> may appear. For example, if I added the following >>>> hack to test_progs/test_lsm sleepable fentry program >>>> test_sys_setdomainname: >>>> >>>> --- a/tools/testing/selftests/bpf/progs/lsm.c >>>> +++ b/tools/testing/selftests/bpf/progs/lsm.c >>>> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) >>>> int buf = 0; >>>> long ret; >>>> >>>> + __u64 cg_id = bpf_get_current_cgroup_id(); >>>> + if (cg_id == 1000) >>>> + copy_test++; >>>> + >>>> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); >>>> if (len == -2 && ret == 0 && buf == 1234) >>>> copy_test++; >>>> >>>> I will hit the following rcu warning: >>>> >>>> include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! >>>> other info that might help us debug this: >>>> rcu_scheduler_active = 2, debug_locks = 1 >>>> 1 lock held by test_progs/260: >>>> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 >>>> stack backtrace: >>>> CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >>>> Call Trace: >>>> dump_stack_lvl+0x56/0x7b >>>> bpf_get_current_cgroup_id+0x9c/0xb1 >>>> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c >>>> bpf_trampoline_6442469132_0+0x2d/0x1000 >>>> __x64_sys_setdomainname+0x5/0x110 >>>> do_syscall_64+0x3a/0x80 >>>> entry_SYSCALL_64_after_hwframe+0x44/0xae >>>> >>>> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. >>>> syzbot reported a similar issue in [1] for syscall program. Helper >>>> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() >>>> has the following callchain: >>>> task_dfl_cgroup >>>> task_css_set >>>> task_css_set_check >>>> and we have >>>> #define task_css_set_check(task, __c) \ >>>> rcu_dereference_check((task)->cgroups, \ >>>> lockdep_is_held(&cgroup_mutex) || \ >>>> lockdep_is_held(&css_set_lock) || \ >>>> ((task)->flags & PF_EXITING) || (__c)) >>>> Since cgroup_mutex/css_set_lock is not held and the task >>>> is not existing and rcu read_lock is not held, a warning >>>> will be issued. Note that bpf sleepable program is protected by >>>> rcu_read_lock_trace(). >>>> >>>> To fix the issue, let us make these two helpers not available >>>> to sleepable program. I marked the patch fixing 95b861a7935b >>>> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>>> which added bpf_get_current_ancestor_cgroup_id() to >>>> 5.14. I think backporting 5.14 is probably good enough as sleepable >>>> progrems are not widely used. >>>> >>>> This patch should fix [1] as well since syscall program is a sleepable >>>> program and bpf_get_current_cgroup_id() is not available to >>>> syscall program any more. >>>> >>>> [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ >>>> >>>> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com >>>> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>>> Signed-off-by: Yonghong Song <yhs@fb.com> >>>> --- >>>> kernel/trace/bpf_trace.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>>> index b4916ef388ad..eaa8a8ffbe46 100644 >>>> --- a/kernel/trace/bpf_trace.c >>>> +++ b/kernel/trace/bpf_trace.c >>>> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) >>>> #endif >>>> #ifdef CONFIG_CGROUPS >>>> case BPF_FUNC_get_current_cgroup_id: >>>> - return &bpf_get_current_cgroup_id_proto; >>>> + return prog->aux->sleepable ? >>>> + NULL : &bpf_get_current_cgroup_id_proto; >>>> case BPF_FUNC_get_current_ancestor_cgroup_id: >>>> - return &bpf_get_current_ancestor_cgroup_id_proto; >>>> + return prog->aux->sleepable ? >>>> + NULL : &bpf_get_current_ancestor_cgroup_id_proto; >>> >>> This feels too extreme. I bet these helpers are as useful in sleepable >>> BPF progs as they are in non-sleepable ones. >>> >>> Why don't we just implement a variant of get_current_cgroup_id (and >>> the ancestor variant as well) which takes that cgroup_mutex lock, and >>> just pick the appropriate implementation. Wouldn't that work? >> >> This may not work. e.g., for sleepable fentry program, >> if the to-be-traced function is inside in cgroup_mutex, we will >> have a deadlock. > > We can also do preempty_disable() + rcu_read_lock() inside the helper > itself, no? I mean in the new "sleepable" variant. Yep, we do that for example in c5dbb89fc2ac ("bpf: Expose bpf_get_socket_cookie to tracing programs") as well (the sock_gen_cookie() disables preemption). Thanks, Daniel
On 8/9/21 1:29 PM, Daniel Borkmann wrote: > On 8/9/21 7:58 PM, Andrii Nakryiko wrote: >> On Mon, Aug 9, 2021 at 10:41 AM Yonghong Song <yhs@fb.com> wrote: >>> On 8/9/21 10:18 AM, Andrii Nakryiko wrote: >>>> On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: >>>>> >>>>> Currently, if bpf_get_current_cgroup_id() or >>>>> bpf_get_current_ancestor_cgroup_id() helper is >>>>> called with sleepable programs e.g., sleepable >>>>> fentry/fmod_ret/fexit/lsm programs, a rcu warning >>>>> may appear. For example, if I added the following >>>>> hack to test_progs/test_lsm sleepable fentry program >>>>> test_sys_setdomainname: >>>>> >>>>> --- a/tools/testing/selftests/bpf/progs/lsm.c >>>>> +++ b/tools/testing/selftests/bpf/progs/lsm.c >>>>> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, >>>>> struct pt_regs *regs) >>>>> int buf = 0; >>>>> long ret; >>>>> >>>>> + __u64 cg_id = bpf_get_current_cgroup_id(); >>>>> + if (cg_id == 1000) >>>>> + copy_test++; >>>>> + >>>>> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); >>>>> if (len == -2 && ret == 0 && buf == 1234) >>>>> copy_test++; >>>>> >>>>> I will hit the following rcu warning: >>>>> >>>>> include/linux/cgroup.h:481 suspicious rcu_dereference_check() >>>>> usage! >>>>> other info that might help us debug this: >>>>> rcu_scheduler_active = 2, debug_locks = 1 >>>>> 1 lock held by test_progs/260: >>>>> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: >>>>> __bpf_prog_enter_sleepable+0x0/0xa0 >>>>> stack backtrace: >>>>> CPU: 1 PID: 260 Comm: test_progs Tainted: G O >>>>> 5.14.0-rc2+ #176 >>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>>>> rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >>>>> Call Trace: >>>>> dump_stack_lvl+0x56/0x7b >>>>> bpf_get_current_cgroup_id+0x9c/0xb1 >>>>> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c >>>>> bpf_trampoline_6442469132_0+0x2d/0x1000 >>>>> __x64_sys_setdomainname+0x5/0x110 >>>>> do_syscall_64+0x3a/0x80 >>>>> entry_SYSCALL_64_after_hwframe+0x44/0xae >>>>> >>>>> I can get similar warning using >>>>> bpf_get_current_ancestor_cgroup_id() helper. >>>>> syzbot reported a similar issue in [1] for syscall program. Helper >>>>> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() >>>>> has the following callchain: >>>>> task_dfl_cgroup >>>>> task_css_set >>>>> task_css_set_check >>>>> and we have >>>>> #define task_css_set_check(task, >>>>> __c) \ >>>>> >>>>> rcu_dereference_check((task)->cgroups, \ >>>>> lockdep_is_held(&cgroup_mutex) >>>>> || \ >>>>> lockdep_is_held(&css_set_lock) >>>>> || \ >>>>> ((task)->flags & PF_EXITING) || (__c)) >>>>> Since cgroup_mutex/css_set_lock is not held and the task >>>>> is not existing and rcu read_lock is not held, a warning >>>>> will be issued. Note that bpf sleepable program is protected by >>>>> rcu_read_lock_trace(). >>>>> >>>>> To fix the issue, let us make these two helpers not available >>>>> to sleepable program. I marked the patch fixing 95b861a7935b >>>>> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>>>> which added bpf_get_current_ancestor_cgroup_id() to >>>>> 5.14. I think backporting 5.14 is probably good enough as sleepable >>>>> progrems are not widely used. >>>>> >>>>> This patch should fix [1] as well since syscall program is a sleepable >>>>> program and bpf_get_current_cgroup_id() is not available to >>>>> syscall program any more. >>>>> >>>>> [1] >>>>> https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ >>>>> >>>>> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com >>>>> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id >>>>> for tracing") >>>>> Signed-off-by: Yonghong Song <yhs@fb.com> >>>>> --- >>>>> kernel/trace/bpf_trace.c | 6 ++++-- >>>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>>>> index b4916ef388ad..eaa8a8ffbe46 100644 >>>>> --- a/kernel/trace/bpf_trace.c >>>>> +++ b/kernel/trace/bpf_trace.c >>>>> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id >>>>> func_id, const struct bpf_prog *prog) >>>>> #endif >>>>> #ifdef CONFIG_CGROUPS >>>>> case BPF_FUNC_get_current_cgroup_id: >>>>> - return &bpf_get_current_cgroup_id_proto; >>>>> + return prog->aux->sleepable ? >>>>> + NULL : &bpf_get_current_cgroup_id_proto; >>>>> case BPF_FUNC_get_current_ancestor_cgroup_id: >>>>> - return &bpf_get_current_ancestor_cgroup_id_proto; >>>>> + return prog->aux->sleepable ? >>>>> + NULL : >>>>> &bpf_get_current_ancestor_cgroup_id_proto; >>>> >>>> This feels too extreme. I bet these helpers are as useful in sleepable >>>> BPF progs as they are in non-sleepable ones. >>>> >>>> Why don't we just implement a variant of get_current_cgroup_id (and >>>> the ancestor variant as well) which takes that cgroup_mutex lock, and >>>> just pick the appropriate implementation. Wouldn't that work? >>> >>> This may not work. e.g., for sleepable fentry program, >>> if the to-be-traced function is inside in cgroup_mutex, we will >>> have a deadlock. >> >> We can also do preempty_disable() + rcu_read_lock() inside the helper >> itself, no? I mean in the new "sleepable" variant. > > Yep, we do that for example in c5dbb89fc2ac ("bpf: Expose > bpf_get_socket_cookie > to tracing programs") as well (the sock_gen_cookie() disables preemption). I think rcu_read_lock() is enough. For all the cases I mentioned in the above, we already have migrate_disable(). Together with rcu_read_lock(), we should be okay. For non-sleepable programs (tracing,lsm), the helpers are protected with migrate_disable() and rcu_read_lock(). > > Thanks, > Daniel
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b4916ef388ad..eaa8a8ffbe46 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) #endif #ifdef CONFIG_CGROUPS case BPF_FUNC_get_current_cgroup_id: - return &bpf_get_current_cgroup_id_proto; + return prog->aux->sleepable ? + NULL : &bpf_get_current_cgroup_id_proto; case BPF_FUNC_get_current_ancestor_cgroup_id: - return &bpf_get_current_ancestor_cgroup_id_proto; + return prog->aux->sleepable ? + NULL : &bpf_get_current_ancestor_cgroup_id_proto; #endif case BPF_FUNC_send_signal: return &bpf_send_signal_proto;
Currently, if bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() helper is called with sleepable programs e.g., sleepable fentry/fmod_ret/fexit/lsm programs, a rcu warning may appear. For example, if I added the following hack to test_progs/test_lsm sleepable fentry program test_sys_setdomainname: --- a/tools/testing/selftests/bpf/progs/lsm.c +++ b/tools/testing/selftests/bpf/progs/lsm.c @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) int buf = 0; long ret; + __u64 cg_id = bpf_get_current_cgroup_id(); + if (cg_id == 1000) + copy_test++; + ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); if (len == -2 && ret == 0 && buf == 1234) copy_test++; I will hit the following rcu warning: include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by test_progs/260: #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 stack backtrace: CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x56/0x7b bpf_get_current_cgroup_id+0x9c/0xb1 bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c bpf_trampoline_6442469132_0+0x2d/0x1000 __x64_sys_setdomainname+0x5/0x110 do_syscall_64+0x3a/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. syzbot reported a similar issue in [1] for syscall program. Helper bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() has the following callchain: task_dfl_cgroup task_css_set task_css_set_check and we have #define task_css_set_check(task, __c) \ rcu_dereference_check((task)->cgroups, \ lockdep_is_held(&cgroup_mutex) || \ lockdep_is_held(&css_set_lock) || \ ((task)->flags & PF_EXITING) || (__c)) Since cgroup_mutex/css_set_lock is not held and the task is not existing and rcu read_lock is not held, a warning will be issued. Note that bpf sleepable program is protected by rcu_read_lock_trace(). To fix the issue, let us make these two helpers not available to sleepable program. I marked the patch fixing 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") which added bpf_get_current_ancestor_cgroup_id() to 5.14. I think backporting 5.14 is probably good enough as sleepable progrems are not widely used. This patch should fix [1] as well since syscall program is a sleepable program and bpf_get_current_cgroup_id() is not available to syscall program any more. [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") Signed-off-by: Yonghong Song <yhs@fb.com> --- kernel/trace/bpf_trace.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)