diff mbox series

[bpf,v3,1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers

Message ID 20210809235146.1663522-1-yhs@fb.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: fix a couple of issues with syscall program | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: namhyung@kernel.org; 6 maintainers not CCed: netdev@vger.kernel.org john.fastabend@gmail.com namhyung@kernel.org songliubraving@fb.com kpsingh@kernel.org kafai@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 6 this patch: 6
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch fail ERROR: Avoid using diff content in the commit message - patch(1) might not work
netdev/build_allmodconfig_warn success Errors and warnings before: 6 this patch: 6
netdev/header_inline success Link

Commit Message

Yonghong Song Aug. 9, 2021, 11:51 p.m. UTC
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a7935b
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/

Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/helpers.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Andrii Nakryiko Aug. 10, 2021, 12:25 a.m. UTC | #1
On Mon, Aug 9, 2021 at 4:51 PM Yonghong Song <yhs@fb.com> wrote:
>
> Currently, if bpf_get_current_cgroup_id() or
> bpf_get_current_ancestor_cgroup_id() helper is
> called with sleepable programs e.g., sleepable
> fentry/fmod_ret/fexit/lsm programs, a rcu warning
> may appear. For example, if I added the following
> hack to test_progs/test_lsm sleepable fentry program
> test_sys_setdomainname:
>
>   --- a/tools/testing/selftests/bpf/progs/lsm.c
>   +++ b/tools/testing/selftests/bpf/progs/lsm.c
>   @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
>           int buf = 0;
>           long ret;
>
>   +       __u64 cg_id = bpf_get_current_cgroup_id();
>   +       if (cg_id == 1000)
>   +               copy_test++;
>   +
>           ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
>           if (len == -2 && ret == 0 && buf == 1234)
>                   copy_test++;
>
> I will hit the following rcu warning:
>
>   include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>   other info that might help us debug this:
>     rcu_scheduler_active = 2, debug_locks = 1
>     1 lock held by test_progs/260:
>       #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
>     stack backtrace:
>     CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>     Call Trace:
>       dump_stack_lvl+0x56/0x7b
>       bpf_get_current_cgroup_id+0x9c/0xb1
>       bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
>       bpf_trampoline_6442469132_0+0x2d/0x1000
>       __x64_sys_setdomainname+0x5/0x110
>       do_syscall_64+0x3a/0x80
>       entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
> syzbot reported a similar issue in [1] for syscall program. Helper
> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
> has the following callchain:
>    task_dfl_cgroup
>      task_css_set
>        task_css_set_check
> and we have
>    #define task_css_set_check(task, __c)                                   \
>            rcu_dereference_check((task)->cgroups,                          \
>                    lockdep_is_held(&cgroup_mutex) ||                       \
>                    lockdep_is_held(&css_set_lock) ||                       \
>                    ((task)->flags & PF_EXITING) || (__c))
> Since cgroup_mutex/css_set_lock is not held and the task
> is not existing and rcu read_lock is not held, a warning
> will be issued. Note that bpf sleepable program is protected by
> rcu_read_lock_trace().
>
> The above sleepable bpf programs are already protected
> by migrate_disable(). Adding rcu_read_lock() in these
> two helpers will silence the above warning.
> I marked the patch fixing 95b861a7935b
> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> which added bpf_get_current_ancestor_cgroup_id() to tracing programs
> in 5.14. I think backporting 5.14 is probably good enough as sleepable
> progrems are not widely used.
>
> This patch should fix [1] as well since syscall program is a sleepable
> program protected with migrate_disable().
>
>  [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
>
> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---

LGTM, thanks!

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  kernel/bpf/helpers.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62cf00383910..4567d2841133 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
>  #ifdef CONFIG_CGROUPS
>  BPF_CALL_0(bpf_get_current_cgroup_id)
>  {
> -       struct cgroup *cgrp = task_dfl_cgroup(current);
> +       struct cgroup *cgrp;
> +
> +       rcu_read_lock();
> +       cgrp = task_dfl_cgroup(current);
> +       rcu_read_unlock();
>
>         return cgroup_id(cgrp);
>  }
> @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>
>  BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
>  {
> -       struct cgroup *cgrp = task_dfl_cgroup(current);
> +       struct cgroup *cgrp;
>         struct cgroup *ancestor;
>
> +       rcu_read_lock();
> +       cgrp = task_dfl_cgroup(current);
> +       rcu_read_unlock();
> +
>         ancestor = cgroup_ancestor(cgrp, ancestor_level);
>         if (!ancestor)
>                 return 0;
> --
> 2.30.2
>
Daniel Borkmann Aug. 10, 2021, 8:08 a.m. UTC | #2
[ +Paul ]

On 8/10/21 1:51 AM, Yonghong Song wrote:
[...]
> I will hit the following rcu warning:
> 
>    include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>    other info that might help us debug this:
>      rcu_scheduler_active = 2, debug_locks = 1
>      1 lock held by test_progs/260:
>        #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
>      stack backtrace:
>      CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
>      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>      Call Trace:
>        dump_stack_lvl+0x56/0x7b
>        bpf_get_current_cgroup_id+0x9c/0xb1
>        bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
>        bpf_trampoline_6442469132_0+0x2d/0x1000
>        __x64_sys_setdomainname+0x5/0x110
>        do_syscall_64+0x3a/0x80
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
> syzbot reported a similar issue in [1] for syscall program. Helper
> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
> has the following callchain:
>     task_dfl_cgroup
>       task_css_set
>         task_css_set_check
> and we have
>     #define task_css_set_check(task, __c)                                   \
>             rcu_dereference_check((task)->cgroups,                          \
>                     lockdep_is_held(&cgroup_mutex) ||                       \
>                     lockdep_is_held(&css_set_lock) ||                       \
>                     ((task)->flags & PF_EXITING) || (__c))
> Since cgroup_mutex/css_set_lock is not held and the task
> is not existing and rcu read_lock is not held, a warning
> will be issued. Note that bpf sleepable program is protected by
> rcu_read_lock_trace().
> 
> The above sleepable bpf programs are already protected
> by migrate_disable(). Adding rcu_read_lock() in these
> two helpers will silence the above warning.
> I marked the patch fixing 95b861a7935b
> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> which added bpf_get_current_ancestor_cgroup_id() to tracing programs
> in 5.14. I think backporting 5.14 is probably good enough as sleepable
> progrems are not widely used.
> 
> This patch should fix [1] as well since syscall program is a sleepable
> program protected with migrate_disable().
> 
>   [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
> 
> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>   kernel/bpf/helpers.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62cf00383910..4567d2841133 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> -	struct cgroup *cgrp = task_dfl_cgroup(current);
> +	struct cgroup *cgrp;
> +
> +	rcu_read_lock();
> +	cgrp = task_dfl_cgroup(current);
> +	rcu_read_unlock();
>   
>   	return cgroup_id(cgrp);

I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace()
context, then the above is racy given you access the memory via cgroup_id() outside of it,
same below. If rcu_read_lock_trace() is enough and the above is really just to silence the
'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from
task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as
a cleaner workaround], which one is it?

>   }
> @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>   
>   BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
>   {
> -	struct cgroup *cgrp = task_dfl_cgroup(current);
> +	struct cgroup *cgrp;
>   	struct cgroup *ancestor;
>   
> +	rcu_read_lock();
> +	cgrp = task_dfl_cgroup(current);
> +	rcu_read_unlock();
> +
>   	ancestor = cgroup_ancestor(cgrp, ancestor_level);
>   	if (!ancestor)
>   		return 0;
> 

Thanks,
Daniel
Paul E. McKenney Aug. 10, 2021, 4:32 p.m. UTC | #3
On Tue, Aug 10, 2021 at 10:08:58AM +0200, Daniel Borkmann wrote:
> [ +Paul ]
> 
> On 8/10/21 1:51 AM, Yonghong Song wrote:
> [...]
> > I will hit the following rcu warning:
> > 
> >    include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> >    other info that might help us debug this:
> >      rcu_scheduler_active = 2, debug_locks = 1
> >      1 lock held by test_progs/260:
> >        #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
> >      stack backtrace:
> >      CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
> >      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> >      Call Trace:
> >        dump_stack_lvl+0x56/0x7b
> >        bpf_get_current_cgroup_id+0x9c/0xb1
> >        bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
> >        bpf_trampoline_6442469132_0+0x2d/0x1000
> >        __x64_sys_setdomainname+0x5/0x110
> >        do_syscall_64+0x3a/0x80
> >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
> > syzbot reported a similar issue in [1] for syscall program. Helper
> > bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
> > has the following callchain:
> >     task_dfl_cgroup
> >       task_css_set
> >         task_css_set_check
> > and we have
> >     #define task_css_set_check(task, __c)                                   \
> >             rcu_dereference_check((task)->cgroups,                          \
> >                     lockdep_is_held(&cgroup_mutex) ||                       \
> >                     lockdep_is_held(&css_set_lock) ||                       \
> >                     ((task)->flags & PF_EXITING) || (__c))
> > Since cgroup_mutex/css_set_lock is not held and the task
> > is not existing and rcu read_lock is not held, a warning
> > will be issued. Note that bpf sleepable program is protected by
> > rcu_read_lock_trace().
> > 
> > The above sleepable bpf programs are already protected
> > by migrate_disable(). Adding rcu_read_lock() in these
> > two helpers will silence the above warning.
> > I marked the patch fixing 95b861a7935b
> > ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> > which added bpf_get_current_ancestor_cgroup_id() to tracing programs
> > in 5.14. I think backporting 5.14 is probably good enough as sleepable
> > progrems are not widely used.
> > 
> > This patch should fix [1] as well since syscall program is a sleepable
> > program protected with migrate_disable().
> > 
> >   [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
> > 
> > Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
> > Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >   kernel/bpf/helpers.c | 12 ++++++++++--
> >   1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 62cf00383910..4567d2841133 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > -	struct cgroup *cgrp = task_dfl_cgroup(current);
> > +	struct cgroup *cgrp;
> > +
> > +	rcu_read_lock();
> > +	cgrp = task_dfl_cgroup(current);
> > +	rcu_read_unlock();
> >   	return cgroup_id(cgrp);
> 
> I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace()
> context, then the above is racy given you access the memory via cgroup_id() outside of it,
> same below. If rcu_read_lock_trace() is enough and the above is really just to silence the
> 'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from
> task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as
> a cleaner workaround], which one is it?

At first glance, something like this would work from an RCU perspective:

BPF_CALL_0(bpf_get_current_cgroup_id)
{
	struct cgroup *cgrp = task_dfl_cgroup(current);
	struct cgroup *cgrp;
	u64 ret;

	rcu_read_lock();
	cgrp = task_dfl_cgroup(current);
 	ret = cgroup_id(cgrp);
	rcu_read_unlock();
 	return ret;
}

If I am reading the code correctly, rcu_read_lock() is required to keep
the cgroup from going away, and the rcu_read_lock_trace() is needed in
addition in order to keep the BPF program from going away.

There might well be better ways to do this, but the above obeys the
letter of the law.  Or at least the law as I understand it.  ;-)

Or does the fact that a cgroup contains a given task prevent that cgroup
from going away?  (I -think- that tasks can be removed from cgroups
without the cooperation of those tasks, but then again there is much
about cgroups that I do not know.)

							Thanx, Paul

> >   }
> > @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
> >   BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
> >   {
> > -	struct cgroup *cgrp = task_dfl_cgroup(current);
> > +	struct cgroup *cgrp;
> >   	struct cgroup *ancestor;
> > +	rcu_read_lock();
> > +	cgrp = task_dfl_cgroup(current);
> > +	rcu_read_unlock();
> > +
> >   	ancestor = cgroup_ancestor(cgrp, ancestor_level);
> >   	if (!ancestor)
> >   		return 0;
> > 
> 
> Thanks,
> Daniel
Yonghong Song Aug. 10, 2021, 4:36 p.m. UTC | #4
Add Tejun to provide more clarification on cgroup side.

On 8/10/21 9:32 AM, Paul E. McKenney wrote:
> On Tue, Aug 10, 2021 at 10:08:58AM +0200, Daniel Borkmann wrote:
>> [ +Paul ]
>>
>> On 8/10/21 1:51 AM, Yonghong Song wrote:
>> [...]
>>> I will hit the following rcu warning:
>>>
>>>     include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>>>     other info that might help us debug this:
>>>       rcu_scheduler_active = 2, debug_locks = 1
>>>       1 lock held by test_progs/260:
>>>         #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
>>>       stack backtrace:
>>>       CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
>>>       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>>       Call Trace:
>>>         dump_stack_lvl+0x56/0x7b
>>>         bpf_get_current_cgroup_id+0x9c/0xb1
>>>         bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
>>>         bpf_trampoline_6442469132_0+0x2d/0x1000
>>>         __x64_sys_setdomainname+0x5/0x110
>>>         do_syscall_64+0x3a/0x80
>>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
>>> syzbot reported a similar issue in [1] for syscall program. Helper
>>> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
>>> has the following callchain:
>>>      task_dfl_cgroup
>>>        task_css_set
>>>          task_css_set_check
>>> and we have
>>>      #define task_css_set_check(task, __c)                                   \
>>>              rcu_dereference_check((task)->cgroups,                          \
>>>                      lockdep_is_held(&cgroup_mutex) ||                       \
>>>                      lockdep_is_held(&css_set_lock) ||                       \
>>>                      ((task)->flags & PF_EXITING) || (__c))
>>> Since cgroup_mutex/css_set_lock is not held and the task
>>> is not existing and rcu read_lock is not held, a warning
>>> will be issued. Note that bpf sleepable program is protected by
>>> rcu_read_lock_trace().
>>>
>>> The above sleepable bpf programs are already protected
>>> by migrate_disable(). Adding rcu_read_lock() in these
>>> two helpers will silence the above warning.
>>> I marked the patch fixing 95b861a7935b
>>> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
>>> which added bpf_get_current_ancestor_cgroup_id() to tracing programs
>>> in 5.14. I think backporting 5.14 is probably good enough as sleepable
>>> progrems are not widely used.
>>>
>>> This patch should fix [1] as well since syscall program is a sleepable
>>> program protected with migrate_disable().
>>>
>>>    [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
>>>
>>> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
>>> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>    kernel/bpf/helpers.c | 12 ++++++++++--
>>>    1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index 62cf00383910..4567d2841133 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
>>>    #ifdef CONFIG_CGROUPS
>>>    BPF_CALL_0(bpf_get_current_cgroup_id)
>>>    {
>>> -	struct cgroup *cgrp = task_dfl_cgroup(current);
>>> +	struct cgroup *cgrp;
>>> +
>>> +	rcu_read_lock();
>>> +	cgrp = task_dfl_cgroup(current);
>>> +	rcu_read_unlock();
>>>    	return cgroup_id(cgrp);
>>
>> I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace()
>> context, then the above is racy given you access the memory via cgroup_id() outside of it,
>> same below. If rcu_read_lock_trace() is enough and the above is really just to silence the
>> 'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from
>> task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as
>> a cleaner workaround], which one is it?

Thanks for pointing out. Definitely a bug on my side. rcu_read_lock() 
region should include cgroup_id(cgrp) as well.

> 
> At first glance, something like this would work from an RCU perspective:
> 
> BPF_CALL_0(bpf_get_current_cgroup_id)
> {
> 	struct cgroup *cgrp = task_dfl_cgroup(current);
> 	struct cgroup *cgrp;
> 	u64 ret;
> 
> 	rcu_read_lock();
> 	cgrp = task_dfl_cgroup(current);
>   	ret = cgroup_id(cgrp);
> 	rcu_read_unlock();
>   	return ret;
> }
> 
> If I am reading the code correctly, rcu_read_lock() is required to keep
> the cgroup from going away, and the rcu_read_lock_trace() is needed in
> addition in order to keep the BPF program from going away.
> 
> There might well be better ways to do this, but the above obeys the
> letter of the law.  Or at least the law as I understand it.  ;-)
> 
> Or does the fact that a cgroup contains a given task prevent that cgroup
> from going away?  (I -think- that tasks can be removed from cgroups
> without the cooperation of those tasks, but then again there is much
> about cgroups that I do not know.)
> 
> 							Thanx, Paul
> 
>>>    }
>>> @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>>>    BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
>>>    {
>>> -	struct cgroup *cgrp = task_dfl_cgroup(current);
>>> +	struct cgroup *cgrp;
>>>    	struct cgroup *ancestor;
>>> +	rcu_read_lock();
>>> +	cgrp = task_dfl_cgroup(current);
>>> +	rcu_read_unlock();
>>> +
>>>    	ancestor = cgroup_ancestor(cgrp, ancestor_level);
>>>    	if (!ancestor)
>>>    		return 0;
>>>
>>
>> Thanks,
>> Daniel
diff mbox series

Patch

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 62cf00383910..4567d2841133 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -353,7 +353,11 @@  const struct bpf_func_proto bpf_jiffies64_proto = {
 #ifdef CONFIG_CGROUPS
 BPF_CALL_0(bpf_get_current_cgroup_id)
 {
-	struct cgroup *cgrp = task_dfl_cgroup(current);
+	struct cgroup *cgrp;
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	rcu_read_unlock();
 
 	return cgroup_id(cgrp);
 }
@@ -366,9 +370,13 @@  const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
 
 BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
 {
-	struct cgroup *cgrp = task_dfl_cgroup(current);
+	struct cgroup *cgrp;
 	struct cgroup *ancestor;
 
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	rcu_read_unlock();
+
 	ancestor = cgroup_ancestor(cgrp, ancestor_level);
 	if (!ancestor)
 		return 0;