diff mbox series

bpf: Fix KASAN use-after-free Read in compute_effective_progs

Message ID 20220405170356.43128-1-tadeusz.struk@linaro.org (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: Fix KASAN use-after-free Read in compute_effective_progs | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 14 this patch: 14
netdev/cc_maintainers fail 1 blamed authors not CCed: roman.gushchin@linux.dev; 1 maintainers not CCed: roman.gushchin@linux.dev
netdev/build_clang success Errors and warnings before: 21 this patch: 21
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 14 this patch: 14
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 50 lines checked
netdev/kdoc success Errors and warnings before: 12 this patch: 12
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 fail Logs for Kernel LATEST on ubuntu-latest + selftests
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Kernel LATEST on z15 + selftests
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Tadeusz Struk April 5, 2022, 5:03 p.m. UTC
Syzbot found a Use After Free bug in compute_effective_progs().
The reproducer creates a number of BPF links, and causes a fault
injected alloc to fail, while calling bpf_link_detach on them.
Link detach triggers the link to be freed by bpf_link_free(),
which calls __cgroup_bpf_detach() and update_effective_progs().
If the memory allocation in this function fails, the function restores
the pointer to the bpf_cgroup_link on the cgroup list, but the memory
gets freed just after it returns. After this, every subsequent call to
update_effective_progs() causes this already deallocated pointer to be
dereferenced in prog_list_length(), and triggers KASAN UAF error.
To fix this don't preserve the pointer to the link on the cgroup list
in __cgroup_bpf_detach(), but proceed with the cleanup and retry calling
update_effective_progs() again afterwards.


Cc: "Alexei Starovoitov" <ast@kernel.org>
Cc: "Daniel Borkmann" <daniel@iogearbox.net>
Cc: "Andrii Nakryiko" <andrii@kernel.org>
Cc: "Martin KaFai Lau" <kafai@fb.com>
Cc: "Song Liu" <songliubraving@fb.com>
Cc: "Yonghong Song" <yhs@fb.com>
Cc: "John Fastabend" <john.fastabend@gmail.com>
Cc: "KP Singh" <kpsingh@kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>

Link: https://syzkaller.appspot.com/bug?id=8ebf179a95c2a2670f7cf1ba62429ec044369db4
Fixes: af6eea57437a ("bpf: Implement bpf_link-based cgroup BPF program attachment")
Reported-by: <syzbot+f264bffdfbd5614f3bb2@syzkaller.appspotmail.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
---
 kernel/bpf/cgroup.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

Comments

Tadeusz Struk April 12, 2022, 2:44 p.m. UTC | #1
On 4/5/22 10:03, Tadeusz Struk wrote:
> Syzbot found a Use After Free bug in compute_effective_progs().
> The reproducer creates a number of BPF links, and causes a fault
> injected alloc to fail, while calling bpf_link_detach on them.
> Link detach triggers the link to be freed by bpf_link_free(),
> which calls __cgroup_bpf_detach() and update_effective_progs().
> If the memory allocation in this function fails, the function restores
> the pointer to the bpf_cgroup_link on the cgroup list, but the memory
> gets freed just after it returns. After this, every subsequent call to
> update_effective_progs() causes this already deallocated pointer to be
> dereferenced in prog_list_length(), and triggers KASAN UAF error.
> To fix this don't preserve the pointer to the link on the cgroup list
> in __cgroup_bpf_detach(), but proceed with the cleanup and retry calling
> update_effective_progs() again afterwards.
> 
> 
> Cc: "Alexei Starovoitov" <ast@kernel.org>
> Cc: "Daniel Borkmann" <daniel@iogearbox.net>
> Cc: "Andrii Nakryiko" <andrii@kernel.org>
> Cc: "Martin KaFai Lau" <kafai@fb.com>
> Cc: "Song Liu" <songliubraving@fb.com>
> Cc: "Yonghong Song" <yhs@fb.com>
> Cc: "John Fastabend" <john.fastabend@gmail.com>
> Cc: "KP Singh" <kpsingh@kernel.org>
> Cc: <netdev@vger.kernel.org>
> Cc: <bpf@vger.kernel.org>
> Cc: <stable@vger.kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
> 
> Link: https://syzkaller.appspot.com/bug?id=8ebf179a95c2a2670f7cf1ba62429ec044369db4
> Fixes: af6eea57437a ("bpf: Implement bpf_link-based cgroup BPF program attachment")
> Reported-by: <syzbot+f264bffdfbd5614f3bb2@syzkaller.appspotmail.com>
> Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
> ---
>   kernel/bpf/cgroup.c | 25 ++++++++++++++-----------
>   1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 128028efda64..b6307337a3c7 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -723,10 +723,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>   	pl->link = NULL;
>   
>   	err = update_effective_progs(cgrp, atype);
> -	if (err)
> -		goto cleanup;
> -
> -	/* now can actually delete it from this cgroup list */
> +	/*
> +	 * Proceed regardless of error. The link and/or prog will be freed
> +	 * just after this function returns so just delete it from this
> +	 * cgroup list and retry calling update_effective_progs again later.
> +	 */
>   	list_del(&pl->node);
>   	kfree(pl);
>   	if (list_empty(progs))
> @@ -735,12 +736,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>   	if (old_prog)
>   		bpf_prog_put(old_prog);
>   	static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> -	return 0;
>   
> -cleanup:
> -	/* restore back prog or link */
> -	pl->prog = old_prog;
> -	pl->link = link;
> +	/* In case of error call update_effective_progs again */
> +	if (err)
> +		err = update_effective_progs(cgrp, atype);
> +
>   	return err;
>   }
>   
> @@ -881,6 +881,7 @@ static void bpf_cgroup_link_release(struct bpf_link *link)
>   	struct bpf_cgroup_link *cg_link =
>   		container_of(link, struct bpf_cgroup_link, link);
>   	struct cgroup *cg;
> +	int err;
>   
>   	/* link might have been auto-detached by dying cgroup already,
>   	 * in that case our work is done here
> @@ -896,8 +897,10 @@ static void bpf_cgroup_link_release(struct bpf_link *link)
>   		return;
>   	}
>   
> -	WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
> -				    cg_link->type));
> +	err = __cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
> +				  cg_link->type);
> +	if (err)
> +		pr_warn("cgroup_bpf_detach() failed, err %d\n", err);
>   
>   	cg = cg_link->cgroup;
>   	cg_link->cgroup = NULL;

Hi,
Any feedback/comments on this one?
Andrii Nakryiko April 13, 2022, 4:34 a.m. UTC | #2
On Tue, Apr 5, 2022 at 10:04 AM Tadeusz Struk <tadeusz.struk@linaro.org> wrote:
>
> Syzbot found a Use After Free bug in compute_effective_progs().
> The reproducer creates a number of BPF links, and causes a fault
> injected alloc to fail, while calling bpf_link_detach on them.
> Link detach triggers the link to be freed by bpf_link_free(),
> which calls __cgroup_bpf_detach() and update_effective_progs().
> If the memory allocation in this function fails, the function restores
> the pointer to the bpf_cgroup_link on the cgroup list, but the memory
> gets freed just after it returns. After this, every subsequent call to
> update_effective_progs() causes this already deallocated pointer to be
> dereferenced in prog_list_length(), and triggers KASAN UAF error.
> To fix this don't preserve the pointer to the link on the cgroup list
> in __cgroup_bpf_detach(), but proceed with the cleanup and retry calling
> update_effective_progs() again afterwards.

I think it's still problematic. BPF link might have been the last one
that holds bpf_prog's refcnt, so when link is put, its prog can stay
there in effective_progs array(s) and will cause use-after-free
anyways.

It would be best to make sure that detach never fails. On detach
effective prog array can only shrink, so even if
update_effective_progs() fails to allocate memory, we should be able
to iterate and just replace that prog with NULL, as a fallback
strategy.

>
>
> Cc: "Alexei Starovoitov" <ast@kernel.org>
> Cc: "Daniel Borkmann" <daniel@iogearbox.net>
> Cc: "Andrii Nakryiko" <andrii@kernel.org>
> Cc: "Martin KaFai Lau" <kafai@fb.com>
> Cc: "Song Liu" <songliubraving@fb.com>
> Cc: "Yonghong Song" <yhs@fb.com>
> Cc: "John Fastabend" <john.fastabend@gmail.com>
> Cc: "KP Singh" <kpsingh@kernel.org>
> Cc: <netdev@vger.kernel.org>
> Cc: <bpf@vger.kernel.org>
> Cc: <stable@vger.kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
>
> Link: https://syzkaller.appspot.com/bug?id=8ebf179a95c2a2670f7cf1ba62429ec044369db4
> Fixes: af6eea57437a ("bpf: Implement bpf_link-based cgroup BPF program attachment")
> Reported-by: <syzbot+f264bffdfbd5614f3bb2@syzkaller.appspotmail.com>
> Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
> ---
>  kernel/bpf/cgroup.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 128028efda64..b6307337a3c7 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -723,10 +723,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>         pl->link = NULL;
>
>         err = update_effective_progs(cgrp, atype);
> -       if (err)
> -               goto cleanup;
> -
> -       /* now can actually delete it from this cgroup list */
> +       /*
> +        * Proceed regardless of error. The link and/or prog will be freed
> +        * just after this function returns so just delete it from this
> +        * cgroup list and retry calling update_effective_progs again later.
> +        */
>         list_del(&pl->node);
>         kfree(pl);
>         if (list_empty(progs))
> @@ -735,12 +736,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>         if (old_prog)
>                 bpf_prog_put(old_prog);
>         static_branch_dec(&cgroup_bpf_enabled_key[atype]);
> -       return 0;
>
> -cleanup:
> -       /* restore back prog or link */
> -       pl->prog = old_prog;
> -       pl->link = link;
> +       /* In case of error call update_effective_progs again */
> +       if (err)
> +               err = update_effective_progs(cgrp, atype);

there is no guarantee that this will now succeed, right? so it's more
like "let's try just in case we are lucky this time"?

> +
>         return err;
>  }
>
> @@ -881,6 +881,7 @@ static void bpf_cgroup_link_release(struct bpf_link *link)
>         struct bpf_cgroup_link *cg_link =
>                 container_of(link, struct bpf_cgroup_link, link);
>         struct cgroup *cg;
> +       int err;
>
>         /* link might have been auto-detached by dying cgroup already,
>          * in that case our work is done here
> @@ -896,8 +897,10 @@ static void bpf_cgroup_link_release(struct bpf_link *link)
>                 return;
>         }
>
> -       WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
> -                                   cg_link->type));
> +       err = __cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
> +                                 cg_link->type);
> +       if (err)
> +               pr_warn("cgroup_bpf_detach() failed, err %d\n", err);
>
>         cg = cg_link->cgroup;
>         cg_link->cgroup = NULL;
> --
> 2.35.1
Tadeusz Struk April 13, 2022, 5:28 p.m. UTC | #3
Hi Andrii,
On 4/12/22 21:34, Andrii Nakryiko wrote:
> On Tue, Apr 5, 2022 at 10:04 AM Tadeusz Struk<tadeusz.struk@linaro.org>  wrote:
>> Syzbot found a Use After Free bug in compute_effective_progs().
>> The reproducer creates a number of BPF links, and causes a fault
>> injected alloc to fail, while calling bpf_link_detach on them.
>> Link detach triggers the link to be freed by bpf_link_free(),
>> which calls __cgroup_bpf_detach() and update_effective_progs().
>> If the memory allocation in this function fails, the function restores
>> the pointer to the bpf_cgroup_link on the cgroup list, but the memory
>> gets freed just after it returns. After this, every subsequent call to
>> update_effective_progs() causes this already deallocated pointer to be
>> dereferenced in prog_list_length(), and triggers KASAN UAF error.
>> To fix this don't preserve the pointer to the link on the cgroup list
>> in __cgroup_bpf_detach(), but proceed with the cleanup and retry calling
>> update_effective_progs() again afterwards.
> I think it's still problematic. BPF link might have been the last one
> that holds bpf_prog's refcnt, so when link is put, its prog can stay
> there in effective_progs array(s) and will cause use-after-free
> anyways.
> 
> It would be best to make sure that detach never fails. On detach
> effective prog array can only shrink, so even if
> update_effective_progs() fails to allocate memory, we should be able
> to iterate and just replace that prog with NULL, as a fallback
> strategy.

it would be ideal if detach would never fail, but it would require some kind of 
prealloc, on attach maybe? Another option would be to minimize the probability
of failing by sending it gfp_t flags (GFP_NOIO | GFP_NOFS | __GFP_NOFAIL)?
Detach can really only fail if the kzalloc in compute_effective_progs() fails
so maybe doing something like bellow would help:

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 128028efda64..5a47740c317b 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -226,7 +226,8 @@ static bool hierarchy_allows_attach(struct cgroup *cgrp,
   */
  static int compute_effective_progs(struct cgroup *cgrp,
  				   enum cgroup_bpf_attach_type atype,
-				   struct bpf_prog_array **array)
+				   struct bpf_prog_array **array,
+				   gfp_t flags)
  {
  	struct bpf_prog_array_item *item;
  	struct bpf_prog_array *progs;
@@ -241,7 +242,7 @@ static int compute_effective_progs(struct cgroup *cgrp,
  		p = cgroup_parent(p);
  	} while (p);

-	progs = bpf_prog_array_alloc(cnt, GFP_KERNEL);
+	progs = bpf_prog_array_alloc(cnt, flags);
  	if (!progs)
  		return -ENOMEM;

@@ -308,7 +309,7 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
  	INIT_LIST_HEAD(&cgrp->bpf.storages);

  	for (i = 0; i < NR; i++)
-		if (compute_effective_progs(cgrp, i, &arrays[i]))
+		if (compute_effective_progs(cgrp, i, &arrays[i], GFP_KERNEL))
  			goto cleanup;

  	for (i = 0; i < NR; i++)
@@ -328,7 +329,8 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
  }

  static int update_effective_progs(struct cgroup *cgrp,
-				  enum cgroup_bpf_attach_type atype)
+				  enum cgroup_bpf_attach_type atype,
+				  gfp_t flags)
  {
  	struct cgroup_subsys_state *css;
  	int err;
@@ -340,7 +342,8 @@ static int update_effective_progs(struct cgroup *cgrp,
  		if (percpu_ref_is_zero(&desc->bpf.refcnt))
  			continue;

-		err = compute_effective_progs(desc, atype, &desc->bpf.inactive);
+		err = compute_effective_progs(desc, atype, &desc->bpf.inactive,
+					      flags);
  		if (err)
  			goto cleanup;
  	}
@@ -499,7 +502,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
  	bpf_cgroup_storages_assign(pl->storage, storage);
  	cgrp->bpf.flags[atype] = saved_flags;

-	err = update_effective_progs(cgrp, atype);
+	err = update_effective_progs(cgrp, atype, GFP_KERNEL);
  	if (err)
  		goto cleanup;

@@ -722,7 +725,7 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct 
bpf_prog *prog,
  	pl->prog = NULL;
  	pl->link = NULL;

-	err = update_effective_progs(cgrp, atype);
+	err = update_effective_progs(cgrp, atype, GFP_NOIO | GFP_NOFS | __GFP_NOFAIL);
  	if (err)
  		goto cleanup;

>> -cleanup:
>> -       /* restore back prog or link */
>> -       pl->prog = old_prog;
>> -       pl->link = link;
>> +       /* In case of error call update_effective_progs again */
>> +       if (err)
>> +               err = update_effective_progs(cgrp, atype);
> there is no guarantee that this will now succeed, right? so it's more
> like "let's try just in case we are lucky this time"?

Yes, there is no guarantee, but my thinking was that since we have freed some
memory (see kfree(pl) above) let's retry and see if it works this time.
If that is combined with the above gfp flags it is a good chance that it will
work. What do you think?
Andrii Nakryiko April 13, 2022, 7:07 p.m. UTC | #4
On Wed, Apr 13, 2022 at 10:28 AM Tadeusz Struk <tadeusz.struk@linaro.org> wrote:
>
> Hi Andrii,
> On 4/12/22 21:34, Andrii Nakryiko wrote:
> > On Tue, Apr 5, 2022 at 10:04 AM Tadeusz Struk<tadeusz.struk@linaro.org>  wrote:
> >> Syzbot found a Use After Free bug in compute_effective_progs().
> >> The reproducer creates a number of BPF links, and causes a fault
> >> injected alloc to fail, while calling bpf_link_detach on them.
> >> Link detach triggers the link to be freed by bpf_link_free(),
> >> which calls __cgroup_bpf_detach() and update_effective_progs().
> >> If the memory allocation in this function fails, the function restores
> >> the pointer to the bpf_cgroup_link on the cgroup list, but the memory
> >> gets freed just after it returns. After this, every subsequent call to
> >> update_effective_progs() causes this already deallocated pointer to be
> >> dereferenced in prog_list_length(), and triggers KASAN UAF error.
> >> To fix this don't preserve the pointer to the link on the cgroup list
> >> in __cgroup_bpf_detach(), but proceed with the cleanup and retry calling
> >> update_effective_progs() again afterwards.
> > I think it's still problematic. BPF link might have been the last one
> > that holds bpf_prog's refcnt, so when link is put, its prog can stay
> > there in effective_progs array(s) and will cause use-after-free
> > anyways.
> >
> > It would be best to make sure that detach never fails. On detach
> > effective prog array can only shrink, so even if
> > update_effective_progs() fails to allocate memory, we should be able
> > to iterate and just replace that prog with NULL, as a fallback
> > strategy.
>
> it would be ideal if detach would never fail, but it would require some kind of
> prealloc, on attach maybe? Another option would be to minimize the probability

We allocate new arrays in update_effective_progs() under assumption
that we might need to grow the array because we use
update_effective_progs() for attachment. But for detachment we know
that we definitely don't need to increase the size, we need to remove
existing element only, thus shrinking the size.

Normally we'd reallocate the array to shrink it (and that's why we use
update_effective_progs() and allocate memory), but we can also have a
fallback path for detachment only to reuse existing effective arrays
and just shift all the elements to the right from the element that's
being removed. We'll leave NULL at the end, but that's much better
than error out. Subsequent attachment or detachment will attempt to
properly size and reallocate everything.

So I think that should be the fix, if you'd be willing to work on it.


> of failing by sending it gfp_t flags (GFP_NOIO | GFP_NOFS | __GFP_NOFAIL)?
> Detach can really only fail if the kzalloc in compute_effective_progs() fails
> so maybe doing something like bellow would help:
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 128028efda64..5a47740c317b 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -226,7 +226,8 @@ static bool hierarchy_allows_attach(struct cgroup *cgrp,
>    */
>   static int compute_effective_progs(struct cgroup *cgrp,
>                                    enum cgroup_bpf_attach_type atype,
> -                                  struct bpf_prog_array **array)
> +                                  struct bpf_prog_array **array,
> +                                  gfp_t flags)
>   {
>         struct bpf_prog_array_item *item;
>         struct bpf_prog_array *progs;
> @@ -241,7 +242,7 @@ static int compute_effective_progs(struct cgroup *cgrp,
>                 p = cgroup_parent(p);
>         } while (p);
>
> -       progs = bpf_prog_array_alloc(cnt, GFP_KERNEL);
> +       progs = bpf_prog_array_alloc(cnt, flags);
>         if (!progs)
>                 return -ENOMEM;
>
> @@ -308,7 +309,7 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
>         INIT_LIST_HEAD(&cgrp->bpf.storages);
>
>         for (i = 0; i < NR; i++)
> -               if (compute_effective_progs(cgrp, i, &arrays[i]))
> +               if (compute_effective_progs(cgrp, i, &arrays[i], GFP_KERNEL))
>                         goto cleanup;
>
>         for (i = 0; i < NR; i++)
> @@ -328,7 +329,8 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
>   }
>
>   static int update_effective_progs(struct cgroup *cgrp,
> -                                 enum cgroup_bpf_attach_type atype)
> +                                 enum cgroup_bpf_attach_type atype,
> +                                 gfp_t flags)
>   {
>         struct cgroup_subsys_state *css;
>         int err;
> @@ -340,7 +342,8 @@ static int update_effective_progs(struct cgroup *cgrp,
>                 if (percpu_ref_is_zero(&desc->bpf.refcnt))
>                         continue;
>
> -               err = compute_effective_progs(desc, atype, &desc->bpf.inactive);
> +               err = compute_effective_progs(desc, atype, &desc->bpf.inactive,
> +                                             flags);
>                 if (err)
>                         goto cleanup;
>         }
> @@ -499,7 +502,7 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>         bpf_cgroup_storages_assign(pl->storage, storage);
>         cgrp->bpf.flags[atype] = saved_flags;
>
> -       err = update_effective_progs(cgrp, atype);
> +       err = update_effective_progs(cgrp, atype, GFP_KERNEL);
>         if (err)
>                 goto cleanup;
>
> @@ -722,7 +725,7 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct
> bpf_prog *prog,
>         pl->prog = NULL;
>         pl->link = NULL;
>
> -       err = update_effective_progs(cgrp, atype);
> +       err = update_effective_progs(cgrp, atype, GFP_NOIO | GFP_NOFS | __GFP_NOFAIL);
>         if (err)
>                 goto cleanup;
>
> >> -cleanup:
> >> -       /* restore back prog or link */
> >> -       pl->prog = old_prog;
> >> -       pl->link = link;
> >> +       /* In case of error call update_effective_progs again */
> >> +       if (err)
> >> +               err = update_effective_progs(cgrp, atype);
> > there is no guarantee that this will now succeed, right? so it's more
> > like "let's try just in case we are lucky this time"?
>
> Yes, there is no guarantee, but my thinking was that since we have freed some
> memory (see kfree(pl) above) let's retry and see if it works this time.
> If that is combined with the above gfp flags it is a good chance that it will
> work. What do you think?
>

See above, instead of compensating for the update_effective_progs()
failure, we should make sure it never fails for detachment.


> --
> Thanks,
> Tadeusz
Tadeusz Struk April 13, 2022, 7:27 p.m. UTC | #5
On 4/13/22 12:07, Andrii Nakryiko wrote:
>> it would be ideal if detach would never fail, but it would require some kind of
>> prealloc, on attach maybe? Another option would be to minimize the probability
> We allocate new arrays in update_effective_progs() under assumption
> that we might need to grow the array because we use
> update_effective_progs() for attachment. But for detachment we know
> that we definitely don't need to increase the size, we need to remove
> existing element only, thus shrinking the size.
> 
> Normally we'd reallocate the array to shrink it (and that's why we use
> update_effective_progs() and allocate memory), but we can also have a
> fallback path for detachment only to reuse existing effective arrays
> and just shift all the elements to the right from the element that's
> being removed. We'll leave NULL at the end, but that's much better
> than error out. Subsequent attachment or detachment will attempt to
> properly size and reallocate everything.
> 
> So I think that should be the fix, if you'd be willing to work on it.

That makes it much easier then. I will change it so that there is no
alloc needed on the detach path. Thanks for the clarification.
Andrii Nakryiko April 13, 2022, 7:49 p.m. UTC | #6
On Wed, Apr 13, 2022 at 12:27 PM Tadeusz Struk <tadeusz.struk@linaro.org> wrote:
>
> On 4/13/22 12:07, Andrii Nakryiko wrote:
> >> it would be ideal if detach would never fail, but it would require some kind of
> >> prealloc, on attach maybe? Another option would be to minimize the probability
> > We allocate new arrays in update_effective_progs() under assumption
> > that we might need to grow the array because we use
> > update_effective_progs() for attachment. But for detachment we know
> > that we definitely don't need to increase the size, we need to remove
> > existing element only, thus shrinking the size.
> >
> > Normally we'd reallocate the array to shrink it (and that's why we use
> > update_effective_progs() and allocate memory), but we can also have a
> > fallback path for detachment only to reuse existing effective arrays
> > and just shift all the elements to the right from the element that's
> > being removed. We'll leave NULL at the end, but that's much better
> > than error out. Subsequent attachment or detachment will attempt to
> > properly size and reallocate everything.
> >
> > So I think that should be the fix, if you'd be willing to work on it.
>
> That makes it much easier then. I will change it so that there is no
> alloc needed on the detach path. Thanks for the clarification.

Keep in mind that we probably want to do normal alloc-based detach
first anyways, if it works. It will keep effective arrays minimally
sized. This additional detach specific logic should be a fall back
path if the normal way doesn't work.

>
> --
> Thanks,
> Tadeusz
diff mbox series

Patch

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 128028efda64..b6307337a3c7 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -723,10 +723,11 @@  static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	pl->link = NULL;
 
 	err = update_effective_progs(cgrp, atype);
-	if (err)
-		goto cleanup;
-
-	/* now can actually delete it from this cgroup list */
+	/*
+	 * Proceed regardless of error. The link and/or prog will be freed
+	 * just after this function returns so just delete it from this
+	 * cgroup list and retry calling update_effective_progs again later.
+	 */
 	list_del(&pl->node);
 	kfree(pl);
 	if (list_empty(progs))
@@ -735,12 +736,11 @@  static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (old_prog)
 		bpf_prog_put(old_prog);
 	static_branch_dec(&cgroup_bpf_enabled_key[atype]);
-	return 0;
 
-cleanup:
-	/* restore back prog or link */
-	pl->prog = old_prog;
-	pl->link = link;
+	/* In case of error call update_effective_progs again */
+	if (err)
+		err = update_effective_progs(cgrp, atype);
+
 	return err;
 }
 
@@ -881,6 +881,7 @@  static void bpf_cgroup_link_release(struct bpf_link *link)
 	struct bpf_cgroup_link *cg_link =
 		container_of(link, struct bpf_cgroup_link, link);
 	struct cgroup *cg;
+	int err;
 
 	/* link might have been auto-detached by dying cgroup already,
 	 * in that case our work is done here
@@ -896,8 +897,10 @@  static void bpf_cgroup_link_release(struct bpf_link *link)
 		return;
 	}
 
-	WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
-				    cg_link->type));
+	err = __cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
+				  cg_link->type);
+	if (err)
+		pr_warn("cgroup_bpf_detach() failed, err %d\n", err);
 
 	cg = cg_link->cgroup;
 	cg_link->cgroup = NULL;