Message ID | 20211007121603.1484881-1-quanyang.wang@windriver.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | cgroup: fix memory leak caused by missing cgroup_bpf_offline | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
Hello. On Thu, Oct 07, 2021 at 08:16:03PM +0800, quanyang.wang@windriver.com wrote: > This is because that root_cgrp->bpf.refcnt.data is allocated by the > function percpu_ref_init in cgroup_bpf_inherit which is called by > cgroup_setup_root when mounting, but not freed along with root_cgrp > when umounting. Good catch! > Adding cgroup_bpf_offline which calls percpu_ref_kill to > cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in umount path. That is sensible. > Fixes: 2b0d3d3e4fcfb ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Why this Fixes:? Is the leak absent before the percpu_ref refactoring? I guess the embedded data are free'd together with cgroup. Makes me wonder why struct cgroup_bpf has a separate percpu_ref counter from struct cgroup... > +++ b/kernel/cgroup/cgroup.c > @@ -2147,8 +2147,10 @@ static void cgroup_kill_sb(struct super_block *sb) > * And don't kill the default root. > */ > if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && > - !percpu_ref_is_dying(&root->cgrp.self.refcnt)) > + !percpu_ref_is_dying(&root->cgrp.self.refcnt)) { > + cgroup_bpf_offline(&root->cgrp); (You made some unnecessary whitespace here breaking indention :-) > percpu_ref_kill(&root->cgrp.self.refcnt); > + } > cgroup_put(&root->cgrp); > kernfs_kill_sb(sb); > }
On Mon, Oct 11, 2021 at 06:21:28PM +0200, Michal Koutny wrote: > Hello. > > On Thu, Oct 07, 2021 at 08:16:03PM +0800, quanyang.wang@windriver.com wrote: > > This is because that root_cgrp->bpf.refcnt.data is allocated by the > > function percpu_ref_init in cgroup_bpf_inherit which is called by > > cgroup_setup_root when mounting, but not freed along with root_cgrp > > when umounting. > > Good catch! +1 > > > Adding cgroup_bpf_offline which calls percpu_ref_kill to > > cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in umount path. > > That is sensible. > > > Fixes: 2b0d3d3e4fcfb ("percpu_ref: reduce memory footprint of percpu_ref in fast path") > > Why this Fixes:? Is the leak absent before the percpu_ref refactoring? I agree, the "fixes" tag looks dubious to me. > I guess the embedded data are free'd together with cgroup. Makes me > wonder why struct cgroup_bpf has a separate percpu_ref counter from > struct cgroup... This is because a cgroup can stay a long time (sometimes effectively forever) in a dying state, so we want to release bpf structures earlier. Thanks!
Hi Michal & Roman, Thank you for your review. On 10/12/21 12:21 AM, Michal Koutný wrote: > Hello. > > On Thu, Oct 07, 2021 at 08:16:03PM +0800, quanyang.wang@windriver.com wrote: >> This is because that root_cgrp->bpf.refcnt.data is allocated by the >> function percpu_ref_init in cgroup_bpf_inherit which is called by >> cgroup_setup_root when mounting, but not freed along with root_cgrp >> when umounting. > > Good catch! > >> Adding cgroup_bpf_offline which calls percpu_ref_kill to >> cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in umount path. > > That is sensible. > >> Fixes: 2b0d3d3e4fcfb ("percpu_ref: reduce memory footprint of percpu_ref in fast path") > > Why this Fixes:? Is the leak absent before the percpu_ref refactoring? Before this commit, percpu_ref is embedded in cgroup, it can be freed along with cgroup, so there is no memory leak. Since this commit, it causes the memory leak. Should I change it to "Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")"? > I guess the embedded data are free'd together with cgroup. Makes me > wonder why struct cgroup_bpf has a separate percpu_ref counter from > struct cgroup... > >> +++ b/kernel/cgroup/cgroup.c >> @@ -2147,8 +2147,10 @@ static void cgroup_kill_sb(struct super_block *sb) >> * And don't kill the default root. >> */ >> if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && >> - !percpu_ref_is_dying(&root->cgrp.self.refcnt)) >> + !percpu_ref_is_dying(&root->cgrp.self.refcnt)) { >> + cgroup_bpf_offline(&root->cgrp); > > (You made some unnecessary whitespace here breaking indention :-) Thanks for pointing it out. I will send a V2 to fix this. Thanks, Quanyang > >> percpu_ref_kill(&root->cgrp.self.refcnt); >> + } >> cgroup_put(&root->cgrp); >> kernfs_kill_sb(sb); >> }
On Tue, Oct 12, 2021 at 02:22:13PM +0800, Quanyang Wang <quanyang.wang@windriver.com> wrote: > Before this commit, percpu_ref is embedded in cgroup, it can be freed along > with cgroup, so there is no memory leak. Since this commit, it causes the > memory leak. > Should I change it to "Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of > cgroup_bpf from cgroup itself")"? I see. The leak is a product so I'd tag both of them and explain it in the commit message. Thank you, Michal
Roman Gushchin wrote: > On Mon, Oct 11, 2021 at 06:21:28PM +0200, Michal Koutny wrote: > > Hello. > > > > On Thu, Oct 07, 2021 at 08:16:03PM +0800, quanyang.wang@windriver.com wrote: > > > This is because that root_cgrp->bpf.refcnt.data is allocated by the > > > function percpu_ref_init in cgroup_bpf_inherit which is called by > > > cgroup_setup_root when mounting, but not freed along with root_cgrp > > > when umounting. > > > > Good catch! > > +1 > > > > > > Adding cgroup_bpf_offline which calls percpu_ref_kill to > > > cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in umount path. > > > > That is sensible. > > > > > Fixes: 2b0d3d3e4fcfb ("percpu_ref: reduce memory footprint of percpu_ref in fast path") > > > > Why this Fixes:? Is the leak absent before the percpu_ref refactoring? > > I agree, the "fixes" tag looks dubious to me. > > > I guess the embedded data are free'd together with cgroup. Makes me > > wonder why struct cgroup_bpf has a separate percpu_ref counter from > > struct cgroup... > > This is because a cgroup can stay a long time (sometimes effectively forever) > in a dying state, so we want to release bpf structures earlier. > > Thanks! Other than whitespace LGTM. Acked-by: John Fastabend <john.fastabend@gmail.com>
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index c8b811e039cc2..ce636deec5e41 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2147,8 +2147,10 @@ static void cgroup_kill_sb(struct super_block *sb) * And don't kill the default root. */ if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && - !percpu_ref_is_dying(&root->cgrp.self.refcnt)) + !percpu_ref_is_dying(&root->cgrp.self.refcnt)) { + cgroup_bpf_offline(&root->cgrp); percpu_ref_kill(&root->cgrp.self.refcnt); + } cgroup_put(&root->cgrp); kernfs_kill_sb(sb); }