Message ID | 20211018075623.26884-1-quanyang.wang@windriver.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | BPF |
Headers | show |
Series | [V2] cgroup: fix memory leak caused by missing cgroup_bpf_offline | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
bpf/vmtest-bpf-next-PR | success | PR summary |
bpf/vmtest-bpf-PR | success | PR summary |
bpf/vmtest-bpf | success | VM_Test |
bpf/vmtest-bpf-next | success | VM_Test |
On Mon, Oct 18, 2021 at 03:56:23PM +0800, quanyang.wang@windriver.com wrote: > From: Quanyang Wang <quanyang.wang@windriver.com> > > When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running > the command as below: > > $mount -t cgroup -o none,name=foo cgroup cgroup/ > $umount cgroup/ > > unreferenced object 0xc3585c40 (size 64): > comm "mount", pid 425, jiffies 4294959825 (age 31.990s) > hex dump (first 32 bytes): > 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... > 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... > backtrace: > [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c > [<1f03679c>] cgroup_setup_root+0x174/0x37c > [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 > [<f85b12fd>] vfs_get_tree+0x24/0x108 > [<f55aec5c>] path_mount+0x384/0x988 > [<e2d5e9cd>] do_mount+0x64/0x9c > [<208c9cfe>] sys_mount+0xfc/0x1f4 > [<06dd06e0>] ret_fast_syscall+0x0/0x48 > [<a8308cb3>] 0xbeb4daa8 > > This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce > memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data > is allocated by the function percpu_ref_init in cgroup_bpf_inherit which > is called by cgroup_setup_root when mounting, but not freed along with > root_cgrp when umounting. Adding cgroup_bpf_offline which calls > percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in > umount path. > > This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime > of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a > cleanup that frees the resources which are allocated by cgroup_bpf_inherit > in cgroup_setup_root. > > And inside cgroup_bpf_offline, cgroup_get() is at the beginning and > cgroup_put is at the end of cgroup_bpf_release which is called by > cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of > cgroup's refcount. > > Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") If I understand correctly, cgroup_bpf_release() won't be called without your patch. So anything allocated in cgroup_bpf_inherit() will be leaked? If that is true, 'Fixes: 2b0d3d3e4fcf' looks misleading, cause people has to backport your patch if 4bfc0bb2c60e is applied. Meantime, this fix isn't needed if 4bfc0bb2c60e isn't merged. Thanks, Ming
Hi Ming, On 10/18/21 5:02 PM, Ming Lei wrote: > On Mon, Oct 18, 2021 at 03:56:23PM +0800, quanyang.wang@windriver.com wrote: >> From: Quanyang Wang <quanyang.wang@windriver.com> >> >> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running >> the command as below: >> >> $mount -t cgroup -o none,name=foo cgroup cgroup/ >> $umount cgroup/ >> >> unreferenced object 0xc3585c40 (size 64): >> comm "mount", pid 425, jiffies 4294959825 (age 31.990s) >> hex dump (first 32 bytes): >> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... >> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... >> backtrace: >> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c >> [<1f03679c>] cgroup_setup_root+0x174/0x37c >> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 >> [<f85b12fd>] vfs_get_tree+0x24/0x108 >> [<f55aec5c>] path_mount+0x384/0x988 >> [<e2d5e9cd>] do_mount+0x64/0x9c >> [<208c9cfe>] sys_mount+0xfc/0x1f4 >> [<06dd06e0>] ret_fast_syscall+0x0/0x48 >> [<a8308cb3>] 0xbeb4daa8 >> >> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce >> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data >> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which >> is called by cgroup_setup_root when mounting, but not freed along with >> root_cgrp when umounting. Adding cgroup_bpf_offline which calls >> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in >> umount path. >> >> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime >> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a >> cleanup that frees the resources which are allocated by cgroup_bpf_inherit >> in cgroup_setup_root. >> >> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and >> cgroup_put is at the end of cgroup_bpf_release which is called by >> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of >> cgroup's refcount. >> >> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") > > If I understand correctly, cgroup_bpf_release() won't be called without > your patch. So anything allocated in cgroup_bpf_inherit() will be > leaked? No, for now cgroup_bpf_release is called to release bpf.refcnt.data of the cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked. For non-root cgroup: cgroup_mkdir -> cgroup_create -->cgroup_bpf_inherit(cgrp_A) cgroup_rmdir ->cgroup_destroy_locked() -->cgroup_bpf_offline(cgrp_A) So for non-root cgroup, there is no memory leak. For root cgroup: cgroup_setup_root ->cgroup_bpf_inherit(root_cgrp) cgroup_kill_sb: -> (Here should be call cgroup_bpf_offline, or else leak occurs) Thanks, Quanyang > > If that is true, 'Fixes: 2b0d3d3e4fcf' looks misleading, cause people has to > backport your patch if 4bfc0bb2c60e is applied. Meantime, this fix isn't > needed if 4bfc0bb2c60e isn't merged. > > > Thanks, > Ming >
On Mon, Oct 18, 2021 at 06:06:28PM +0800, Quanyang Wang wrote: > Hi Ming, > > On 10/18/21 5:02 PM, Ming Lei wrote: > > On Mon, Oct 18, 2021 at 03:56:23PM +0800, quanyang.wang@windriver.com wrote: > > > From: Quanyang Wang <quanyang.wang@windriver.com> > > > > > > When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running > > > the command as below: > > > > > > $mount -t cgroup -o none,name=foo cgroup cgroup/ > > > $umount cgroup/ > > > > > > unreferenced object 0xc3585c40 (size 64): > > > comm "mount", pid 425, jiffies 4294959825 (age 31.990s) > > > hex dump (first 32 bytes): > > > 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... > > > 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... > > > backtrace: > > > [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c > > > [<1f03679c>] cgroup_setup_root+0x174/0x37c > > > [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 > > > [<f85b12fd>] vfs_get_tree+0x24/0x108 > > > [<f55aec5c>] path_mount+0x384/0x988 > > > [<e2d5e9cd>] do_mount+0x64/0x9c > > > [<208c9cfe>] sys_mount+0xfc/0x1f4 > > > [<06dd06e0>] ret_fast_syscall+0x0/0x48 > > > [<a8308cb3>] 0xbeb4daa8 > > > > > > This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce > > > memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data > > > is allocated by the function percpu_ref_init in cgroup_bpf_inherit which > > > is called by cgroup_setup_root when mounting, but not freed along with > > > root_cgrp when umounting. Adding cgroup_bpf_offline which calls > > > percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in > > > umount path. > > > > > > This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime > > > of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a > > > cleanup that frees the resources which are allocated by cgroup_bpf_inherit > > > in cgroup_setup_root. > > > > > > And inside cgroup_bpf_offline, cgroup_get() is at the beginning and > > > cgroup_put is at the end of cgroup_bpf_release which is called by > > > cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of > > > cgroup's refcount. > > > > > > Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") > > > > If I understand correctly, cgroup_bpf_release() won't be called without > > your patch. So anything allocated in cgroup_bpf_inherit() will be > > leaked? > No, for now cgroup_bpf_release is called to release bpf.refcnt.data of the > cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked. You mean that cgroup_bpf_inherit() allocates nothing for root_cgroup? If yes, I agree you can add 'Fixes: 2b0d3d3e4fcf', otherwise please remove it. Thanks, Ming
Hi Ming, On 10/18/21 8:59 PM, Ming Lei wrote: > On Mon, Oct 18, 2021 at 06:06:28PM +0800, Quanyang Wang wrote: >> Hi Ming, >> >> On 10/18/21 5:02 PM, Ming Lei wrote: >>> On Mon, Oct 18, 2021 at 03:56:23PM +0800, quanyang.wang@windriver.com wrote: >>>> From: Quanyang Wang <quanyang.wang@windriver.com> >>>> >>>> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running >>>> the command as below: >>>> >>>> $mount -t cgroup -o none,name=foo cgroup cgroup/ >>>> $umount cgroup/ >>>> >>>> unreferenced object 0xc3585c40 (size 64): >>>> comm "mount", pid 425, jiffies 4294959825 (age 31.990s) >>>> hex dump (first 32 bytes): >>>> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... >>>> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... >>>> backtrace: >>>> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c >>>> [<1f03679c>] cgroup_setup_root+0x174/0x37c >>>> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 >>>> [<f85b12fd>] vfs_get_tree+0x24/0x108 >>>> [<f55aec5c>] path_mount+0x384/0x988 >>>> [<e2d5e9cd>] do_mount+0x64/0x9c >>>> [<208c9cfe>] sys_mount+0xfc/0x1f4 >>>> [<06dd06e0>] ret_fast_syscall+0x0/0x48 >>>> [<a8308cb3>] 0xbeb4daa8 >>>> >>>> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce >>>> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data >>>> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which >>>> is called by cgroup_setup_root when mounting, but not freed along with >>>> root_cgrp when umounting. Adding cgroup_bpf_offline which calls >>>> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in >>>> umount path. >>>> >>>> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime >>>> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a >>>> cleanup that frees the resources which are allocated by cgroup_bpf_inherit >>>> in cgroup_setup_root. >>>> >>>> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and >>>> cgroup_put is at the end of cgroup_bpf_release which is called by >>>> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of >>>> cgroup's refcount. >>>> >>>> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") >>> >>> If I understand correctly, cgroup_bpf_release() won't be called without >>> your patch. So anything allocated in cgroup_bpf_inherit() will be >>> leaked? >> No, for now cgroup_bpf_release is called to release bpf.refcnt.data of the >> cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked. > > You mean that cgroup_bpf_inherit() allocates nothing for root_cgroup? Yes, cgroup_bpf_inherit allocates something for root_cgroup. The earlier commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") introduces an imbalance that call cgroup_bpf_inherit(&root_cgroup) but not call cgroup_bpf_offline(&root_cgroup). But there was no memory leak here. When the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") applies, some data is allocated for root_cgroup and not released with root_cgroup, and memory leak is observed. So I add 2 "Fixes tags" here to indicate that 2 commits introduce two different issues. But it seems that 2 "Fixes tags" is misleading now. So maybe just fix earlier commit 4bfc0bb2c60e which introduces imbalance? Thanks, Quanyang > > If yes, I agree you can add 'Fixes: 2b0d3d3e4fcf', otherwise please > remove it. > > > Thanks, > Ming >
Hi Ming, On 10/18/21 8:59 PM, Ming Lei wrote: > On Mon, Oct 18, 2021 at 06:06:28PM +0800, Quanyang Wang wrote: >> Hi Ming, >> >> On 10/18/21 5:02 PM, Ming Lei wrote: >>> On Mon, Oct 18, 2021 at 03:56:23PM +0800, quanyang.wang@windriver.com wrote: >>>> From: Quanyang Wang <quanyang.wang@windriver.com> >>>> >>>> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running >>>> the command as below: >>>> >>>> $mount -t cgroup -o none,name=foo cgroup cgroup/ >>>> $umount cgroup/ >>>> >>>> unreferenced object 0xc3585c40 (size 64): >>>> comm "mount", pid 425, jiffies 4294959825 (age 31.990s) >>>> hex dump (first 32 bytes): >>>> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... >>>> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... >>>> backtrace: >>>> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c >>>> [<1f03679c>] cgroup_setup_root+0x174/0x37c >>>> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 >>>> [<f85b12fd>] vfs_get_tree+0x24/0x108 >>>> [<f55aec5c>] path_mount+0x384/0x988 >>>> [<e2d5e9cd>] do_mount+0x64/0x9c >>>> [<208c9cfe>] sys_mount+0xfc/0x1f4 >>>> [<06dd06e0>] ret_fast_syscall+0x0/0x48 >>>> [<a8308cb3>] 0xbeb4daa8 >>>> >>>> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce >>>> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data >>>> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which >>>> is called by cgroup_setup_root when mounting, but not freed along with >>>> root_cgrp when umounting. Adding cgroup_bpf_offline which calls >>>> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in >>>> umount path. >>>> >>>> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime >>>> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a >>>> cleanup that frees the resources which are allocated by cgroup_bpf_inherit >>>> in cgroup_setup_root. >>>> >>>> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and >>>> cgroup_put is at the end of cgroup_bpf_release which is called by >>>> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of >>>> cgroup's refcount. >>>> >>>> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") >>> >>> If I understand correctly, cgroup_bpf_release() won't be called without >>> your patch. So anything allocated in cgroup_bpf_inherit() will be >>> leaked? >> No, for now cgroup_bpf_release is called to release bpf.refcnt.data of the >> cgroup which is not root_cgroup. Only root_cgroup's bpf data is leaked. > > You mean that cgroup_bpf_inherit() allocates nothing for root_cgroup? > > If yes, I agree you can add 'Fixes: 2b0d3d3e4fcf', otherwise please > remove it. > > > Thanks, > Ming >
Hi. On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <quanyang.wang@windriver.com> wrote: > So I add 2 "Fixes tags" here to indicate that 2 commits introduce two > different issues. AFAIU, both the changes are needed to cause the leak, a single patch alone won't cause the issue. Is that correct? (Perhaps not as I realize, see below.) But on second thought, the problem is the missing percpu_ref_exit() in the (root) cgroup release path and percpu counter would allocate the percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more visible. Is this correct? I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") alone did nothing wrong. [On a related (but independent) note, there seems to be an optimization opportunity in not dealing with cgroup_bpf at all on the non-default hierarchies.] Regards, Michal
On Tue, Oct 19, 2021 at 07:10:26PM +0200, Michal Koutný wrote: > Hi. > > On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <quanyang.wang@windriver.com> wrote: > > So I add 2 "Fixes tags" here to indicate that 2 commits introduce two > > different issues. > > AFAIU, both the changes are needed to cause the leak, a single patch > alone won't cause the issue. Is that correct? (Perhaps not as I realize, > see below.) > > But on second thought, the problem is the missing percpu_ref_exit() in > the (root) cgroup release path and percpu counter would allocate the > percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more > visible. Is this correct? > > I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of > percpu_ref in fast path") alone did nothing wrong. If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf", I thought cgroup_bpf_release() needs to release more for root cgroup, but looks not true. Thanks, Ming
Hi Michal, On 10/20/21 1:10 AM, Michal Koutný wrote: > Hi. > > On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <quanyang.wang@windriver.com> wrote: >> So I add 2 "Fixes tags" here to indicate that 2 commits introduce two >> different issues. > > AFAIU, both the changes are needed to cause the leak, a single patch > alone won't cause the issue. Is that correct? (Perhaps not as I realize, > see below.) Yes, I back to the earlier commit 4bfc0bb2c60e and no memory leak is observed. > > But on second thought, the problem is the missing percpu_ref_exit() in > the (root) cgroup release path and percpu counter would allocate the > percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more > visible. Is this correct? No, the earlier commit 4bfc0bb2c60e introduces a imbalance and the later commit 2b0d3d3e4fcf introduces a visible leak. Thanks, Quanyang > > I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of > percpu_ref in fast path") alone did nothing wrong. > > [On a related (but independent) note, there seems to be an optimization > opportunity in not dealing with cgroup_bpf at all on the non-default > hierarchies.] > > Regards, > Michal >
Hi Ming, On 10/20/21 10:17 AM, Ming Lei wrote: > On Tue, Oct 19, 2021 at 07:10:26PM +0200, Michal Koutný wrote: >> Hi. >> >> On Tue, Oct 19, 2021 at 06:41:14PM +0800, Quanyang Wang <quanyang.wang@windriver.com> wrote: >>> So I add 2 "Fixes tags" here to indicate that 2 commits introduce two >>> different issues. >> >> AFAIU, both the changes are needed to cause the leak, a single patch >> alone won't cause the issue. Is that correct? (Perhaps not as I realize, >> see below.) >> >> But on second thought, the problem is the missing percpu_ref_exit() in >> the (root) cgroup release path and percpu counter would allocate the >> percpu_count_ptr anyway, so 4bfc0bb2c60e is only making the leak more >> visible. Is this correct? >> >> I agree the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of >> percpu_ref in fast path") alone did nothing wrong. > > If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf", > I thought cgroup_bpf_release() needs to release more for root cgroup, but > looks not true. For now, I can only observe that precpu_ref data is leaked when running ltp testsuite. Thanks, Quanyang > > > Thanks, > Ming >
On Wed, Oct 20, 2021 at 01:22:06PM +0800, Quanyang Wang <quanyang.wang@windriver.com> wrote: > > If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf", > > I thought cgroup_bpf_release() needs to release more for root cgroup, but > > looks not true. > For now, I can only observe that precpu_ref data is leaked when running ltp > testsuite. I assume you refer to ref->data. I considered the ref->percpu_count_ptr allocated with __alloc_percpu_gfp(). Could it be that kmemleak won't detect leaked percpu allocations? (The patch you sent resolves this as well, I'm just curious.) Michal
On Mon, Oct 18, 2021 at 03:56:23PM +0800, quanyang.wang@windriver.com wrote: > From: Quanyang Wang <quanyang.wang@windriver.com> > > When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running > the command as below: > > $mount -t cgroup -o none,name=foo cgroup cgroup/ > $umount cgroup/ > > unreferenced object 0xc3585c40 (size 64): > comm "mount", pid 425, jiffies 4294959825 (age 31.990s) > hex dump (first 32 bytes): > 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... > 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... > backtrace: > [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c > [<1f03679c>] cgroup_setup_root+0x174/0x37c > [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 > [<f85b12fd>] vfs_get_tree+0x24/0x108 > [<f55aec5c>] path_mount+0x384/0x988 > [<e2d5e9cd>] do_mount+0x64/0x9c > [<208c9cfe>] sys_mount+0xfc/0x1f4 > [<06dd06e0>] ret_fast_syscall+0x0/0x48 > [<a8308cb3>] 0xbeb4daa8 > > This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce > memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data > is allocated by the function percpu_ref_init in cgroup_bpf_inherit which > is called by cgroup_setup_root when mounting, but not freed along with > root_cgrp when umounting. Adding cgroup_bpf_offline which calls > percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in > umount path. > > This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime > of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a > cleanup that frees the resources which are allocated by cgroup_bpf_inherit > in cgroup_setup_root. > > And inside cgroup_bpf_offline, cgroup_get() is at the beginning and > cgroup_put is at the end of cgroup_bpf_release which is called by > cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of > cgroup's refcount. > > Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") > Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") > Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com> > --- > V1 ---> V2: > 1. As per Daniel's suggestion, add description to commit msg about the > balance of cgroup's refcount in cgroup_bpf_offline. > 2. As per Michal's suggestion, add tag "Fixes: 4bfc0bb2c60e" and add > description about it. > 3. Fix indentation on the percpu_ref_is_dying line. Acked-by: Roman Gushchin <guro@fb.com> The fix looks correct, two fixes tag are fine too, if only it won't confuse scripts picking up patches for stable backports. In fact, it's a very cold path, which is arguably never hit in the real life. On cgroup v2 it's not an issue. I'm not sure we need a stable backport at all, only if it creates a noise for some automation tests. Quanyang, out of curiosity, how did you find it? Anyway, thanks for catching and fixing it! Roman
Hi Michal, On 10/21/21 1:28 AM, Michal Koutný wrote: > On Wed, Oct 20, 2021 at 01:22:06PM +0800, Quanyang Wang <quanyang.wang@windriver.com> wrote: >>> If only precpu_ref data is leaked, it is fine to add "Fixes: 2b0d3d3e4fcf", >>> I thought cgroup_bpf_release() needs to release more for root cgroup, but >>> looks not true. >> For now, I can only observe that precpu_ref data is leaked when running ltp >> testsuite. > > I assume you refer to ref->data. I considered the ref->percpu_count_ptr > allocated with __alloc_percpu_gfp(). Could it be that kmemleak won't > detect leaked percpu allocations? Yes, kmemleak can't detect percpu allocations. I find some message about this: commit f528f0b8e53d Author: Catalin Marinas <catalin.marinas@arm.com> Date: Mon Sep 26 17:12:53 2011 +0100 kmemleak: Handle percpu memory allocation This patch adds kmemleak callbacks from the percpu allocator, reducing a number of false positives caused by kmemleak not scanning such memory blocks. The percpu chunks are never reported as leaks because of current kmemleak limitations with the __percpu pointer not pointing directly to the actual chunks. Thanks, Quanyang > > (The patch you sent resolves this as well, I'm just curious.) > > Michal >
Hi Roman, On 10/22/21 9:30 AM, Roman Gushchin wrote: > On Mon, Oct 18, 2021 at 03:56:23PM +0800, quanyang.wang@windriver.com wrote: >> From: Quanyang Wang <quanyang.wang@windriver.com> >> >> When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running >> the command as below: >> >> $mount -t cgroup -o none,name=foo cgroup cgroup/ >> $umount cgroup/ >> >> unreferenced object 0xc3585c40 (size 64): >> comm "mount", pid 425, jiffies 4294959825 (age 31.990s) >> hex dump (first 32 bytes): >> 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... >> 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... >> backtrace: >> [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c >> [<1f03679c>] cgroup_setup_root+0x174/0x37c >> [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 >> [<f85b12fd>] vfs_get_tree+0x24/0x108 >> [<f55aec5c>] path_mount+0x384/0x988 >> [<e2d5e9cd>] do_mount+0x64/0x9c >> [<208c9cfe>] sys_mount+0xfc/0x1f4 >> [<06dd06e0>] ret_fast_syscall+0x0/0x48 >> [<a8308cb3>] 0xbeb4daa8 >> >> This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce >> memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data >> is allocated by the function percpu_ref_init in cgroup_bpf_inherit which >> is called by cgroup_setup_root when mounting, but not freed along with >> root_cgrp when umounting. Adding cgroup_bpf_offline which calls >> percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in >> umount path. >> >> This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime >> of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a >> cleanup that frees the resources which are allocated by cgroup_bpf_inherit >> in cgroup_setup_root. >> >> And inside cgroup_bpf_offline, cgroup_get() is at the beginning and >> cgroup_put is at the end of cgroup_bpf_release which is called by >> cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of >> cgroup's refcount. >> >> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") >> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") >> Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com> >> --- >> V1 ---> V2: >> 1. As per Daniel's suggestion, add description to commit msg about the >> balance of cgroup's refcount in cgroup_bpf_offline. >> 2. As per Michal's suggestion, add tag "Fixes: 4bfc0bb2c60e" and add >> description about it. >> 3. Fix indentation on the percpu_ref_is_dying line. > > Acked-by: Roman Gushchin <guro@fb.com> > > The fix looks correct, two fixes tag are fine too, if only it won't > confuse scripts picking up patches for stable backports. > > In fact, it's a very cold path, which is arguably never hit in the real > life. On cgroup v2 it's not an issue. I'm not sure we need a stable > backport at all, only if it creates a noise for some automation tests. > > Quanyang, out of curiosity, how did you find it? I ran ltp testsuite to find this. ./runltp -f controllers -s cgroup Thanks, Quanyang > > Anyway, thanks for catching and fixing it! > > Roman >
On Fri, Oct 22, 2021 at 4:56 AM Quanyang Wang <quanyang.wang@windriver.com> wrote: > >> Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") > >> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") > >> Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com> > >> --- > >> V1 ---> V2: > >> 1. As per Daniel's suggestion, add description to commit msg about the > >> balance of cgroup's refcount in cgroup_bpf_offline. > >> 2. As per Michal's suggestion, add tag "Fixes: 4bfc0bb2c60e" and add > >> description about it. > >> 3. Fix indentation on the percpu_ref_is_dying line. > > > > Acked-by: Roman Gushchin <guro@fb.com> > > > > The fix looks correct, two fixes tag are fine too, if only it won't > > confuse scripts picking up patches for stable backports. > > > > In fact, it's a very cold path, which is arguably never hit in the real > > life. On cgroup v2 it's not an issue. I'm not sure we need a stable > > backport at all, only if it creates a noise for some automation tests. > > > > Quanyang, out of curiosity, how did you find it? > I ran ltp testsuite to find this. > > ./runltp -f controllers -s cgroup > > Thanks, > Quanyang > > > > Anyway, thanks for catching and fixing it! Applied to bpf tree. Thanks everyone!
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 570b0c97392a..ea08f01d0111 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2187,8 +2187,10 @@ static void cgroup_kill_sb(struct super_block *sb) * And don't kill the default root. */ if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && - !percpu_ref_is_dying(&root->cgrp.self.refcnt)) + !percpu_ref_is_dying(&root->cgrp.self.refcnt)) { + cgroup_bpf_offline(&root->cgrp); percpu_ref_kill(&root->cgrp.self.refcnt); + } cgroup_put(&root->cgrp); kernfs_kill_sb(sb); }