diff mbox series

nfsd: fix race between cache_clean and cache_purge

Message ID 5eed50660eb13326b0fbf537fb58481ea53c1acb.1585043174.git.wuyihao@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series nfsd: fix race between cache_clean and cache_purge | expand

Commit Message

Yihao Wu March 24, 2020, 9:49 a.m. UTC
cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
can be cache_put twice, which leads to a use-after-free bug.

To reproduce, run ltp. It happens rarely.  /opt/ltp/runltp run -f net.nfs

[14454.137661] ==================================================================
[14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
[14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
[14454.140746]
[14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
[14454.141002] Call Trace:
[14454.141014]  dump_stack+0xaf/0xfb[14454.141027]  print_address_description+0x6a/0x2a0
[14454.141037]  kasan_report+0x166/0x2b0[14454.141057]  ? cache_purge+0xce/0x160 [sunrpc]
[14454.141079]  cache_purge+0xce/0x160 [sunrpc]
[14454.141099]  nfsd_last_thread+0x267/0x270 [nfsd][14454.141109]  ? nfsd_last_thread+0x5/0x270 [nfsd]
[14454.141130]  nfsd_destroy+0xcb/0x180 [nfsd]
[14454.141140]  ? nfsd_destroy+0x5/0x180 [nfsd]
[14454.141153]  nfsd+0x1e4/0x2b0 [nfsd]
[14454.141163]  ? nfsd+0x5/0x2b0 [nfsd]
[14454.141173]  kthread+0x114/0x150
[14454.141183]  ? nfsd_destroy+0x180/0x180 [nfsd]
[14454.141187]  ? kthread_park+0xb0/0xb0
[14454.141197]  ret_from_fork+0x3a/0x50
[14454.141224]
[14454.141475] Allocated by task 20918:
[14454.142011]  kmem_cache_alloc_trace+0x9f/0x2e0
[14454.142027]  sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
[14454.142037]  svc_export_parse+0x1e7/0x930 [nfsd]
[14454.142051]  cache_do_downcall+0x5a/0x80 [sunrpc]
[14454.142064]  cache_downcall+0x78/0x180 [sunrpc]
[14454.142078]  cache_write_procfs+0x57/0x80 [sunrpc]
[14454.142083]  proc_reg_write+0x90/0xd0
[14454.142088]  vfs_write+0xc2/0x1c0
[14454.142092]  ksys_write+0x4d/0xd0
[14454.142098]  do_syscall_64+0x60/0x250
[14454.142103]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[14454.142106]
[14454.142344] Freed by task 19165:
[14454.142804]  kfree+0x114/0x300
[14454.142819]  cache_clean+0x2a4/0x2e0 [sunrpc]
[14454.142833]  cache_flush+0x24/0x60 [sunrpc]
[14454.142845]  write_flush.isra.19+0xbe/0x100 [sunrpc]
[14454.142849]  proc_reg_write+0x90/0xd0
[14454.142853]  vfs_write+0xc2/0x1c0
[14454.142856]  ksys_write+0x4d/0xd0
[14454.142860]  do_syscall_64+0x60/0x250
[14454.142865]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[14454.142867]
[14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
[14454.144842] The buggy address is located 32 bytes inside of  256-byte region [ffff8883d484d540, ffff8883d484d640)
[14454.146463] The buggy address belongs to the page:
[14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
[14454.148712] flags: 0x17fffc00010200(slab|head)
[14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
[14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
[14454.151557] page dumped because: kasan: bad access detected
[14454.152364]
[14454.152606] Memory state around the buggy address:
[14454.153300]  ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[14454.154319]  ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[14454.156334]                                                        ^
[14454.157237]  ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[14454.158262]  ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[14454.159282] ==================================================================
[14454.160224] Disabling lock debugging due to kernel taint

Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
Cc: stable@vger.kernel.org #v4.11+
Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
---
 net/sunrpc/cache.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chuck Lever March 24, 2020, 1:38 p.m. UTC | #1
> On Mar 24, 2020, at 5:49 AM, Yihao Wu <wuyihao@linux.alibaba.com> wrote:
> 
> cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
> can be cache_put twice, which leads to a use-after-free bug.
> 
> To reproduce, run ltp. It happens rarely.  /opt/ltp/runltp run -f net.nfs
> 
> [14454.137661] ==================================================================
> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
> [14454.140746]
> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> [14454.141002] Call Trace:
> [14454.141014]  dump_stack+0xaf/0xfb[14454.141027]  print_address_description+0x6a/0x2a0
> [14454.141037]  kasan_report+0x166/0x2b0[14454.141057]  ? cache_purge+0xce/0x160 [sunrpc]
> [14454.141079]  cache_purge+0xce/0x160 [sunrpc]
> [14454.141099]  nfsd_last_thread+0x267/0x270 [nfsd][14454.141109]  ? nfsd_last_thread+0x5/0x270 [nfsd]
> [14454.141130]  nfsd_destroy+0xcb/0x180 [nfsd]
> [14454.141140]  ? nfsd_destroy+0x5/0x180 [nfsd]
> [14454.141153]  nfsd+0x1e4/0x2b0 [nfsd]
> [14454.141163]  ? nfsd+0x5/0x2b0 [nfsd]
> [14454.141173]  kthread+0x114/0x150
> [14454.141183]  ? nfsd_destroy+0x180/0x180 [nfsd]
> [14454.141187]  ? kthread_park+0xb0/0xb0
> [14454.141197]  ret_from_fork+0x3a/0x50
> [14454.141224]
> [14454.141475] Allocated by task 20918:
> [14454.142011]  kmem_cache_alloc_trace+0x9f/0x2e0
> [14454.142027]  sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> [14454.142037]  svc_export_parse+0x1e7/0x930 [nfsd]
> [14454.142051]  cache_do_downcall+0x5a/0x80 [sunrpc]
> [14454.142064]  cache_downcall+0x78/0x180 [sunrpc]
> [14454.142078]  cache_write_procfs+0x57/0x80 [sunrpc]
> [14454.142083]  proc_reg_write+0x90/0xd0
> [14454.142088]  vfs_write+0xc2/0x1c0
> [14454.142092]  ksys_write+0x4d/0xd0
> [14454.142098]  do_syscall_64+0x60/0x250
> [14454.142103]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142106]
> [14454.142344] Freed by task 19165:
> [14454.142804]  kfree+0x114/0x300
> [14454.142819]  cache_clean+0x2a4/0x2e0 [sunrpc]
> [14454.142833]  cache_flush+0x24/0x60 [sunrpc]
> [14454.142845]  write_flush.isra.19+0xbe/0x100 [sunrpc]
> [14454.142849]  proc_reg_write+0x90/0xd0
> [14454.142853]  vfs_write+0xc2/0x1c0
> [14454.142856]  ksys_write+0x4d/0xd0
> [14454.142860]  do_syscall_64+0x60/0x250
> [14454.142865]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142867]
> [14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> [14454.144842] The buggy address is located 32 bytes inside of  256-byte region [ffff8883d484d540, ffff8883d484d640)
> [14454.146463] The buggy address belongs to the page:
> [14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
> [14454.148712] flags: 0x17fffc00010200(slab|head)
> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
> [14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
> [14454.151557] page dumped because: kasan: bad access detected
> [14454.152364]
> [14454.152606] Memory state around the buggy address:
> [14454.153300]  ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.154319]  ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> [14454.156334]                                                        ^
> [14454.157237]  ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.158262]  ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [14454.159282] ==================================================================
> [14454.160224] Disabling lock debugging due to kernel taint
> 
> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
> Cc: stable@vger.kernel.org #v4.11+
> Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>

Mechanically this looks OK, but I would feel more comfortable
if a domain expert could review this. Neil, Trond, Bruce?


> ---
> net/sunrpc/cache.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index bd843a81afa0..3e523eefc47f 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
> 	struct hlist_node *tmp = NULL;
> 	int i = 0;
> 
> +	spin_lock(&cache_list_lock);
> 	spin_lock(&detail->hash_lock);
> 	if (!detail->entries) {
> 		spin_unlock(&detail->hash_lock);
> +		spin_unlock(&cache_list_lock);
> 		return;
> 	}
> 
> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
> 		}
> 	}
> 	spin_unlock(&detail->hash_lock);
> +	spin_unlock(&cache_list_lock);
> }
> EXPORT_SYMBOL_GPL(cache_purge);
> 
> -- 
> 2.20.1.2432.ga663e714
> 

--
Chuck Lever
Chuck Lever March 24, 2020, 2:18 p.m. UTC | #2
> On Mar 24, 2020, at 5:49 AM, Yihao Wu <wuyihao@linux.alibaba.com> wrote:
> 
> cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
> can be cache_put twice, which leads to a use-after-free bug.
> 
> To reproduce, run ltp. It happens rarely.  /opt/ltp/runltp run -f net.nfs
> 
> [14454.137661] ==================================================================
> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
> [14454.140746]
> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> [14454.141002] Call Trace:
> [14454.141014]  dump_stack+0xaf/0xfb[14454.141027]  print_address_description+0x6a/0x2a0
> [14454.141037]  kasan_report+0x166/0x2b0[14454.141057]  ? cache_purge+0xce/0x160 [sunrpc]
> [14454.141079]  cache_purge+0xce/0x160 [sunrpc]
> [14454.141099]  nfsd_last_thread+0x267/0x270 [nfsd][14454.141109]  ? nfsd_last_thread+0x5/0x270 [nfsd]
> [14454.141130]  nfsd_destroy+0xcb/0x180 [nfsd]
> [14454.141140]  ? nfsd_destroy+0x5/0x180 [nfsd]
> [14454.141153]  nfsd+0x1e4/0x2b0 [nfsd]
> [14454.141163]  ? nfsd+0x5/0x2b0 [nfsd]
> [14454.141173]  kthread+0x114/0x150
> [14454.141183]  ? nfsd_destroy+0x180/0x180 [nfsd]
> [14454.141187]  ? kthread_park+0xb0/0xb0
> [14454.141197]  ret_from_fork+0x3a/0x50
> [14454.141224]
> [14454.141475] Allocated by task 20918:
> [14454.142011]  kmem_cache_alloc_trace+0x9f/0x2e0
> [14454.142027]  sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> [14454.142037]  svc_export_parse+0x1e7/0x930 [nfsd]
> [14454.142051]  cache_do_downcall+0x5a/0x80 [sunrpc]
> [14454.142064]  cache_downcall+0x78/0x180 [sunrpc]
> [14454.142078]  cache_write_procfs+0x57/0x80 [sunrpc]
> [14454.142083]  proc_reg_write+0x90/0xd0
> [14454.142088]  vfs_write+0xc2/0x1c0
> [14454.142092]  ksys_write+0x4d/0xd0
> [14454.142098]  do_syscall_64+0x60/0x250
> [14454.142103]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142106]
> [14454.142344] Freed by task 19165:
> [14454.142804]  kfree+0x114/0x300
> [14454.142819]  cache_clean+0x2a4/0x2e0 [sunrpc]
> [14454.142833]  cache_flush+0x24/0x60 [sunrpc]
> [14454.142845]  write_flush.isra.19+0xbe/0x100 [sunrpc]
> [14454.142849]  proc_reg_write+0x90/0xd0
> [14454.142853]  vfs_write+0xc2/0x1c0
> [14454.142856]  ksys_write+0x4d/0xd0
> [14454.142860]  do_syscall_64+0x60/0x250
> [14454.142865]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142867]
> [14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> [14454.144842] The buggy address is located 32 bytes inside of  256-byte region [ffff8883d484d540, ffff8883d484d640)
> [14454.146463] The buggy address belongs to the page:
> [14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
> [14454.148712] flags: 0x17fffc00010200(slab|head)
> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
> [14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
> [14454.151557] page dumped because: kasan: bad access detected
> [14454.152364]
> [14454.152606] Memory state around the buggy address:
> [14454.153300]  ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.154319]  ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> [14454.156334]                                                        ^
> [14454.157237]  ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.158262]  ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [14454.159282] ==================================================================
> [14454.160224] Disabling lock debugging due to kernel taint
> 
> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
> Cc: stable@vger.kernel.org #v4.11+

Yihao, I couldn't get this patch to apply to kernels before v5.0.

I don't think we need both a Fixes tag and a Cc: stable, because
stable maintainers will try to apply this patch to any kernel that
has 471a930, and ignore the failures.

So if I apply your fix, I'm going to drop the Cc: stable tag.


> Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
> ---
> net/sunrpc/cache.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index bd843a81afa0..3e523eefc47f 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
> 	struct hlist_node *tmp = NULL;
> 	int i = 0;
> 
> +	spin_lock(&cache_list_lock);
> 	spin_lock(&detail->hash_lock);
> 	if (!detail->entries) {
> 		spin_unlock(&detail->hash_lock);
> +		spin_unlock(&cache_list_lock);
> 		return;
> 	}
> 
> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
> 		}
> 	}
> 	spin_unlock(&detail->hash_lock);
> +	spin_unlock(&cache_list_lock);
> }
> EXPORT_SYMBOL_GPL(cache_purge);
> 
> -- 
> 2.20.1.2432.ga663e714
> 

--
Chuck Lever
Yihao Wu March 24, 2020, 2:28 p.m. UTC | #3
On 2020/3/24 10:18 PM, Chuck Lever wrote:
>> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
>> Cc: stable@vger.kernel.org #v4.11+
> Yihao, I couldn't get this patch to apply to kernels before v5.0.
> 
> I don't think we need both a Fixes tag and a Cc: stable, because
> stable maintainers will try to apply this patch to any kernel that
> has 471a930, and ignore the failures.
> 
> So if I apply your fix, I'm going to drop the Cc: stable tag.
> 
> 

OK. And thanks for reviewing and reminding me this, Chuck.

Yihao Wu
J. Bruce Fields March 24, 2020, 2:33 p.m. UTC | #4
On Tue, Mar 24, 2020 at 09:38:55AM -0400, Chuck Lever wrote:
> Mechanically this looks OK, but I would feel more comfortable
> if a domain expert could review this. Neil, Trond, Bruce?

Looks right to me.

Reviewed-by: J. Bruce Fields <bfields@redhat.com>

--b.
Chuck Lever March 24, 2020, 2:35 p.m. UTC | #5
> On Mar 24, 2020, at 10:33 AM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Tue, Mar 24, 2020 at 09:38:55AM -0400, Chuck Lever wrote:
>> Mechanically this looks OK, but I would feel more comfortable
>> if a domain expert could review this. Neil, Trond, Bruce?
> 
> Looks right to me.
> 
> Reviewed-by: J. Bruce Fields <bfields@redhat.com>

Thanks, Bruce! I've added your R-b.

--
Chuck Lever
Trond Myklebust March 24, 2020, 3:24 p.m. UTC | #6
On Tue, 2020-03-24 at 09:38 -0400, Chuck Lever wrote:
> > On Mar 24, 2020, at 5:49 AM, Yihao Wu <wuyihao@linux.alibaba.com>
> > wrote:
> > 
> > cache_purge should hold cache_list_lock as cache_clean does.
> > Otherwise a cache
> > can be cache_put twice, which leads to a use-after-free bug.
> > 
> > To reproduce, run ltp. It happens rarely.  /opt/ltp/runltp run -f
> > net.nfs
> > 
> > [14454.137661]
> > ==================================================================
> > [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160
> > [sunrpc]
> > [14454.139822] Read of size 4 at addr ffff8883d484d560 by task
> > nfsd/31993
> > [14454.140746]
> > [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not
> > tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> > [14454.141002] Call Trace:
> > [14454.141014]  dump_stack+0xaf/0xfb[14454.141027]  print_address_d
> > escription+0x6a/0x2a0
> > [14454.141037]  kasan_report+0x166/0x2b0[14454.141057]  ?
> > cache_purge+0xce/0x160 [sunrpc]
> > [14454.141079]  cache_purge+0xce/0x160 [sunrpc]
> > [14454.141099]  nfsd_last_thread+0x267/0x270
> > [nfsd][14454.141109]  ? nfsd_last_thread+0x5/0x270 [nfsd]
> > [14454.141130]  nfsd_destroy+0xcb/0x180 [nfsd]
> > [14454.141140]  ? nfsd_destroy+0x5/0x180 [nfsd]
> > [14454.141153]  nfsd+0x1e4/0x2b0 [nfsd]
> > [14454.141163]  ? nfsd+0x5/0x2b0 [nfsd]
> > [14454.141173]  kthread+0x114/0x150
> > [14454.141183]  ? nfsd_destroy+0x180/0x180 [nfsd]
> > [14454.141187]  ? kthread_park+0xb0/0xb0
> > [14454.141197]  ret_from_fork+0x3a/0x50
> > [14454.141224]
> > [14454.141475] Allocated by task 20918:
> > [14454.142011]  kmem_cache_alloc_trace+0x9f/0x2e0
> > [14454.142027]  sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> > [14454.142037]  svc_export_parse+0x1e7/0x930 [nfsd]
> > [14454.142051]  cache_do_downcall+0x5a/0x80 [sunrpc]
> > [14454.142064]  cache_downcall+0x78/0x180 [sunrpc]
> > [14454.142078]  cache_write_procfs+0x57/0x80 [sunrpc]
> > [14454.142083]  proc_reg_write+0x90/0xd0
> > [14454.142088]  vfs_write+0xc2/0x1c0
> > [14454.142092]  ksys_write+0x4d/0xd0
> > [14454.142098]  do_syscall_64+0x60/0x250
> > [14454.142103]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [14454.142106]
> > [14454.142344] Freed by task 19165:
> > [14454.142804]  kfree+0x114/0x300
> > [14454.142819]  cache_clean+0x2a4/0x2e0 [sunrpc]
> > [14454.142833]  cache_flush+0x24/0x60 [sunrpc]
> > [14454.142845]  write_flush.isra.19+0xbe/0x100 [sunrpc]
> > [14454.142849]  proc_reg_write+0x90/0xd0
> > [14454.142853]  vfs_write+0xc2/0x1c0
> > [14454.142856]  ksys_write+0x4d/0xd0
> > [14454.142860]  do_syscall_64+0x60/0x250
> > [14454.142865]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [14454.142867]
> > [14454.143095] The buggy address belongs to the object at
> > ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> > [14454.144842] The buggy address is located 32 bytes inside
> > of  256-byte region [ffff8883d484d540, ffff8883d484d640)
> > [14454.146463] The buggy address belongs to the page:
> > [14454.147155] page:ffffea000f521300 count:1 mapcount:0
> > mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map
> > count: 0
> > [14454.148712] flags: 0x17fffc00010200(slab|head)
> > [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00
> > 0000000200000002 ffff888107c02e00
> > [14454.150453] raw: ffff8883d484da40 0000000080190001
> > 00000001ffffffff 0000000000000000
> > [14454.151557] page dumped because: kasan: bad access detected
> > [14454.152364]
> > [14454.152606] Memory state around the buggy address:
> > [14454.153300]  ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb fb
> > [14454.154319]  ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb fb
> > [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb
> > fb fb fb fb fb
> > [14454.156334]                                                     
> >    ^
> > [14454.157237]  ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb fb
> > [14454.158262]  ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc
> > fc fc fc fc fc
> > [14454.159282]
> > ==================================================================
> > [14454.160224] Disabling lock debugging due to kernel taint
> > 
> > Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when
> > cache_purge())
> > Cc: stable@vger.kernel.org #v4.11+
> > Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
> 
> Mechanically this looks OK, but I would feel more comfortable
> if a domain expert could review this. Neil, Trond, Bruce?
> 
> 
> > ---
> > net/sunrpc/cache.c | 3 +++
> > 1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index bd843a81afa0..3e523eefc47f 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
> > 	struct hlist_node *tmp = NULL;
> > 	int i = 0;
> > 
> > +	spin_lock(&cache_list_lock);
> > 	spin_lock(&detail->hash_lock);
> > 	if (!detail->entries) {
> > 		spin_unlock(&detail->hash_lock);
> > +		spin_unlock(&cache_list_lock);
> > 		return;
> > 	}
> > 
> > @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
> > 		}
> > 	}
> > 	spin_unlock(&detail->hash_lock);
> > +	spin_unlock(&cache_list_lock);
> > }
> > EXPORT_SYMBOL_GPL(cache_purge);


Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
sunrpc_end_cache_remove_entry()? The latter does call both
cache_revisit_request() and cache_put(), and while they do not
explicitly call anything that holds cache_list_lock, some of those cd-
>cache_put callbacks do look as if there is potential for deadlock.
Chuck Lever March 24, 2020, 5:46 p.m. UTC | #7
> On Mar 24, 2020, at 11:24 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Tue, 2020-03-24 at 09:38 -0400, Chuck Lever wrote:
>>> On Mar 24, 2020, at 5:49 AM, Yihao Wu <wuyihao@linux.alibaba.com>
>>> wrote:
>>> 
>>> cache_purge should hold cache_list_lock as cache_clean does.
>>> Otherwise a cache
>>> can be cache_put twice, which leads to a use-after-free bug.
>>> 
>>> To reproduce, run ltp. It happens rarely.  /opt/ltp/runltp run -f
>>> net.nfs
>>> 
>>> [14454.137661]
>>> ==================================================================
>>> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160
>>> [sunrpc]
>>> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task
>>> nfsd/31993
>>> [14454.140746]
>>> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not
>>> tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
>>> [14454.141002] Call Trace:
>>> [14454.141014]  dump_stack+0xaf/0xfb[14454.141027]  print_address_d
>>> escription+0x6a/0x2a0
>>> [14454.141037]  kasan_report+0x166/0x2b0[14454.141057]  ?
>>> cache_purge+0xce/0x160 [sunrpc]
>>> [14454.141079]  cache_purge+0xce/0x160 [sunrpc]
>>> [14454.141099]  nfsd_last_thread+0x267/0x270
>>> [nfsd][14454.141109]  ? nfsd_last_thread+0x5/0x270 [nfsd]
>>> [14454.141130]  nfsd_destroy+0xcb/0x180 [nfsd]
>>> [14454.141140]  ? nfsd_destroy+0x5/0x180 [nfsd]
>>> [14454.141153]  nfsd+0x1e4/0x2b0 [nfsd]
>>> [14454.141163]  ? nfsd+0x5/0x2b0 [nfsd]
>>> [14454.141173]  kthread+0x114/0x150
>>> [14454.141183]  ? nfsd_destroy+0x180/0x180 [nfsd]
>>> [14454.141187]  ? kthread_park+0xb0/0xb0
>>> [14454.141197]  ret_from_fork+0x3a/0x50
>>> [14454.141224]
>>> [14454.141475] Allocated by task 20918:
>>> [14454.142011]  kmem_cache_alloc_trace+0x9f/0x2e0
>>> [14454.142027]  sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
>>> [14454.142037]  svc_export_parse+0x1e7/0x930 [nfsd]
>>> [14454.142051]  cache_do_downcall+0x5a/0x80 [sunrpc]
>>> [14454.142064]  cache_downcall+0x78/0x180 [sunrpc]
>>> [14454.142078]  cache_write_procfs+0x57/0x80 [sunrpc]
>>> [14454.142083]  proc_reg_write+0x90/0xd0
>>> [14454.142088]  vfs_write+0xc2/0x1c0
>>> [14454.142092]  ksys_write+0x4d/0xd0
>>> [14454.142098]  do_syscall_64+0x60/0x250
>>> [14454.142103]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [14454.142106]
>>> [14454.142344] Freed by task 19165:
>>> [14454.142804]  kfree+0x114/0x300
>>> [14454.142819]  cache_clean+0x2a4/0x2e0 [sunrpc]
>>> [14454.142833]  cache_flush+0x24/0x60 [sunrpc]
>>> [14454.142845]  write_flush.isra.19+0xbe/0x100 [sunrpc]
>>> [14454.142849]  proc_reg_write+0x90/0xd0
>>> [14454.142853]  vfs_write+0xc2/0x1c0
>>> [14454.142856]  ksys_write+0x4d/0xd0
>>> [14454.142860]  do_syscall_64+0x60/0x250
>>> [14454.142865]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [14454.142867]
>>> [14454.143095] The buggy address belongs to the object at
>>> ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
>>> [14454.144842] The buggy address is located 32 bytes inside
>>> of  256-byte region [ffff8883d484d540, ffff8883d484d640)
>>> [14454.146463] The buggy address belongs to the page:
>>> [14454.147155] page:ffffea000f521300 count:1 mapcount:0
>>> mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map
>>> count: 0
>>> [14454.148712] flags: 0x17fffc00010200(slab|head)
>>> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00
>>> 0000000200000002 ffff888107c02e00
>>> [14454.150453] raw: ffff8883d484da40 0000000080190001
>>> 00000001ffffffff 0000000000000000
>>> [14454.151557] page dumped because: kasan: bad access detected
>>> [14454.152364]
>>> [14454.152606] Memory state around the buggy address:
>>> [14454.153300]  ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb
>>> fb fb fb fb fb
>>> [14454.154319]  ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb
>>> fb fb fb fb fb
>>> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb
>>> fb fb fb fb fb
>>> [14454.156334]                                                     
>>>   ^
>>> [14454.157237]  ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb
>>> fb fb fb fb fb
>>> [14454.158262]  ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc
>>> fc fc fc fc fc
>>> [14454.159282]
>>> ==================================================================
>>> [14454.160224] Disabling lock debugging due to kernel taint
>>> 
>>> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when
>>> cache_purge())
>>> Cc: stable@vger.kernel.org #v4.11+
>>> Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
>> 
>> Mechanically this looks OK, but I would feel more comfortable
>> if a domain expert could review this. Neil, Trond, Bruce?
>> 
>> 
>>> ---
>>> net/sunrpc/cache.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>> index bd843a81afa0..3e523eefc47f 100644
>>> --- a/net/sunrpc/cache.c
>>> +++ b/net/sunrpc/cache.c
>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>> 	struct hlist_node *tmp = NULL;
>>> 	int i = 0;
>>> 
>>> +	spin_lock(&cache_list_lock);
>>> 	spin_lock(&detail->hash_lock);
>>> 	if (!detail->entries) {
>>> 		spin_unlock(&detail->hash_lock);
>>> +		spin_unlock(&cache_list_lock);
>>> 		return;
>>> 	}
>>> 
>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>> 		}
>>> 	}
>>> 	spin_unlock(&detail->hash_lock);
>>> +	spin_unlock(&cache_list_lock);
>>> }
>>> EXPORT_SYMBOL_GPL(cache_purge);
> 
> 
> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
> sunrpc_end_cache_remove_entry()? The latter does call both
> cache_revisit_request() and cache_put(), and while they do not
> explicitly call anything that holds cache_list_lock, some of those cd-
>> cache_put callbacks do look as if there is potential for deadlock.

I see svc_export_put calling dput, eventually, which might_sleep().


--
Chuck Lever
NeilBrown March 24, 2020, 11:07 p.m. UTC | #8
On Tue, Mar 24 2020, Yihao Wu wrote:

> cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
> can be cache_put twice, which leads to a use-after-free bug.
>
> To reproduce, run ltp. It happens rarely.  /opt/ltp/runltp run -f net.nfs
>
> [14454.137661] ==================================================================
> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
> [14454.140746]
> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> [14454.141002] Call Trace:
> [14454.141014]  dump_stack+0xaf/0xfb[14454.141027]  print_address_description+0x6a/0x2a0
> [14454.141037]  kasan_report+0x166/0x2b0[14454.141057]  ? cache_purge+0xce/0x160 [sunrpc]
> [14454.141079]  cache_purge+0xce/0x160 [sunrpc]
> [14454.141099]  nfsd_last_thread+0x267/0x270 [nfsd][14454.141109]  ? nfsd_last_thread+0x5/0x270 [nfsd]
> [14454.141130]  nfsd_destroy+0xcb/0x180 [nfsd]
> [14454.141140]  ? nfsd_destroy+0x5/0x180 [nfsd]
> [14454.141153]  nfsd+0x1e4/0x2b0 [nfsd]
> [14454.141163]  ? nfsd+0x5/0x2b0 [nfsd]
> [14454.141173]  kthread+0x114/0x150
> [14454.141183]  ? nfsd_destroy+0x180/0x180 [nfsd]
> [14454.141187]  ? kthread_park+0xb0/0xb0
> [14454.141197]  ret_from_fork+0x3a/0x50
> [14454.141224]
> [14454.141475] Allocated by task 20918:
> [14454.142011]  kmem_cache_alloc_trace+0x9f/0x2e0
> [14454.142027]  sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> [14454.142037]  svc_export_parse+0x1e7/0x930 [nfsd]
> [14454.142051]  cache_do_downcall+0x5a/0x80 [sunrpc]
> [14454.142064]  cache_downcall+0x78/0x180 [sunrpc]
> [14454.142078]  cache_write_procfs+0x57/0x80 [sunrpc]
> [14454.142083]  proc_reg_write+0x90/0xd0
> [14454.142088]  vfs_write+0xc2/0x1c0
> [14454.142092]  ksys_write+0x4d/0xd0
> [14454.142098]  do_syscall_64+0x60/0x250
> [14454.142103]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142106]
> [14454.142344] Freed by task 19165:
> [14454.142804]  kfree+0x114/0x300
> [14454.142819]  cache_clean+0x2a4/0x2e0 [sunrpc]
> [14454.142833]  cache_flush+0x24/0x60 [sunrpc]
> [14454.142845]  write_flush.isra.19+0xbe/0x100 [sunrpc]
> [14454.142849]  proc_reg_write+0x90/0xd0
> [14454.142853]  vfs_write+0xc2/0x1c0
> [14454.142856]  ksys_write+0x4d/0xd0
> [14454.142860]  do_syscall_64+0x60/0x250
> [14454.142865]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142867]
> [14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> [14454.144842] The buggy address is located 32 bytes inside of  256-byte region [ffff8883d484d540, ffff8883d484d640)
> [14454.146463] The buggy address belongs to the page:
> [14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
> [14454.148712] flags: 0x17fffc00010200(slab|head)
> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
> [14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
> [14454.151557] page dumped because: kasan: bad access detected
> [14454.152364]
> [14454.152606] Memory state around the buggy address:
> [14454.153300]  ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.154319]  ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> [14454.156334]                                                        ^
> [14454.157237]  ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.158262]  ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [14454.159282] ==================================================================
> [14454.160224] Disabling lock debugging due to kernel taint
>
> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
> Cc: stable@vger.kernel.org #v4.11+
> Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
> ---
>  net/sunrpc/cache.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index bd843a81afa0..3e523eefc47f 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>  	struct hlist_node *tmp = NULL;
>  	int i = 0;
>  
> +	spin_lock(&cache_list_lock);
>  	spin_lock(&detail->hash_lock);
>  	if (!detail->entries) {
>  		spin_unlock(&detail->hash_lock);
> +		spin_unlock(&cache_list_lock);
>  		return;
>  	}
>  
> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>  		}
>  	}
>  	spin_unlock(&detail->hash_lock);
> +	spin_unlock(&cache_list_lock);
>  }
>  EXPORT_SYMBOL_GPL(cache_purge);
>  
> -- 
> 2.20.1.2432.ga663e714

I wonder if this is the best solution.
This code:

		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
			sunrpc_begin_cache_remove_entry(ch, detail);
			spin_unlock(&detail->hash_lock);
			sunrpc_end_cache_remove_entry(ch, detail);
			spin_lock(&detail->hash_lock);
		}

Looks wrong.
Dropping a lock while walking a list if only safe if you hold a
reference to the place-holder - 'tmp' in this case.  but we don't.
As this is trying to remove everything in the list it would be safer to
do something like

 while (!hlist_empty(head)) {
 	ch = hlist_entry(head->first, struct cache_head, h);
	sunrpc_begin_cache_remove_entry(ch, detail);
	spin_unlock(&detail->hash_lock);
	sunrpc_end_cache_remove_entry(ch, detail);
	spin_lock(&detail->hash_lock);
 }

I'm guessing that would fix the problem in a more focused.
But I'm not 100% sure because there is no analysis given of the cause.
What line is
  cache_purge+0xce/0x160
./scripts/faddr2line can tell you.
I suspect it is the hlist_for_each_entry_safe() line.

Can you test the change I suggest and see if it helps?

Thanks,
NeilBrown
Sasha Levin March 25, 2020, 1:01 a.m. UTC | #9
Hi

[This is an automated email]

The bot has tested the following trees: v5.5.11, v5.4.27, v4.19.112, v4.14.174, v4.9.217, v4.4.217.

v5.5.11: Build OK!
v5.4.27: Build OK!
v4.19.112: Failed to apply! Possible dependencies:
    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")

v4.14.174: Failed to apply! Possible dependencies:
    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")

v4.9.217: Failed to apply! Possible dependencies:
    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
    2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
    471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")

v4.4.217: Failed to apply! Possible dependencies:
    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
    2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
    471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
    d8d29138b17c ("sunrpc: remove 'inuse' flag from struct cache_detail.")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?
Yihao Wu March 25, 2020, 6:37 a.m. UTC | #10
On 2020/3/25 1:46 AM, Chuck Lever wrote:
>>>> ---
>>>> net/sunrpc/cache.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>>> index bd843a81afa0..3e523eefc47f 100644
>>>> --- a/net/sunrpc/cache.c
>>>> +++ b/net/sunrpc/cache.c
>>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>>> 	struct hlist_node *tmp = NULL;
>>>> 	int i = 0;
>>>>
>>>> +	spin_lock(&cache_list_lock);
>>>> 	spin_lock(&detail->hash_lock);
>>>> 	if (!detail->entries) {
>>>> 		spin_unlock(&detail->hash_lock);
>>>> +		spin_unlock(&cache_list_lock);
>>>> 		return;
>>>> 	}
>>>>
>>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>>> 		}
>>>> 	}
>>>> 	spin_unlock(&detail->hash_lock);
>>>> +	spin_unlock(&cache_list_lock);
>>>> }
>>>> EXPORT_SYMBOL_GPL(cache_purge);
>>
>> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
>> sunrpc_end_cache_remove_entry()? The latter does call both
>> cache_revisit_request() and cache_put(), and while they do not
>> explicitly call anything that holds cache_list_lock, some of those cd-
>>> cache_put callbacks do look as if there is potential for deadlock.
> I see svc_export_put calling dput, eventually, which might_sleep().

Wow that's a little strange. If svc_export_put->dput might_sleep, why can we
spin_lock(&detail->hash_lock); in cache_purge in the first place?

And I agree with Trond those cd->cache_put callbacks are dangerous. I will look
into them today.

But if we dropping cache_list_lock when we call sunrpc_end_cache_remove_entry,
cache_put is not protected, and this patch won't work anymore, right?

Thanks,
Yihao Wu
Chuck Lever March 25, 2020, 2:13 p.m. UTC | #11
> On Mar 24, 2020, at 9:01 PM, Sasha Levin <sashal@kernel.org> wrote:
> 
> Hi
> 
> [This is an automated email]
> 
> The bot has tested the following trees: v5.5.11, v5.4.27, v4.19.112, v4.14.174, v4.9.217, v4.4.217.
> 
> v5.5.11: Build OK!
> v5.4.27: Build OK!
> v4.19.112: Failed to apply! Possible dependencies:
>    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
> 
> v4.14.174: Failed to apply! Possible dependencies:
>    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
> 
> v4.9.217: Failed to apply! Possible dependencies:
>    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
>    2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
>    471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
> 
> v4.4.217: Failed to apply! Possible dependencies:
>    1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
>    2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
>    471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
>    d8d29138b17c ("sunrpc: remove 'inuse' flag from struct cache_detail.")
> 
> 
> NOTE: The patch will not be queued to stable trees until it is upstream.
> 
> How should we proceed with this patch?

Please drop it. Thanks!


--
Chuck Lever
Chuck Lever March 25, 2020, 2:24 p.m. UTC | #12
> On Mar 25, 2020, at 2:37 AM, Yihao Wu <wuyihao@linux.alibaba.com> wrote:
> 
> On 2020/3/25 1:46 AM, Chuck Lever wrote:
>>>>> ---
>>>>> net/sunrpc/cache.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>> 
>>>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>>>> index bd843a81afa0..3e523eefc47f 100644
>>>>> --- a/net/sunrpc/cache.c
>>>>> +++ b/net/sunrpc/cache.c
>>>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>>>> 	struct hlist_node *tmp = NULL;
>>>>> 	int i = 0;
>>>>> 
>>>>> +	spin_lock(&cache_list_lock);
>>>>> 	spin_lock(&detail->hash_lock);
>>>>> 	if (!detail->entries) {
>>>>> 		spin_unlock(&detail->hash_lock);
>>>>> +		spin_unlock(&cache_list_lock);
>>>>> 		return;
>>>>> 	}
>>>>> 
>>>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>>>> 		}
>>>>> 	}
>>>>> 	spin_unlock(&detail->hash_lock);
>>>>> +	spin_unlock(&cache_list_lock);
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(cache_purge);
>>> 
>>> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
>>> sunrpc_end_cache_remove_entry()? The latter does call both
>>> cache_revisit_request() and cache_put(), and while they do not
>>> explicitly call anything that holds cache_list_lock, some of those cd-
>>>> cache_put callbacks do look as if there is potential for deadlock.
>> I see svc_export_put calling dput, eventually, which might_sleep().
> 
> Wow that's a little strange. If svc_export_put->dput might_sleep, why can we
> spin_lock(&detail->hash_lock); in cache_purge in the first place?
> 
> And I agree with Trond those cd->cache_put callbacks are dangerous. I will look
> into them today.
> 
> But if we dropping cache_list_lock when we call sunrpc_end_cache_remove_entry,
> cache_put is not protected, and this patch won't work anymore, right?

IMHO Neil's proposed solution seems pretty safe, and follows a well-understood
pattern.

It would be nice (but not 100% necessary) if the race you found was spelled out
in the patch description.

Thanks!


--
Chuck Lever
Yihao Wu March 25, 2020, 3:14 p.m. UTC | #13
On 2020/3/25 10:24 PM, Chuck Lever wrote:
> 
> 
>> On Mar 25, 2020, at 2:37 AM, Yihao Wu <wuyihao@linux.alibaba.com> wrote:
>>
>> On 2020/3/25 1:46 AM, Chuck Lever wrote:
>>>>>> ---
>>>>>> net/sunrpc/cache.c | 3 +++
>>>>>> 1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>>>>> index bd843a81afa0..3e523eefc47f 100644
>>>>>> --- a/net/sunrpc/cache.c
>>>>>> +++ b/net/sunrpc/cache.c
>>>>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>>>>> 	struct hlist_node *tmp = NULL;
>>>>>> 	int i = 0;
>>>>>>
>>>>>> +	spin_lock(&cache_list_lock);
>>>>>> 	spin_lock(&detail->hash_lock);
>>>>>> 	if (!detail->entries) {
>>>>>> 		spin_unlock(&detail->hash_lock);
>>>>>> +		spin_unlock(&cache_list_lock);
>>>>>> 		return;
>>>>>> 	}
>>>>>>
>>>>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>>>>> 		}
>>>>>> 	}
>>>>>> 	spin_unlock(&detail->hash_lock);
>>>>>> +	spin_unlock(&cache_list_lock);
>>>>>> }
>>>>>> EXPORT_SYMBOL_GPL(cache_purge);
>>>>
>>>> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
>>>> sunrpc_end_cache_remove_entry()? The latter does call both
>>>> cache_revisit_request() and cache_put(), and while they do not
>>>> explicitly call anything that holds cache_list_lock, some of those cd-
>>>>> cache_put callbacks do look as if there is potential for deadlock.
>>> I see svc_export_put calling dput, eventually, which might_sleep().
>>
>> Wow that's a little strange. If svc_export_put->dput might_sleep, why can we
>> spin_lock(&detail->hash_lock); in cache_purge in the first place?
>>
>> And I agree with Trond those cd->cache_put callbacks are dangerous. I will look
>> into them today.
>>
>> But if we dropping cache_list_lock when we call sunrpc_end_cache_remove_entry,
>> cache_put is not protected, and this patch won't work anymore, right?
> 
> IMHO Neil's proposed solution seems pretty safe, and follows a well-understood
> pattern.
> 
> It would be nice (but not 100% necessary) if the race you found was spelled out
> in the patch description.
> 
> Thanks!
> 
> 
> --
> Chuck Lever
> 
> 

Yeah. I believe Neil's solution must be better. 
But I'm still studying it, so I didn't reply to him yet.
OK. I'll try make it more clearly in the next version patch.

Thanks,
Yihao Wu
Yihao Wu April 4, 2020, 1:53 p.m. UTC | #14
On 2020/3/25 7:07 AM, NeilBrown wrote:
>>  
>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>  		}
>>  	}
>>  	spin_unlock(&detail->hash_lock);
>> +	spin_unlock(&cache_list_lock);
>>  }
>>  EXPORT_SYMBOL_GPL(cache_purge);
>>  
>> -- 
>> 2.20.1.2432.ga663e714
> I wonder if this is the best solution.
> This code:
> 
> 		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
> 			sunrpc_begin_cache_remove_entry(ch, detail);
> 			spin_unlock(&detail->hash_lock);
> 			sunrpc_end_cache_remove_entry(ch, detail);
> 			spin_lock(&detail->hash_lock);
> 		}
> 
> Looks wrong.
> Dropping a lock while walking a list if only safe if you hold a
> reference to the place-holder - 'tmp' in this case.  but we don't.
> As this is trying to remove everything in the list it would be safer to
> do something like
> 
>  while (!hlist_empty(head)) {
>  	ch = hlist_entry(head->first, struct cache_head, h);
> 	sunrpc_begin_cache_remove_entry(ch, detail);
> 	spin_unlock(&detail->hash_lock);
> 	sunrpc_end_cache_remove_entry(ch, detail);
> 	spin_lock(&detail->hash_lock);
>  }
> 
> I'm guessing that would fix the problem in a more focused.
> But I'm not 100% sure because there is no analysis given of the cause.
> What line is
>   cache_purge+0xce/0x160
> ./scripts/faddr2line can tell you.
> I suspect it is the hlist_for_each_entry_safe() line.
> 
> Can you test the change I suggest and see if it helps?
> 
> Thanks,
> NeilBrown


Sorry for the late. It took me some time to reproduce the bug stably so I
can verify the correctness of the fix.

You definitely pointed out the root cause. And the solution is more elegant.
After applying your solution. The bug doesn't reproduce now.

There's no race condition. hash_lock is designed to protect cache_detail in
fine grain. And it already did its job. And yes, hlist_for_each_entry_safe
is where the bug at. It may walk to a deleted entry(tmp). My v1 solution is
a regression considering this.

So I will modify the patch title in v2 too.

BTW, I checked faddr2line output, it says cache_purge+0xce/0x160 is cache_put.
It make sense too, and doesn't go against your theory.

Here's my reproduce script:

	systemctl enable rpcbind
	systemctl enable nfs
	systemctl start rpcbind
	systemctl start nfs
	mkdir /tmp/x /tmp/y
	
	# Create some collision in hash table
	for i in `seq 256`; do
		mkdir /tmp/x/$i
		mkdir /tmp/y/$i
		exportfs localhost:/tmp/x/$i
	done
	for i in `seq 256`; do
		mount localhost:/tmp/x/$i /tmp/y/$i
	done
	
	END=$(cat /proc/self/net/rpc/nfsd.export/flush)
	NOW=$(date +%s)
	sleep $((END - NOW))

	# Trigger cache_purge
	systemctl stop nfs &
	usleep 20000
	# Trigger cache_clean
	echo > /proc/self/net/rpc/nfsd.export/flush

To speedup the reproducing process, I also added mdelay(500) between acquiring
and releasing hash_lock in cache_purge.

Thank you so much!
Yihao Wu
diff mbox series

Patch

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index bd843a81afa0..3e523eefc47f 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -524,9 +524,11 @@  void cache_purge(struct cache_detail *detail)
 	struct hlist_node *tmp = NULL;
 	int i = 0;
 
+	spin_lock(&cache_list_lock);
 	spin_lock(&detail->hash_lock);
 	if (!detail->entries) {
 		spin_unlock(&detail->hash_lock);
+		spin_unlock(&cache_list_lock);
 		return;
 	}
 
@@ -541,6 +543,7 @@  void cache_purge(struct cache_detail *detail)
 		}
 	}
 	spin_unlock(&detail->hash_lock);
+	spin_unlock(&cache_list_lock);
 }
 EXPORT_SYMBOL_GPL(cache_purge);