diff mbox series

ext2, possible circular locking dependency detected

Message ID 4946.1582339996@jrobl (mailing list archive)
State New, archived
Headers show
Series ext2, possible circular locking dependency detected | expand

Commit Message

J. R. Okajima Feb. 22, 2020, 2:53 a.m. UTC
Hello ext2 maintainers,

During my local fs stress test, I've encounter this.
Is it false positive?
Otherwise, I've made a small patch to stop reclaming recursively into FS
from ext2_xattr_set().  Please consider taking this.

Once I've considered about whether it should be done in VFS layer or
not.  I mean, every i_op->brabra() calls in VFS should be surrounded by
memalloc_nofs_{save,restore}(), by a macro or something.  But I am
afraid it may introduce unnecesary overheads, especially when FS code
doesn't allocate memory.  So it is better to do it in real FS
operations.


J. R. Okajima

----------------------------------------
WARNING: possible circular locking dependency detected
5.6.0-rc2aufsD+ #165 Tainted: G        W
------------------------------------------------------
kswapd0/94 is trying to acquire lock:
ffff91f670bd7610 (sb_internal#2){.+.+}, at: ext2_evict_inode+0x7e/0x130

but task is already holding lock:
ffffffff8ca901e0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (fs_reclaim){+.+.}:
       fs_reclaim_acquire.part.98+0x29/0x30
       __kmalloc+0x44/0x320
       ext2_xattr_set+0xe7/0x880
       __vfs_setxattr+0x66/0x80
       __vfs_setxattr_noperm+0x67/0x1a0
       vfs_setxattr+0x81/0xa0
       setxattr+0x13b/0x1c0
       path_setxattr+0xbe/0xe0
       __x64_sys_setxattr+0x27/0x30
       do_syscall_64+0x54/0x1f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #1 (&ei->xattr_sem#2){++++}:
       down_write+0x3d/0x70
       ext2_xattr_delete_inode+0x26/0x200
       ext2_evict_inode+0xc2/0x130
       evict+0xd0/0x1a0
       vfs_rmdir+0x15c/0x180
       do_rmdir+0x1c6/0x220
       do_syscall_64+0x54/0x1f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (sb_internal#2){.+.+}:
       __lock_acquire+0xd30/0x1540
       lock_acquire+0x90/0x170
       __sb_start_write+0x135/0x220
       ext2_evict_inode+0x7e/0x130
       evict+0xd0/0x1a0
       __dentry_kill+0xdc/0x180
       shrink_dentry_list+0xdd/0x200
       prune_dcache_sb+0x52/0x70
       super_cache_scan+0xf3/0x1a0
       do_shrink_slab+0x143/0x3a0
       shrink_slab+0x22c/0x2c0
       shrink_node+0x16c/0x670
       balance_pgdat+0x2cc/0x530
       kswapd+0xad/0x470
       kthread+0x11d/0x140
       ret_from_fork+0x24/0x50

other info that might help us debug this:

Chain exists of:
  sb_internal#2 --> &ei->xattr_sem#2 --> fs_reclaim

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(&ei->xattr_sem#2);
                               lock(fs_reclaim);
  lock(sb_internal#2);

 *** DEADLOCK ***

3 locks held by kswapd0/94:
 #0: ffffffff8ca901e0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 #1: ffffffff8ca81bc8 (shrinker_rwsem){++++}, at: shrink_slab+0x135/0x2c0
 #2: ffff91f670bd70d8 (&type->s_umount_key#45){++++}, at: trylock_super+0x16/0x50

stack backtrace:
CPU: 4 PID: 94 Comm: kswapd0 Tainted: G        W         5.6.0-rc2aufsD+ #165
Hardware name: System manufacturer System Product Name/ROG STRIX H370-I GAMING, BIOS 2418 06/04/2019
Call Trace:
 dump_stack+0x71/0xa0
 check_noncircular+0x172/0x190
 __lock_acquire+0xd30/0x1540
 lock_acquire+0x90/0x170
 ? ext2_evict_inode+0x7e/0x130
 __sb_start_write+0x135/0x220
 ? ext2_evict_inode+0x7e/0x130
 ? shrink_dentry_list+0x24/0x200
 ext2_evict_inode+0x7e/0x130
 evict+0xd0/0x1a0
 __dentry_kill+0xdc/0x180
 shrink_dentry_list+0xdd/0x200
 prune_dcache_sb+0x52/0x70
 super_cache_scan+0xf3/0x1a0
 do_shrink_slab+0x143/0x3a0
 shrink_slab+0x22c/0x2c0
 shrink_node+0x16c/0x670
 balance_pgdat+0x2cc/0x530
 kswapd+0xad/0x470
 ? finish_wait+0x80/0x80
 ? balance_pgdat+0x530/0x530
 kthread+0x11d/0x140
 ? kthread_park+0x80/0x80
 ret_from_fork+0x24/0x50
----------------------------------------
a small patch

Comments

Jan Kara Feb. 24, 2020, 9:08 a.m. UTC | #1
Hello!

On Sat 22-02-20 11:53:16, J. R. Okajima wrote:
> Hello ext2 maintainers,
> 
> During my local fs stress test, I've encounter this.
> Is it false positive?
> Otherwise, I've made a small patch to stop reclaming recursively into FS
> from ext2_xattr_set().  Please consider taking this.
> 
> Once I've considered about whether it should be done in VFS layer or
> not.  I mean, every i_op->brabra() calls in VFS should be surrounded by
> memalloc_nofs_{save,restore}(), by a macro or something.  But I am
> afraid it may introduce unnecesary overheads, especially when FS code
> doesn't allocate memory.  So it is better to do it in real FS
> operations.

Thanks for debugging this and for the patch. One comment below:

...

> @@ -532,7 +534,9 @@ ext2_xattr_set(struct inode *inode, int name_index, const char *name,
>  
>  			unlock_buffer(bh);
>  			ea_bdebug(bh, "cloning");
> +			nofs_flag = memalloc_nofs_save();
>  			header = kmemdup(HDR(bh), bh->b_size, GFP_KERNEL);
> +			memalloc_nofs_restore(nofs_flag);
>  			error = -ENOMEM;
>  			if (header == NULL)
>  				goto cleanup;
> @@ -545,7 +549,9 @@ ext2_xattr_set(struct inode *inode, int name_index, const char *name,
>  		}
>  	} else {
>  		/* Allocate a buffer where we construct the new block. */
> +		nofs_flag = memalloc_nofs_save();
>  		header = kzalloc(sb->s_blocksize, GFP_KERNEL);
> +		memalloc_nofs_restore(nofs_flag);
>  		error = -ENOMEM;
>  		if (header == NULL)
>  			goto cleanup;

This is not the right way how memalloc_nofs_save() should be used (you
could just use GFP_NOFS instead of GFP_KERNEL instead of wrapping the
allocation inside memalloc_nofs_save/restore()). The
memalloc_nofs_save/restore() API is created so that you can change the
allocation context at the place which mandates the new context - i.e., in
this case when acquiring / dropping xattr_sem. That way you don't have to
propagate the context information down to function calls and the code is
also future-proof - if you add new allocation, they will use correct
allocation context.

								Honza
J. R. Okajima Feb. 24, 2020, 10:02 a.m. UTC | #2
Jan Kara:
> This is not the right way how memalloc_nofs_save() should be used (you
> could just use GFP_NOFS instead of GFP_KERNEL instead of wrapping the
> allocation inside memalloc_nofs_save/restore()). The
> memalloc_nofs_save/restore() API is created so that you can change the
> allocation context at the place which mandates the new context - i.e., in
> this case when acquiring / dropping xattr_sem. That way you don't have to
> propagate the context information down to function calls and the code is
> also future-proof - if you add new allocation, they will use correct
> allocation context.

Thanks for the lecture about memalloc_nofs_save/restore().
Honestly speaking, I didn't know these APIs and I always use GFP_NOFS
flag. Investigating this lockdep warning, I read the comments in gfp.h.

 * %GFP_NOFS will use direct reclaim but will not use any filesystem interfaces.
 * Please try to avoid using this flag directly and instead use
 * memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't
 * recurse into the FS layer with a short explanation why. All allocation
 * requests will inherit GFP_NOFS implicitly.

Actually grep-ping the whole kernel source tree told me there are
several "one-liners" like ...nofs_save(); kmalloc(); ...nofs_restore
sequence.  But re-reading the comments and your mail, I understand these
APIs are for much wider region than such one-liner.

I don't think it a good idea that I send you another patch replaced by
GFP_NOFS.  You can fix it simply and you know much more than me about
this matter, and I will be satisfied when this problem is fixed by you.


J. R. Okajima
Jan Kara Feb. 24, 2020, 1:02 p.m. UTC | #3
On Mon 24-02-20 19:02:16, J. R. Okajima wrote:
> Jan Kara:
> > This is not the right way how memalloc_nofs_save() should be used (you
> > could just use GFP_NOFS instead of GFP_KERNEL instead of wrapping the
> > allocation inside memalloc_nofs_save/restore()). The
> > memalloc_nofs_save/restore() API is created so that you can change the
> > allocation context at the place which mandates the new context - i.e., in
> > this case when acquiring / dropping xattr_sem. That way you don't have to
> > propagate the context information down to function calls and the code is
> > also future-proof - if you add new allocation, they will use correct
> > allocation context.
> 
> Thanks for the lecture about memalloc_nofs_save/restore().
> Honestly speaking, I didn't know these APIs and I always use GFP_NOFS
> flag. Investigating this lockdep warning, I read the comments in gfp.h.
> 
>  * %GFP_NOFS will use direct reclaim but will not use any filesystem interfaces.
>  * Please try to avoid using this flag directly and instead use
>  * memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't
>  * recurse into the FS layer with a short explanation why. All allocation
>  * requests will inherit GFP_NOFS implicitly.
> 
> Actually grep-ping the whole kernel source tree told me there are
> several "one-liners" like ...nofs_save(); kmalloc(); ...nofs_restore
> sequence.  But re-reading the comments and your mail, I understand these
> APIs are for much wider region than such one-liner.
> 
> I don't think it a good idea that I send you another patch replaced by
> GFP_NOFS.  You can fix it simply and you know much more than me about
> this matter, and I will be satisfied when this problem is fixed by you.

OK, in the end I've decided to go with a different solution because I
realized the warning is a false positive one. The patch has passed a
fstests run but I'd be grateful if you could verify whether you can no longer
trigger the lockdep warning. Thanks!

								Honza

PS: I've posted the patch separately to the list.
J. R. Okajima Feb. 24, 2020, 3:11 p.m. UTC | #4
Jan Kara:
> OK, in the end I've decided to go with a different solution because I
> realized the warning is a false positive one. The patch has passed a
> fstests run but I'd be grateful if you could verify whether you can no longer
> trigger the lockdep warning. Thanks!

I will.
But it may take very long time, a month or two I am afraid.
If you don't receive any mail about this matter in next few months, then
it means everything is fine.

Thnak you
J. R. Okajima
diff mbox series

Patch

diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c
index 0456bc990b5e..85463fddbc17 100644
--- a/fs/ext2/xattr.c
+++ b/fs/ext2/xattr.c
@@ -61,6 +61,7 @@ 
 #include <linux/quotaops.h>
 #include <linux/rwsem.h>
 #include <linux/security.h>
+#include <linux/sched/mm.h>
 #include "ext2.h"
 #include "xattr.h"
 #include "acl.h"
@@ -413,6 +414,7 @@  ext2_xattr_set(struct inode *inode, int name_index, const char *name,
 	size_t name_len, free, min_offs = sb->s_blocksize;
 	int not_found = 1, error;
 	char *end;
+	unsigned int nofs_flag;
 	
 	/*
 	 * header -- Points either into bh, or to a temporarily
@@ -532,7 +534,9 @@  ext2_xattr_set(struct inode *inode, int name_index, const char *name,
 
 			unlock_buffer(bh);
 			ea_bdebug(bh, "cloning");
+			nofs_flag = memalloc_nofs_save();
 			header = kmemdup(HDR(bh), bh->b_size, GFP_KERNEL);
+			memalloc_nofs_restore(nofs_flag);
 			error = -ENOMEM;
 			if (header == NULL)
 				goto cleanup;
@@ -545,7 +549,9 @@  ext2_xattr_set(struct inode *inode, int name_index, const char *name,
 		}
 	} else {
 		/* Allocate a buffer where we construct the new block. */
+		nofs_flag = memalloc_nofs_save();
 		header = kzalloc(sb->s_blocksize, GFP_KERNEL);
+		memalloc_nofs_restore(nofs_flag);
 		error = -ENOMEM;
 		if (header == NULL)
 			goto cleanup;