diff mbox series

[v4] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim

Message ID 20200618171941.9475-1-longman@redhat.com (mailing list archive)
State Superseded
Headers show
Series [v4] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim | expand

Commit Message

Waiman Long June 18, 2020, 5:19 p.m. UTC
Depending on the workloads, the following circular locking dependency
warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
lock) may show up:

======================================================
WARNING: possible circular locking dependency detected
5.0.0-rc1+ #60 Tainted: G        W
------------------------------------------------------
fsfreeze/4346 is trying to acquire lock:
0000000026f1d784 (fs_reclaim){+.+.}, at:
fs_reclaim_acquire.part.19+0x5/0x30

but task is already holding lock:
0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650

which lock already depends on the new lock.
  :
 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sb_internal);
                               lock(fs_reclaim);
                               lock(sb_internal);
  lock(fs_reclaim);

 *** DEADLOCK ***

4 locks held by fsfreeze/4346:
 #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
 #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
 #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
 #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650

stack backtrace:
Call Trace:
 dump_stack+0xe0/0x19a
 print_circular_bug.isra.10.cold.34+0x2f4/0x435
 check_prev_add.constprop.19+0xca1/0x15f0
 validate_chain.isra.14+0x11af/0x3b50
 __lock_acquire+0x728/0x1200
 lock_acquire+0x269/0x5a0
 fs_reclaim_acquire.part.19+0x29/0x30
 fs_reclaim_acquire+0x19/0x20
 kmem_cache_alloc+0x3e/0x3f0
 kmem_zone_alloc+0x79/0x150
 xfs_trans_alloc+0xfa/0x9d0
 xfs_sync_sb+0x86/0x170
 xfs_log_sbcount+0x10f/0x140
 xfs_quiesce_attr+0x134/0x270
 xfs_fs_freeze+0x4a/0x70
 freeze_super+0x1af/0x290
 do_vfs_ioctl+0xedc/0x16c0
 ksys_ioctl+0x41/0x80
 __x64_sys_ioctl+0x73/0xa9
 do_syscall_64+0x18f/0xd23
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is a false positive as all the dirty pages are flushed out before
the filesystem can be frozen.

One way to avoid this splat is to add GFP_NOFS to the affected allocation
calls. This is what PF_MEMALLOC_NOFS per-process flag is for. This does
reduce the potential source of memory where reclaim can be done. This
shouldn't matter unless the system is really running out of memory.
In that particular case, the filesystem freeze operation may fail while
it was succeeding previously.

Without this patch, the command sequence below will show that the lock
dependency chain sb_internal -> fs_reclaim exists.

 # fsfreeze -f /home
 # fsfreeze --unfreeze /home
 # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal

After applying the patch, such sb_internal -> fs_reclaim lock dependency
chain can no longer be found. Because of that, the locking dependency
warning will not be shown.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/xfs/xfs_super.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

Dave Chinner June 18, 2020, 10:58 p.m. UTC | #1
On Thu, Jun 18, 2020 at 01:19:41PM -0400, Waiman Long wrote:
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 379cbff438bc..1b94b9bfa4d7 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -913,11 +913,33 @@ xfs_fs_freeze(
>  	struct super_block	*sb)
>  {
>  	struct xfs_mount	*mp = XFS_M(sb);
> +	unsigned long		pflags;
> +	int			ret;
>  
> +	/*
> +	 * A fs_reclaim pseudo lock is added to check for potential deadlock
> +	 * condition with fs reclaim. The following lockdep splat was hit
> +	 * occasionally. This is actually a false positive as the allocation
> +	 * is being done only after the frozen filesystem is no longer dirty.
> +	 * One way to avoid this splat is to add GFP_NOFS to the affected
> +	 * allocation calls. This is what PF_MEMALLOC_NOFS is for.
> +	 *
> +	 *       CPU0                    CPU1
> +	 *       ----                    ----
> +	 *  lock(sb_internal);
> +	 *                               lock(fs_reclaim);
> +	 *                               lock(sb_internal);
> +	 *  lock(fs_reclaim);
> +	 *
> +	 *  *** DEADLOCK ***
> +	 */

The lockdep splat is detailed in the commit message - it most
definitely does not need to be repeated in full here because:

	a) it doesn't explain why the splat occurring is, and
	b) we most definitely don't care about how the lockdep check
	   that triggered it is implemented.

IOWs, the comment here needs to explain how the freeze state held at
this point requires that we avoid reclaim recursion back into the
filesystem, regardless of how lockdep detects it or whether the
lockdep splats are a false positive or not...

e.g.

/*
 * The superblock is now in the frozen state, which means we cannot
 * allow memory allocation to recurse into reclaim on this
 * filesystem as this may require running operations that the
 * current freeze state prevents. This should not occur if
 * everything is working correctly and sometimes lockdep may report
 * false positives in this path. However, to be safe and to avoid
 * unnecessary false positives in test/CI environments, put the
 * entire final freeze processing path under GFP_NOFS allocation
 * contexts to prevent reclaim recursion from occurring anywhere in
 * the path.
 */

Cheers,

Dave.

> +	current_set_flags_nested(&pflags, PF_MEMALLOC_NOFS);
>  	xfs_stop_block_reaping(mp);
>  	xfs_save_resvblks(mp);
>  	xfs_quiesce_attr(mp);
> -	return xfs_sync_sb(mp, true);
> +	ret = xfs_sync_sb(mp, true);
> +	current_restore_flags_nested(&pflags, PF_MEMALLOC_NOFS);
> +	return ret;
>  }
>  
>  STATIC int
> -- 
> 2.18.1
> 
>
Dave Chinner June 18, 2020, 11:04 p.m. UTC | #2
On Fri, Jun 19, 2020 at 08:58:10AM +1000, Dave Chinner wrote:
> On Thu, Jun 18, 2020 at 01:19:41PM -0400, Waiman Long wrote:
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 379cbff438bc..1b94b9bfa4d7 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -913,11 +913,33 @@ xfs_fs_freeze(
> >  	struct super_block	*sb)
> >  {
> >  	struct xfs_mount	*mp = XFS_M(sb);
> > +	unsigned long		pflags;
> > +	int			ret;
> >  
> > +	/*
> > +	 * A fs_reclaim pseudo lock is added to check for potential deadlock
> > +	 * condition with fs reclaim. The following lockdep splat was hit
> > +	 * occasionally. This is actually a false positive as the allocation
> > +	 * is being done only after the frozen filesystem is no longer dirty.
> > +	 * One way to avoid this splat is to add GFP_NOFS to the affected
> > +	 * allocation calls. This is what PF_MEMALLOC_NOFS is for.
> > +	 *
> > +	 *       CPU0                    CPU1
> > +	 *       ----                    ----
> > +	 *  lock(sb_internal);
> > +	 *                               lock(fs_reclaim);
> > +	 *                               lock(sb_internal);
> > +	 *  lock(fs_reclaim);
> > +	 *
> > +	 *  *** DEADLOCK ***
> > +	 */
> 
> The lockdep splat is detailed in the commit message - it most
> definitely does not need to be repeated in full here because:
> 
> 	a) it doesn't explain why the splat occurring is, and
> 	b) we most definitely don't care about how the lockdep check
> 	   that triggered it is implemented.

I should have added this:

	c) a lot of people don't understand what lockdep reports
	   are telling them is a problem.

I get a lot of questions like "I saw this lockdep thing, but I can't
work out what it actually means, so can you have a look at it
Dave?". Hence I think directly quoting something people tend not to
understand to explain the problem they didn't understand isn't the
best approach to improving understanding of the problem...

Cheers,

Dave.
Waiman Long June 22, 2020, 5:56 p.m. UTC | #3
On 6/18/20 7:04 PM, Dave Chinner wrote:
> On Fri, Jun 19, 2020 at 08:58:10AM +1000, Dave Chinner wrote:
>> On Thu, Jun 18, 2020 at 01:19:41PM -0400, Waiman Long wrote:
>>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>>> index 379cbff438bc..1b94b9bfa4d7 100644
>>> --- a/fs/xfs/xfs_super.c
>>> +++ b/fs/xfs/xfs_super.c
>>> @@ -913,11 +913,33 @@ xfs_fs_freeze(
>>>   	struct super_block	*sb)
>>>   {
>>>   	struct xfs_mount	*mp = XFS_M(sb);
>>> +	unsigned long		pflags;
>>> +	int			ret;
>>>   
>>> +	/*
>>> +	 * A fs_reclaim pseudo lock is added to check for potential deadlock
>>> +	 * condition with fs reclaim. The following lockdep splat was hit
>>> +	 * occasionally. This is actually a false positive as the allocation
>>> +	 * is being done only after the frozen filesystem is no longer dirty.
>>> +	 * One way to avoid this splat is to add GFP_NOFS to the affected
>>> +	 * allocation calls. This is what PF_MEMALLOC_NOFS is for.
>>> +	 *
>>> +	 *       CPU0                    CPU1
>>> +	 *       ----                    ----
>>> +	 *  lock(sb_internal);
>>> +	 *                               lock(fs_reclaim);
>>> +	 *                               lock(sb_internal);
>>> +	 *  lock(fs_reclaim);
>>> +	 *
>>> +	 *  *** DEADLOCK ***
>>> +	 */
>> The lockdep splat is detailed in the commit message - it most
>> definitely does not need to be repeated in full here because:
>>
>> 	a) it doesn't explain why the splat occurring is, and
>> 	b) we most definitely don't care about how the lockdep check
>> 	   that triggered it is implemented.
> I should have added this:
>
> 	c) a lot of people don't understand what lockdep reports
> 	   are telling them is a problem.
>
> I get a lot of questions like "I saw this lockdep thing, but I can't
> work out what it actually means, so can you have a look at it
> Dave?". Hence I think directly quoting something people tend not to
> understand to explain the problem they didn't understand isn't the
> best approach to improving understanding of the problem...

OK, how about simplifying the comment to as follows:

        /*
          * Disable fs reclaim in memory allocation for fs freeze to avoid
          * causing a possible circular locking dependency lockdep splat
          * involving fs reclaim.
          */

Does that look good enough for you?

Cheers,
Longman
diff mbox series

Patch

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 379cbff438bc..1b94b9bfa4d7 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -913,11 +913,33 @@  xfs_fs_freeze(
 	struct super_block	*sb)
 {
 	struct xfs_mount	*mp = XFS_M(sb);
+	unsigned long		pflags;
+	int			ret;
 
+	/*
+	 * A fs_reclaim pseudo lock is added to check for potential deadlock
+	 * condition with fs reclaim. The following lockdep splat was hit
+	 * occasionally. This is actually a false positive as the allocation
+	 * is being done only after the frozen filesystem is no longer dirty.
+	 * One way to avoid this splat is to add GFP_NOFS to the affected
+	 * allocation calls. This is what PF_MEMALLOC_NOFS is for.
+	 *
+	 *       CPU0                    CPU1
+	 *       ----                    ----
+	 *  lock(sb_internal);
+	 *                               lock(fs_reclaim);
+	 *                               lock(sb_internal);
+	 *  lock(fs_reclaim);
+	 *
+	 *  *** DEADLOCK ***
+	 */
+	current_set_flags_nested(&pflags, PF_MEMALLOC_NOFS);
 	xfs_stop_block_reaping(mp);
 	xfs_save_resvblks(mp);
 	xfs_quiesce_attr(mp);
-	return xfs_sync_sb(mp, true);
+	ret = xfs_sync_sb(mp, true);
+	current_restore_flags_nested(&pflags, PF_MEMALLOC_NOFS);
+	return ret;
 }
 
 STATIC int