diff mbox series

[v4,1/2] xfs: avoid double restore PF_MEMALLOC_NOFS if transaction reservation fails

Message ID 20200801154632.866356-2-laoar.shao@gmail.com (mailing list archive)
State Deferred, archived
Headers show
Series void xfs transaction reservation recursion | expand

Commit Message

Yafang Shao Aug. 1, 2020, 3:46 p.m. UTC
From: Yafang Shao <shaoyafang@didiglobal.com>

In xfs_trans_alloc(), if xfs_trans_reserve() fails, it will call
xfs_trans_cancel(), in which it will restore the flag PF_MEMALLOC_NOFS.
However this flags has been restored in xfs_trans_reserve(). Although
this behavior doesn't introduce any obvious issue, we'd better improve it.

Signed-off-by: Yafang Shao <shaoyafang@didiglobal.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
---
 fs/xfs/xfs_trans.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Dave Chinner Aug. 4, 2020, 11:20 p.m. UTC | #1
On Sat, Aug 01, 2020 at 11:46:31AM -0400, Yafang Shao wrote:
> From: Yafang Shao <shaoyafang@didiglobal.com>
> 
> In xfs_trans_alloc(), if xfs_trans_reserve() fails, it will call
> xfs_trans_cancel(), in which it will restore the flag PF_MEMALLOC_NOFS.
> However this flags has been restored in xfs_trans_reserve(). Although
> this behavior doesn't introduce any obvious issue, we'd better improve it.
.....

> 
> Signed-off-by: Yafang Shao <shaoyafang@didiglobal.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> ---
>  fs/xfs/xfs_trans.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 3c94e5ff4316..9ff41970d0c7 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -162,10 +162,9 @@ xfs_trans_reserve(
>  	 */
>  	if (blocks > 0) {
>  		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> -		if (error != 0) {
> -			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +		if (error != 0)
>  			return -ENOSPC;
> -		}
> +
>  		tp->t_blk_res += blocks;
>  	}
>  
> @@ -240,8 +239,6 @@ xfs_trans_reserve(
>  		tp->t_blk_res = 0;
>  	}
>  
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> -
>  	return error;
>  }

So you fix it here....

>  
> @@ -972,6 +969,7 @@ xfs_trans_roll(
>  	struct xfs_trans	**tpp)
>  {
>  	struct xfs_trans	*trans = *tpp;
> +	struct xfs_trans        *tp;
>  	struct xfs_trans_res	tres;
>  	int			error;
>  
> @@ -1005,5 +1003,10 @@ xfs_trans_roll(
>  	 * the prior and the next transactions.
>  	 */
>  	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> -	return xfs_trans_reserve(*tpp, &tres, 0, 0);
> +	tp = *tpp;
> +	error = xfs_trans_reserve(tp, &tres, 0, 0);
> +	if (error)
> +		current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +
> +	return error;
>  }

.... but then introduce the exact same issue here. i.e. when
xfs_trans_roll() fails, the caller will call xfs_trans_cancel() on
the returned transaction....

Realistically, I think this is kinda missing the overall intent of
rolling transactions. The issue here is that when we commit a
transaction, we unconditionally clear the PF_MEMALLOC_NOFS flag via
__xfs_trans_commit() just before freeing the transaction despite the
fact the long running rolling transaction has not yet completed.

This means that when we roll a transactions, we are bouncing the
NOFS state unnecessarily like so:

t0		t1		t2
alloc
 reserve
  NOFS
roll
		alloc
commit
clear NOFS
		reserve
		  NOFS
		roll
				alloc
		commit
		clear NOFS
				reserve
				  NOFS
				.....

right until the last commit of the sequence. IOWs, we are repeatedly
setting and clearing NOFS even though we actually only need to set
it at the t0 and clear it on the final commit or cancel.

Hence I think that __xfs_trans_commit() should probably only clear
NOFS on successful commit if @regrant is false (i.e. not called from
xfs_trans_roll()) and the setting of NOFS should be removed from
xfs_trans_reserve() and moved up into the initial xfs_trans_alloc()
before xfs_trans_reserve() is called.

This way the calls to either xfs_trans_commit() or
xfs_trans_cancel() will clear the NOFS state as they indicate that
we are exiting transaction context completely....

Also, please convert these to memalloc_nofs_save()/restore() calls
as that is the way we are supposed to mark these regions now.

Cheers,

Dave.
Matthew Wilcox (Oracle) Aug. 4, 2020, 11:50 p.m. UTC | #2
On Wed, Aug 05, 2020 at 09:20:05AM +1000, Dave Chinner wrote:
> Also, please convert these to memalloc_nofs_save()/restore() calls
> as that is the way we are supposed to mark these regions now.

I have a patch for that!


From 7830a7dc66f349e16ef34bd408b9962f4c4bcdba Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Fri, 27 Mar 2020 13:01:57 -0400
Subject: [PATCH 3/6] xfs: Convert to memalloc_nofs_save
To: linux-kernel@vger.kernel.org,
    linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org,
    dm-devel@redhat.com,
    Mikulas Patocka <mpatocka@redhat.com>,
    Jens Axboe <axboe@kernel.dk>,
    NeilBrown <neilb@suse.de>

Instead of using custom macros to set/restore PF_MEMALLOC_NOFS, use
memalloc_nofs_save() like the rest of the kernel.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/xfs/kmem.c      |  2 +-
 fs/xfs/xfs_aops.c  |  4 ++--
 fs/xfs/xfs_buf.c   |  2 +-
 fs/xfs/xfs_linux.h |  6 ------
 fs/xfs/xfs_trans.c | 14 +++++++-------
 fs/xfs/xfs_trans.h |  2 +-
 6 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index f1366475c389..c2d237159bfc 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -35,7 +35,7 @@ kmem_alloc(size_t size, xfs_km_flags_t flags)
  * __vmalloc() will allocate data pages and auxiliary structures (e.g.
  * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context here. Hence
  * we need to tell memory reclaim that we are in such a context via
- * PF_MEMALLOC_NOFS to prevent memory reclaim re-entering the filesystem here
+ * memalloc_nofs to prevent memory reclaim re-entering the filesystem here
  * and potentially deadlocking.
  */
 static void *
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index b35611882ff9..e3a4806e519d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -62,7 +62,7 @@ xfs_setfilesize_trans_alloc(
 	 * We hand off the transaction to the completion thread now, so
 	 * clear the flag here.
 	 */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	memalloc_nofs_restore(tp->t_memalloc);
 	return 0;
 }
 
@@ -125,7 +125,7 @@ xfs_setfilesize_ioend(
 	 * thus we need to mark ourselves as being in a transaction manually.
 	 * Similarly for freeze protection.
 	 */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	tp->t_memalloc = memalloc_nofs_save();
 	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
 
 	/* we abort the update if there was an IO error */
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 20b748f7e186..b2c3d01c690b 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -470,7 +470,7 @@ _xfs_buf_map_pages(
 		 * vm_map_ram() will allocate auxiliary structures (e.g.
 		 * pagetables) with GFP_KERNEL, yet we are likely to be under
 		 * GFP_NOFS context here. Hence we need to tell memory reclaim
-		 * that we are in such a context via PF_MEMALLOC_NOFS to prevent
+		 * that we are in such a context via memalloc_nofs to prevent
 		 * memory reclaim re-entering the filesystem here and
 		 * potentially deadlocking.
 		 */
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 9f70d2f68e05..e1daf242a53b 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -104,12 +104,6 @@ typedef __u32			xfs_nlink_t;
 #define current_cpu()		(raw_smp_processor_id())
 #define current_pid()		(current->pid)
 #define current_test_flags(f)	(current->flags & (f))
-#define current_set_flags_nested(sp, f)		\
-		(*(sp) = current->flags, current->flags |= (f))
-#define current_clear_flags_nested(sp, f)	\
-		(*(sp) = current->flags, current->flags &= ~(f))
-#define current_restore_flags_nested(sp, f)	\
-		(current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
 
 #define NBBY		8		/* number of bits per byte */
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 3c94e5ff4316..4ef1a0ff0a11 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -118,7 +118,7 @@ xfs_trans_dup(
 
 	ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
 	tp->t_rtx_res = tp->t_rtx_res_used;
-	ntp->t_pflags = tp->t_pflags;
+	ntp->t_memalloc = tp->t_memalloc;
 
 	/* move deferred ops over to the new tp */
 	xfs_defer_move(ntp, tp);
@@ -153,7 +153,7 @@ xfs_trans_reserve(
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 
 	/* Mark this thread as being in a transaction */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	tp->t_memalloc = memalloc_nofs_save();
 
 	/*
 	 * Attempt to reserve the needed disk blocks by decrementing
@@ -163,7 +163,7 @@ xfs_trans_reserve(
 	if (blocks > 0) {
 		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
 		if (error != 0) {
-			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+			memalloc_nofs_restore(tp->t_memalloc);
 			return -ENOSPC;
 		}
 		tp->t_blk_res += blocks;
@@ -240,7 +240,7 @@ xfs_trans_reserve(
 		tp->t_blk_res = 0;
 	}
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	memalloc_nofs_restore(tp->t_memalloc);
 
 	return error;
 }
@@ -861,7 +861,7 @@ __xfs_trans_commit(
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	memalloc_nofs_restore(tp->t_memalloc);
 	xfs_trans_free(tp);
 
 	/*
@@ -893,7 +893,7 @@ __xfs_trans_commit(
 			xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
 		tp->t_ticket = NULL;
 	}
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	memalloc_nofs_restore(tp->t_memalloc);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -954,7 +954,7 @@ xfs_trans_cancel(
 	}
 
 	/* mark this thread as no longer being in a transaction */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	memalloc_nofs_restore(tp->t_memalloc);
 
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 8308bf6d7e40..7aa2d5ff9245 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -118,6 +118,7 @@ typedef struct xfs_trans {
 	unsigned int		t_rtx_res;	/* # of rt extents resvd */
 	unsigned int		t_rtx_res_used;	/* # of resvd rt extents used */
 	unsigned int		t_flags;	/* misc flags */
+	unsigned int		t_memalloc;	/* saved memalloc state */
 	xfs_fsblock_t		t_firstblock;	/* first block allocated */
 	struct xlog_ticket	*t_ticket;	/* log mgr ticket */
 	struct xfs_mount	*t_mountp;	/* ptr to fs mount struct */
@@ -144,7 +145,6 @@ typedef struct xfs_trans {
 	struct list_head	t_items;	/* log item descriptors */
 	struct list_head	t_busy;		/* list of busy extents */
 	struct list_head	t_dfops;	/* deferred operations */
-	unsigned long		t_pflags;	/* saved process flags state */
 } xfs_trans_t;
 
 /*
Dave Chinner Aug. 5, 2020, 1:28 a.m. UTC | #3
On Wed, Aug 05, 2020 at 12:50:38AM +0100, Matthew Wilcox wrote:
> On Wed, Aug 05, 2020 at 09:20:05AM +1000, Dave Chinner wrote:
> > Also, please convert these to memalloc_nofs_save()/restore() calls
> > as that is the way we are supposed to mark these regions now.
> 
> I have a patch for that!

Did you compile test it? :)

> ---
>  fs/xfs/kmem.c      |  2 +-
>  fs/xfs/xfs_aops.c  |  4 ++--
>  fs/xfs/xfs_buf.c   |  2 +-
>  fs/xfs/xfs_linux.h |  6 ------
>  fs/xfs/xfs_trans.c | 14 +++++++-------
>  fs/xfs/xfs_trans.h |  2 +-
>  6 files changed, 12 insertions(+), 18 deletions(-)

.....

> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index 9f70d2f68e05..e1daf242a53b 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -104,12 +104,6 @@ typedef __u32			xfs_nlink_t;
>  #define current_cpu()		(raw_smp_processor_id())
>  #define current_pid()		(current->pid)
>  #define current_test_flags(f)	(current->flags & (f))
> -#define current_set_flags_nested(sp, f)		\
> -		(*(sp) = current->flags, current->flags |= (f))
> -#define current_clear_flags_nested(sp, f)	\
> -		(*(sp) = current->flags, current->flags &= ~(f))
> -#define current_restore_flags_nested(sp, f)	\
> -		(current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))

current_set_flags_nested() and current_restore_flags_nested()
are used in xfs_btree_split_worker() in fs/xfs/libxfs/xfs_btree.c
and that's not a file you modified in this patch...

Cheers,

Dave.
Yafang Shao Aug. 7, 2020, 4:05 a.m. UTC | #4
On Wed, Aug 5, 2020 at 9:28 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Aug 05, 2020 at 12:50:38AM +0100, Matthew Wilcox wrote:
> > On Wed, Aug 05, 2020 at 09:20:05AM +1000, Dave Chinner wrote:
> > > Also, please convert these to memalloc_nofs_save()/restore() calls
> > > as that is the way we are supposed to mark these regions now.
> >
> > I have a patch for that!
>
> Did you compile test it? :)
>
> > ---
> >  fs/xfs/kmem.c      |  2 +-
> >  fs/xfs/xfs_aops.c  |  4 ++--
> >  fs/xfs/xfs_buf.c   |  2 +-
> >  fs/xfs/xfs_linux.h |  6 ------
> >  fs/xfs/xfs_trans.c | 14 +++++++-------
> >  fs/xfs/xfs_trans.h |  2 +-
> >  6 files changed, 12 insertions(+), 18 deletions(-)
>
> .....
>
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index 9f70d2f68e05..e1daf242a53b 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -104,12 +104,6 @@ typedef __u32                    xfs_nlink_t;
> >  #define current_cpu()                (raw_smp_processor_id())
> >  #define current_pid()                (current->pid)
> >  #define current_test_flags(f)        (current->flags & (f))
> > -#define current_set_flags_nested(sp, f)              \
> > -             (*(sp) = current->flags, current->flags |= (f))
> > -#define current_clear_flags_nested(sp, f)    \
> > -             (*(sp) = current->flags, current->flags &= ~(f))
> > -#define current_restore_flags_nested(sp, f)  \
> > -             (current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
>
> current_set_flags_nested() and current_restore_flags_nested()
> are used in xfs_btree_split_worker() in fs/xfs/libxfs/xfs_btree.c
> and that's not a file you modified in this patch...
>

It is in Willy's another patch "mm: Add become_kswapd and
restore_kswapd"[1], which can be committed independently from
"Overhaul memalloc_no*" series. I will carry it in the next version.

[1] https://lore.kernel.org/linux-mm/20200625113122.7540-3-willy@infradead.org/#t
[2] https://lore.kernel.org/linux-mm/20200625113122.7540-1-willy@infradead.org/
diff mbox series

Patch

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 3c94e5ff4316..9ff41970d0c7 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -162,10 +162,9 @@  xfs_trans_reserve(
 	 */
 	if (blocks > 0) {
 		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
-		if (error != 0) {
-			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+		if (error != 0)
 			return -ENOSPC;
-		}
+
 		tp->t_blk_res += blocks;
 	}
 
@@ -240,8 +239,6 @@  xfs_trans_reserve(
 		tp->t_blk_res = 0;
 	}
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
-
 	return error;
 }
 
@@ -972,6 +969,7 @@  xfs_trans_roll(
 	struct xfs_trans	**tpp)
 {
 	struct xfs_trans	*trans = *tpp;
+	struct xfs_trans        *tp;
 	struct xfs_trans_res	tres;
 	int			error;
 
@@ -1005,5 +1003,10 @@  xfs_trans_roll(
 	 * the prior and the next transactions.
 	 */
 	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
-	return xfs_trans_reserve(*tpp, &tres, 0, 0);
+	tp = *tpp;
+	error = xfs_trans_reserve(tp, &tres, 0, 0);
+	if (error)
+		current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+
+	return error;
 }