Message ID | 20200801154632.866356-2-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | void xfs transaction reservation recursion | expand |
On Sat, Aug 01, 2020 at 11:46:31AM -0400, Yafang Shao wrote: > From: Yafang Shao <shaoyafang@didiglobal.com> > > In xfs_trans_alloc(), if xfs_trans_reserve() fails, it will call > xfs_trans_cancel(), in which it will restore the flag PF_MEMALLOC_NOFS. > However this flags has been restored in xfs_trans_reserve(). Although > this behavior doesn't introduce any obvious issue, we'd better improve it. ..... > > Signed-off-by: Yafang Shao <shaoyafang@didiglobal.com> > Cc: Dave Chinner <david@fromorbit.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Darrick J. Wong <darrick.wong@oracle.com> > Cc: Matthew Wilcox <willy@infradead.org> > --- > fs/xfs/xfs_trans.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > index 3c94e5ff4316..9ff41970d0c7 100644 > --- a/fs/xfs/xfs_trans.c > +++ b/fs/xfs/xfs_trans.c > @@ -162,10 +162,9 @@ xfs_trans_reserve( > */ > if (blocks > 0) { > error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); > - if (error != 0) { > - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); > + if (error != 0) > return -ENOSPC; > - } > + > tp->t_blk_res += blocks; > } > > @@ -240,8 +239,6 @@ xfs_trans_reserve( > tp->t_blk_res = 0; > } > > - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); > - > return error; > } So you fix it here.... > > @@ -972,6 +969,7 @@ xfs_trans_roll( > struct xfs_trans **tpp) > { > struct xfs_trans *trans = *tpp; > + struct xfs_trans *tp; > struct xfs_trans_res tres; > int error; > > @@ -1005,5 +1003,10 @@ xfs_trans_roll( > * the prior and the next transactions. > */ > tres.tr_logflags = XFS_TRANS_PERM_LOG_RES; > - return xfs_trans_reserve(*tpp, &tres, 0, 0); > + tp = *tpp; > + error = xfs_trans_reserve(tp, &tres, 0, 0); > + if (error) > + current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); > + > + return error; > } .... but then introduce the exact same issue here. i.e. when xfs_trans_roll() fails, the caller will call xfs_trans_cancel() on the returned transaction.... Realistically, I think this is kinda missing the overall intent of rolling transactions. The issue here is that when we commit a transaction, we unconditionally clear the PF_MEMALLOC_NOFS flag via __xfs_trans_commit() just before freeing the transaction despite the fact the long running rolling transaction has not yet completed. This means that when we roll a transactions, we are bouncing the NOFS state unnecessarily like so: t0 t1 t2 alloc reserve NOFS roll alloc commit clear NOFS reserve NOFS roll alloc commit clear NOFS reserve NOFS ..... right until the last commit of the sequence. IOWs, we are repeatedly setting and clearing NOFS even though we actually only need to set it at the t0 and clear it on the final commit or cancel. Hence I think that __xfs_trans_commit() should probably only clear NOFS on successful commit if @regrant is false (i.e. not called from xfs_trans_roll()) and the setting of NOFS should be removed from xfs_trans_reserve() and moved up into the initial xfs_trans_alloc() before xfs_trans_reserve() is called. This way the calls to either xfs_trans_commit() or xfs_trans_cancel() will clear the NOFS state as they indicate that we are exiting transaction context completely.... Also, please convert these to memalloc_nofs_save()/restore() calls as that is the way we are supposed to mark these regions now. Cheers, Dave.
On Wed, Aug 05, 2020 at 09:20:05AM +1000, Dave Chinner wrote: > Also, please convert these to memalloc_nofs_save()/restore() calls > as that is the way we are supposed to mark these regions now. I have a patch for that! From 7830a7dc66f349e16ef34bd408b9962f4c4bcdba Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Date: Fri, 27 Mar 2020 13:01:57 -0400 Subject: [PATCH 3/6] xfs: Convert to memalloc_nofs_save To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org, dm-devel@redhat.com, Mikulas Patocka <mpatocka@redhat.com>, Jens Axboe <axboe@kernel.dk>, NeilBrown <neilb@suse.de> Instead of using custom macros to set/restore PF_MEMALLOC_NOFS, use memalloc_nofs_save() like the rest of the kernel. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- fs/xfs/kmem.c | 2 +- fs/xfs/xfs_aops.c | 4 ++-- fs/xfs/xfs_buf.c | 2 +- fs/xfs/xfs_linux.h | 6 ------ fs/xfs/xfs_trans.c | 14 +++++++------- fs/xfs/xfs_trans.h | 2 +- 6 files changed, 12 insertions(+), 18 deletions(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index f1366475c389..c2d237159bfc 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -35,7 +35,7 @@ kmem_alloc(size_t size, xfs_km_flags_t flags) * __vmalloc() will allocate data pages and auxiliary structures (e.g. * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context here. Hence * we need to tell memory reclaim that we are in such a context via - * PF_MEMALLOC_NOFS to prevent memory reclaim re-entering the filesystem here + * memalloc_nofs to prevent memory reclaim re-entering the filesystem here * and potentially deadlocking. */ static void * diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index b35611882ff9..e3a4806e519d 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -62,7 +62,7 @@ xfs_setfilesize_trans_alloc( * We hand off the transaction to the completion thread now, so * clear the flag here. */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + memalloc_nofs_restore(tp->t_memalloc); return 0; } @@ -125,7 +125,7 @@ xfs_setfilesize_ioend( * thus we need to mark ourselves as being in a transaction manually. * Similarly for freeze protection. */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + tp->t_memalloc = memalloc_nofs_save(); __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS); /* we abort the update if there was an IO error */ diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 20b748f7e186..b2c3d01c690b 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -470,7 +470,7 @@ _xfs_buf_map_pages( * vm_map_ram() will allocate auxiliary structures (e.g. * pagetables) with GFP_KERNEL, yet we are likely to be under * GFP_NOFS context here. Hence we need to tell memory reclaim - * that we are in such a context via PF_MEMALLOC_NOFS to prevent + * that we are in such a context via memalloc_nofs to prevent * memory reclaim re-entering the filesystem here and * potentially deadlocking. */ diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 9f70d2f68e05..e1daf242a53b 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -104,12 +104,6 @@ typedef __u32 xfs_nlink_t; #define current_cpu() (raw_smp_processor_id()) #define current_pid() (current->pid) #define current_test_flags(f) (current->flags & (f)) -#define current_set_flags_nested(sp, f) \ - (*(sp) = current->flags, current->flags |= (f)) -#define current_clear_flags_nested(sp, f) \ - (*(sp) = current->flags, current->flags &= ~(f)) -#define current_restore_flags_nested(sp, f) \ - (current->flags = ((current->flags & ~(f)) | (*(sp) & (f)))) #define NBBY 8 /* number of bits per byte */ diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 3c94e5ff4316..4ef1a0ff0a11 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -118,7 +118,7 @@ xfs_trans_dup( ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used; tp->t_rtx_res = tp->t_rtx_res_used; - ntp->t_pflags = tp->t_pflags; + ntp->t_memalloc = tp->t_memalloc; /* move deferred ops over to the new tp */ xfs_defer_move(ntp, tp); @@ -153,7 +153,7 @@ xfs_trans_reserve( bool rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0; /* Mark this thread as being in a transaction */ - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + tp->t_memalloc = memalloc_nofs_save(); /* * Attempt to reserve the needed disk blocks by decrementing @@ -163,7 +163,7 @@ xfs_trans_reserve( if (blocks > 0) { error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); if (error != 0) { - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + memalloc_nofs_restore(tp->t_memalloc); return -ENOSPC; } tp->t_blk_res += blocks; @@ -240,7 +240,7 @@ xfs_trans_reserve( tp->t_blk_res = 0; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + memalloc_nofs_restore(tp->t_memalloc); return error; } @@ -861,7 +861,7 @@ __xfs_trans_commit( xfs_log_commit_cil(mp, tp, &commit_lsn, regrant); - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + memalloc_nofs_restore(tp->t_memalloc); xfs_trans_free(tp); /* @@ -893,7 +893,7 @@ __xfs_trans_commit( xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket); tp->t_ticket = NULL; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + memalloc_nofs_restore(tp->t_memalloc); xfs_trans_free_items(tp, !!error); xfs_trans_free(tp); @@ -954,7 +954,7 @@ xfs_trans_cancel( } /* mark this thread as no longer being in a transaction */ - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + memalloc_nofs_restore(tp->t_memalloc); xfs_trans_free_items(tp, dirty); xfs_trans_free(tp); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 8308bf6d7e40..7aa2d5ff9245 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -118,6 +118,7 @@ typedef struct xfs_trans { unsigned int t_rtx_res; /* # of rt extents resvd */ unsigned int t_rtx_res_used; /* # of resvd rt extents used */ unsigned int t_flags; /* misc flags */ + unsigned int t_memalloc; /* saved memalloc state */ xfs_fsblock_t t_firstblock; /* first block allocated */ struct xlog_ticket *t_ticket; /* log mgr ticket */ struct xfs_mount *t_mountp; /* ptr to fs mount struct */ @@ -144,7 +145,6 @@ typedef struct xfs_trans { struct list_head t_items; /* log item descriptors */ struct list_head t_busy; /* list of busy extents */ struct list_head t_dfops; /* deferred operations */ - unsigned long t_pflags; /* saved process flags state */ } xfs_trans_t; /*
On Wed, Aug 05, 2020 at 12:50:38AM +0100, Matthew Wilcox wrote: > On Wed, Aug 05, 2020 at 09:20:05AM +1000, Dave Chinner wrote: > > Also, please convert these to memalloc_nofs_save()/restore() calls > > as that is the way we are supposed to mark these regions now. > > I have a patch for that! Did you compile test it? :) > --- > fs/xfs/kmem.c | 2 +- > fs/xfs/xfs_aops.c | 4 ++-- > fs/xfs/xfs_buf.c | 2 +- > fs/xfs/xfs_linux.h | 6 ------ > fs/xfs/xfs_trans.c | 14 +++++++------- > fs/xfs/xfs_trans.h | 2 +- > 6 files changed, 12 insertions(+), 18 deletions(-) ..... > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > index 9f70d2f68e05..e1daf242a53b 100644 > --- a/fs/xfs/xfs_linux.h > +++ b/fs/xfs/xfs_linux.h > @@ -104,12 +104,6 @@ typedef __u32 xfs_nlink_t; > #define current_cpu() (raw_smp_processor_id()) > #define current_pid() (current->pid) > #define current_test_flags(f) (current->flags & (f)) > -#define current_set_flags_nested(sp, f) \ > - (*(sp) = current->flags, current->flags |= (f)) > -#define current_clear_flags_nested(sp, f) \ > - (*(sp) = current->flags, current->flags &= ~(f)) > -#define current_restore_flags_nested(sp, f) \ > - (current->flags = ((current->flags & ~(f)) | (*(sp) & (f)))) current_set_flags_nested() and current_restore_flags_nested() are used in xfs_btree_split_worker() in fs/xfs/libxfs/xfs_btree.c and that's not a file you modified in this patch... Cheers, Dave.
On Wed, Aug 5, 2020 at 9:28 AM Dave Chinner <david@fromorbit.com> wrote: > > On Wed, Aug 05, 2020 at 12:50:38AM +0100, Matthew Wilcox wrote: > > On Wed, Aug 05, 2020 at 09:20:05AM +1000, Dave Chinner wrote: > > > Also, please convert these to memalloc_nofs_save()/restore() calls > > > as that is the way we are supposed to mark these regions now. > > > > I have a patch for that! > > Did you compile test it? :) > > > --- > > fs/xfs/kmem.c | 2 +- > > fs/xfs/xfs_aops.c | 4 ++-- > > fs/xfs/xfs_buf.c | 2 +- > > fs/xfs/xfs_linux.h | 6 ------ > > fs/xfs/xfs_trans.c | 14 +++++++------- > > fs/xfs/xfs_trans.h | 2 +- > > 6 files changed, 12 insertions(+), 18 deletions(-) > > ..... > > > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > > index 9f70d2f68e05..e1daf242a53b 100644 > > --- a/fs/xfs/xfs_linux.h > > +++ b/fs/xfs/xfs_linux.h > > @@ -104,12 +104,6 @@ typedef __u32 xfs_nlink_t; > > #define current_cpu() (raw_smp_processor_id()) > > #define current_pid() (current->pid) > > #define current_test_flags(f) (current->flags & (f)) > > -#define current_set_flags_nested(sp, f) \ > > - (*(sp) = current->flags, current->flags |= (f)) > > -#define current_clear_flags_nested(sp, f) \ > > - (*(sp) = current->flags, current->flags &= ~(f)) > > -#define current_restore_flags_nested(sp, f) \ > > - (current->flags = ((current->flags & ~(f)) | (*(sp) & (f)))) > > current_set_flags_nested() and current_restore_flags_nested() > are used in xfs_btree_split_worker() in fs/xfs/libxfs/xfs_btree.c > and that's not a file you modified in this patch... > It is in Willy's another patch "mm: Add become_kswapd and restore_kswapd"[1], which can be committed independently from "Overhaul memalloc_no*" series. I will carry it in the next version. [1] https://lore.kernel.org/linux-mm/20200625113122.7540-3-willy@infradead.org/#t [2] https://lore.kernel.org/linux-mm/20200625113122.7540-1-willy@infradead.org/
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 3c94e5ff4316..9ff41970d0c7 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -162,10 +162,9 @@ xfs_trans_reserve( */ if (blocks > 0) { error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); - if (error != 0) { - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + if (error != 0) return -ENOSPC; - } + tp->t_blk_res += blocks; } @@ -240,8 +239,6 @@ xfs_trans_reserve( tp->t_blk_res = 0; } - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); - return error; } @@ -972,6 +969,7 @@ xfs_trans_roll( struct xfs_trans **tpp) { struct xfs_trans *trans = *tpp; + struct xfs_trans *tp; struct xfs_trans_res tres; int error; @@ -1005,5 +1003,10 @@ xfs_trans_roll( * the prior and the next transactions. */ tres.tr_logflags = XFS_TRANS_PERM_LOG_RES; - return xfs_trans_reserve(*tpp, &tres, 0, 0); + tp = *tpp; + error = xfs_trans_reserve(tp, &tres, 0, 0); + if (error) + current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); + + return error; }