diff mbox

transaction reservations for deleting of shared extents

Message ID 20170413035250.GT8502@birch.djwong.org (mailing list archive)
State Superseded
Headers show

Commit Message

Darrick J. Wong April 13, 2017, 3:52 a.m. UTC
On Wed, Apr 12, 2017 at 04:06:12PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 12, 2017 at 03:52:31PM +0200, Christoph Hellwig wrote:
> > I think the problem is that t_log_res just contains the log reservation
> > when the transaction was created.  But each item processed by
> > xfs_defer_finish uses up some of it, but in some cases these might
> > be different operations and not just more refcount updates, e.g. for
> > xfs_itruncate_extents which I see the issues with we mix EFI/EFD
> > items with refcount updates.
> 
> Hmmm... I suppose you could end up with a heavy load of deferred updates
> stemming from the removal of a single extent:
> 
> 1) Start with one huge extent mapped into a file.
> 2) Reflink every other block into another file.
> 3) Delete the first file.  This results in:
>    a) Unmap the huge extent.
>    b) Schedule removal of the rmap, if applicable.
>    c) Schedule a refcount decrease for the huge extent.
>    d) Perform the deferred rmap removal.  If we push blocks off the
>       AGFL as part of removing rmapbt blocks, queue an EFI.
>    e) Perform the deferred refcount decrease:
>       For each (singly-)shared block, set the refcount=1 by deleting the
>       refcount record.  Every ~150 deletions we free a refcount block
>       and queue an EFI.  (If rmap, queue a deferred rmap update too.)
>    f) Perform the deferred rmap removals.  If we push blocks off the
>       AGFL as part of removing rmapbt blocks, queue an EFI.
>    g) Free each shared block by queueing an EFI.
>    h) For each EFI, free the extent.
> 
> So I think the problem you're seeing here is that just prior to (3g) we
> have the most deferred items (EFIs, specifically) attached to this
> transaction at any point in the whole operation.  There can be so many
> EFIs that we use up the log reservation and blow the ASSERT.
> 
> One way to fix this is to unmap a smaller range in (1) so that we don't
> blow up at (3g).  Unfortunately, it is hard to guess at (1) just how
> many EFIs we might end up queueing, but I think reducing the amount of
> file mapping we free in a given step might be the only sane solution.
> One could calculate the number of blocks we can free, given the
> remaining transaction reservation and assuming the worst case number of
> EFIs that could get filed to unmap those blocks, and only __bunmapi that
> many blocks, thereby forcing the caller to come back with a fresh
> defer_ops for another try.

Hmm... can you try out the attached RFC(RAP) xfstests testcase to see if
it can reproduce the problem you're seeing, and then apply the (equally
RFCRAP) patch to see if it fixes the problem?

--D

> 
> > I still don't have a good idea how to fix this, though.  One idea
> > would be to prevent mixing different items, but I think being able
> > to mix them was one of your goals with the defer infrastructure rewrite.
> 
> Yes, we have to be able to perform several different types of updates
> in one defer_ops so that we can execute CoW remappings atomically.
> 
> --D
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Hellwig April 13, 2017, 10:51 a.m. UTC | #1
On Wed, Apr 12, 2017 at 08:52:50PM -0700, Darrick J. Wong wrote:
> Hmm... can you try out the attached RFC(RAP) xfstests testcase to see if
> it can reproduce the problem you're seeing,

That one gives me a softlockup in xlog_grant_head_wait, so I think we're
having even more problems in this area.  Let me try with a larger log
size and deeg into the issues..
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From: Darrick J. Wong <darrick.wong@oracle.com>
Subject: [PATCH] xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent

In a pathological scenario where we are trying to bunmapi a single
extent in which every other block is shared, it's possible that trying
to unmap the entire large extent in a single transaction can generate so
many EFIs that we overflow the transaction reservation.

Therefore, use a heuristic to guess at the number of blocks we can
safely unmap from a reflink file's data fork in an single transaction.
This should prevent problems such as the log head slamming into the tail
and ASSERTs that trigger because we've exceeded the transaction
reservation.

Signed-off-by: Darrick J. Wong <djwong@djwong.org>
---
 fs/xfs/libxfs/xfs_bmap.c |   42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 179b483..87da58d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5417,6 +5417,28 @@  xfs_bmap_del_extent(
 }
 
 /*
+ * Guesstimate how many blocks we can unmap without running the risk of
+ * blowing out the transaction with EFIs.
+ */
+static xfs_fileoff_t
+xfs_bunmapi_can_unmap(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	int			whichfork,
+	xfs_fileoff_t		len)
+{
+	unsigned int		limit;
+
+	if (!xfs_is_reflink_inode(ip) || whichfork != XFS_DATA_FORK)
+		return len;
+
+	limit = (tp->t_log_res * 3 / 4) / 32;
+	if (len <= limit)
+		return len;
+	return limit;
+}
+
+/*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
  * that value.  If not all extents in the block range can be removed then
@@ -5451,6 +5473,7 @@  __xfs_bunmapi(
 	int			whichfork;	/* data or attribute fork */
 	xfs_fsblock_t		sum;
 	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
+	xfs_fileoff_t		can_unmap;
 
 	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
 
@@ -5472,6 +5495,12 @@  __xfs_bunmapi(
 	ASSERT(len > 0);
 	ASSERT(nexts >= 0);
 
+	/*
+	 * If this is potentially reflinked data blocks, don't blow out
+	 * the reservation.
+	 */
+	can_unmap = xfs_bunmapi_can_unmap(tp, ip, whichfork, len);
+
 	if (!(ifp->if_flags & XFS_IFEXTENTS) &&
 	    (error = xfs_iread_extents(tp, ip, whichfork)))
 		return error;
@@ -5516,7 +5545,7 @@  __xfs_bunmapi(
 
 	extno = 0;
 	while (bno != (xfs_fileoff_t)-1 && bno >= start && lastx >= 0 &&
-	       (nexts == 0 || extno < nexts)) {
+	       (nexts == 0 || extno < nexts) && can_unmap > 0) {
 		/*
 		 * Is the found extent after a hole in which bno lives?
 		 * Just back up to the previous extent, if so.
@@ -5548,6 +5577,16 @@  __xfs_bunmapi(
 		}
 		if (del.br_startoff + del.br_blockcount > bno + 1)
 			del.br_blockcount = bno + 1 - del.br_startoff;
+
+		/* How much can we safely unmap? */
+		if (can_unmap < del.br_blockcount) {
+printk(KERN_ERR "%s: ino %lld pblk %d:%llu lblk %llu len %llu can_unmap %llu got=%llu:%llu:%llu\n", __func__, ip->i_ino, wasdel, del.br_startblock, del.br_startoff, del.br_blockcount, can_unmap, got.br_startblock, got.br_startoff, got.br_blockcount);
+			del.br_startoff += del.br_blockcount - can_unmap;
+			if (!wasdel)
+				del.br_startblock += del.br_blockcount - can_unmap;
+			del.br_blockcount = can_unmap;
+		}
+
 		sum = del.br_startblock + del.br_blockcount;
 		if (isrt &&
 		    (mod = do_mod(sum, mp->m_sb.sb_rextsize))) {
@@ -5724,6 +5763,7 @@  __xfs_bunmapi(
 		if (!isrt && wasdel)
 			xfs_mod_fdblocks(mp, (int64_t)del.br_blockcount, false);
 
+		can_unmap -= del.br_blockcount;
 		bno = del.br_startoff - 1;
 nodelete:
 		/*