diff mbox series

[2/2] xfs: estimate post-merge refcounts correctly

Message ID 166975929675.3768925.10238207487640742011.stgit@magnolia (mailing list archive)
State Accepted
Headers show
Series xfs: fix broken MAXREFCOUNT handling | expand

Commit Message

Darrick J. Wong Nov. 29, 2022, 10:01 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
metadata corruptions after being run.  Specifically, xfs_repair noticed
single-block refcount records that could be combined but had not been.

The root cause of this is improper MAXREFCOUNT edge case handling in
xfs_refcount_merge_extents.  When we're trying to find candidates for a
refcount btree record merge, we compute the refcount attribute of the
merged record, but we fail to account for the fact that once a record
hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence
the computed refcount is wrong, and we fail to merge the extents.

Fix this by adjusting the merge predicates to compute the adjusted
refcount correctly.

Fixes: 3172725814f9 ("xfs: adjust refcount of an extent of blocks in refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_refcount.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

Comments

Dave Chinner Nov. 29, 2022, 10:37 p.m. UTC | #1
On Tue, Nov 29, 2022 at 02:01:36PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
> metadata corruptions after being run.  Specifically, xfs_repair noticed
> single-block refcount records that could be combined but had not been.
> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> refcount btree record merge, we compute the refcount attribute of the
> merged record, but we fail to account for the fact that once a record
> hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence
> the computed refcount is wrong, and we fail to merge the extents.
> 
> Fix this by adjusting the merge predicates to compute the adjusted
> refcount correctly.
> 
> Fixes: 3172725814f9 ("xfs: adjust refcount of an extent of blocks in refcount btree")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_refcount.c |   25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)

Looks good to me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Xiao Yang Nov. 30, 2022, 9:32 a.m. UTC | #2
On 2022/11/30 6:01, Darrick J. Wong wrote:
> From: Darrick J. Wong<djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
> metadata corruptions after being run.  Specifically, xfs_repair noticed
> single-block refcount records that could be combined but had not been.
Hi Darrick,

I am investigating the issue as well. Thanks a lot for your quick fix.
I have confirmed that xfs/179 with the fix patch can works well in DAX mode.
Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>

> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> refcount btree record merge, we compute the refcount attribute of the
> merged record, but we fail to account for the fact that once a record
> hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence

One question:
Is it reansonable/expected to pin rc_refcount forever when a record hits 
rc_refcount == MAXREFCOUNT?

> the computed refcount is wrong, and we fail to merge the extents.
> 
> Fix this by adjusting the merge predicates to compute the adjusted
> refcount correctly.

Best Regards,
Xiao Yang
Darrick J. Wong Nov. 30, 2022, 6:49 p.m. UTC | #3
On Wed, Nov 30, 2022 at 05:32:38PM +0800, Yang, Xiao/杨 晓 wrote:
> On 2022/11/30 6:01, Darrick J. Wong wrote:
> > From: Darrick J. Wong<djwong@kernel.org>
> > 
> > Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
> > metadata corruptions after being run.  Specifically, xfs_repair noticed
> > single-block refcount records that could be combined but had not been.
> Hi Darrick,
> 
> I am investigating the issue as well. Thanks a lot for your quick fix.
> I have confirmed that xfs/179 with the fix patch can works well in DAX mode.
> Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>
> 
> > 
> > The root cause of this is improper MAXREFCOUNT edge case handling in
> > xfs_refcount_merge_extents.  When we're trying to find candidates for a
> > refcount btree record merge, we compute the refcount attribute of the
> > merged record, but we fail to account for the fact that once a record
> > hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence
> 
> One question:
> Is it reansonable/expected to pin rc_refcount forever when a record hits
> rc_refcount == MAXREFCOUNT?

In the ideal world we'd have a way for refcount_adjust to return early
if it hit a MAXREFCOUNT record, and stop the reflink operation right
there and then.

*Unfortunately* due to the way defer ops work, by the time we're walking
through the refcount btree, we've already committed to adding the entire
mapping.  There's no good way to back out once we start, since
transactions do not log undo items, only redo items.  Augmenting the log
to support undo items is a very big ask.

We could try to work around that by walking the refcount records
*before* committing log items, but that second walk will increase the
overhead for a fairly difficult to hit corner case.  We'd also need to
hold the AGF buffer lock from the start of the chain all the way until
we start the deferred refcount processing.  Or we'd need to add a
separate AG lock.

Even if we did all that, FICLONE* doesn't have any means to communicate
that a short reflink occurred.  copy_file_range does, but then it has
other weird warts, and cannot reflink more than 2^32 bytes in one go.

The simplest solution is to pin the extent if its refcount hits
MAXREFCOUNT, so that's what I went with. :/

--D

> > the computed refcount is wrong, and we fail to merge the extents.
> > 
> > Fix this by adjusting the merge predicates to compute the adjusted
> > refcount correctly.
> 
> Best Regards,
> Xiao Yang
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 8c68994d09f3..0d68a9230386 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -820,6 +820,17 @@  xfs_refc_valid(
 	return rc->rc_startblock != NULLAGBLOCK;
 }
 
+static inline xfs_nlink_t
+xfs_refc_merge_refcount(
+	const struct xfs_refcount_irec	*irec,
+	enum xfs_refc_adjust_op		adjust)
+{
+	/* Once a record hits MAXREFCOUNT, it is pinned there forever */
+	if (irec->rc_refcount == MAXREFCOUNT)
+		return MAXREFCOUNT;
+	return irec->rc_refcount + adjust;
+}
+
 static inline bool
 xfs_refc_want_merge_center(
 	const struct xfs_refcount_irec	*left,
@@ -831,6 +842,7 @@  xfs_refc_want_merge_center(
 	unsigned long long		*ulenp)
 {
 	unsigned long long		ulen = left->rc_blockcount;
+	xfs_nlink_t			new_refcount;
 
 	/*
 	 * To merge with a center record, both shoulder records must be
@@ -846,9 +858,10 @@  xfs_refc_want_merge_center(
 		return false;
 
 	/* The shoulder record refcounts must match the new refcount. */
-	if (left->rc_refcount != cleft->rc_refcount + adjust)
+	new_refcount = xfs_refc_merge_refcount(cleft, adjust);
+	if (left->rc_refcount != new_refcount)
 		return false;
-	if (right->rc_refcount != cleft->rc_refcount + adjust)
+	if (right->rc_refcount != new_refcount)
 		return false;
 
 	/*
@@ -870,6 +883,7 @@  xfs_refc_want_merge_left(
 	enum xfs_refc_adjust_op		adjust)
 {
 	unsigned long long		ulen = left->rc_blockcount;
+	xfs_nlink_t			new_refcount;
 
 	/*
 	 * For a left merge, the left shoulder record must be adjacent to the
@@ -880,7 +894,8 @@  xfs_refc_want_merge_left(
 		return false;
 
 	/* Left shoulder record refcount must match the new refcount. */
-	if (left->rc_refcount != cleft->rc_refcount + adjust)
+	new_refcount = xfs_refc_merge_refcount(cleft, adjust);
+	if (left->rc_refcount != new_refcount)
 		return false;
 
 	/*
@@ -901,6 +916,7 @@  xfs_refc_want_merge_right(
 	enum xfs_refc_adjust_op		adjust)
 {
 	unsigned long long		ulen = right->rc_blockcount;
+	xfs_nlink_t			new_refcount;
 
 	/*
 	 * For a right merge, the right shoulder record must be adjacent to the
@@ -911,7 +927,8 @@  xfs_refc_want_merge_right(
 		return false;
 
 	/* Right shoulder record refcount must match the new refcount. */
-	if (right->rc_refcount != cright->rc_refcount + adjust)
+	new_refcount = xfs_refc_merge_refcount(cright, adjust);
+	if (right->rc_refcount != new_refcount)
 		return false;
 
 	/*