diff mbox series

[RFC] xfs/179: modify test to trigger refcount update bugs

Message ID Y4aCb+y2ej1TBE/R@magnolia (mailing list archive)
State Superseded
Headers show
Series [RFC] xfs/179: modify test to trigger refcount update bugs | expand

Commit Message

Darrick J. Wong Nov. 29, 2022, 10:06 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Upon enabling fsdax + reflink for XFS, this test began to report
refcount metadata corruptions after being run.  Specifically, xfs_repair
noticed single-block refcount records that could be combined but had not
been.

The root cause of this is improper MAXREFCOUNT edge case handling in
xfs_refcount_merge_extents.  When we're trying to find candidates for a
record merge, we compute the refcount of the merged record, but without
accounting for the fact that once a record hits rc_refcount ==
MAXREFCOUNT, it is pinned that way forever.

Adjust this test to use a sub-filesize write for one of the COW writes,
because this is how we force the extent merge code to run.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/179 |   28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

Comments

Dave Chinner Nov. 29, 2022, 10:42 p.m. UTC | #1
On Tue, Nov 29, 2022 at 02:06:39PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, this test began to report
> refcount metadata corruptions after being run.  Specifically, xfs_repair
> noticed single-block refcount records that could be combined but had not
> been.
> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> record merge, we compute the refcount of the merged record, but without
> accounting for the fact that once a record hits rc_refcount ==
> MAXREFCOUNT, it is pinned that way forever.
> 
> Adjust this test to use a sub-filesize write for one of the COW writes,
> because this is how we force the extent merge code to run.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Seems like a reasonable modification to the test....

> ---
>  tests/xfs/179 |   28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/xfs/179 b/tests/xfs/179
> index ec0cb7e5b4..214558f694 100755
> --- a/tests/xfs/179
> +++ b/tests/xfs/179
> @@ -21,17 +21,28 @@ _require_scratch_nocheck
>  _require_cp_reflink
>  _require_test_program "punch-alternating"
>  
> +_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly"

Though I really don't like these annotation because when the test
fails in future as I'm developing new code it's going to tell me I
need a fix I already have in the kernel. This is just extra noise
that I have to filter out of the results output. IMO a comment for
this information or a line in the commit message is fine - it
just doesn't belong in the test output....

Other than that:

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Darrick J. Wong Nov. 30, 2022, 12:19 a.m. UTC | #2
On Wed, Nov 30, 2022 at 09:42:27AM +1100, Dave Chinner wrote:
> On Tue, Nov 29, 2022 at 02:06:39PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Upon enabling fsdax + reflink for XFS, this test began to report
> > refcount metadata corruptions after being run.  Specifically, xfs_repair
> > noticed single-block refcount records that could be combined but had not
> > been.
> > 
> > The root cause of this is improper MAXREFCOUNT edge case handling in
> > xfs_refcount_merge_extents.  When we're trying to find candidates for a
> > record merge, we compute the refcount of the merged record, but without
> > accounting for the fact that once a record hits rc_refcount ==
> > MAXREFCOUNT, it is pinned that way forever.
> > 
> > Adjust this test to use a sub-filesize write for one of the COW writes,
> > because this is how we force the extent merge code to run.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Seems like a reasonable modification to the test....
> 
> > ---
> >  tests/xfs/179 |   28 +++++++++++++++++++++++++---
> >  1 file changed, 25 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tests/xfs/179 b/tests/xfs/179
> > index ec0cb7e5b4..214558f694 100755
> > --- a/tests/xfs/179
> > +++ b/tests/xfs/179
> > @@ -21,17 +21,28 @@ _require_scratch_nocheck
> >  _require_cp_reflink
> >  _require_test_program "punch-alternating"
> >  
> > +_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly"
> 
> Though I really don't like these annotation because when the test
> fails in future as I'm developing new code it's going to tell me I
> need a fix I already have in the kernel. This is just extra noise
> that I have to filter out of the results output. IMO a comment for
> this information or a line in the commit message is fine - it
> just doesn't belong in the test output....

I'll turn that into a comment, since this originally was a functional
test, not a regression test.

> Other than that:
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Ok thanks!

--D

> 
> -- 
> Dave Chinner
> david@fromorbit.com
Xiao Yang Nov. 30, 2022, 10:07 a.m. UTC | #3
On 2022/11/30 6:06, Darrick J. Wong wrote:
> From: Darrick J. Wong<djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, this test began to report
> refcount metadata corruptions after being run.  Specifically, xfs_repair
> noticed single-block refcount records that could be combined but had not
> been.
> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> record merge, we compute the refcount of the merged record, but without
> accounting for the fact that once a record hits rc_refcount ==
> MAXREFCOUNT, it is pinned that way forever.
> 
> Adjust this test to use a sub-filesize write for one of the COW writes,
> because this is how we force the extent merge code to run.
Hi Darrick,

Cool, it is reliable to reproduce the same issue in non-DAX mode.
Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>
Tested-by: Xiao Yang <yangx.jy@fujitsu.com>

Best Regards,
Xiao Yang
diff mbox series

Patch

diff --git a/tests/xfs/179 b/tests/xfs/179
index ec0cb7e5b4..214558f694 100755
--- a/tests/xfs/179
+++ b/tests/xfs/179
@@ -21,17 +21,28 @@  _require_scratch_nocheck
 _require_cp_reflink
 _require_test_program "punch-alternating"
 
+_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly"
+
 echo "Format and mount"
 _scratch_mkfs -d agcount=1 > $seqres.full 2>&1
 _scratch_mount >> $seqres.full 2>&1
 
+# This test modifies the refcount btree on the data device, so we must force
+# rtinherit off so that the test files are created there.
+_xfs_force_bdev data $SCRATCH_MNT
+
 testdir=$SCRATCH_MNT/test-$seq
 mkdir $testdir
 
+# Set the file size to 10x the block size to guarantee that the COW writes will
+# touch multiple blocks and exercise the refcount extent merging code.  This is
+# necessary to catch a bug in the refcount extent merging code that handles
+# MAXREFCOUNT edge cases.
 blksz=65536
+filesz=$((blksz * 10))
 
 echo "Create original files"
-_pwrite_byte 0x61 0 $blksz $testdir/file1 >> $seqres.full
+_pwrite_byte 0x61 0 $filesz $testdir/file1 >> $seqres.full
 _cp_reflink $testdir/file1 $testdir/file2 >> $seqres.full
 
 echo "Change reference count"
@@ -56,9 +67,20 @@  _scratch_xfs_db -c 'agf 0' -c 'addr refcntroot' -c 'p recs[1]' >> $seqres.full
 _scratch_mount >> $seqres.full
 
 echo "CoW a couple files"
-_pwrite_byte 0x62 0 $blksz $testdir/file3 >> $seqres.full
-_pwrite_byte 0x62 0 $blksz $testdir/file5 >> $seqres.full
+_pwrite_byte 0x62 0 $filesz $testdir/file3 >> $seqres.full
+_pwrite_byte 0x62 0 $filesz $testdir/file5 >> $seqres.full
+
+# For the last COW test, write single blocks at the start, middle, and end of
+# the shared file to exercise a refcount btree update that targets a single
+# block of the multiblock refcount record that we just modified.
+#
+# This trips a bug where XFS didn't correctly identify refcount record merge
+# candidates when any of the records are pinned at MAXREFCOUNT.  The bug was
+# originally discovered by enabling fsdax + reflink, but the bug can be
+# triggered by any COW that doesn't target the entire extent.
 _pwrite_byte 0x62 0 $blksz $testdir/file7 >> $seqres.full
+_pwrite_byte 0x62 $((blksz * 4)) $blksz $testdir/file7 >> $seqres.full
+_pwrite_byte 0x62 $((filesz - blksz)) $blksz $testdir/file7 >> $seqres.full
 
 echo "Check scratch fs"
 _scratch_unmount