Message ID | Y4aCb+y2ej1TBE/R@magnolia (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] xfs/179: modify test to trigger refcount update bugs | expand |
On Tue, Nov 29, 2022 at 02:06:39PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Upon enabling fsdax + reflink for XFS, this test began to report > refcount metadata corruptions after being run. Specifically, xfs_repair > noticed single-block refcount records that could be combined but had not > been. > > The root cause of this is improper MAXREFCOUNT edge case handling in > xfs_refcount_merge_extents. When we're trying to find candidates for a > record merge, we compute the refcount of the merged record, but without > accounting for the fact that once a record hits rc_refcount == > MAXREFCOUNT, it is pinned that way forever. > > Adjust this test to use a sub-filesize write for one of the COW writes, > because this is how we force the extent merge code to run. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> Seems like a reasonable modification to the test.... > --- > tests/xfs/179 | 28 +++++++++++++++++++++++++--- > 1 file changed, 25 insertions(+), 3 deletions(-) > > diff --git a/tests/xfs/179 b/tests/xfs/179 > index ec0cb7e5b4..214558f694 100755 > --- a/tests/xfs/179 > +++ b/tests/xfs/179 > @@ -21,17 +21,28 @@ _require_scratch_nocheck > _require_cp_reflink > _require_test_program "punch-alternating" > > +_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly" Though I really don't like these annotation because when the test fails in future as I'm developing new code it's going to tell me I need a fix I already have in the kernel. This is just extra noise that I have to filter out of the results output. IMO a comment for this information or a line in the commit message is fine - it just doesn't belong in the test output.... Other than that: Reviewed-by: Dave Chinner <dchinner@redhat.com>
On Wed, Nov 30, 2022 at 09:42:27AM +1100, Dave Chinner wrote: > On Tue, Nov 29, 2022 at 02:06:39PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Upon enabling fsdax + reflink for XFS, this test began to report > > refcount metadata corruptions after being run. Specifically, xfs_repair > > noticed single-block refcount records that could be combined but had not > > been. > > > > The root cause of this is improper MAXREFCOUNT edge case handling in > > xfs_refcount_merge_extents. When we're trying to find candidates for a > > record merge, we compute the refcount of the merged record, but without > > accounting for the fact that once a record hits rc_refcount == > > MAXREFCOUNT, it is pinned that way forever. > > > > Adjust this test to use a sub-filesize write for one of the COW writes, > > because this is how we force the extent merge code to run. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > Seems like a reasonable modification to the test.... > > > --- > > tests/xfs/179 | 28 +++++++++++++++++++++++++--- > > 1 file changed, 25 insertions(+), 3 deletions(-) > > > > diff --git a/tests/xfs/179 b/tests/xfs/179 > > index ec0cb7e5b4..214558f694 100755 > > --- a/tests/xfs/179 > > +++ b/tests/xfs/179 > > @@ -21,17 +21,28 @@ _require_scratch_nocheck > > _require_cp_reflink > > _require_test_program "punch-alternating" > > > > +_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly" > > Though I really don't like these annotation because when the test > fails in future as I'm developing new code it's going to tell me I > need a fix I already have in the kernel. This is just extra noise > that I have to filter out of the results output. IMO a comment for > this information or a line in the commit message is fine - it > just doesn't belong in the test output.... I'll turn that into a comment, since this originally was a functional test, not a regression test. > Other than that: > > Reviewed-by: Dave Chinner <dchinner@redhat.com> Ok thanks! --D > > -- > Dave Chinner > david@fromorbit.com
On 2022/11/30 6:06, Darrick J. Wong wrote: > From: Darrick J. Wong<djwong@kernel.org> > > Upon enabling fsdax + reflink for XFS, this test began to report > refcount metadata corruptions after being run. Specifically, xfs_repair > noticed single-block refcount records that could be combined but had not > been. > > The root cause of this is improper MAXREFCOUNT edge case handling in > xfs_refcount_merge_extents. When we're trying to find candidates for a > record merge, we compute the refcount of the merged record, but without > accounting for the fact that once a record hits rc_refcount == > MAXREFCOUNT, it is pinned that way forever. > > Adjust this test to use a sub-filesize write for one of the COW writes, > because this is how we force the extent merge code to run. Hi Darrick, Cool, it is reliable to reproduce the same issue in non-DAX mode. Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com> Tested-by: Xiao Yang <yangx.jy@fujitsu.com> Best Regards, Xiao Yang
diff --git a/tests/xfs/179 b/tests/xfs/179 index ec0cb7e5b4..214558f694 100755 --- a/tests/xfs/179 +++ b/tests/xfs/179 @@ -21,17 +21,28 @@ _require_scratch_nocheck _require_cp_reflink _require_test_program "punch-alternating" +_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly" + echo "Format and mount" _scratch_mkfs -d agcount=1 > $seqres.full 2>&1 _scratch_mount >> $seqres.full 2>&1 +# This test modifies the refcount btree on the data device, so we must force +# rtinherit off so that the test files are created there. +_xfs_force_bdev data $SCRATCH_MNT + testdir=$SCRATCH_MNT/test-$seq mkdir $testdir +# Set the file size to 10x the block size to guarantee that the COW writes will +# touch multiple blocks and exercise the refcount extent merging code. This is +# necessary to catch a bug in the refcount extent merging code that handles +# MAXREFCOUNT edge cases. blksz=65536 +filesz=$((blksz * 10)) echo "Create original files" -_pwrite_byte 0x61 0 $blksz $testdir/file1 >> $seqres.full +_pwrite_byte 0x61 0 $filesz $testdir/file1 >> $seqres.full _cp_reflink $testdir/file1 $testdir/file2 >> $seqres.full echo "Change reference count" @@ -56,9 +67,20 @@ _scratch_xfs_db -c 'agf 0' -c 'addr refcntroot' -c 'p recs[1]' >> $seqres.full _scratch_mount >> $seqres.full echo "CoW a couple files" -_pwrite_byte 0x62 0 $blksz $testdir/file3 >> $seqres.full -_pwrite_byte 0x62 0 $blksz $testdir/file5 >> $seqres.full +_pwrite_byte 0x62 0 $filesz $testdir/file3 >> $seqres.full +_pwrite_byte 0x62 0 $filesz $testdir/file5 >> $seqres.full + +# For the last COW test, write single blocks at the start, middle, and end of +# the shared file to exercise a refcount btree update that targets a single +# block of the multiblock refcount record that we just modified. +# +# This trips a bug where XFS didn't correctly identify refcount record merge +# candidates when any of the records are pinned at MAXREFCOUNT. The bug was +# originally discovered by enabling fsdax + reflink, but the bug can be +# triggered by any COW that doesn't target the entire extent. _pwrite_byte 0x62 0 $blksz $testdir/file7 >> $seqres.full +_pwrite_byte 0x62 $((blksz * 4)) $blksz $testdir/file7 >> $seqres.full +_pwrite_byte 0x62 $((filesz - blksz)) $blksz $testdir/file7 >> $seqres.full echo "Check scratch fs" _scratch_unmount