Message ID | 20240910043127.3480554-1-hch@lst.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xfs: test log recovery for extent frees right after growfs | expand |
On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote: > Reproduce a bug where log recovery fails when an unfinised extent free > intent is in the same log as the growfs transaction that added the AG. Which bug? If it's a regression test, can we have a _fixed_by_kernel_commit to mark the known issue? > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > tests/xfs/1323 | 61 ++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/1323.out | 14 +++++++++++ > 2 files changed, 75 insertions(+) > create mode 100755 tests/xfs/1323 > create mode 100644 tests/xfs/1323.out > > diff --git a/tests/xfs/1323 b/tests/xfs/1323 > new file mode 100755 > index 000000000..a436510b0 > --- /dev/null > +++ b/tests/xfs/1323 > @@ -0,0 +1,61 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2024, Christoph Hellwig > +# > +# FS QA Test No. 1323 > +# > +# Test that recovering an extfree item residing on a freshly grown AG works. > +# > +. ./common/preamble > +_begin_fstest auto quick growfs > + > +. ./common/filter > +. ./common/inject > + _require_scratch > +_require_xfs_io_error_injection "free_extent" > + > +_xfs_force_bdev data $SCRATCH_MNT Don't you need to do this after below _scratch_mount ? > + > +_cleanup() > +{ > + cd / > + _scratch_unmount > /dev/null 2>&1 SCRATCH_DEV will be unmounted at the end of each test, so this might not be needed. If so, this whole _cleanup is not necessary. > + rm -rf $tmp.* > +} > + > +echo "Format filesystem" > +_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full > +_scratch_mount >> $seqres.full > + > +echo "Fill file system" > +dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null > +sync > +dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null > +sync There's a helper named _fill_fs() in common/populate, I'm not sure if your above steps are necessary or can be replaced, just to confirm with you. > + > +echo "Grow file system" > +$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full _require_command "$XFS_GROWFS_PROG" xfs_growfs > + > +echo "Create test files" > +dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \ > + _filter_dd > +dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \ > + _filter_dd > + > +echo "Inject error" > +_scratch_inject_error "free_extent" > + > +echo "Remove test file" > +rm $SCRATCH_MNT/test2 Is -f needed ? Thanks, Zorro > + > +echo "FS should be shut down, touch will fail" > +touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch > + > +echo "Remount to replay log" > +_scratch_remount_dump_log >> $seqres.full > + > +echo "Done" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out > new file mode 100644 > index 000000000..1740f9a1f > --- /dev/null > +++ b/tests/xfs/1323.out > @@ -0,0 +1,14 @@ > +QA output created by 1323 > +Format filesystem > +Fill file system > +Grow file system > +Create test files > +4+0 records in > +4+0 records out > +4+0 records in > +4+0 records out > +Inject error > +Remove test file > +FS should be shut down, touch will fail > +Remount to replay log > +Done > -- > 2.45.2 > >
On Tue, Sep 10, 2024 at 04:57:48PM +0800, Zorro Lang wrote: > On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote: > > Reproduce a bug where log recovery fails when an unfinised extent free > > intent is in the same log as the growfs transaction that added the AG. > > Which bug? If it's a regression test, can we have a _fixed_by_kernel_commit > to mark the known issue? I just sent the kernel patches for it. It's been there basically forever as far as I can tell.
On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote: > Reproduce a bug where log recovery fails when an unfinised extent free > intent is in the same log as the growfs transaction that added the AG. > No real issue with the test, but I wonder if we could do something more generic. Various XFS shutdown and log recovery issues went undetected for a while until we started adding more of the generic stress tests currently categorized in the recoveryloop group. So for example, I'm wondering if you took something like generic/388 or 475 and modified it to start with a smallish fs, grew it in 1GB or whatever increments on each loop iteration, and then ran the same generic stress/timeout/shutdown/recovery sequence, would that eventually reproduce the issue you've fixed? I don't think reproducibility would need to be 100% for the test to be useful, fwiw. Note that I'm assuming we don't have something like that already. I see growfs and shutdown tests in tests/xfs/group.list, but nothing in both groups and I haven't looked through the individual tests. Just a thought. Brian > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > tests/xfs/1323 | 61 ++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/1323.out | 14 +++++++++++ > 2 files changed, 75 insertions(+) > create mode 100755 tests/xfs/1323 > create mode 100644 tests/xfs/1323.out > > diff --git a/tests/xfs/1323 b/tests/xfs/1323 > new file mode 100755 > index 000000000..a436510b0 > --- /dev/null > +++ b/tests/xfs/1323 > @@ -0,0 +1,61 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2024, Christoph Hellwig > +# > +# FS QA Test No. 1323 > +# > +# Test that recovering an extfree item residing on a freshly grown AG works. > +# > +. ./common/preamble > +_begin_fstest auto quick growfs > + > +. ./common/filter > +. ./common/inject > + > +_require_xfs_io_error_injection "free_extent" > + > +_xfs_force_bdev data $SCRATCH_MNT > + > +_cleanup() > +{ > + cd / > + _scratch_unmount > /dev/null 2>&1 > + rm -rf $tmp.* > +} > + > +echo "Format filesystem" > +_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full > +_scratch_mount >> $seqres.full > + > +echo "Fill file system" > +dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null > +sync > +dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null > +sync > + > +echo "Grow file system" > +$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full > + > +echo "Create test files" > +dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \ > + _filter_dd > +dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \ > + _filter_dd > + > +echo "Inject error" > +_scratch_inject_error "free_extent" > + > +echo "Remove test file" > +rm $SCRATCH_MNT/test2 > + > +echo "FS should be shut down, touch will fail" > +touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch > + > +echo "Remount to replay log" > +_scratch_remount_dump_log >> $seqres.full > + > +echo "Done" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out > new file mode 100644 > index 000000000..1740f9a1f > --- /dev/null > +++ b/tests/xfs/1323.out > @@ -0,0 +1,14 @@ > +QA output created by 1323 > +Format filesystem > +Fill file system > +Grow file system > +Create test files > +4+0 records in > +4+0 records out > +4+0 records in > +4+0 records out > +Inject error > +Remove test file > +FS should be shut down, touch will fail > +Remount to replay log > +Done > -- > 2.45.2 > >
On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote: > No real issue with the test, but I wonder if we could do something more > generic. Various XFS shutdown and log recovery issues went undetected > for a while until we started adding more of the generic stress tests > currently categorized in the recoveryloop group. > > So for example, I'm wondering if you took something like generic/388 or > 475 and modified it to start with a smallish fs, grew it in 1GB or > whatever increments on each loop iteration, and then ran the same > generic stress/timeout/shutdown/recovery sequence, would that eventually > reproduce the issue you've fixed? I don't think reproducibility would > need to be 100% for the test to be useful, fwiw. > > Note that I'm assuming we don't have something like that already. I see > growfs and shutdown tests in tests/xfs/group.list, but nothing in both > groups and I haven't looked through the individual tests. Just a > thought. It turns out reproducing this bug was surprisingly complicated. After a growfs we can now dip into reserves that made the test1 file start filling up the existing AGs first for a while, and thus the error injection would hit on that and never even reach a new AG. So while agree with your sentiment and like the highlevel idea, I suspect it will need a fair amount of work to actually be useful. Right now I'm too busy with various projects to look into it unfortunately.
On Tue, Sep 10, 2024 at 05:10:53PM +0200, Christoph Hellwig wrote: > On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote: > > No real issue with the test, but I wonder if we could do something more > > generic. Various XFS shutdown and log recovery issues went undetected > > for a while until we started adding more of the generic stress tests > > currently categorized in the recoveryloop group. > > > > So for example, I'm wondering if you took something like generic/388 or > > 475 and modified it to start with a smallish fs, grew it in 1GB or > > whatever increments on each loop iteration, and then ran the same > > generic stress/timeout/shutdown/recovery sequence, would that eventually > > reproduce the issue you've fixed? I don't think reproducibility would > > need to be 100% for the test to be useful, fwiw. > > > > Note that I'm assuming we don't have something like that already. I see > > growfs and shutdown tests in tests/xfs/group.list, but nothing in both > > groups and I haven't looked through the individual tests. Just a > > thought. > > It turns out reproducing this bug was surprisingly complicated. > After a growfs we can now dip into reserves that made the test1 > file start filling up the existing AGs first for a while, and thus > the error injection would hit on that and never even reach a new > AG. > > So while agree with your sentiment and like the highlevel idea, I > suspect it will need a fair amount of work to actually be useful. > Right now I'm too busy with various projects to look into it > unfortunately. > Fair enough, maybe I'll play with it a bit when I have some more time. Brian
diff --git a/tests/xfs/1323 b/tests/xfs/1323 new file mode 100755 index 000000000..a436510b0 --- /dev/null +++ b/tests/xfs/1323 @@ -0,0 +1,61 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024, Christoph Hellwig +# +# FS QA Test No. 1323 +# +# Test that recovering an extfree item residing on a freshly grown AG works. +# +. ./common/preamble +_begin_fstest auto quick growfs + +. ./common/filter +. ./common/inject + +_require_xfs_io_error_injection "free_extent" + +_xfs_force_bdev data $SCRATCH_MNT + +_cleanup() +{ + cd / + _scratch_unmount > /dev/null 2>&1 + rm -rf $tmp.* +} + +echo "Format filesystem" +_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full +_scratch_mount >> $seqres.full + +echo "Fill file system" +dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null +sync +dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null +sync + +echo "Grow file system" +$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full + +echo "Create test files" +dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \ + _filter_dd +dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \ + _filter_dd + +echo "Inject error" +_scratch_inject_error "free_extent" + +echo "Remove test file" +rm $SCRATCH_MNT/test2 + +echo "FS should be shut down, touch will fail" +touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch + +echo "Remount to replay log" +_scratch_remount_dump_log >> $seqres.full + +echo "Done" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out new file mode 100644 index 000000000..1740f9a1f --- /dev/null +++ b/tests/xfs/1323.out @@ -0,0 +1,14 @@ +QA output created by 1323 +Format filesystem +Fill file system +Grow file system +Create test files +4+0 records in +4+0 records out +4+0 records in +4+0 records out +Inject error +Remove test file +FS should be shut down, touch will fail +Remount to replay log +Done
Reproduce a bug where log recovery fails when an unfinised extent free intent is in the same log as the growfs transaction that added the AG. Signed-off-by: Christoph Hellwig <hch@lst.de> --- tests/xfs/1323 | 61 ++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/1323.out | 14 +++++++++++ 2 files changed, 75 insertions(+) create mode 100755 tests/xfs/1323 create mode 100644 tests/xfs/1323.out