diff mbox series

xfs: test log recovery for extent frees right after growfs

Message ID 20240910043127.3480554-1-hch@lst.de (mailing list archive)
State New
Headers show
Series xfs: test log recovery for extent frees right after growfs | expand

Commit Message

Christoph Hellwig Sept. 10, 2024, 4:31 a.m. UTC
Reproduce a bug where log recovery fails when an unfinised extent free
intent is in the same log as the growfs transaction that added the AG.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 tests/xfs/1323     | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/1323.out | 14 +++++++++++
 2 files changed, 75 insertions(+)
 create mode 100755 tests/xfs/1323
 create mode 100644 tests/xfs/1323.out

Comments

Zorro Lang Sept. 10, 2024, 8:57 a.m. UTC | #1
On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote:
> Reproduce a bug where log recovery fails when an unfinised extent free
> intent is in the same log as the growfs transaction that added the AG.

Which bug? If it's a regression test, can we have a _fixed_by_kernel_commit
to mark the known issue?

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  tests/xfs/1323     | 61 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1323.out | 14 +++++++++++
>  2 files changed, 75 insertions(+)
>  create mode 100755 tests/xfs/1323
>  create mode 100644 tests/xfs/1323.out
> 
> diff --git a/tests/xfs/1323 b/tests/xfs/1323
> new file mode 100755
> index 000000000..a436510b0
> --- /dev/null
> +++ b/tests/xfs/1323
> @@ -0,0 +1,61 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024, Christoph Hellwig
> +#
> +# FS QA Test No. 1323
> +#
> +# Test that recovering an extfree item residing on a freshly grown AG works.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick growfs
> +
> +. ./common/filter
> +. ./common/inject
> +

_require_scratch

> +_require_xfs_io_error_injection "free_extent"
> +
> +_xfs_force_bdev data $SCRATCH_MNT

Don't you need to do this after below _scratch_mount ?

> +
> +_cleanup()
> +{
> +	cd /
> +	_scratch_unmount > /dev/null 2>&1

SCRATCH_DEV will be unmounted at the end of each test, so this might not be needed.
If so, this whole _cleanup is not necessary.

> +	rm -rf $tmp.*
> +}
> +
> +echo "Format filesystem"
> +_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full
> +_scratch_mount >> $seqres.full
> +
> +echo "Fill file system"
> +dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null
> +sync
> +dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null
> +sync

There's a helper named _fill_fs() in common/populate, I'm not sure if
your above steps are necessary or can be replaced, just to confirm with
you.

> +
> +echo "Grow file system"
> +$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full

_require_command "$XFS_GROWFS_PROG" xfs_growfs

> +
> +echo "Create test files"
> +dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \
> +	 _filter_dd
> +dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \
> +	 _filter_dd
> +
> +echo "Inject error"
> +_scratch_inject_error "free_extent"
> +
> +echo "Remove test file"
> +rm $SCRATCH_MNT/test2

Is -f needed ?

Thanks,
Zorro

> +
> +echo "FS should be shut down, touch will fail"
> +touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch
> +
> +echo "Remount to replay log"
> +_scratch_remount_dump_log >> $seqres.full
> +
> +echo "Done"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out
> new file mode 100644
> index 000000000..1740f9a1f
> --- /dev/null
> +++ b/tests/xfs/1323.out
> @@ -0,0 +1,14 @@
> +QA output created by 1323
> +Format filesystem
> +Fill file system
> +Grow file system
> +Create test files
> +4+0 records in
> +4+0 records out
> +4+0 records in
> +4+0 records out
> +Inject error
> +Remove test file
> +FS should be shut down, touch will fail
> +Remount to replay log
> +Done
> -- 
> 2.45.2
> 
>
Christoph Hellwig Sept. 10, 2024, 11:34 a.m. UTC | #2
On Tue, Sep 10, 2024 at 04:57:48PM +0800, Zorro Lang wrote:
> On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote:
> > Reproduce a bug where log recovery fails when an unfinised extent free
> > intent is in the same log as the growfs transaction that added the AG.
> 
> Which bug? If it's a regression test, can we have a _fixed_by_kernel_commit
> to mark the known issue?

I just sent the kernel patches for it.  It's been there basically
forever as far as I can tell.
Brian Foster Sept. 10, 2024, 2:19 p.m. UTC | #3
On Tue, Sep 10, 2024 at 07:31:17AM +0300, Christoph Hellwig wrote:
> Reproduce a bug where log recovery fails when an unfinised extent free
> intent is in the same log as the growfs transaction that added the AG.
> 

No real issue with the test, but I wonder if we could do something more
generic. Various XFS shutdown and log recovery issues went undetected
for a while until we started adding more of the generic stress tests
currently categorized in the recoveryloop group.

So for example, I'm wondering if you took something like generic/388 or
475 and modified it to start with a smallish fs, grew it in 1GB or
whatever increments on each loop iteration, and then ran the same
generic stress/timeout/shutdown/recovery sequence, would that eventually
reproduce the issue you've fixed? I don't think reproducibility would
need to be 100% for the test to be useful, fwiw.

Note that I'm assuming we don't have something like that already. I see
growfs and shutdown tests in tests/xfs/group.list, but nothing in both
groups and I haven't looked through the individual tests. Just a
thought.

Brian

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  tests/xfs/1323     | 61 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1323.out | 14 +++++++++++
>  2 files changed, 75 insertions(+)
>  create mode 100755 tests/xfs/1323
>  create mode 100644 tests/xfs/1323.out
> 
> diff --git a/tests/xfs/1323 b/tests/xfs/1323
> new file mode 100755
> index 000000000..a436510b0
> --- /dev/null
> +++ b/tests/xfs/1323
> @@ -0,0 +1,61 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024, Christoph Hellwig
> +#
> +# FS QA Test No. 1323
> +#
> +# Test that recovering an extfree item residing on a freshly grown AG works.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick growfs
> +
> +. ./common/filter
> +. ./common/inject
> +
> +_require_xfs_io_error_injection "free_extent"
> +
> +_xfs_force_bdev data $SCRATCH_MNT
> +
> +_cleanup()
> +{
> +	cd /
> +	_scratch_unmount > /dev/null 2>&1
> +	rm -rf $tmp.*
> +}
> +
> +echo "Format filesystem"
> +_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full
> +_scratch_mount >> $seqres.full
> +
> +echo "Fill file system"
> +dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null
> +sync
> +dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null
> +sync
> +
> +echo "Grow file system"
> +$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full
> +
> +echo "Create test files"
> +dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \
> +	 _filter_dd
> +dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \
> +	 _filter_dd
> +
> +echo "Inject error"
> +_scratch_inject_error "free_extent"
> +
> +echo "Remove test file"
> +rm $SCRATCH_MNT/test2
> +
> +echo "FS should be shut down, touch will fail"
> +touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch
> +
> +echo "Remount to replay log"
> +_scratch_remount_dump_log >> $seqres.full
> +
> +echo "Done"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out
> new file mode 100644
> index 000000000..1740f9a1f
> --- /dev/null
> +++ b/tests/xfs/1323.out
> @@ -0,0 +1,14 @@
> +QA output created by 1323
> +Format filesystem
> +Fill file system
> +Grow file system
> +Create test files
> +4+0 records in
> +4+0 records out
> +4+0 records in
> +4+0 records out
> +Inject error
> +Remove test file
> +FS should be shut down, touch will fail
> +Remount to replay log
> +Done
> -- 
> 2.45.2
> 
>
Christoph Hellwig Sept. 10, 2024, 3:10 p.m. UTC | #4
On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote:
> No real issue with the test, but I wonder if we could do something more
> generic. Various XFS shutdown and log recovery issues went undetected
> for a while until we started adding more of the generic stress tests
> currently categorized in the recoveryloop group.
> 
> So for example, I'm wondering if you took something like generic/388 or
> 475 and modified it to start with a smallish fs, grew it in 1GB or
> whatever increments on each loop iteration, and then ran the same
> generic stress/timeout/shutdown/recovery sequence, would that eventually
> reproduce the issue you've fixed? I don't think reproducibility would
> need to be 100% for the test to be useful, fwiw.
> 
> Note that I'm assuming we don't have something like that already. I see
> growfs and shutdown tests in tests/xfs/group.list, but nothing in both
> groups and I haven't looked through the individual tests. Just a
> thought.

It turns out reproducing this bug was surprisingly complicated.
After a growfs we can now dip into reserves that made the test1
file start filling up the existing AGs first for a while, and thus
the error injection would hit on that and never even reach a new
AG.

So while agree with your sentiment and like the highlevel idea, I
suspect it will need a fair amount of work to actually be useful.
Right now I'm too busy with various projects to look into it
unfortunately.
Brian Foster Sept. 10, 2024, 4:13 p.m. UTC | #5
On Tue, Sep 10, 2024 at 05:10:53PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 10, 2024 at 10:19:50AM -0400, Brian Foster wrote:
> > No real issue with the test, but I wonder if we could do something more
> > generic. Various XFS shutdown and log recovery issues went undetected
> > for a while until we started adding more of the generic stress tests
> > currently categorized in the recoveryloop group.
> > 
> > So for example, I'm wondering if you took something like generic/388 or
> > 475 and modified it to start with a smallish fs, grew it in 1GB or
> > whatever increments on each loop iteration, and then ran the same
> > generic stress/timeout/shutdown/recovery sequence, would that eventually
> > reproduce the issue you've fixed? I don't think reproducibility would
> > need to be 100% for the test to be useful, fwiw.
> > 
> > Note that I'm assuming we don't have something like that already. I see
> > growfs and shutdown tests in tests/xfs/group.list, but nothing in both
> > groups and I haven't looked through the individual tests. Just a
> > thought.
> 
> It turns out reproducing this bug was surprisingly complicated.
> After a growfs we can now dip into reserves that made the test1
> file start filling up the existing AGs first for a while, and thus
> the error injection would hit on that and never even reach a new
> AG.
> 
> So while agree with your sentiment and like the highlevel idea, I
> suspect it will need a fair amount of work to actually be useful.
> Right now I'm too busy with various projects to look into it
> unfortunately.
> 

Fair enough, maybe I'll play with it a bit when I have some more time.

Brian
diff mbox series

Patch

diff --git a/tests/xfs/1323 b/tests/xfs/1323
new file mode 100755
index 000000000..a436510b0
--- /dev/null
+++ b/tests/xfs/1323
@@ -0,0 +1,61 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024, Christoph Hellwig
+#
+# FS QA Test No. 1323
+#
+# Test that recovering an extfree item residing on a freshly grown AG works.
+#
+. ./common/preamble
+_begin_fstest auto quick growfs
+
+. ./common/filter
+. ./common/inject
+
+_require_xfs_io_error_injection "free_extent"
+
+_xfs_force_bdev data $SCRATCH_MNT
+
+_cleanup()
+{
+	cd /
+	_scratch_unmount > /dev/null 2>&1
+	rm -rf $tmp.*
+}
+
+echo "Format filesystem"
+_scratch_mkfs_sized $((128 * 1024 * 1024)) >> $seqres.full
+_scratch_mount >> $seqres.full
+
+echo "Fill file system"
+dd if=/dev/zero of=$SCRATCH_MNT/filler1 bs=64k oflag=direct &>/dev/null
+sync
+dd if=/dev/zero of=$SCRATCH_MNT/filler2 bs=64k oflag=direct &>/dev/null
+sync
+
+echo "Grow file system"
+$XFS_GROWFS_PROG $SCRATCH_MNT >>$seqres.full
+
+echo "Create test files"
+dd if=/dev/zero of=$SCRATCH_MNT/test1 bs=8M count=4 oflag=direct | \
+	 _filter_dd
+dd if=/dev/zero of=$SCRATCH_MNT/test2 bs=8M count=4 oflag=direct | \
+	 _filter_dd
+
+echo "Inject error"
+_scratch_inject_error "free_extent"
+
+echo "Remove test file"
+rm $SCRATCH_MNT/test2
+
+echo "FS should be shut down, touch will fail"
+touch $SCRATCH_MNT/test1 2>&1 | _filter_scratch
+
+echo "Remount to replay log"
+_scratch_remount_dump_log >> $seqres.full
+
+echo "Done"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/1323.out b/tests/xfs/1323.out
new file mode 100644
index 000000000..1740f9a1f
--- /dev/null
+++ b/tests/xfs/1323.out
@@ -0,0 +1,14 @@ 
+QA output created by 1323
+Format filesystem
+Fill file system
+Grow file system
+Create test files
+4+0 records in
+4+0 records out
+4+0 records in
+4+0 records out
+Inject error
+Remove test file
+FS should be shut down, touch will fail
+Remount to replay log
+Done