diff mbox series

generic: add test for missing btrfs csums in log when doing async on subpage vol

Message ID 20241015153957.2099812-1-maharmstone@fb.com (mailing list archive)
State New
Headers show
Series generic: add test for missing btrfs csums in log when doing async on subpage vol | expand

Commit Message

Mark Harmstone Oct. 15, 2024, 3:39 p.m. UTC
Adds a test for a bug we encountered on Linux 6.4 on aarch64, where a
race could mean that csums weren't getting written to the log tree,
leading to corruption when it was replayed.

The patches to detect log this tree corruption are in btrfs-progs 6.11.

Signed-off-by: Mark Harmstone <maharmstone@fb.com>
---
This is a genericized version of the test I originally proposed as
btrfs/333.

 tests/generic/757     | 71 +++++++++++++++++++++++++++++++++++++++++++
 tests/generic/757.out |  2 ++
 2 files changed, 73 insertions(+)
 create mode 100755 tests/generic/757
 create mode 100644 tests/generic/757.out

Comments

Filipe Manana Oct. 16, 2024, 11:09 a.m. UTC | #1
On Tue, Oct 15, 2024 at 4:42 PM Mark Harmstone <maharmstone@fb.com> wrote:
>
> Adds a test for a bug we encountered on Linux 6.4 on aarch64, where a
> race could mean that csums weren't getting written to the log tree,
> leading to corruption when it was replayed.
>
> The patches to detect log this tree corruption are in btrfs-progs 6.11.

This shouldn't be needed right?
Because after log replay the csums are missing and 'btrfs check'
detects (IIRC) missing csums for extents referred by file extent items
in a subvolume tree - if it doesn't then it should be improved.

>
> Signed-off-by: Mark Harmstone <maharmstone@fb.com>
> ---
> This is a genericized version of the test I originally proposed as
> btrfs/333.
>
>  tests/generic/757     | 71 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/757.out |  2 ++
>  2 files changed, 73 insertions(+)
>  create mode 100755 tests/generic/757
>  create mode 100644 tests/generic/757.out
>
> diff --git a/tests/generic/757 b/tests/generic/757
> new file mode 100755
> index 00000000..6ad3d01e
> --- /dev/null
> +++ b/tests/generic/757
> @@ -0,0 +1,71 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# FS QA Test 757
> +#
> +# Test async dio with fsync to test a btrfs bug where a race meant that csums
> +# weren't getting written to the log tree, causing corruptions on remount.
> +# This can be seen on subpage FSes on Linux 6.4.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick metadata log recoveryloop
> +
> +_fixed_by_kernel_commit e917ff56c8e7 \
> +       "btrfs: determine synchronous writers from bio or writeback control"

For generic tests what we do is:

[ $FSTYP == "btrfs" ] && _fixed_by_kernel_commit .....

As long as the failure has not been observed and fixed on other filesystems.
In case one day a regression happens in another fs, we just add a
corresponding line using the same logic.

Otherwise if the test one days fails on another fs and fstests
suggests that that commit is missing, it would be odd.

Everything else looks good, so with that fixed (maybe Zorro can change
that when picking the patch):

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.


> +
> +fio_config=$tmp.fio
> +
> +. ./common/dmlogwrites
> +
> +_require_scratch
> +_require_log_writes
> +
> +cat >$fio_config <<EOF
> +[global]
> +iodepth=128
> +direct=1
> +ioengine=libaio
> +rw=randwrite
> +runtime=1s
> +[job0]
> +rw=randwrite
> +filename=$SCRATCH_MNT/file
> +size=1g
> +fdatasync=1
> +EOF
> +
> +_require_fio $fio_config
> +
> +cat $fio_config >> $seqres.full
> +
> +_log_writes_init $SCRATCH_DEV
> +_log_writes_mkfs >> $seqres.full 2>&1
> +_log_writes_mark mkfs
> +
> +_log_writes_mount
> +
> +$FIO_PROG $fio_config > /dev/null 2>&1
> +_log_writes_unmount
> +
> +_log_writes_remove
> +
> +prev=$(_log_writes_mark_to_entry_number mkfs)
> +[ -z "$prev" ] && _fail "failed to locate entry mark 'mkfs'"
> +cur=$(_log_writes_find_next_fua $prev)
> +[ -z "$cur" ] && _fail "failed to locate next FUA write"
> +
> +while [ ! -z "$cur" ]; do
> +       _log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full
> +
> +       _check_scratch_fs
> +
> +       prev=$cur
> +       cur=$(_log_writes_find_next_fua $(($cur + 1)))
> +       [ -z "$cur" ] && break
> +done
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/757.out b/tests/generic/757.out
> new file mode 100644
> index 00000000..dfbc8094
> --- /dev/null
> +++ b/tests/generic/757.out
> @@ -0,0 +1,2 @@
> +QA output created by 757
> +Silence is golden
> --
> 2.44.2
>
>
diff mbox series

Patch

diff --git a/tests/generic/757 b/tests/generic/757
new file mode 100755
index 00000000..6ad3d01e
--- /dev/null
+++ b/tests/generic/757
@@ -0,0 +1,71 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# FS QA Test 757
+#
+# Test async dio with fsync to test a btrfs bug where a race meant that csums
+# weren't getting written to the log tree, causing corruptions on remount.
+# This can be seen on subpage FSes on Linux 6.4.
+#
+. ./common/preamble
+_begin_fstest auto quick metadata log recoveryloop
+
+_fixed_by_kernel_commit e917ff56c8e7 \
+	"btrfs: determine synchronous writers from bio or writeback control"
+
+fio_config=$tmp.fio
+
+. ./common/dmlogwrites
+
+_require_scratch
+_require_log_writes
+
+cat >$fio_config <<EOF
+[global]
+iodepth=128
+direct=1
+ioengine=libaio
+rw=randwrite
+runtime=1s
+[job0]
+rw=randwrite
+filename=$SCRATCH_MNT/file
+size=1g
+fdatasync=1
+EOF
+
+_require_fio $fio_config
+
+cat $fio_config >> $seqres.full
+
+_log_writes_init $SCRATCH_DEV
+_log_writes_mkfs >> $seqres.full 2>&1
+_log_writes_mark mkfs
+
+_log_writes_mount
+
+$FIO_PROG $fio_config > /dev/null 2>&1
+_log_writes_unmount
+
+_log_writes_remove
+
+prev=$(_log_writes_mark_to_entry_number mkfs)
+[ -z "$prev" ] && _fail "failed to locate entry mark 'mkfs'"
+cur=$(_log_writes_find_next_fua $prev)
+[ -z "$cur" ] && _fail "failed to locate next FUA write"
+
+while [ ! -z "$cur" ]; do
+	_log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full
+
+	_check_scratch_fs
+
+	prev=$cur
+	cur=$(_log_writes_find_next_fua $(($cur + 1)))
+	[ -z "$cur" ] && break
+done
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/757.out b/tests/generic/757.out
new file mode 100644
index 00000000..dfbc8094
--- /dev/null
+++ b/tests/generic/757.out
@@ -0,0 +1,2 @@ 
+QA output created by 757
+Silence is golden