diff mbox series

[v2] fstests: btrfs: zoned: verify RAID conversion with write pointer mismatch

Message ID d5ae8704427e156eb6dca0b720847e48665a6340.1742302069.git.jth@kernel.org (mailing list archive)
State New
Headers show
Series [v2] fstests: btrfs: zoned: verify RAID conversion with write pointer mismatch | expand

Commit Message

Johannes Thumshirn March 18, 2025, 12:49 p.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Recently we had a bug report about a kernel crash that happened when the
user was converting a filesystem to use RAID1 for metadata, but for some
reason the device's write pointers got out of sync.

Test this scenario by manually injecting de-synchronized write pointer
positions and then running conversion to a metadata RAID1 filesystem.

In the testcase also repair the broken filesystem and check if both system
and metadata block groups are back to the default 'DUP' profile
afterwards.

Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@mail.gmail.com/
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

---
Changes to v1:
- Add test description
- Don't redirect stderr to $seqres.full
- Use xfs_io instead of dd
- Use $SCRATCH_MNT instead of hardcoded mount path
- Check that 1st balance command actually fails as it's supposed to
---
 tests/btrfs/329     | 61 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/329.out |  7 ++++++
 2 files changed, 68 insertions(+)
 create mode 100755 tests/btrfs/329
 create mode 100644 tests/btrfs/329.out

Comments

Johannes Thumshirn March 18, 2025, 12:52 p.m. UTC | #1
On 18.03.25 13:49, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> Recently we had a bug report about a kernel crash that happened when the
> user was converting a filesystem to use RAID1 for metadata, but for some
> reason the device's write pointers got out of sync.
> 
> Test this scenario by manually injecting de-synchronized write pointer
> positions and then running conversion to a metadata RAID1 filesystem.
> 
> In the testcase also repair the broken filesystem and check if both system
> and metadata block groups are back to the default 'DUP' profile
> afterwards.
> 
> Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@mail.gmail.com/
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to v1:
> - Add test description
> - Don't redirect stderr to $seqres.full
> - Use xfs_io instead of dd
> - Use $SCRATCH_MNT instead of hardcoded mount path
> - Check that 1st balance command actually fails as it's supposed to
> ---
>   tests/btrfs/329     | 61 +++++++++++++++++++++++++++++++++++++++++++++
>   tests/btrfs/329.out |  7 ++++++
>   2 files changed, 68 insertions(+)
>   create mode 100755 tests/btrfs/329
>   create mode 100644 tests/btrfs/329.out
> 
> diff --git a/tests/btrfs/329 b/tests/btrfs/329
> new file mode 100755
> index 000000000000..0cc75bc8156d
> --- /dev/null
> +++ b/tests/btrfs/329
> @@ -0,0 +1,61 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2025 Western Digital Corporation.  All Rights Reserved.
> +#
> +# FS QA Test 329
> +#
> +# Regression test for a kernel crash when converting a zoned BTRFS from
> +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer
> +# position in the target zone.
> +#
> +. ./common/preamble
> +_begin_fstest zone quick volume
> +
> +. ./common/filter
> +
> +_fixed_by_kernel_commit XXXXXXXXXXXX \
> +	"btrfs: zoned: return EIO on RAID1 block group write pointer mismatch"
> +
> +_require_scratch_dev_pool 2
> +declare -a devs="( $SCRATCH_DEV_POOL )"
> +_require_zoned_device ${devs[0]}
> +_require_zoned_device ${devs[1]}
> +_require_command "$BLKZONE_PROG" blkzone
> +
> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> +_scratch_mount
> +
> +# Write some data to the FS to dirty it
> +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io
> +
> +# Add device two to the FS
> +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full
> +
> +# Move write pointers of all empty zones by 4k to simulate write pointer
> +# mismatch.
> +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\
> +	sed 's/,//')
> +for zone in $zones;
> +do
> +	# We have to ignore the output here, as a) we don't know the number of
> +	# zones that have dirtied and b) if we run over the maximal number of
> +	# active zones, xfs_io will output errors, both we don't care.
> +	$XFS_IO_PROG -fdc "pwrite $(($zone << 9)) 4096" ${devs[1]} > /dev/null 2>&1
> +done
> +
> +# expected to fail
> +$BTRFS_UTIL_PROG balance start -mconvert=raid1 $SCRATCH_MNT >> $seqres.full
> +
> +_scratch_unmount
> +
> +$MOUNT_PROG -t btrfs -odegraded ${devs[0]} $SCRATCH_MNT
> +
> +$BTRFS_UTIL_PROG device remove --force missing $SCRATCH_MNT >> $seqres.full
> +$BTRFS_UTIL_PROG balance start --full-balance $SCRATCH_MNT >> $seqres.full
> +
> +# Check that both System and Metadata are back to the DUP profile
> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT |\
> +	grep -o -e "System, DUP" -e "Metadata, DUP"
> +
> +status=0
> +exit
> diff --git a/tests/btrfs/329.out b/tests/btrfs/329.out
> new file mode 100644
> index 000000000000..b52b7d90d253
> --- /dev/null
> +++ b/tests/btrfs/329.out
> @@ -0,0 +1,7 @@
> +QA output created by 329
> +wrote 134217728/134217728 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +ERROR: error during balancing '/mnt/scratch': Input/output error


Args, that needs to be filtered as well. Saw it too late sorry.

> +There may be more info in syslog - try dmesg | tail
> +System, DUP
> +Metadata, DUP
diff mbox series

Patch

diff --git a/tests/btrfs/329 b/tests/btrfs/329
new file mode 100755
index 000000000000..0cc75bc8156d
--- /dev/null
+++ b/tests/btrfs/329
@@ -0,0 +1,61 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025 Western Digital Corporation.  All Rights Reserved.
+#
+# FS QA Test 329
+#
+# Regression test for a kernel crash when converting a zoned BTRFS from
+# metadata DUP to RAID1 and one of the devices has a non 0 write pointer
+# position in the target zone.
+#
+. ./common/preamble
+_begin_fstest zone quick volume
+
+. ./common/filter
+
+_fixed_by_kernel_commit XXXXXXXXXXXX \
+	"btrfs: zoned: return EIO on RAID1 block group write pointer mismatch"
+
+_require_scratch_dev_pool 2
+declare -a devs="( $SCRATCH_DEV_POOL )"
+_require_zoned_device ${devs[0]}
+_require_zoned_device ${devs[1]}
+_require_command "$BLKZONE_PROG" blkzone
+
+_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
+_scratch_mount
+
+# Write some data to the FS to dirty it
+$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io
+
+# Add device two to the FS
+$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full
+
+# Move write pointers of all empty zones by 4k to simulate write pointer
+# mismatch.
+zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\
+	sed 's/,//')
+for zone in $zones;
+do
+	# We have to ignore the output here, as a) we don't know the number of
+	# zones that have dirtied and b) if we run over the maximal number of
+	# active zones, xfs_io will output errors, both we don't care.
+	$XFS_IO_PROG -fdc "pwrite $(($zone << 9)) 4096" ${devs[1]} > /dev/null 2>&1
+done
+
+# expected to fail
+$BTRFS_UTIL_PROG balance start -mconvert=raid1 $SCRATCH_MNT >> $seqres.full
+
+_scratch_unmount
+
+$MOUNT_PROG -t btrfs -odegraded ${devs[0]} $SCRATCH_MNT
+
+$BTRFS_UTIL_PROG device remove --force missing $SCRATCH_MNT >> $seqres.full
+$BTRFS_UTIL_PROG balance start --full-balance $SCRATCH_MNT >> $seqres.full
+
+# Check that both System and Metadata are back to the DUP profile
+$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT |\
+	grep -o -e "System, DUP" -e "Metadata, DUP"
+
+status=0
+exit
diff --git a/tests/btrfs/329.out b/tests/btrfs/329.out
new file mode 100644
index 000000000000..b52b7d90d253
--- /dev/null
+++ b/tests/btrfs/329.out
@@ -0,0 +1,7 @@ 
+QA output created by 329
+wrote 134217728/134217728 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR: error during balancing '/mnt/scratch': Input/output error
+There may be more info in syslog - try dmesg | tail
+System, DUP
+Metadata, DUP