Message ID | 05f928908a7949fb1787680176840b5ab23fde0b.1742303818.git.jth@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] fstests: btrfs: zoned: verify RAID conversion with write pointer mismatch | expand |
On Tue, Mar 18, 2025 at 1:17 PM Johannes Thumshirn <jth@kernel.org> wrote: > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Recently we had a bug report about a kernel crash that happened when the > user was converting a filesystem to use RAID1 for metadata, but for some > reason the device's write pointers got out of sync. > > Test this scenario by manually injecting de-synchronized write pointer > positions and then running conversion to a metadata RAID1 filesystem. > > In the testcase also repair the broken filesystem and check if both system > and metadata block groups are back to the default 'DUP' profile > afterwards. > > Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@mail.gmail.com/ > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good now, thanks. > > --- > Changes to v2: > - Filter SCRATCH_MNT in golden output > Changes to v1: > - Add test description > - Don't redirect stderr to $seqres.full > - Use xfs_io instead of dd > - Use $SCRATCH_MNT instead of hardcoded mount path > - Check that 1st balance command actually fails as it's supposed to > --- > tests/btrfs/329 | 62 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/329.out | 7 +++++ > 2 files changed, 69 insertions(+) > create mode 100755 tests/btrfs/329 > create mode 100644 tests/btrfs/329.out > > diff --git a/tests/btrfs/329 b/tests/btrfs/329 > new file mode 100755 > index 000000000000..5496866ac325 > --- /dev/null > +++ b/tests/btrfs/329 > @@ -0,0 +1,62 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2025 Western Digital Corporation. All Rights Reserved. > +# > +# FS QA Test 329 > +# > +# Regression test for a kernel crash when converting a zoned BTRFS from > +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer > +# position in the target zone. > +# > +. ./common/preamble > +_begin_fstest zone quick volume > + > +. ./common/filter > + > +_fixed_by_kernel_commit XXXXXXXXXXXX \ > + "btrfs: zoned: return EIO on RAID1 block group write pointer mismatch" > + > +_require_scratch_dev_pool 2 > +declare -a devs="( $SCRATCH_DEV_POOL )" > +_require_zoned_device ${devs[0]} > +_require_zoned_device ${devs[1]} > +_require_command "$BLKZONE_PROG" blkzone > + > +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" > +_scratch_mount > + > +# Write some data to the FS to dirty it > +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io > + > +# Add device two to the FS > +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full > + > +# Move write pointers of all empty zones by 4k to simulate write pointer > +# mismatch. > +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ > + sed 's/,//') > +for zone in $zones; > +do > + # We have to ignore the output here, as a) we don't know the number of > + # zones that have dirtied and b) if we run over the maximal number of > + # active zones, xfs_io will output errors, both we don't care. > + $XFS_IO_PROG -fdc "pwrite $(($zone << 9)) 4096" ${devs[1]} > /dev/null 2>&1 > +done > + > +# expected to fail > +$BTRFS_UTIL_PROG balance start -mconvert=raid1 $SCRATCH_MNT 2>&1 |\ > + _filter_scratch > + > +_scratch_unmount > + > +$MOUNT_PROG -t btrfs -odegraded ${devs[0]} $SCRATCH_MNT > + > +$BTRFS_UTIL_PROG device remove --force missing $SCRATCH_MNT >> $seqres.full > +$BTRFS_UTIL_PROG balance start --full-balance $SCRATCH_MNT >> $seqres.full > + > +# Check that both System and Metadata are back to the DUP profile > +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT |\ > + grep -o -e "System, DUP" -e "Metadata, DUP" > + > +status=0 > +exit > diff --git a/tests/btrfs/329.out b/tests/btrfs/329.out > new file mode 100644 > index 000000000000..e47a2a6ff04b > --- /dev/null > +++ b/tests/btrfs/329.out > @@ -0,0 +1,7 @@ > +QA output created by 329 > +wrote 134217728/134217728 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +ERROR: error during balancing 'SCRATCH_MNT': Input/output error > +There may be more info in syslog - try dmesg | tail > +System, DUP > +Metadata, DUP > -- > 2.43.0 > >
On Tue Mar 18, 2025 at 10:17 PM JST, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > Recently we had a bug report about a kernel crash that happened when the > user was converting a filesystem to use RAID1 for metadata, but for some > reason the device's write pointers got out of sync. > > Test this scenario by manually injecting de-synchronized write pointer > positions and then running conversion to a metadata RAID1 filesystem. > > In the testcase also repair the broken filesystem and check if both system > and metadata block groups are back to the default 'DUP' profile > afterwards. > > Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@mail.gmail.com/ > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > --- > Changes to v2: > - Filter SCRATCH_MNT in golden output > Changes to v1: > - Add test description > - Don't redirect stderr to $seqres.full > - Use xfs_io instead of dd > - Use $SCRATCH_MNT instead of hardcoded mount path > - Check that 1st balance command actually fails as it's supposed to > --- > tests/btrfs/329 | 62 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/329.out | 7 +++++ > 2 files changed, 69 insertions(+) > create mode 100755 tests/btrfs/329 > create mode 100644 tests/btrfs/329.out > > diff --git a/tests/btrfs/329 b/tests/btrfs/329 > new file mode 100755 > index 000000000000..5496866ac325 > --- /dev/null > +++ b/tests/btrfs/329 > @@ -0,0 +1,62 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2025 Western Digital Corporation. All Rights Reserved. > +# > +# FS QA Test 329 > +# > +# Regression test for a kernel crash when converting a zoned BTRFS from > +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer > +# position in the target zone. > +# > +. ./common/preamble > +_begin_fstest zone quick volume > + > +. ./common/filter > + > +_fixed_by_kernel_commit XXXXXXXXXXXX \ > + "btrfs: zoned: return EIO on RAID1 block group write pointer mismatch" > + > +_require_scratch_dev_pool 2 > +declare -a devs="( $SCRATCH_DEV_POOL )" > +_require_zoned_device ${devs[0]} > +_require_zoned_device ${devs[1]} > +_require_command "$BLKZONE_PROG" blkzone > + > +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" > +_scratch_mount > + > +# Write some data to the FS to dirty it > +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io > + > +# Add device two to the FS > +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full > + > +# Move write pointers of all empty zones by 4k to simulate write pointer > +# mismatch. > +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ > + sed 's/,//') Can we limit the number of zones to work with, in case we run this test on a huge device? I guess 2*(128M/4M)=64 would be enough. > +for zone in $zones; > +do > + # We have to ignore the output here, as a) we don't know the number of > + # zones that have dirtied and b) if we run over the maximal number of > + # active zones, xfs_io will output errors, both we don't care. > + $XFS_IO_PROG -fdc "pwrite $(($zone << 9)) 4096" ${devs[1]} > /dev/null 2>&1 > +done > + > +# expected to fail > +$BTRFS_UTIL_PROG balance start -mconvert=raid1 $SCRATCH_MNT 2>&1 |\ > + _filter_scratch > + > +_scratch_unmount > + > +$MOUNT_PROG -t btrfs -odegraded ${devs[0]} $SCRATCH_MNT > + > +$BTRFS_UTIL_PROG device remove --force missing $SCRATCH_MNT >> $seqres.full > +$BTRFS_UTIL_PROG balance start --full-balance $SCRATCH_MNT >> $seqres.full > + > +# Check that both System and Metadata are back to the DUP profile > +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT |\ > + grep -o -e "System, DUP" -e "Metadata, DUP" > + > +status=0 > +exit > diff --git a/tests/btrfs/329.out b/tests/btrfs/329.out > new file mode 100644 > index 000000000000..e47a2a6ff04b > --- /dev/null > +++ b/tests/btrfs/329.out > @@ -0,0 +1,7 @@ > +QA output created by 329 > +wrote 134217728/134217728 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +ERROR: error during balancing 'SCRATCH_MNT': Input/output error > +There may be more info in syslog - try dmesg | tail > +System, DUP > +Metadata, DUP
On 19.03.25 02:18, Naohiro Aota wrote: > On Tue Mar 18, 2025 at 10:17 PM JST, Johannes Thumshirn wrote: >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> Recently we had a bug report about a kernel crash that happened when the >> user was converting a filesystem to use RAID1 for metadata, but for some >> reason the device's write pointers got out of sync. >> >> Test this scenario by manually injecting de-synchronized write pointer >> positions and then running conversion to a metadata RAID1 filesystem. >> >> In the testcase also repair the broken filesystem and check if both system >> and metadata block groups are back to the default 'DUP' profile >> afterwards. >> >> Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@mail.gmail.com/ >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> --- >> Changes to v2: >> - Filter SCRATCH_MNT in golden output >> Changes to v1: >> - Add test description >> - Don't redirect stderr to $seqres.full >> - Use xfs_io instead of dd >> - Use $SCRATCH_MNT instead of hardcoded mount path >> - Check that 1st balance command actually fails as it's supposed to >> --- >> tests/btrfs/329 | 62 +++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/329.out | 7 +++++ >> 2 files changed, 69 insertions(+) >> create mode 100755 tests/btrfs/329 >> create mode 100644 tests/btrfs/329.out >> >> diff --git a/tests/btrfs/329 b/tests/btrfs/329 >> new file mode 100755 >> index 000000000000..5496866ac325 >> --- /dev/null >> +++ b/tests/btrfs/329 >> @@ -0,0 +1,62 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (c) 2025 Western Digital Corporation. All Rights Reserved. >> +# >> +# FS QA Test 329 >> +# >> +# Regression test for a kernel crash when converting a zoned BTRFS from >> +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer >> +# position in the target zone. >> +# >> +. ./common/preamble >> +_begin_fstest zone quick volume >> + >> +. ./common/filter >> + >> +_fixed_by_kernel_commit XXXXXXXXXXXX \ >> + "btrfs: zoned: return EIO on RAID1 block group write pointer mismatch" >> + >> +_require_scratch_dev_pool 2 >> +declare -a devs="( $SCRATCH_DEV_POOL )" >> +_require_zoned_device ${devs[0]} >> +_require_zoned_device ${devs[1]} >> +_require_command "$BLKZONE_PROG" blkzone >> + >> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" >> +_scratch_mount >> + >> +# Write some data to the FS to dirty it >> +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io >> + >> +# Add device two to the FS >> +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full >> + >> +# Move write pointers of all empty zones by 4k to simulate write pointer >> +# mismatch. >> +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ >> + sed 's/,//') > > Can we limit the number of zones to work with, in case we run this test > on a huge device? I guess 2*(128M/4M)=64 would be enough. > I.e. something like the following: diff --git a/tests/btrfs/329 b/tests/btrfs/329 index 5496866ac325..24d34852db1f 100755 --- a/tests/btrfs/329 +++ b/tests/btrfs/329 @@ -33,8 +33,14 @@ $BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full # Move write pointers of all empty zones by 4k to simulate write pointer # mismatch. + +nzones=$($BLKZONE_PROG report ${devs[1]} | wc -l) +if [ $nzones -gt 64 ]; then + nzones=64 +fi + zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ - sed 's/,//') + sed 's/,//' | head -n $nzones) for zone in $zones; do Yup this still triggers the bug on an unpatched kernel in my case and the fix also fixes it. So yes I'll update the testcase (I guess Filipe's R-b remains with this change).
On Wed, Mar 19, 2025 at 11:05 AM Johannes Thumshirn <Johannes.Thumshirn@wdc.com> wrote: > > On 19.03.25 02:18, Naohiro Aota wrote: > > On Tue Mar 18, 2025 at 10:17 PM JST, Johannes Thumshirn wrote: > >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> > >> Recently we had a bug report about a kernel crash that happened when the > >> user was converting a filesystem to use RAID1 for metadata, but for some > >> reason the device's write pointers got out of sync. > >> > >> Test this scenario by manually injecting de-synchronized write pointer > >> positions and then running conversion to a metadata RAID1 filesystem. > >> > >> In the testcase also repair the broken filesystem and check if both system > >> and metadata block groups are back to the default 'DUP' profile > >> afterwards. > >> > >> Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@mail.gmail.com/ > >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> > >> --- > >> Changes to v2: > >> - Filter SCRATCH_MNT in golden output > >> Changes to v1: > >> - Add test description > >> - Don't redirect stderr to $seqres.full > >> - Use xfs_io instead of dd > >> - Use $SCRATCH_MNT instead of hardcoded mount path > >> - Check that 1st balance command actually fails as it's supposed to > >> --- > >> tests/btrfs/329 | 62 +++++++++++++++++++++++++++++++++++++++++++++ > >> tests/btrfs/329.out | 7 +++++ > >> 2 files changed, 69 insertions(+) > >> create mode 100755 tests/btrfs/329 > >> create mode 100644 tests/btrfs/329.out > >> > >> diff --git a/tests/btrfs/329 b/tests/btrfs/329 > >> new file mode 100755 > >> index 000000000000..5496866ac325 > >> --- /dev/null > >> +++ b/tests/btrfs/329 > >> @@ -0,0 +1,62 @@ > >> +#! /bin/bash > >> +# SPDX-License-Identifier: GPL-2.0 > >> +# Copyright (c) 2025 Western Digital Corporation. All Rights Reserved. > >> +# > >> +# FS QA Test 329 > >> +# > >> +# Regression test for a kernel crash when converting a zoned BTRFS from > >> +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer > >> +# position in the target zone. > >> +# > >> +. ./common/preamble > >> +_begin_fstest zone quick volume > >> + > >> +. ./common/filter > >> + > >> +_fixed_by_kernel_commit XXXXXXXXXXXX \ > >> + "btrfs: zoned: return EIO on RAID1 block group write pointer mismatch" > >> + > >> +_require_scratch_dev_pool 2 > >> +declare -a devs="( $SCRATCH_DEV_POOL )" > >> +_require_zoned_device ${devs[0]} > >> +_require_zoned_device ${devs[1]} > >> +_require_command "$BLKZONE_PROG" blkzone > >> + > >> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" > >> +_scratch_mount > >> + > >> +# Write some data to the FS to dirty it > >> +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io > >> + > >> +# Add device two to the FS > >> +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full > >> + > >> +# Move write pointers of all empty zones by 4k to simulate write pointer > >> +# mismatch. > >> +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ > >> + sed 's/,//') > > > > Can we limit the number of zones to work with, in case we run this test > > on a huge device? I guess 2*(128M/4M)=64 would be enough. > > > > I.e. something like the following: > > diff --git a/tests/btrfs/329 b/tests/btrfs/329 > index 5496866ac325..24d34852db1f 100755 > --- a/tests/btrfs/329 > +++ b/tests/btrfs/329 > @@ -33,8 +33,14 @@ $BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full > > # Move write pointers of all empty zones by 4k to simulate write pointer > # mismatch. > + > +nzones=$($BLKZONE_PROG report ${devs[1]} | wc -l) > +if [ $nzones -gt 64 ]; then > + nzones=64 > +fi > + > zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ > - sed 's/,//') > + sed 's/,//' | head -n $nzones) > for zone in $zones; > do > > Yup this still triggers the bug on an unpatched kernel in my case and the > fix also fixes it. > > So yes I'll update the testcase (I guess Filipe's R-b remains with this change). Yes. >
diff --git a/tests/btrfs/329 b/tests/btrfs/329 new file mode 100755 index 000000000000..5496866ac325 --- /dev/null +++ b/tests/btrfs/329 @@ -0,0 +1,62 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2025 Western Digital Corporation. All Rights Reserved. +# +# FS QA Test 329 +# +# Regression test for a kernel crash when converting a zoned BTRFS from +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer +# position in the target zone. +# +. ./common/preamble +_begin_fstest zone quick volume + +. ./common/filter + +_fixed_by_kernel_commit XXXXXXXXXXXX \ + "btrfs: zoned: return EIO on RAID1 block group write pointer mismatch" + +_require_scratch_dev_pool 2 +declare -a devs="( $SCRATCH_DEV_POOL )" +_require_zoned_device ${devs[0]} +_require_zoned_device ${devs[1]} +_require_command "$BLKZONE_PROG" blkzone + +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" +_scratch_mount + +# Write some data to the FS to dirty it +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io + +# Add device two to the FS +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full + +# Move write pointers of all empty zones by 4k to simulate write pointer +# mismatch. +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ + sed 's/,//') +for zone in $zones; +do + # We have to ignore the output here, as a) we don't know the number of + # zones that have dirtied and b) if we run over the maximal number of + # active zones, xfs_io will output errors, both we don't care. + $XFS_IO_PROG -fdc "pwrite $(($zone << 9)) 4096" ${devs[1]} > /dev/null 2>&1 +done + +# expected to fail +$BTRFS_UTIL_PROG balance start -mconvert=raid1 $SCRATCH_MNT 2>&1 |\ + _filter_scratch + +_scratch_unmount + +$MOUNT_PROG -t btrfs -odegraded ${devs[0]} $SCRATCH_MNT + +$BTRFS_UTIL_PROG device remove --force missing $SCRATCH_MNT >> $seqres.full +$BTRFS_UTIL_PROG balance start --full-balance $SCRATCH_MNT >> $seqres.full + +# Check that both System and Metadata are back to the DUP profile +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT |\ + grep -o -e "System, DUP" -e "Metadata, DUP" + +status=0 +exit diff --git a/tests/btrfs/329.out b/tests/btrfs/329.out new file mode 100644 index 000000000000..e47a2a6ff04b --- /dev/null +++ b/tests/btrfs/329.out @@ -0,0 +1,7 @@ +QA output created by 329 +wrote 134217728/134217728 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +ERROR: error during balancing 'SCRATCH_MNT': Input/output error +There may be more info in syslog - try dmesg | tail +System, DUP +Metadata, DUP