diff mbox series

fstests: add test case to make sure btrfs can handle one corrupted device

Message ID 20220726062948.56315-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series fstests: add test case to make sure btrfs can handle one corrupted device | expand

Commit Message

Qu Wenruo July 26, 2022, 6:29 a.m. UTC
The new test case will verify that btrfs can handle one corrupted device
without affecting the consistency of the filesystem.

Unlike a missing device, one corrupted device can return garbage to the fs,
thus btrfs has to utilize its data/metadata checksum to verify which
data is correct.

The test case will:

- Create a small fs
  Mostly to speedup the test

- Fill the fs with a regular file

- Use fsstress to create some contents

- Save the fssum for later verification

- Corrupt one device with garbage but keep the primary superblock
  untouched

- Run fssum verification

- Run scrub to fix the fs

- Run scrub again to make sure the fs is fine

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/261.out |  2 +
 2 files changed, 96 insertions(+)
 create mode 100755 tests/btrfs/261
 create mode 100644 tests/btrfs/261.out

Comments

Boris Burkov July 26, 2022, 11:09 p.m. UTC | #1
On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
> The new test case will verify that btrfs can handle one corrupted device
> without affecting the consistency of the filesystem.
> 
> Unlike a missing device, one corrupted device can return garbage to the fs,
> thus btrfs has to utilize its data/metadata checksum to verify which
> data is correct.

> 
> The test case will:
> 
> - Create a small fs
>   Mostly to speedup the test
> 
> - Fill the fs with a regular file
> 
> - Use fsstress to create some contents
> 
> - Save the fssum for later verification
> 
> - Corrupt one device with garbage but keep the primary superblock
>   untouched
> 
> - Run fssum verification
> 
> - Run scrub to fix the fs
> 
> - Run scrub again to make sure the fs is fine
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Works for me, and looks like a nice test to complement btrfs/027.
Reviewed-by: Boris Burkov <boris@bur.io>

> ---
>  tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/261.out |  2 +
>  2 files changed, 96 insertions(+)
>  create mode 100755 tests/btrfs/261
>  create mode 100644 tests/btrfs/261.out
> 
> diff --git a/tests/btrfs/261 b/tests/btrfs/261
> new file mode 100755
> index 00000000..15218e28
> --- /dev/null
> +++ b/tests/btrfs/261
> @@ -0,0 +1,94 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 261
> +#
> +# Make sure btrfs raid profiles can handling one corrupted device
> +# without affecting the consistency of the fs.
> +#
> +. ./common/preamble
> +_begin_fstest raid
> +
> +. ./common/filter
> +. ./common/populate
> +
> +_supported_fs btrfs
> +_require_scratch_dev_pool 4
> +_require_fssum
> +
> +prepare_fs()
> +{
> +	local profile=$1
> +
> +	# We don't want too large fs which can take too long to populate
> +	# And the extra redirection of stderr is to avoid the RAID56 warning
> +	# message to polluate the golden output
> +	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
> +	if [ $? -ne 0 ]; then
> +		echo "mkfs $mkfs_opts failed"
> +		return
> +	fi
> +
> +	# Disable compression, as compressed read repair is known to have problems
> +	_scratch_mount -o compress=no
> +
> +	# Fill some part of the fs first
> +	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
> +
> +	# Then use fsstress to generate some extra contents.
> +	# Disable setattr related operations, as it may set NODATACOW which will
> +	# not allow us to use btrfs checksum to verify the content.
> +	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
> +	sync
> +
> +	# Save the fssum of this fs
> +	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
> +	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
> +	_scratch_unmount
> +}
> +
> +workload()
> +{
> +	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
> +	local profile=$1
> +	local num_devs=$2
> +
> +	_scratch_dev_pool_get $num_devs
> +	echo "=== Testing profile $profile ===" >> $seqres.full
> +	rm -f -- $tmp.saved_fssum
> +	prepare_fs $profile
> +
> +	# Corrupt the target device, only keep the superblock.
> +	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
> +
> +	_scratch_mount
> +
> +	# All content should be fine
> +	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
> +
> +	# Scrub to fix the fs, this is known to report various correctable
> +	# errors
> +	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
> +
> +	# Make sure above scrub fixed the fs
> +	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
> +	if [ $? -ne 0 ]; then
> +		echo "scrub failed to fix the fs for profile $profile"
> +	fi
> +	_scratch_unmount
> +	_scratch_dev_pool_put
> +}
> +
> +workload raid1 2
> +workload raid1c3 3
> +workload raid1c4 4
> +workload raid10 4
> +workload raid5 3
> +workload raid6 4

Speaking of 027, 

Can you implement this with _btrfs_profile_configs?

Further, you could imagine writing a more generic test that does:
for raid in raids:
        create-fs raid
        screw-up disk(s)
        check-condition

and make 027 and this new one (and others?) special cases of that.
I think this might fall under YAGNI.. Food for thought :)

> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
> new file mode 100644
> index 00000000..679ddc0f
> --- /dev/null
> +++ b/tests/btrfs/261.out
> @@ -0,0 +1,2 @@
> +QA output created by 261
> +Silence is golden
> -- 
> 2.36.1
>
Qu Wenruo July 27, 2022, 1:39 a.m. UTC | #2
On 2022/7/27 07:09, Boris Burkov wrote:
> On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
>> The new test case will verify that btrfs can handle one corrupted device
>> without affecting the consistency of the filesystem.
>>
>> Unlike a missing device, one corrupted device can return garbage to the fs,
>> thus btrfs has to utilize its data/metadata checksum to verify which
>> data is correct.
>
>>
>> The test case will:
>>
>> - Create a small fs
>>    Mostly to speedup the test
>>
>> - Fill the fs with a regular file
>>
>> - Use fsstress to create some contents
>>
>> - Save the fssum for later verification
>>
>> - Corrupt one device with garbage but keep the primary superblock
>>    untouched
>>
>> - Run fssum verification
>>
>> - Run scrub to fix the fs
>>
>> - Run scrub again to make sure the fs is fine
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Works for me, and looks like a nice test to complement btrfs/027.
> Reviewed-by: Boris Burkov <boris@bur.io>
>
>> ---
>>   tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/btrfs/261.out |  2 +
>>   2 files changed, 96 insertions(+)
>>   create mode 100755 tests/btrfs/261
>>   create mode 100644 tests/btrfs/261.out
>>
>> diff --git a/tests/btrfs/261 b/tests/btrfs/261
>> new file mode 100755
>> index 00000000..15218e28
>> --- /dev/null
>> +++ b/tests/btrfs/261
>> @@ -0,0 +1,94 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 261
>> +#
>> +# Make sure btrfs raid profiles can handling one corrupted device
>> +# without affecting the consistency of the fs.
>> +#
>> +. ./common/preamble
>> +_begin_fstest raid
>> +
>> +. ./common/filter
>> +. ./common/populate
>> +
>> +_supported_fs btrfs
>> +_require_scratch_dev_pool 4
>> +_require_fssum
>> +
>> +prepare_fs()
>> +{
>> +	local profile=$1
>> +
>> +	# We don't want too large fs which can take too long to populate
>> +	# And the extra redirection of stderr is to avoid the RAID56 warning
>> +	# message to polluate the golden output
>> +	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
>> +	if [ $? -ne 0 ]; then
>> +		echo "mkfs $mkfs_opts failed"
>> +		return
>> +	fi
>> +
>> +	# Disable compression, as compressed read repair is known to have problems
>> +	_scratch_mount -o compress=no
>> +
>> +	# Fill some part of the fs first
>> +	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
>> +
>> +	# Then use fsstress to generate some extra contents.
>> +	# Disable setattr related operations, as it may set NODATACOW which will
>> +	# not allow us to use btrfs checksum to verify the content.
>> +	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
>> +	sync
>> +
>> +	# Save the fssum of this fs
>> +	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
>> +	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
>> +	_scratch_unmount
>> +}
>> +
>> +workload()
>> +{
>> +	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
>> +	local profile=$1
>> +	local num_devs=$2
>> +
>> +	_scratch_dev_pool_get $num_devs
>> +	echo "=== Testing profile $profile ===" >> $seqres.full
>> +	rm -f -- $tmp.saved_fssum
>> +	prepare_fs $profile
>> +
>> +	# Corrupt the target device, only keep the superblock.
>> +	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
>> +
>> +	_scratch_mount
>> +
>> +	# All content should be fine
>> +	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
>> +
>> +	# Scrub to fix the fs, this is known to report various correctable
>> +	# errors
>> +	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
>> +
>> +	# Make sure above scrub fixed the fs
>> +	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
>> +	if [ $? -ne 0 ]; then
>> +		echo "scrub failed to fix the fs for profile $profile"
>> +	fi
>> +	_scratch_unmount
>> +	_scratch_dev_pool_put
>> +}
>> +
>> +workload raid1 2
>> +workload raid1c3 3
>> +workload raid1c4 4
>> +workload raid10 4
>> +workload raid5 3
>> +workload raid6 4
>
> Speaking of 027,
>
> Can you implement this with _btrfs_profile_configs?

In fact _btrfs_profile_configs() is what I went initially, but the
following factors prevent it from being used:

- No way to specify the number of devices
   That's not a big deal though.

- No way to skip unsupported profiles
   Like DUP/SINGLE can not meet the duplication requirement.


And since _btrfs_profiles_configs() directly provides the mkfs options,
it's not easy to just pick what we need.

Personally speaking, it would be much easier if we make
_btrfs_profiles_configs() to just return a profile, not full mkfs options.

>
> Further, you could imagine writing a more generic test that does:
> for raid in raids:
>          create-fs raid
>          screw-up disk(s)
>          check-condition

I have seen some helper in `common/populate`, but unfortunately it only
supports XFS and EXT4.

It may be a good idea to enhance that file though.

Thanks,
Qu

>
> and make 027 and this new one (and others?) special cases of that.
> I think this might fall under YAGNI.. Food for thought :)
>
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
>> new file mode 100644
>> index 00000000..679ddc0f
>> --- /dev/null
>> +++ b/tests/btrfs/261.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 261
>> +Silence is golden
>> --
>> 2.36.1
>>
Boris Burkov July 27, 2022, 2:49 a.m. UTC | #3
On Wed, Jul 27, 2022 at 09:39:46AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/7/27 07:09, Boris Burkov wrote:
> > On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
> > > The new test case will verify that btrfs can handle one corrupted device
> > > without affecting the consistency of the filesystem.
> > > 
> > > Unlike a missing device, one corrupted device can return garbage to the fs,
> > > thus btrfs has to utilize its data/metadata checksum to verify which
> > > data is correct.
> > 
> > > 
> > > The test case will:
> > > 
> > > - Create a small fs
> > >    Mostly to speedup the test
> > > 
> > > - Fill the fs with a regular file
> > > 
> > > - Use fsstress to create some contents
> > > 
> > > - Save the fssum for later verification
> > > 
> > > - Corrupt one device with garbage but keep the primary superblock
> > >    untouched
> > > 
> > > - Run fssum verification
> > > 
> > > - Run scrub to fix the fs
> > > 
> > > - Run scrub again to make sure the fs is fine
> > > 
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > 
> > Works for me, and looks like a nice test to complement btrfs/027.
> > Reviewed-by: Boris Burkov <boris@bur.io>
> > 
> > > ---
> > >   tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > >   tests/btrfs/261.out |  2 +
> > >   2 files changed, 96 insertions(+)
> > >   create mode 100755 tests/btrfs/261
> > >   create mode 100644 tests/btrfs/261.out
> > > 
> > > diff --git a/tests/btrfs/261 b/tests/btrfs/261
> > > new file mode 100755
> > > index 00000000..15218e28
> > > --- /dev/null
> > > +++ b/tests/btrfs/261
> > > @@ -0,0 +1,94 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FS QA Test 261
> > > +#
> > > +# Make sure btrfs raid profiles can handling one corrupted device
> > > +# without affecting the consistency of the fs.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest raid
> > > +
> > > +. ./common/filter
> > > +. ./common/populate
> > > +
> > > +_supported_fs btrfs
> > > +_require_scratch_dev_pool 4
> > > +_require_fssum
> > > +
> > > +prepare_fs()
> > > +{
> > > +	local profile=$1
> > > +
> > > +	# We don't want too large fs which can take too long to populate
> > > +	# And the extra redirection of stderr is to avoid the RAID56 warning
> > > +	# message to polluate the golden output
> > > +	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
> > > +	if [ $? -ne 0 ]; then
> > > +		echo "mkfs $mkfs_opts failed"
> > > +		return
> > > +	fi
> > > +
> > > +	# Disable compression, as compressed read repair is known to have problems
> > > +	_scratch_mount -o compress=no
> > > +
> > > +	# Fill some part of the fs first
> > > +	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
> > > +
> > > +	# Then use fsstress to generate some extra contents.
> > > +	# Disable setattr related operations, as it may set NODATACOW which will
> > > +	# not allow us to use btrfs checksum to verify the content.
> > > +	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
> > > +	sync
> > > +
> > > +	# Save the fssum of this fs
> > > +	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
> > > +	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
> > > +	_scratch_unmount
> > > +}
> > > +
> > > +workload()
> > > +{
> > > +	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
> > > +	local profile=$1
> > > +	local num_devs=$2
> > > +
> > > +	_scratch_dev_pool_get $num_devs
> > > +	echo "=== Testing profile $profile ===" >> $seqres.full
> > > +	rm -f -- $tmp.saved_fssum
> > > +	prepare_fs $profile
> > > +
> > > +	# Corrupt the target device, only keep the superblock.
> > > +	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
> > > +
> > > +	_scratch_mount
> > > +
> > > +	# All content should be fine
> > > +	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
> > > +
> > > +	# Scrub to fix the fs, this is known to report various correctable
> > > +	# errors
> > > +	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
> > > +
> > > +	# Make sure above scrub fixed the fs
> > > +	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
> > > +	if [ $? -ne 0 ]; then
> > > +		echo "scrub failed to fix the fs for profile $profile"
> > > +	fi
> > > +	_scratch_unmount
> > > +	_scratch_dev_pool_put
> > > +}
> > > +
> > > +workload raid1 2
> > > +workload raid1c3 3
> > > +workload raid1c4 4
> > > +workload raid10 4
> > > +workload raid5 3
> > > +workload raid6 4
> > 
> > Speaking of 027,
> > 
> > Can you implement this with _btrfs_profile_configs?
> 
> In fact _btrfs_profile_configs() is what I went initially, but the
> following factors prevent it from being used:
> 
> - No way to specify the number of devices
>   That's not a big deal though.
> 
> - No way to skip unsupported profiles
>   Like DUP/SINGLE can not meet the duplication requirement.

It takes an argument and if you pass "replace-missing" I think it does
what you want. You could also add another string it accepts for
different semantics.

> 
> 
> And since _btrfs_profiles_configs() directly provides the mkfs options,
> it's not easy to just pick what we need.
> 
> Personally speaking, it would be much easier if we make
> _btrfs_profiles_configs() to just return a profile, not full mkfs options.

Giving the caller more flexibility with their call to mkfs does seem
like a good idea to me.

> 
> > 
> > Further, you could imagine writing a more generic test that does:
> > for raid in raids:
> >          create-fs raid
> >          screw-up disk(s)
> >          check-condition
> 
> I have seen some helper in `common/populate`, but unfortunately it only
> supports XFS and EXT4.
> 
> It may be a good idea to enhance that file though.

Cool, never noticed that before. Would be interesting to make it work
with btrfs.

> 
> Thanks,
> Qu
> 
> > 
> > and make 027 and this new one (and others?) special cases of that.
> > I think this might fall under YAGNI.. Food for thought :)
> > 
> > > +
> > > +echo "Silence is golden"
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
> > > new file mode 100644
> > > index 00000000..679ddc0f
> > > --- /dev/null
> > > +++ b/tests/btrfs/261.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 261
> > > +Silence is golden
> > > --
> > > 2.36.1
> > >
Zorro Lang July 27, 2022, 4:57 a.m. UTC | #4
On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
> The new test case will verify that btrfs can handle one corrupted device
> without affecting the consistency of the filesystem.
> 
> Unlike a missing device, one corrupted device can return garbage to the fs,
> thus btrfs has to utilize its data/metadata checksum to verify which
> data is correct.
> 
> The test case will:
> 
> - Create a small fs
>   Mostly to speedup the test
> 
> - Fill the fs with a regular file
> 
> - Use fsstress to create some contents
> 
> - Save the fssum for later verification
> 
> - Corrupt one device with garbage but keep the primary superblock
>   untouched
> 
> - Run fssum verification
> 
> - Run scrub to fix the fs
> 
> - Run scrub again to make sure the fs is fine
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/261.out |  2 +
>  2 files changed, 96 insertions(+)
>  create mode 100755 tests/btrfs/261
>  create mode 100644 tests/btrfs/261.out
> 
> diff --git a/tests/btrfs/261 b/tests/btrfs/261
> new file mode 100755
> index 00000000..15218e28
> --- /dev/null
> +++ b/tests/btrfs/261
> @@ -0,0 +1,94 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 261
> +#
> +# Make sure btrfs raid profiles can handling one corrupted device
> +# without affecting the consistency of the fs.
> +#
> +. ./common/preamble
> +_begin_fstest raid

Do you think about add it into auto group, or more groups?

> +
> +. ./common/filter
> +. ./common/populate
> +
> +_supported_fs btrfs
> +_require_scratch_dev_pool 4
> +_require_fssum
> +
> +prepare_fs()
> +{
> +	local profile=$1
> +
> +	# We don't want too large fs which can take too long to populate
> +	# And the extra redirection of stderr is to avoid the RAID56 warning
> +	# message to polluate the golden output
> +	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
> +	if [ $? -ne 0 ]; then
> +		echo "mkfs $mkfs_opts failed"
> +		return
> +	fi

If mkfs fails, below workload() will keep running, I think it's not what you
want, right?

> +
> +	# Disable compression, as compressed read repair is known to have problems
> +	_scratch_mount -o compress=no
> +
> +	# Fill some part of the fs first
> +	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
> +
> +	# Then use fsstress to generate some extra contents.
> +	# Disable setattr related operations, as it may set NODATACOW which will
> +	# not allow us to use btrfs checksum to verify the content.
> +	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
> +	sync
> +
> +	# Save the fssum of this fs
> +	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
> +	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
> +	_scratch_unmount
> +}
> +
> +workload()
> +{
> +	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')

In common/config, we always set SCRATCH_DEV to the first of $SCRATCH_DEV_POOL,
as below:

        # a btrfs tester will set only SCRATCH_DEV_POOL, we will put first of its dev
        # to SCRATCH_DEV and rest to SCRATCH_DEV_POOL to maintain the backward compatibility
        if [ ! -z "$SCRATCH_DEV_POOL" ]; then
                if [ ! -z "$SCRATCH_DEV" ]; then
                        echo "common/config: Error: \$SCRATCH_DEV ($SCRATCH_DEV) should be unset when \$SCRATCH_DEV_POOL ($SCRATCH_DEV_POOL) is set"
                        exit 1
                fi
                SCRATCH_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`

is that help?

> +	local profile=$1
> +	local num_devs=$2
> +
> +	_scratch_dev_pool_get $num_devs
> +	echo "=== Testing profile $profile ===" >> $seqres.full
> +	rm -f -- $tmp.saved_fssum
> +	prepare_fs $profile
> +
> +	# Corrupt the target device, only keep the superblock.
> +	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1

Do you need "_require_scratch_nocheck", if you corrupt the fs purposely?

But I think it won't check the SCRATCH_DEV currently, due to you even don't use
_require_scratch, and the _require_scratch_dev_pool might not trigger a fsck at
the end of the testing.

Hmm... I prefer having a "_require_scratch_nocheck", and you can use $SCRATCH_DEV
to replace your "$target".

> +
> +	_scratch_mount
> +
> +	# All content should be fine
> +	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
> +
> +	# Scrub to fix the fs, this is known to report various correctable
> +	# errors
> +	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
> +
> +	# Make sure above scrub fixed the fs
> +	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
> +	if [ $? -ne 0 ]; then
> +		echo "scrub failed to fix the fs for profile $profile"
> +	fi
> +	_scratch_unmount
> +	_scratch_dev_pool_put
> +}
> +
> +workload raid1 2
> +workload raid1c3 3
> +workload raid1c4 4
> +workload raid10 4
> +workload raid5 3
> +workload raid6 4
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
> new file mode 100644
> index 00000000..679ddc0f
> --- /dev/null
> +++ b/tests/btrfs/261.out
> @@ -0,0 +1,2 @@
> +QA output created by 261
> +Silence is golden
> -- 
> 2.36.1
>
Qu Wenruo July 27, 2022, 5:03 a.m. UTC | #5
On 2022/7/27 12:57, Zorro Lang wrote:
> On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
>> The new test case will verify that btrfs can handle one corrupted device
>> without affecting the consistency of the filesystem.
>>
>> Unlike a missing device, one corrupted device can return garbage to the fs,
>> thus btrfs has to utilize its data/metadata checksum to verify which
>> data is correct.
>>
>> The test case will:
>>
>> - Create a small fs
>>    Mostly to speedup the test
>>
>> - Fill the fs with a regular file
>>
>> - Use fsstress to create some contents
>>
>> - Save the fssum for later verification
>>
>> - Corrupt one device with garbage but keep the primary superblock
>>    untouched
>>
>> - Run fssum verification
>>
>> - Run scrub to fix the fs
>>
>> - Run scrub again to make sure the fs is fine
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/btrfs/261.out |  2 +
>>   2 files changed, 96 insertions(+)
>>   create mode 100755 tests/btrfs/261
>>   create mode 100644 tests/btrfs/261.out
>>
>> diff --git a/tests/btrfs/261 b/tests/btrfs/261
>> new file mode 100755
>> index 00000000..15218e28
>> --- /dev/null
>> +++ b/tests/btrfs/261
>> @@ -0,0 +1,94 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 261
>> +#
>> +# Make sure btrfs raid profiles can handling one corrupted device
>> +# without affecting the consistency of the fs.
>> +#
>> +. ./common/preamble
>> +_begin_fstest raid
>
> Do you think about add it into auto group, or more groups?

Oh all my bad, was planning to add to auto later but forgot.

>
>> +
>> +. ./common/filter
>> +. ./common/populate
>> +
>> +_supported_fs btrfs
>> +_require_scratch_dev_pool 4
>> +_require_fssum
>> +
>> +prepare_fs()
>> +{
>> +	local profile=$1
>> +
>> +	# We don't want too large fs which can take too long to populate
>> +	# And the extra redirection of stderr is to avoid the RAID56 warning
>> +	# message to polluate the golden output
>> +	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
>> +	if [ $? -ne 0 ]; then
>> +		echo "mkfs $mkfs_opts failed"
>> +		return
>> +	fi
>
> If mkfs fails, below workload() will keep running, I think it's not what you
> want, right?

Right, it should call _fail() instead.

>
>> +
>> +	# Disable compression, as compressed read repair is known to have problems
>> +	_scratch_mount -o compress=no
>> +
>> +	# Fill some part of the fs first
>> +	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
>> +
>> +	# Then use fsstress to generate some extra contents.
>> +	# Disable setattr related operations, as it may set NODATACOW which will
>> +	# not allow us to use btrfs checksum to verify the content.
>> +	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
>> +	sync
>> +
>> +	# Save the fssum of this fs
>> +	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
>> +	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
>> +	_scratch_unmount
>> +}
>> +
>> +workload()
>> +{
>> +	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
>
> In common/config, we always set SCRATCH_DEV to the first of $SCRATCH_DEV_POOL,
> as below:
>
>          # a btrfs tester will set only SCRATCH_DEV_POOL, we will put first of its dev
>          # to SCRATCH_DEV and rest to SCRATCH_DEV_POOL to maintain the backward compatibility
>          if [ ! -z "$SCRATCH_DEV_POOL" ]; then
>                  if [ ! -z "$SCRATCH_DEV" ]; then
>                          echo "common/config: Error: \$SCRATCH_DEV ($SCRATCH_DEV) should be unset when \$SCRATCH_DEV_POOL ($SCRATCH_DEV_POOL) is set"
>                          exit 1
>                  fi
>                  SCRATCH_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`
>
> is that help?

Thanks, then I can use SCRATCH_DEV directly.

>
>> +	local profile=$1
>> +	local num_devs=$2
>> +
>> +	_scratch_dev_pool_get $num_devs
>> +	echo "=== Testing profile $profile ===" >> $seqres.full
>> +	rm -f -- $tmp.saved_fssum
>> +	prepare_fs $profile
>> +
>> +	# Corrupt the target device, only keep the superblock.
>> +	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
>
> Do you need "_require_scratch_nocheck", if you corrupt the fs purposely?

I don't think we need that.

The point here is, we will immediately mount the fs and do a RW scrub,
which should make all devices good.

Thus if fsck after the test case found something wrong, it means the RW
scrub doesn't properly repair the fs.

Thanks for the review,
Qu

>
> But I think it won't check the SCRATCH_DEV currently, due to you even don't use
> _require_scratch, and the _require_scratch_dev_pool might not trigger a fsck at
> the end of the testing.
>
> Hmm... I prefer having a "_require_scratch_nocheck", and you can use $SCRATCH_DEV
> to replace your "$target".
>
>> +
>> +	_scratch_mount
>> +
>> +	# All content should be fine
>> +	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
>> +
>> +	# Scrub to fix the fs, this is known to report various correctable
>> +	# errors
>> +	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
>> +
>> +	# Make sure above scrub fixed the fs
>> +	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
>> +	if [ $? -ne 0 ]; then
>> +		echo "scrub failed to fix the fs for profile $profile"
>> +	fi
>> +	_scratch_unmount
>> +	_scratch_dev_pool_put
>> +}
>> +
>> +workload raid1 2
>> +workload raid1c3 3
>> +workload raid1c4 4
>> +workload raid10 4
>> +workload raid5 3
>> +workload raid6 4
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
>> new file mode 100644
>> index 00000000..679ddc0f
>> --- /dev/null
>> +++ b/tests/btrfs/261.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 261
>> +Silence is golden
>> --
>> 2.36.1
>>
>
diff mbox series

Patch

diff --git a/tests/btrfs/261 b/tests/btrfs/261
new file mode 100755
index 00000000..15218e28
--- /dev/null
+++ b/tests/btrfs/261
@@ -0,0 +1,94 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 261
+#
+# Make sure btrfs raid profiles can handling one corrupted device
+# without affecting the consistency of the fs.
+#
+. ./common/preamble
+_begin_fstest raid
+
+. ./common/filter
+. ./common/populate
+
+_supported_fs btrfs
+_require_scratch_dev_pool 4
+_require_fssum
+
+prepare_fs()
+{
+	local profile=$1
+
+	# We don't want too large fs which can take too long to populate
+	# And the extra redirection of stderr is to avoid the RAID56 warning
+	# message to polluate the golden output
+	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
+	if [ $? -ne 0 ]; then
+		echo "mkfs $mkfs_opts failed"
+		return
+	fi
+
+	# Disable compression, as compressed read repair is known to have problems
+	_scratch_mount -o compress=no
+
+	# Fill some part of the fs first
+	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
+
+	# Then use fsstress to generate some extra contents.
+	# Disable setattr related operations, as it may set NODATACOW which will
+	# not allow us to use btrfs checksum to verify the content.
+	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
+	sync
+
+	# Save the fssum of this fs
+	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
+	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
+	_scratch_unmount
+}
+
+workload()
+{
+	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
+	local profile=$1
+	local num_devs=$2
+
+	_scratch_dev_pool_get $num_devs
+	echo "=== Testing profile $profile ===" >> $seqres.full
+	rm -f -- $tmp.saved_fssum
+	prepare_fs $profile
+
+	# Corrupt the target device, only keep the superblock.
+	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
+
+	_scratch_mount
+
+	# All content should be fine
+	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
+
+	# Scrub to fix the fs, this is known to report various correctable
+	# errors
+	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
+
+	# Make sure above scrub fixed the fs
+	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
+	if [ $? -ne 0 ]; then
+		echo "scrub failed to fix the fs for profile $profile"
+	fi
+	_scratch_unmount
+	_scratch_dev_pool_put
+}
+
+workload raid1 2
+workload raid1c3 3
+workload raid1c4 4
+workload raid10 4
+workload raid5 3
+workload raid6 4
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
new file mode 100644
index 00000000..679ddc0f
--- /dev/null
+++ b/tests/btrfs/261.out
@@ -0,0 +1,2 @@ 
+QA output created by 261
+Silence is golden