diff mbox series

fstests: btrfs: add a regression test case to make sure scrub can detect errors

Message ID 20221106235348.9732-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series fstests: btrfs: add a regression test case to make sure scrub can detect errors | expand

Commit Message

Qu Wenruo Nov. 6, 2022, 11:53 p.m. UTC
There is a regression in v6.1-rc kernel, which will prevent btrfs scrub
from detecting corruption (thus no repair either).

The regression is caused by commit 786672e9e1a3 ("btrfs: scrub: use
larger block size for data extent scrub").

The new test case will:

- Create a data extent with 2 sectors
- Corrupt the second sector of above data extent
- Scrub to make sure we detect the corruption

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/btrfs/278     | 66 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/278.out |  2 ++
 2 files changed, 68 insertions(+)
 create mode 100755 tests/btrfs/278
 create mode 100644 tests/btrfs/278.out

Comments

Zorro Lang Nov. 7, 2022, 2:58 a.m. UTC | #1
On Mon, Nov 07, 2022 at 07:53:48AM +0800, Qu Wenruo wrote:
> There is a regression in v6.1-rc kernel, which will prevent btrfs scrub
> from detecting corruption (thus no repair either).
> 
> The regression is caused by commit 786672e9e1a3 ("btrfs: scrub: use
> larger block size for data extent scrub").
> 
> The new test case will:
> 
> - Create a data extent with 2 sectors
> - Corrupt the second sector of above data extent
> - Scrub to make sure we detect the corruption
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  tests/btrfs/278     | 66 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/278.out |  2 ++

Hi,

btrfs/278 has been taken, please rebase on the latest for-next branch, or
use a big enough number.

>  2 files changed, 68 insertions(+)
>  create mode 100755 tests/btrfs/278
>  create mode 100644 tests/btrfs/278.out
> 
> diff --git a/tests/btrfs/278 b/tests/btrfs/278
> new file mode 100755
> index 00000000..ebbf207a
> --- /dev/null
> +++ b/tests/btrfs/278
> @@ -0,0 +1,66 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 278
> +#
> +# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
> +# larger block size for data extent scrub"), which makes btrfs scrub unable
> +# to detect corruption if it's not the first sector of an data extent.

So 786672e9e1a3 is the commit which brought in this regression issue? Is there
a fix patch or commit by reviewed?

> +#
> +. ./common/preamble
> +_begin_fstest auto quick scrub
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/btrfs

The common/btrfs is imported automatically, so don't need to import it at here.
And I think this case doesn't use any filter, if so, the common/filter isn't
needed either.

> +
> +# real QA test starts here
> +
> +# Modify as appropriate.

This comment can be removed.

> +_supported_fs btrfs
> +
> +# Need to use 4K as sector size
> +_require_btrfs_support_sectorsize 4096
> +_require_scratch
> +
> +_scratch_mkfs >> $seqres.full

Just check with you, do you need "-s 4k" so make sure sector size is 4k (even
if 4k is supported)?

Thanks,
Zorro

> +_scratch_mount
> +
> +# Create a data extent with 2 sectors
> +$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
> +sync
> +
> +first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
> +echo "logical of the first sector: $first_logical" >> $seqres.full
> +
> +second_logical=$(( $first_logical + 4096 ))
> +echo "logical of the second sector: $second_logical" >> $seqres.full
> +
> +second_physical=$(_btrfs_get_physical $second_logical 1)
> +echo "physical of the second sector: $second_physical" >> $seqres.full
> +
> +second_dev=$(_btrfs_get_device_path $second_logical 1)
> +echo "device of the second sector: $second_dev" >> $seqres.full
> +
> +_scratch_unmount
> +
> +# Corrupt the second sector of the data extent.
> +$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
> +_scratch_mount
> +
> +# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
> +# it will output an error message.
> +$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
> +cat $tmp.output >> $seqres.full
> +_scratch_unmount
> +
> +if ! grep -q "csum=1" $tmp.output; then
> +	echo "Scrub failed to detect corruption"
> +fi
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
> new file mode 100644
> index 00000000..b4c4a95d
> --- /dev/null
> +++ b/tests/btrfs/278.out
> @@ -0,0 +1,2 @@
> +QA output created by 278
> +Silence is golden
> -- 
> 2.38.0
>
Zorro Lang Nov. 7, 2022, 3:25 a.m. UTC | #2
On Mon, Nov 07, 2022 at 07:53:48AM +0800, Qu Wenruo wrote:
> There is a regression in v6.1-rc kernel, which will prevent btrfs scrub
> from detecting corruption (thus no repair either).
> 
> The regression is caused by commit 786672e9e1a3 ("btrfs: scrub: use
> larger block size for data extent scrub").
> 
> The new test case will:
> 
> - Create a data extent with 2 sectors
> - Corrupt the second sector of above data extent
> - Scrub to make sure we detect the corruption
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  tests/btrfs/278     | 66 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/278.out |  2 ++
>  2 files changed, 68 insertions(+)
>  create mode 100755 tests/btrfs/278
>  create mode 100644 tests/btrfs/278.out
> 
> diff --git a/tests/btrfs/278 b/tests/btrfs/278
> new file mode 100755
> index 00000000..ebbf207a
> --- /dev/null
> +++ b/tests/btrfs/278
> @@ -0,0 +1,66 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 278
> +#
> +# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
> +# larger block size for data extent scrub"), which makes btrfs scrub unable
> +# to detect corruption if it's not the first sector of an data extent.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick scrub
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/btrfs
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +
> +# Need to use 4K as sector size
> +_require_btrfs_support_sectorsize 4096
> +_require_scratch

Hi Darrick,

I noticed that you created some scrub helpers in common/fuzzy:
  # Do we have an online scrub program?
  _require_scrub() {
        case "${FSTYP}" in
        "xfs")
                test -x "$XFS_SCRUB_PROG" || _notrun "xfs_scrub not found"
                ;;
        *)
                _notrun "No online scrub program for ${FSTYP}."
                ;;
        esac
  }

  # Scrub the scratch filesystem metadata (online)
  _scratch_scrub() {
        case "${FSTYP}" in
        "xfs")
                $XFS_SCRUB_PROG -d -T -v "$@" $SCRATCH_MNT
                ;;
        *)
                _fail "No online scrub program for ${FSTYP}."
                ;;
        esac
  }

and common/xfs:
  _supports_xfs_scrub()

(PS: How about change _require_scrub to _require_scrub_command, and then calls
_supports_xfs_scrub in _require_scrub to check kernel part? Or combine kernel
and userspace checking all into _require_scrub? )

From the code logic, they're only support xfs now. But we can see btrfs support
scrub too. Did you plan to make them to be common helpers, or just for xfs fuzzy
test inside?

Hi btrfs folks, do you think if btrfs need _require_scrub and _scratch_scrub?

Thanks,
Zorro

> +
> +_scratch_mkfs >> $seqres.full
> +_scratch_mount
> +
> +# Create a data extent with 2 sectors
> +$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
> +sync
> +
> +first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
> +echo "logical of the first sector: $first_logical" >> $seqres.full
> +
> +second_logical=$(( $first_logical + 4096 ))
> +echo "logical of the second sector: $second_logical" >> $seqres.full
> +
> +second_physical=$(_btrfs_get_physical $second_logical 1)
> +echo "physical of the second sector: $second_physical" >> $seqres.full
> +
> +second_dev=$(_btrfs_get_device_path $second_logical 1)
> +echo "device of the second sector: $second_dev" >> $seqres.full
> +
> +_scratch_unmount
> +
> +# Corrupt the second sector of the data extent.
> +$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
> +_scratch_mount
> +
> +# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
> +# it will output an error message.
> +$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
> +cat $tmp.output >> $seqres.full
> +_scratch_unmount
> +
> +if ! grep -q "csum=1" $tmp.output; then
> +	echo "Scrub failed to detect corruption"
> +fi
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
> new file mode 100644
> index 00000000..b4c4a95d
> --- /dev/null
> +++ b/tests/btrfs/278.out
> @@ -0,0 +1,2 @@
> +QA output created by 278
> +Silence is golden
> -- 
> 2.38.0
>
Qu Wenruo Nov. 7, 2022, 4:41 a.m. UTC | #3
On 2022/11/7 10:58, Zorro Lang wrote:
> On Mon, Nov 07, 2022 at 07:53:48AM +0800, Qu Wenruo wrote:
>> There is a regression in v6.1-rc kernel, which will prevent btrfs scrub
>> from detecting corruption (thus no repair either).
>>
>> The regression is caused by commit 786672e9e1a3 ("btrfs: scrub: use
>> larger block size for data extent scrub").
>>
>> The new test case will:
>>
>> - Create a data extent with 2 sectors
>> - Corrupt the second sector of above data extent
>> - Scrub to make sure we detect the corruption
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   tests/btrfs/278     | 66 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/btrfs/278.out |  2 ++
> 
> Hi,
> 
> btrfs/278 has been taken, please rebase on the latest for-next branch, or
> use a big enough number.

I'm not sure if some one else has already complained about this, the 
idea of a for-next branch of a test suite is stupid (nor the weekly tags).

Fstests is a test suite, it is only for fs developers, I doubt why we 
need tags/for-next at all.

Remember, before the whole for-next/tags things, developers just 
checkout the master branch and go ./new, no need to bother the 
tags/branches at all.

> 
>>   2 files changed, 68 insertions(+)
>>   create mode 100755 tests/btrfs/278
>>   create mode 100644 tests/btrfs/278.out
>>
>> diff --git a/tests/btrfs/278 b/tests/btrfs/278
>> new file mode 100755
>> index 00000000..ebbf207a
>> --- /dev/null
>> +++ b/tests/btrfs/278
>> @@ -0,0 +1,66 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 278
>> +#
>> +# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
>> +# larger block size for data extent scrub"), which makes btrfs scrub unable
>> +# to detect corruption if it's not the first sector of an data extent.
> 
> So 786672e9e1a3 is the commit which brought in this regression issue? Is there
> a fix patch or commit by reviewed?

The fix (by reverting it) is send to btrfs mailing list, titled "Revert 
\"btrfs: scrub: use larger block size for data extent scrub\"".

> 
>> +#
>> +. ./common/preamble
>> +_begin_fstest auto quick scrub
>> +
>> +# Import common functions.
>> +. ./common/filter
>> +. ./common/btrfs
> 
> The common/btrfs is imported automatically, so don't need to import it at here.
> And I think this case doesn't use any filter, if so, the common/filter isn't
> needed either.
> 
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
> 
> This comment can be removed.

If you really believe removing those boilerplate makes much sense, then 
I'd say, the template should be updated to remove those completely.

> 
>> +_supported_fs btrfs
>> +
>> +# Need to use 4K as sector size
>> +_require_btrfs_support_sectorsize 4096
>> +_require_scratch
>> +
>> +_scratch_mkfs >> $seqres.full
> 
> Just check with you, do you need "-s 4k" so make sure sector size is 4k (even
> if 4k is supported)?

I'll add "-s 4k" to make it more explicit for systems with larger page 
sizes.

Thanks,
Qu

> 
> Thanks,
> Zorro
> 
>> +_scratch_mount
>> +
>> +# Create a data extent with 2 sectors
>> +$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
>> +sync
>> +
>> +first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
>> +echo "logical of the first sector: $first_logical" >> $seqres.full
>> +
>> +second_logical=$(( $first_logical + 4096 ))
>> +echo "logical of the second sector: $second_logical" >> $seqres.full
>> +
>> +second_physical=$(_btrfs_get_physical $second_logical 1)
>> +echo "physical of the second sector: $second_physical" >> $seqres.full
>> +
>> +second_dev=$(_btrfs_get_device_path $second_logical 1)
>> +echo "device of the second sector: $second_dev" >> $seqres.full
>> +
>> +_scratch_unmount
>> +
>> +# Corrupt the second sector of the data extent.
>> +$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
>> +_scratch_mount
>> +
>> +# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
>> +# it will output an error message.
>> +$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
>> +cat $tmp.output >> $seqres.full
>> +_scratch_unmount
>> +
>> +if ! grep -q "csum=1" $tmp.output; then
>> +	echo "Scrub failed to detect corruption"
>> +fi
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
>> new file mode 100644
>> index 00000000..b4c4a95d
>> --- /dev/null
>> +++ b/tests/btrfs/278.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 278
>> +Silence is golden
>> -- 
>> 2.38.0
>>
>
Zorro Lang Nov. 7, 2022, 6:13 a.m. UTC | #4
On Mon, Nov 07, 2022 at 12:41:34PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/11/7 10:58, Zorro Lang wrote:
> > On Mon, Nov 07, 2022 at 07:53:48AM +0800, Qu Wenruo wrote:
> > > There is a regression in v6.1-rc kernel, which will prevent btrfs scrub
> > > from detecting corruption (thus no repair either).
> > > 
> > > The regression is caused by commit 786672e9e1a3 ("btrfs: scrub: use
> > > larger block size for data extent scrub").
> > > 
> > > The new test case will:
> > > 
> > > - Create a data extent with 2 sectors
> > > - Corrupt the second sector of above data extent
> > > - Scrub to make sure we detect the corruption
> > > 
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > >   tests/btrfs/278     | 66 +++++++++++++++++++++++++++++++++++++++++++++
> > >   tests/btrfs/278.out |  2 ++
> > 
> > Hi,
> > 
> > btrfs/278 has been taken, please rebase on the latest for-next branch, or
> > use a big enough number.
> 
> I'm not sure if some one else has already complained about this, the idea of
> a for-next branch of a test suite is stupid (nor the weekly tags).
> 
> Fstests is a test suite, it is only for fs developers, I doubt why we need
> tags/for-next at all.

1)
I think fstests isn't only for developers, there're many people, they even
never contributed to fstests or linux, but they still need fstests to do
sanity/regression test daily/weekly for some downstream things.

2)
I don't doubt the tags are useless for most users, but as I know some people
enjoy it. And the tags don't bring in troubles to others don't use it, so why
can't I give it tags?

3)
Although fstests is a test tool, but it still might bring in big/small
regression issues. Someone might submit lots of testing jobs after push some
changes, and happy to sleep or enjoy their weekend (wait the test done).
But when they come back, they find all test jobs are broken due to fstests
regression, they get nothing valid testing results. How despondent are they?

Some testing jobs are just automatical testing, they don't want to deal with
fstests issues manually. So the master branch trys to keep stable. The patches
will be in for-next branch at first, then merge onto master branch if no one
complains these patches.

The for-next is for fstests developers, and other developers who always want
to use lastest update, and don't mind (even enjoy) dealing with fstests issues
(if there's) manually. The master branch is for the ones who just hope to get
a stabler (relative) for their automatical testing frame.

> 
> Remember, before the whole for-next/tags things, developers just checkout
> the master branch and go ./new, no need to bother the tags/branches at all.

Now you can treat for-next branch as master branch you used before. As your
using habits, you can always base on for-next branch, master branch isn't for
developers as you now.

> 
> > 
> > >   2 files changed, 68 insertions(+)
> > >   create mode 100755 tests/btrfs/278
> > >   create mode 100644 tests/btrfs/278.out
> > > 
> > > diff --git a/tests/btrfs/278 b/tests/btrfs/278
> > > new file mode 100755
> > > index 00000000..ebbf207a
> > > --- /dev/null
> > > +++ b/tests/btrfs/278
> > > @@ -0,0 +1,66 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FS QA Test 278
> > > +#
> > > +# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
> > > +# larger block size for data extent scrub"), which makes btrfs scrub unable
> > > +# to detect corruption if it's not the first sector of an data extent.
> > 
> > So 786672e9e1a3 is the commit which brought in this regression issue? Is there
> > a fix patch or commit by reviewed?
> 
> The fix (by reverting it) is send to btrfs mailing list, titled "Revert
> \"btrfs: scrub: use larger block size for data extent scrub\"".

OK, thanks.

> 
> > 
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto quick scrub
> > > +
> > > +# Import common functions.
> > > +. ./common/filter
> > > +. ./common/btrfs
> > 
> > The common/btrfs is imported automatically, so don't need to import it at here.
> > And I think this case doesn't use any filter, if so, the common/filter isn't
> > needed either.
> > 
> > > +
> > > +# real QA test starts here
> > > +
> > > +# Modify as appropriate.
> > 
> > This comment can be removed.
> 
> If you really believe removing those boilerplate makes much sense, then I'd
> say, the template should be updated to remove those completely.

Hmm... there're many comments in template should be removed when we start to
write a new case. Most of them are always removed properly, only this line
always be missed.

I don't doubt the template can be improved. But it's a tiny problem, so I
planned to improve that next time when we have chance to improve the template.
Before that, if a patch is all good, only this single line can be removed, I'll
ignore it. If a patch need to change, I'll point out this line incidentally :)

> 
> > 
> > > +_supported_fs btrfs
> > > +
> > > +# Need to use 4K as sector size
> > > +_require_btrfs_support_sectorsize 4096
> > > +_require_scratch
> > > +
> > > +_scratch_mkfs >> $seqres.full
> > 
> > Just check with you, do you need "-s 4k" so make sure sector size is 4k (even
> > if 4k is supported)?
> 
> I'll add "-s 4k" to make it more explicit for systems with larger page
> sizes.

Thanks,
Zorro

> 
> Thanks,
> Qu
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > +_scratch_mount
> > > +
> > > +# Create a data extent with 2 sectors
> > > +$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
> > > +sync
> > > +
> > > +first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
> > > +echo "logical of the first sector: $first_logical" >> $seqres.full
> > > +
> > > +second_logical=$(( $first_logical + 4096 ))
> > > +echo "logical of the second sector: $second_logical" >> $seqres.full
> > > +
> > > +second_physical=$(_btrfs_get_physical $second_logical 1)
> > > +echo "physical of the second sector: $second_physical" >> $seqres.full
> > > +
> > > +second_dev=$(_btrfs_get_device_path $second_logical 1)
> > > +echo "device of the second sector: $second_dev" >> $seqres.full
> > > +
> > > +_scratch_unmount
> > > +
> > > +# Corrupt the second sector of the data extent.
> > > +$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
> > > +_scratch_mount
> > > +
> > > +# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
> > > +# it will output an error message.
> > > +$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
> > > +cat $tmp.output >> $seqres.full
> > > +_scratch_unmount
> > > +
> > > +if ! grep -q "csum=1" $tmp.output; then
> > > +	echo "Scrub failed to detect corruption"
> > > +fi
> > > +
> > > +echo "Silence is golden"
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
> > > new file mode 100644
> > > index 00000000..b4c4a95d
> > > --- /dev/null
> > > +++ b/tests/btrfs/278.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 278
> > > +Silence is golden
> > > -- 
> > > 2.38.0
> > > 
> > 
>
Qu Wenruo Nov. 7, 2022, 7:05 a.m. UTC | #5
On 2022/11/7 14:13, Zorro Lang wrote:
> On Mon, Nov 07, 2022 at 12:41:34PM +0800, Qu Wenruo wrote:
>>
>>
[...]
>>>
>>> btrfs/278 has been taken, please rebase on the latest for-next branch, or
>>> use a big enough number.
>>
>> I'm not sure if some one else has already complained about this, the idea of
>> a for-next branch of a test suite is stupid (nor the weekly tags).
>>
>> Fstests is a test suite, it is only for fs developers, I doubt why we need
>> tags/for-next at all.
> 
> 1)
> I think fstests isn't only for developers, there're many people, they even
> never contributed to fstests or linux, but they still need fstests to do
> sanity/regression test daily/weekly for some downstream things.
> 
> 2)
> I don't doubt the tags are useless for most users, but as I know some people
> enjoy it. And the tags don't bring in troubles to others don't use it, so why
> can't I give it tags?

Tags are providing a stable anchor for downsteam QAers to rely on 
certain tags.
Which is a bad behavior and should be discouraged.

The best way to discourage this thing is, just no tags, no branches, 
forcing them to go the latest master.

Even for downstream QA, they should use the latest master.
They should have their own systems to detect new tests, and if new tests 
fails, they really need to understand why, and better to submit a patch 
to fix fstests.

> 
> 3)
> Although fstests is a test tool, but it still might bring in big/small
> regression issues. Someone might submit lots of testing jobs after push some
> changes, and happy to sleep or enjoy their weekend (wait the test done).
> But when they come back, they find all test jobs are broken due to fstests
> regression, they get nothing valid testing results. How despondent are they?

This is a bad behavior already, passing tests means nothing.
Only failed tests are good ones, even if it's a false alert.

Such failures drives us to improve the project.

Especially this already means they may miss some high priority regressions.

> 
> Some testing jobs are just automatical testing, they don't want to deal with
> fstests issues manually. So the master branch trys to keep stable. The patches
> will be in for-next branch at first, then merge onto master branch if no one
> complains these patches.

That's a bad argument, convenience should not have a higher priority 
than possible regressions/bugs.

And if QA guys want a stable basis for their downstream tests, they 
should contribute to fstests, not relying on old tags to give them a 
false joy of passing tests.

At least from my limited understanding, more commits to upstream 
projects would always be a plus for QA people.

> 
> The for-next is for fstests developers, and other developers who always want
> to use lastest update, and don't mind (even enjoy) dealing with fstests issues
> (if there's) manually. The master branch is for the ones who just hope to get
> a stabler (relative) for their automatical testing frame.

It's fine to not update fstests hourly, but relying on an old 
branch/tags for months just to make the existing ones to pass, no, this 
is definitely not the way to go.

> 
>>
>> Remember, before the whole for-next/tags things, developers just checkout
>> the master branch and go ./new, no need to bother the tags/branches at all.
> 
> Now you can treat for-next branch as master branch you used before. As your
> using habits, you can always base on for-next branch, master branch isn't for
> developers as you now.

Another point is, if you don't plan to backport tests for older tags, 
just don't keep those tags, it makes no help.
Without tags, for-next doesn't make much sense either.

If some users don't want cutting edges, they always have the choice to 
not update their local fstests that frequently (but hopefully not for 
too long time).

But still, just one master branch is a very strong gesture to discourage 
incorrect QA behavior.

> 
>>
>>>
>>>>    2 files changed, 68 insertions(+)
>>>>    create mode 100755 tests/btrfs/278
>>>>    create mode 100644 tests/btrfs/278.out
>>>>
>>>> diff --git a/tests/btrfs/278 b/tests/btrfs/278
>>>> new file mode 100755
>>>> index 00000000..ebbf207a
>>>> --- /dev/null
>>>> +++ b/tests/btrfs/278
>>>> @@ -0,0 +1,66 @@
>>>> +#! /bin/bash
>>>> +# SPDX-License-Identifier: GPL-2.0
>>>> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
>>>> +#
>>>> +# FS QA Test 278
>>>> +#
>>>> +# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
>>>> +# larger block size for data extent scrub"), which makes btrfs scrub unable
>>>> +# to detect corruption if it's not the first sector of an data extent.
>>>
>>> So 786672e9e1a3 is the commit which brought in this regression issue? Is there
>>> a fix patch or commit by reviewed?
>>
>> The fix (by reverting it) is send to btrfs mailing list, titled "Revert
>> \"btrfs: scrub: use larger block size for data extent scrub\"".
> 
> OK, thanks.
> 
>>
>>>
>>>> +#
>>>> +. ./common/preamble
>>>> +_begin_fstest auto quick scrub
>>>> +
>>>> +# Import common functions.
>>>> +. ./common/filter
>>>> +. ./common/btrfs
>>>
>>> The common/btrfs is imported automatically, so don't need to import it at here.
>>> And I think this case doesn't use any filter, if so, the common/filter isn't
>>> needed either.
>>>
>>>> +
>>>> +# real QA test starts here
>>>> +
>>>> +# Modify as appropriate.
>>>
>>> This comment can be removed.
>>
>> If you really believe removing those boilerplate makes much sense, then I'd
>> say, the template should be updated to remove those completely.
> 
> Hmm... there're many comments in template should be removed when we start to
> write a new case. Most of them are always removed properly, only this line
> always be missed.
> 
> I don't doubt the template can be improved. But it's a tiny problem, so I
> planned to improve that next time when we have chance to improve the template.
> Before that, if a patch is all good, only this single line can be removed, I'll
> ignore it. If a patch need to change, I'll point out this line incidentally :)

Another thing is, such nitpicking is a little too common.

 From my limited experience of previous maintainer Guan, he would either 
just fix it when merging.

Which is mostly inline with the fs maintainers, and makes the whole 
process so smooth.

I really appreciate the maintenance effort and valuable comments (like 
the one for "-s 4k"), but these comments on something from the template?
No, those makes no sense.

Thanks,
Qu

> 
>>
>>>
>>>> +_supported_fs btrfs
>>>> +
>>>> +# Need to use 4K as sector size
>>>> +_require_btrfs_support_sectorsize 4096
>>>> +_require_scratch
>>>> +
>>>> +_scratch_mkfs >> $seqres.full
>>>
>>> Just check with you, do you need "-s 4k" so make sure sector size is 4k (even
>>> if 4k is supported)?
>>
>> I'll add "-s 4k" to make it more explicit for systems with larger page
>> sizes.
> 
> Thanks,
> Zorro
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> Thanks,
>>> Zorro
>>>
>>>> +_scratch_mount
>>>> +
>>>> +# Create a data extent with 2 sectors
>>>> +$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
>>>> +sync
>>>> +
>>>> +first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
>>>> +echo "logical of the first sector: $first_logical" >> $seqres.full
>>>> +
>>>> +second_logical=$(( $first_logical + 4096 ))
>>>> +echo "logical of the second sector: $second_logical" >> $seqres.full
>>>> +
>>>> +second_physical=$(_btrfs_get_physical $second_logical 1)
>>>> +echo "physical of the second sector: $second_physical" >> $seqres.full
>>>> +
>>>> +second_dev=$(_btrfs_get_device_path $second_logical 1)
>>>> +echo "device of the second sector: $second_dev" >> $seqres.full
>>>> +
>>>> +_scratch_unmount
>>>> +
>>>> +# Corrupt the second sector of the data extent.
>>>> +$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
>>>> +_scratch_mount
>>>> +
>>>> +# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
>>>> +# it will output an error message.
>>>> +$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
>>>> +cat $tmp.output >> $seqres.full
>>>> +_scratch_unmount
>>>> +
>>>> +if ! grep -q "csum=1" $tmp.output; then
>>>> +	echo "Scrub failed to detect corruption"
>>>> +fi
>>>> +
>>>> +echo "Silence is golden"
>>>> +
>>>> +# success, all done
>>>> +status=0
>>>> +exit
>>>> diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
>>>> new file mode 100644
>>>> index 00000000..b4c4a95d
>>>> --- /dev/null
>>>> +++ b/tests/btrfs/278.out
>>>> @@ -0,0 +1,2 @@
>>>> +QA output created by 278
>>>> +Silence is golden
>>>> -- 
>>>> 2.38.0
>>>>
>>>
>>
>
Filipe Manana Nov. 7, 2022, 8:42 a.m. UTC | #6
On Mon, Nov 7, 2022 at 3:36 AM Zorro Lang <zlang@redhat.com> wrote:
>
> On Mon, Nov 07, 2022 at 07:53:48AM +0800, Qu Wenruo wrote:
> > There is a regression in v6.1-rc kernel, which will prevent btrfs scrub
> > from detecting corruption (thus no repair either).
> >
> > The regression is caused by commit 786672e9e1a3 ("btrfs: scrub: use
> > larger block size for data extent scrub").
> >
> > The new test case will:
> >
> > - Create a data extent with 2 sectors
> > - Corrupt the second sector of above data extent
> > - Scrub to make sure we detect the corruption
> >
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > ---
> >  tests/btrfs/278     | 66 +++++++++++++++++++++++++++++++++++++++++++++
> >  tests/btrfs/278.out |  2 ++
> >  2 files changed, 68 insertions(+)
> >  create mode 100755 tests/btrfs/278
> >  create mode 100644 tests/btrfs/278.out
> >
> > diff --git a/tests/btrfs/278 b/tests/btrfs/278
> > new file mode 100755
> > index 00000000..ebbf207a
> > --- /dev/null
> > +++ b/tests/btrfs/278
> > @@ -0,0 +1,66 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> > +#
> > +# FS QA Test 278
> > +#
> > +# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
> > +# larger block size for data extent scrub"), which makes btrfs scrub unable
> > +# to detect corruption if it's not the first sector of an data extent.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick scrub
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +. ./common/btrfs
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs btrfs
> > +
> > +# Need to use 4K as sector size
> > +_require_btrfs_support_sectorsize 4096
> > +_require_scratch
>
> Hi Darrick,
>
> I noticed that you created some scrub helpers in common/fuzzy:
>   # Do we have an online scrub program?
>   _require_scrub() {
>         case "${FSTYP}" in
>         "xfs")
>                 test -x "$XFS_SCRUB_PROG" || _notrun "xfs_scrub not found"
>                 ;;
>         *)
>                 _notrun "No online scrub program for ${FSTYP}."
>                 ;;
>         esac
>   }
>
>   # Scrub the scratch filesystem metadata (online)
>   _scratch_scrub() {
>         case "${FSTYP}" in
>         "xfs")
>                 $XFS_SCRUB_PROG -d -T -v "$@" $SCRATCH_MNT
>                 ;;
>         *)
>                 _fail "No online scrub program for ${FSTYP}."
>                 ;;
>         esac
>   }
>
> and common/xfs:
>   _supports_xfs_scrub()
>
> (PS: How about change _require_scrub to _require_scrub_command, and then calls
> _supports_xfs_scrub in _require_scrub to check kernel part? Or combine kernel
> and userspace checking all into _require_scrub? )
>
> From the code logic, they're only support xfs now. But we can see btrfs support
> scrub too. Did you plan to make them to be common helpers, or just for xfs fuzzy
> test inside?
>
> Hi btrfs folks, do you think if btrfs need _require_scrub and _scratch_scrub?

I don't think that provides any value for btrfs.

The scrub feature has existed for well over a decade, and I don't
think anyone is running fstests with kernels and btrfs-progs versions
that don't have scrub.
Even SLE11-SP2+ kernels (first SUSE enterprise distros supporting
btrfs) have scrub...

Plus I don't recall ever anyone complaining that fstests failed
because the underlying kernel or btrfs-progs had no support for scrub.


>
> Thanks,
> Zorro
>
> > +
> > +_scratch_mkfs >> $seqres.full
> > +_scratch_mount
> > +
> > +# Create a data extent with 2 sectors
> > +$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
> > +sync
> > +
> > +first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
> > +echo "logical of the first sector: $first_logical" >> $seqres.full
> > +
> > +second_logical=$(( $first_logical + 4096 ))
> > +echo "logical of the second sector: $second_logical" >> $seqres.full
> > +
> > +second_physical=$(_btrfs_get_physical $second_logical 1)
> > +echo "physical of the second sector: $second_physical" >> $seqres.full
> > +
> > +second_dev=$(_btrfs_get_device_path $second_logical 1)
> > +echo "device of the second sector: $second_dev" >> $seqres.full
> > +
> > +_scratch_unmount
> > +
> > +# Corrupt the second sector of the data extent.
> > +$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
> > +_scratch_mount
> > +
> > +# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
> > +# it will output an error message.
> > +$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
> > +cat $tmp.output >> $seqres.full
> > +_scratch_unmount
> > +
> > +if ! grep -q "csum=1" $tmp.output; then
> > +     echo "Scrub failed to detect corruption"
> > +fi
> > +
> > +echo "Silence is golden"
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
> > new file mode 100644
> > index 00000000..b4c4a95d
> > --- /dev/null
> > +++ b/tests/btrfs/278.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 278
> > +Silence is golden
> > --
> > 2.38.0
> >
>
Zorro Lang Nov. 7, 2022, 9:06 a.m. UTC | #7
On Mon, Nov 07, 2022 at 08:42:17AM +0000, Filipe Manana wrote:
> On Mon, Nov 7, 2022 at 3:36 AM Zorro Lang <zlang@redhat.com> wrote:
> >
> > On Mon, Nov 07, 2022 at 07:53:48AM +0800, Qu Wenruo wrote:
> > > There is a regression in v6.1-rc kernel, which will prevent btrfs scrub
> > > from detecting corruption (thus no repair either).
> > >
> > > The regression is caused by commit 786672e9e1a3 ("btrfs: scrub: use
> > > larger block size for data extent scrub").
> > >
> > > The new test case will:
> > >
> > > - Create a data extent with 2 sectors
> > > - Corrupt the second sector of above data extent
> > > - Scrub to make sure we detect the corruption
> > >
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > >  tests/btrfs/278     | 66 +++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/btrfs/278.out |  2 ++
> > >  2 files changed, 68 insertions(+)
> > >  create mode 100755 tests/btrfs/278
> > >  create mode 100644 tests/btrfs/278.out
> > >
> > > diff --git a/tests/btrfs/278 b/tests/btrfs/278
> > > new file mode 100755
> > > index 00000000..ebbf207a
> > > --- /dev/null
> > > +++ b/tests/btrfs/278
> > > @@ -0,0 +1,66 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FS QA Test 278
> > > +#
> > > +# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
> > > +# larger block size for data extent scrub"), which makes btrfs scrub unable
> > > +# to detect corruption if it's not the first sector of an data extent.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto quick scrub
> > > +
> > > +# Import common functions.
> > > +. ./common/filter
> > > +. ./common/btrfs
> > > +
> > > +# real QA test starts here
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs btrfs
> > > +
> > > +# Need to use 4K as sector size
> > > +_require_btrfs_support_sectorsize 4096
> > > +_require_scratch
> >
> > Hi Darrick,
> >
> > I noticed that you created some scrub helpers in common/fuzzy:
> >   # Do we have an online scrub program?
> >   _require_scrub() {
> >         case "${FSTYP}" in
> >         "xfs")
> >                 test -x "$XFS_SCRUB_PROG" || _notrun "xfs_scrub not found"
> >                 ;;
> >         *)
> >                 _notrun "No online scrub program for ${FSTYP}."
> >                 ;;
> >         esac
> >   }
> >
> >   # Scrub the scratch filesystem metadata (online)
> >   _scratch_scrub() {
> >         case "${FSTYP}" in
> >         "xfs")
> >                 $XFS_SCRUB_PROG -d -T -v "$@" $SCRATCH_MNT
> >                 ;;
> >         *)
> >                 _fail "No online scrub program for ${FSTYP}."
> >                 ;;
> >         esac
> >   }
> >
> > and common/xfs:
> >   _supports_xfs_scrub()
> >
> > (PS: How about change _require_scrub to _require_scrub_command, and then calls
> > _supports_xfs_scrub in _require_scrub to check kernel part? Or combine kernel
> > and userspace checking all into _require_scrub? )
> >
> > From the code logic, they're only support xfs now. But we can see btrfs support
> > scrub too. Did you plan to make them to be common helpers, or just for xfs fuzzy
> > test inside?
> >
> > Hi btrfs folks, do you think if btrfs need _require_scrub and _scratch_scrub?
> 
> I don't think that provides any value for btrfs.
> 
> The scrub feature has existed for well over a decade, and I don't
> think anyone is running fstests with kernels and btrfs-progs versions
> that don't have scrub.
> Even SLE11-SP2+ kernels (first SUSE enterprise distros supporting
> btrfs) have scrub...
> 
> Plus I don't recall ever anyone complaining that fstests failed
> because the underlying kernel or btrfs-progs had no support for scrub.

Sure, thanks for your feedback! I'll think/talk about if we should change
that _require_scrub and _scratch_scrub to be more common helpers for fs which
has supported or will support scrub feature. I'll cc btrfs-list if I touch
btrfs cases :)

Thanks,
Zorro

> 
> 
> >
> > Thanks,
> > Zorro
> >
> > > +
> > > +_scratch_mkfs >> $seqres.full
> > > +_scratch_mount
> > > +
> > > +# Create a data extent with 2 sectors
> > > +$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
> > > +sync
> > > +
> > > +first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
> > > +echo "logical of the first sector: $first_logical" >> $seqres.full
> > > +
> > > +second_logical=$(( $first_logical + 4096 ))
> > > +echo "logical of the second sector: $second_logical" >> $seqres.full
> > > +
> > > +second_physical=$(_btrfs_get_physical $second_logical 1)
> > > +echo "physical of the second sector: $second_physical" >> $seqres.full
> > > +
> > > +second_dev=$(_btrfs_get_device_path $second_logical 1)
> > > +echo "device of the second sector: $second_dev" >> $seqres.full
> > > +
> > > +_scratch_unmount
> > > +
> > > +# Corrupt the second sector of the data extent.
> > > +$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
> > > +_scratch_mount
> > > +
> > > +# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
> > > +# it will output an error message.
> > > +$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
> > > +cat $tmp.output >> $seqres.full
> > > +_scratch_unmount
> > > +
> > > +if ! grep -q "csum=1" $tmp.output; then
> > > +     echo "Scrub failed to detect corruption"
> > > +fi
> > > +
> > > +echo "Silence is golden"
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
> > > new file mode 100644
> > > index 00000000..b4c4a95d
> > > --- /dev/null
> > > +++ b/tests/btrfs/278.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 278
> > > +Silence is golden
> > > --
> > > 2.38.0
> > >
> >
>
diff mbox series

Patch

diff --git a/tests/btrfs/278 b/tests/btrfs/278
new file mode 100755
index 00000000..ebbf207a
--- /dev/null
+++ b/tests/btrfs/278
@@ -0,0 +1,66 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 278
+#
+# A regression test for offending commit 786672e9e1a3 ("btrfs: scrub: use
+# larger block size for data extent scrub"), which makes btrfs scrub unable
+# to detect corruption if it's not the first sector of an data extent.
+#
+. ./common/preamble
+_begin_fstest auto quick scrub
+
+# Import common functions.
+. ./common/filter
+. ./common/btrfs
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+
+# Need to use 4K as sector size
+_require_btrfs_support_sectorsize 4096
+_require_scratch
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount
+
+# Create a data extent with 2 sectors
+$XFS_IO_PROG -fc "pwrite -S 0xff 0 8k" $SCRATCH_MNT/foobar >> $seqres.full
+sync
+
+first_logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
+echo "logical of the first sector: $first_logical" >> $seqres.full
+
+second_logical=$(( $first_logical + 4096 ))
+echo "logical of the second sector: $second_logical" >> $seqres.full
+
+second_physical=$(_btrfs_get_physical $second_logical 1)
+echo "physical of the second sector: $second_physical" >> $seqres.full
+
+second_dev=$(_btrfs_get_device_path $second_logical 1)
+echo "device of the second sector: $second_dev" >> $seqres.full
+
+_scratch_unmount
+
+# Corrupt the second sector of the data extent.
+$XFS_IO_PROG -c "pwrite -S 0x00 $second_physical 4k" $second_dev >> $seqres.full
+_scratch_mount
+
+# Redirect stderr and stdout, as if btrfs detected the unrepairable corruption,
+# it will output an error message.
+$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT &> $tmp.output
+cat $tmp.output >> $seqres.full
+_scratch_unmount
+
+if ! grep -q "csum=1" $tmp.output; then
+	echo "Scrub failed to detect corruption"
+fi
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/278.out b/tests/btrfs/278.out
new file mode 100644
index 00000000..b4c4a95d
--- /dev/null
+++ b/tests/btrfs/278.out
@@ -0,0 +1,2 @@ 
+QA output created by 278
+Silence is golden