generic: check if one fs can detect damage at/after fs thaw

Message ID	20221019052955.30484-1-wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <fstests-owner@kernel.org> From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org Subject: [PATCH] generic: check if one fs can detect damage at/after fs thaw Date: Wed, 19 Oct 2022 13:29:55 +0800 Message-Id: <20221019052955.30484-1-wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	generic: check if one fs can detect damage at/after fs thaw \| expand generic: check if one fs can detect damage at/after fs thaw

Qu Wenruo Oct. 19, 2022, 5:29 a.m. UTC

[BACKGROUND]
There is bug report from btrfs mailing list that, hiberation can allow
one to modify the frozen filesystem unexpectedly (using another OS).
(https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)

Later btrfs adds the check to make sure the fs is not changed
unexpectedly, to prevent corruption from happening.

[TESTCASE]
Here the new test case will create a basic filesystem, fill it with
something by using fsstress, then sync the fs, and finally freeze the fs.

Then corrupt the whole fs by overwriting the block device with 0xcd
(default seed from xfs_io pwrite command).

Finally we thaw the fs, and try if we can create a new file.

for EXT4, it will detect the corruption at touch time, causing -EUCLEAN.

For Btrfs, it will detect the corruption at thaw time, marking the
fs RO immediately, and later touch will return -EROFS.

For XFS, it will detect the corruption at touch time, return -EUCLEAN.
(Without the cache drop, XFS seems to be very happy using the cache info
to do the work without any error though.)

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/generic/702     | 61 +++++++++++++++++++++++++++++++++++++++++++
 tests/generic/702.out |  2 ++
 2 files changed, 63 insertions(+)
 create mode 100755 tests/generic/702
 create mode 100644 tests/generic/702.out

Darrick J. Wong Oct. 19, 2022, 3:36 p.m. UTC | #1

On Wed, Oct 19, 2022 at 01:29:55PM +0800, Qu Wenruo wrote:
> [BACKGROUND]
> There is bug report from btrfs mailing list that, hiberation can allow

"hibernation".

> one to modify the frozen filesystem unexpectedly (using another OS).
> (https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)
> 
> Later btrfs adds the check to make sure the fs is not changed
> unexpectedly, to prevent corruption from happening.
> 
> [TESTCASE]
> Here the new test case will create a basic filesystem, fill it with
> something by using fsstress, then sync the fs, and finally freeze the fs.
> 
> Then corrupt the whole fs by overwriting the block device with 0xcd
> (default seed from xfs_io pwrite command).
> 
> Finally we thaw the fs, and try if we can create a new file.
> 
> for EXT4, it will detect the corruption at touch time, causing -EUCLEAN.

Heh, yikes.  That's pretty scary for ext4 since it still uses buffer
heads from the block device to read/store metadata and older kernels are
known to have crashing problems if (say) the feature bits in the primary
superblock get changed.

I wonder if this should force errors=remount-ro for ext4 since
errors=continue is dangerous and erorrs=panic will crash the test
machine.

> For Btrfs, it will detect the corruption at thaw time, marking the
> fs RO immediately, and later touch will return -EROFS.

What /does/ btrfs check, specifically?  Reading this makes me wonder if
xfs shouldn't re-read its primary super on thaw to check that nobody ran
us over with a backhoe, though that wouldn't help us in the hibernation
case.  (Or does it?  Is userspace/systemd finally smart enough to freeze
filesystems?)

> For XFS, it will detect the corruption at touch time, return -EUCLEAN.
> (Without the cache drop, XFS seems to be very happy using the cache info
> to do the work without any error though.)

Yep.

> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  tests/generic/702     | 61 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/702.out |  2 ++
>  2 files changed, 63 insertions(+)
>  create mode 100755 tests/generic/702
>  create mode 100644 tests/generic/702.out
> 
> diff --git a/tests/generic/702 b/tests/generic/702
> new file mode 100755
> index 00000000..fc3624e1
> --- /dev/null
> +++ b/tests/generic/702
> @@ -0,0 +1,61 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 702
> +#
> +# Test if the filesystem can detect the underlying disk has changed at
> +# thaw time.
> +#
> +. ./common/preamble
> +. ./common/filter
> +_begin_fstest freeze quick
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +_fixed_by_kernel_commit a05d3c915314 \
> +	"btrfs: check superblock to ensure the fs was not modified at thaw time"

Hmmm, it's not very useful for a test failure on (say) xfs spitting
out a message about how this "may" get fixed with a btrfs patch.  How
about:

$FSTYP = btrfs && _fixed_by_kernel_commit a05d3c915314 \
	"btrfs: check superbloc..."

> +
> +# We will corrupt the device completely, thus should not check it after the test.
> +_require_scratch_nocheck
> +_require_freeze
> +
> +# Limit the fs to 512M so we won't waste too much time screwing it up later.
> +_scratch_mkfs_sized $((512 * 1024 * 1024)) >> $seqres.full 2>&1
> +_scratch_mount
> +
> +# Populate the fs with something.
> +$FSSTRESS_PROG -n 500 -d $SCRATCH_MNT >> $seqres.full
> +
> +# Sync to make sure no dirty journal
> +sync
> +
> +# Drop all cache, so later write will need to read from disk, increasing
> +# the chance of detecting the corruption.
> +echo 3 > /proc/sys/vm/drop_caches
> +
> +$XFS_IO_PROG -x -c "freeze" $SCRATCH_MNT
> +
> +# Now screw up the block device
> +$XFS_IO_PROG -f -c "pwrite 0 512M" -c sync $SCRATCH_DEV >> $seqres.full

directio and a larger buffer size to speed this up? e.g.

$XFS_IO_PROG -d -c 'pwrite -b 1m 0 512M' -c sync $SCRATCH_DEV

> +
> +# Thaw the fs, it may or may not report error, we will check it manually later.
> +$XFS_IO_PROG -x -c "thaw" $SCRATCH_MNT

I'm a little surprised you don't check for btrfs returning an error
here...?

> +# If the fs detects something wrong, it should trigger error now.
> +# We don't use the error message as golden output, as btrfs and ext4 use
> +# different error number for different reasons.
> +# (btrfs detects the change immediately at thaw time and mark the fs RO, thus
> +#  touch returns -EROFS, while ext4 detects the change at journal write time,
> +#  returning -EUCLEAN).
> +touch $SCRATCH_MNT/foobar >>$seqres.full 2>&1
> +if [ $? -eq 0 ]; then
> +	echo "Failed to detect corrupted fs"
> +else
> +	echo "Detected corrupted fs (expected)"
> +fi

But otherwise this test looks reasonable so far.

--D

> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/702.out b/tests/generic/702.out
> new file mode 100644
> index 00000000..c29311ff
> --- /dev/null
> +++ b/tests/generic/702.out
> @@ -0,0 +1,2 @@
> +QA output created by 702
> +Detected corrupted fs (expected)
> -- 
> 2.38.0
>

Qu Wenruo Oct. 19, 2022, 10:52 p.m. UTC | #2

On 2022/10/19 23:36, Darrick J. Wong wrote:
> On Wed, Oct 19, 2022 at 01:29:55PM +0800, Qu Wenruo wrote:
>> [BACKGROUND]
>> There is bug report from btrfs mailing list that, hiberation can allow
>
> "hibernation".
>
>> one to modify the frozen filesystem unexpectedly (using another OS).
>> (https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)
>>
>> Later btrfs adds the check to make sure the fs is not changed
>> unexpectedly, to prevent corruption from happening.
>>
>> [TESTCASE]
>> Here the new test case will create a basic filesystem, fill it with
>> something by using fsstress, then sync the fs, and finally freeze the fs.
>>
>> Then corrupt the whole fs by overwriting the block device with 0xcd
>> (default seed from xfs_io pwrite command).
>>
>> Finally we thaw the fs, and try if we can create a new file.
>>
>> for EXT4, it will detect the corruption at touch time, causing -EUCLEAN.
>
> Heh, yikes.  That's pretty scary for ext4 since it still uses buffer
> heads from the block device to read/store metadata and older kernels are
> known to have crashing problems if (say) the feature bits in the primary
> superblock get changed.
>
> I wonder if this should force errors=remount-ro for ext4 since
> errors=continue is dangerous and erorrs=panic will crash the test
> machine.
>
>> For Btrfs, it will detect the corruption at thaw time, marking the
>> fs RO immediately, and later touch will return -EROFS.
>
> What /does/ btrfs check, specifically?

- Read sb without using cache

- The same mount time sanity checks on the superblock
   Which already implies an fsid check.

- Extra generation check
   To make sure no one has touched out cake.

>  Reading this makes me wonder if
> xfs shouldn't re-read its primary super on thaw to check that nobody ran
> us over with a backhoe, though that wouldn't help us in the hibernation
> case.  (Or does it?  Is userspace/systemd finally smart enough to freeze
> filesystems?)

I doubt if userspace/systemd is that smart, because the error report is
running not-that-old distro.

Especially for hibernation there is really no way for anyone to know if
our cakes are touched.

>
>> For XFS, it will detect the corruption at touch time, return -EUCLEAN.
>> (Without the cache drop, XFS seems to be very happy using the cache info
>> to do the work without any error though.)
>
> Yep.
>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   tests/generic/702     | 61 +++++++++++++++++++++++++++++++++++++++++++
>>   tests/generic/702.out |  2 ++
>>   2 files changed, 63 insertions(+)
>>   create mode 100755 tests/generic/702
>>   create mode 100644 tests/generic/702.out
>>
>> diff --git a/tests/generic/702 b/tests/generic/702
>> new file mode 100755
>> index 00000000..fc3624e1
>> --- /dev/null
>> +++ b/tests/generic/702
>> @@ -0,0 +1,61 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 702
>> +#
>> +# Test if the filesystem can detect the underlying disk has changed at
>> +# thaw time.
>> +#
>> +. ./common/preamble
>> +. ./common/filter
>> +_begin_fstest freeze quick
>> +
>> +# real QA test starts here
>> +
>> +_supported_fs generic
>> +_fixed_by_kernel_commit a05d3c915314 \
>> +	"btrfs: check superblock to ensure the fs was not modified at thaw time"
>
> Hmmm, it's not very useful for a test failure on (say) xfs spitting
> out a message about how this "may" get fixed with a btrfs patch.  How
> about:
>
> $FSTYP = btrfs && _fixed_by_kernel_commit a05d3c915314 \
> 	"btrfs: check superbloc..."

That sounds pretty good.

>
>> +
>> +# We will corrupt the device completely, thus should not check it after the test.
>> +_require_scratch_nocheck
>> +_require_freeze
>> +
>> +# Limit the fs to 512M so we won't waste too much time screwing it up later.
>> +_scratch_mkfs_sized $((512 * 1024 * 1024)) >> $seqres.full 2>&1
>> +_scratch_mount
>> +
>> +# Populate the fs with something.
>> +$FSSTRESS_PROG -n 500 -d $SCRATCH_MNT >> $seqres.full
>> +
>> +# Sync to make sure no dirty journal
>> +sync
>> +
>> +# Drop all cache, so later write will need to read from disk, increasing
>> +# the chance of detecting the corruption.
>> +echo 3 > /proc/sys/vm/drop_caches
>> +
>> +$XFS_IO_PROG -x -c "freeze" $SCRATCH_MNT
>> +
>> +# Now screw up the block device
>> +$XFS_IO_PROG -f -c "pwrite 0 512M" -c sync $SCRATCH_DEV >> $seqres.full
>
> directio and a larger buffer size to speed this up? e.g.
>
> $XFS_IO_PROG -d -c 'pwrite -b 1m 0 512M' -c sync $SCRATCH_DEV

I guess no need for directio especially we're doing a sync after the write.
Although larger blocksize may only help a little considering by default
it's already buffered write.

>
>> +
>> +# Thaw the fs, it may or may not report error, we will check it manually later.
>> +$XFS_IO_PROG -x -c "thaw" $SCRATCH_MNT
>
> I'm a little surprised you don't check for btrfs returning an error
> here...?

Great you have asked!

This is the special pitfall related to thaw error handling.

If we return an error for .unfreeze_fs hook, the VFS treats it as we
failed to thaw the fs, and will still consider the fs frozen.

Thus for now, btrfs only output error message into dmesg during thaw,
but always return 0 to workaround it.

We may want a better way for .unfreeze_fs hook to distinguish between
"something really went wrong, but please consider it unfreezed" and
"nope, please keep it frozen".

Thanks,
Qu

>
>> +# If the fs detects something wrong, it should trigger error now.
>> +# We don't use the error message as golden output, as btrfs and ext4 use
>> +# different error number for different reasons.
>> +# (btrfs detects the change immediately at thaw time and mark the fs RO, thus
>> +#  touch returns -EROFS, while ext4 detects the change at journal write time,
>> +#  returning -EUCLEAN).
>> +touch $SCRATCH_MNT/foobar >>$seqres.full 2>&1
>> +if [ $? -eq 0 ]; then
>> +	echo "Failed to detect corrupted fs"
>> +else
>> +	echo "Detected corrupted fs (expected)"
>> +fi
>
> But otherwise this test looks reasonable so far.
>
> --D
>
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/generic/702.out b/tests/generic/702.out
>> new file mode 100644
>> index 00000000..c29311ff
>> --- /dev/null
>> +++ b/tests/generic/702.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 702
>> +Detected corrupted fs (expected)
>> --
>> 2.38.0
>>

Darrick J. Wong Oct. 20, 2022, 12:11 a.m. UTC | #3

On Thu, Oct 20, 2022 at 06:52:36AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/10/19 23:36, Darrick J. Wong wrote:
> > On Wed, Oct 19, 2022 at 01:29:55PM +0800, Qu Wenruo wrote:
> > > [BACKGROUND]
> > > There is bug report from btrfs mailing list that, hiberation can allow
> > 
> > "hibernation".
> > 
> > > one to modify the frozen filesystem unexpectedly (using another OS).
> > > (https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)
> > > 
> > > Later btrfs adds the check to make sure the fs is not changed
> > > unexpectedly, to prevent corruption from happening.
> > > 
> > > [TESTCASE]
> > > Here the new test case will create a basic filesystem, fill it with
> > > something by using fsstress, then sync the fs, and finally freeze the fs.
> > > 
> > > Then corrupt the whole fs by overwriting the block device with 0xcd
> > > (default seed from xfs_io pwrite command).
> > > 
> > > Finally we thaw the fs, and try if we can create a new file.
> > > 
> > > for EXT4, it will detect the corruption at touch time, causing -EUCLEAN.
> > 
> > Heh, yikes.  That's pretty scary for ext4 since it still uses buffer
> > heads from the block device to read/store metadata and older kernels are
> > known to have crashing problems if (say) the feature bits in the primary
> > superblock get changed.
> > 
> > I wonder if this should force errors=remount-ro for ext4 since
> > errors=continue is dangerous and erorrs=panic will crash the test
> > machine.
> > 
> > > For Btrfs, it will detect the corruption at thaw time, marking the
> > > fs RO immediately, and later touch will return -EROFS.
> > 
> > What /does/ btrfs check, specifically?
> 
> - Read sb without using cache
> 
> - The same mount time sanity checks on the superblock
>   Which already implies an fsid check.
> 
> - Extra generation check
>   To make sure no one has touched out cake.

Ah, ok, so you compare the ondisk super with the incore version and
complain if they don't match.  Makes sense.

> >  Reading this makes me wonder if
> > xfs shouldn't re-read its primary super on thaw to check that nobody ran
> > us over with a backhoe, though that wouldn't help us in the hibernation
> > case.  (Or does it?  Is userspace/systemd finally smart enough to freeze
> > filesystems?)
> 
> I doubt if userspace/systemd is that smart, because the error report is
> running not-that-old distro.
> 
> Especially for hibernation there is really no way for anyone to know if
> our cakes are touched.

Yeah, short of encrypting the primary super. :)

> > 
> > > For XFS, it will detect the corruption at touch time, return -EUCLEAN.
> > > (Without the cache drop, XFS seems to be very happy using the cache info
> > > to do the work without any error though.)
> > 
> > Yep.
> > 
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > >   tests/generic/702     | 61 +++++++++++++++++++++++++++++++++++++++++++
> > >   tests/generic/702.out |  2 ++
> > >   2 files changed, 63 insertions(+)
> > >   create mode 100755 tests/generic/702
> > >   create mode 100644 tests/generic/702.out
> > > 
> > > diff --git a/tests/generic/702 b/tests/generic/702
> > > new file mode 100755
> > > index 00000000..fc3624e1
> > > --- /dev/null
> > > +++ b/tests/generic/702
> > > @@ -0,0 +1,61 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FS QA Test 702
> > > +#
> > > +# Test if the filesystem can detect the underlying disk has changed at
> > > +# thaw time.
> > > +#
> > > +. ./common/preamble
> > > +. ./common/filter
> > > +_begin_fstest freeze quick
> > > +
> > > +# real QA test starts here
> > > +
> > > +_supported_fs generic
> > > +_fixed_by_kernel_commit a05d3c915314 \
> > > +	"btrfs: check superblock to ensure the fs was not modified at thaw time"
> > 
> > Hmmm, it's not very useful for a test failure on (say) xfs spitting
> > out a message about how this "may" get fixed with a btrfs patch.  How
> > about:
> > 
> > $FSTYP = btrfs && _fixed_by_kernel_commit a05d3c915314 \
> > 	"btrfs: check superbloc..."
> 
> That sounds pretty good.
> 
> > 
> > > +
> > > +# We will corrupt the device completely, thus should not check it after the test.
> > > +_require_scratch_nocheck
> > > +_require_freeze
> > > +
> > > +# Limit the fs to 512M so we won't waste too much time screwing it up later.
> > > +_scratch_mkfs_sized $((512 * 1024 * 1024)) >> $seqres.full 2>&1
> > > +_scratch_mount
> > > +
> > > +# Populate the fs with something.
> > > +$FSSTRESS_PROG -n 500 -d $SCRATCH_MNT >> $seqres.full
> > > +
> > > +# Sync to make sure no dirty journal
> > > +sync
> > > +
> > > +# Drop all cache, so later write will need to read from disk, increasing
> > > +# the chance of detecting the corruption.
> > > +echo 3 > /proc/sys/vm/drop_caches
> > > +
> > > +$XFS_IO_PROG -x -c "freeze" $SCRATCH_MNT
> > > +
> > > +# Now screw up the block device
> > > +$XFS_IO_PROG -f -c "pwrite 0 512M" -c sync $SCRATCH_DEV >> $seqres.full
> > 
> > directio and a larger buffer size to speed this up? e.g.
> > 
> > $XFS_IO_PROG -d -c 'pwrite -b 1m 0 512M' -c sync $SCRATCH_DEV
> 
> I guess no need for directio especially we're doing a sync after the write.
> Although larger blocksize may only help a little considering by default
> it's already buffered write.

<nod>

> > 
> > > +
> > > +# Thaw the fs, it may or may not report error, we will check it manually later.
> > > +$XFS_IO_PROG -x -c "thaw" $SCRATCH_MNT
> > 
> > I'm a little surprised you don't check for btrfs returning an error
> > here...?
> 
> Great you have asked!
> 
> This is the special pitfall related to thaw error handling.
> 
> If we return an error for .unfreeze_fs hook, the VFS treats it as we
> failed to thaw the fs, and will still consider the fs frozen.
> 
> Thus for now, btrfs only output error message into dmesg during thaw,
> but always return 0 to workaround it.
> 
> We may want a better way for .unfreeze_fs hook to distinguish between
> "something really went wrong, but please consider it unfreezed" and
> "nope, please keep it frozen".

Ah, I guess it makes sense that you have to access the fs post-thaw to
find out if it's still alive.

--D

> Thanks,
> Qu
> 
> > 
> > > +# If the fs detects something wrong, it should trigger error now.
> > > +# We don't use the error message as golden output, as btrfs and ext4 use
> > > +# different error number for different reasons.
> > > +# (btrfs detects the change immediately at thaw time and mark the fs RO, thus
> > > +#  touch returns -EROFS, while ext4 detects the change at journal write time,
> > > +#  returning -EUCLEAN).
> > > +touch $SCRATCH_MNT/foobar >>$seqres.full 2>&1
> > > +if [ $? -eq 0 ]; then
> > > +	echo "Failed to detect corrupted fs"
> > > +else
> > > +	echo "Detected corrupted fs (expected)"
> > > +fi
> > 
> > But otherwise this test looks reasonable so far.
> > 
> > --D
> > 
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/702.out b/tests/generic/702.out
> > > new file mode 100644
> > > index 00000000..c29311ff
> > > --- /dev/null
> > > +++ b/tests/generic/702.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 702
> > > +Detected corrupted fs (expected)
> > > --
> > > 2.38.0
> > >

Theodore Ts'o Oct. 20, 2022, 3:02 a.m. UTC | #4

On Wed, Oct 19, 2022 at 08:36:55AM -0700, Darrick J. Wong wrote:
> On Wed, Oct 19, 2022 at 01:29:55PM +0800, Qu Wenruo wrote:
> > [BACKGROUND]
> > There is bug report from btrfs mailing list that, hiberation can allow
> > one to modify the frozen filesystem unexpectedly (using another OS).
> > (https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)
> > 
> > Later btrfs adds the check to make sure the fs is not changed
> > unexpectedly, to prevent corruption from happening.
> > 
> > [TESTCASE]
> > Here the new test case will create a basic filesystem, fill it with
> > something by using fsstress, then sync the fs, and finally freeze the fs.
> > 
> > Then corrupt the whole fs by overwriting the block device with 0xcd
> > (default seed from xfs_io pwrite command).

It seems to me that the test case is testing something very different
from the originally stated concern, and what btrfs is testing.

The original concern is "something else modified the file system",
which btrfs is testing by checking whether the file system generation
number is different from the last recorded transaction id.

The test is "something has trashed the file system by filling the
block device by 0xcd; let's see how quickly the file system notices"
which is quite different from the scenario described in the link and
the commit description a05d3c915314 ("btrfs: check superblock to
ensure the fs was not modified at thaw time").

> What /does/ btrfs check, specifically?  Reading this makes me wonder if
> xfs shouldn't re-read its primary super on thaw to check that nobody ran
> us over with a backhoe, though that wouldn't help us in the hibernation
> case.  (Or does it?  Is userspace/systemd finally smart enough to freeze
> filesystems?)

From looking at the commit described below, it appears to do some
basic superblock sanity checks, and then it checks to see if the last
commited transaction has changed from what has been recorded in the
superblock.

The simple stupid thing I could add in ext4 is to simply make a full
copy of the ext4 superblock, and if *anything* in that 1k set of bytes
has changed between the freeze and the thaw, call ext4_error(), and mark
the file system corrupted.

We've been talking about changing the default for ext4 to remount the
file system read-only, and if we did this then the behavior would be
the same as btrfs.  Or maybe in the specific case of the superblock
has changed between freeze and thaw, we will always remount the file
system read-only.

> > For XFS, it will detect the corruption at touch time, return -EUCLEAN.
> > (Without the cache drop, XFS seems to be very happy using the cache info
> > to do the work without any error though.)
> 
> Yep.

I would suggest not putting this test in generic/NNN, but put it in
shared, and to let each file system opt-in to this test.  There are a
whole bunch of file systems such such as jfs, reiserfs, vfat, exfat,
etc., which could run this test, and depending on the specifics of how
a file system might behave to determine whether the test "passes" or
"fails" seems wrong.

After all, what you're really doing is protecting against a specific
form of "stupid user trick", and other Linux file systems happen to do
something different when you completely trash the file system by
overwriting the block device with 0xcd, callign what some other file
system, whether it be f2fs, exfat, overlayfs, nfs, as a "failure"
doesn't seem right.

Moving it into shared also means you don't have to add extra checks to
make sure the test gets skipped if there is no block device to trash
(for example, if you are testing overlayfs, nfs, tmpfs, etc.)

Cheers,

					- Ted

Qu Wenruo Oct. 20, 2022, 3:20 a.m. UTC | #5

On 2022/10/20 11:02, Theodore Ts'o wrote:
> On Wed, Oct 19, 2022 at 08:36:55AM -0700, Darrick J. Wong wrote:
>> On Wed, Oct 19, 2022 at 01:29:55PM +0800, Qu Wenruo wrote:
>>> [BACKGROUND]
>>> There is bug report from btrfs mailing list that, hiberation can allow
>>> one to modify the frozen filesystem unexpectedly (using another OS).
>>> (https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)
>>>
>>> Later btrfs adds the check to make sure the fs is not changed
>>> unexpectedly, to prevent corruption from happening.
>>>
>>> [TESTCASE]
>>> Here the new test case will create a basic filesystem, fill it with
>>> something by using fsstress, then sync the fs, and finally freeze the fs.
>>>
>>> Then corrupt the whole fs by overwriting the block device with 0xcd
>>> (default seed from xfs_io pwrite command).
>
> It seems to me that the test case is testing something very different
> from the originally stated concern, and what btrfs is testing.
>
> The original concern is "something else modified the file system",
> which btrfs is testing by checking whether the file system generation
> number is different from the last recorded transaction id.
>
> The test is "something has trashed the file system by filling the
> block device by 0xcd; let's see how quickly the file system notices"
> which is quite different from the scenario described in the link and
> the commit description a05d3c915314 ("btrfs: check superblock to
> ensure the fs was not modified at thaw time").

Yes, you're right.

The problem here is I have no good way to just mount the frozen fs and
update it.

One of the btrfs specific problem is, btrfs will reject fs with the same
fsid.
Thus even if we do a binary copy of the original fs, we can not mount it
to modify, at least not for btrfs.

Any good ideas for this? Maybe relying on btrfs-progs to do the write?

>
>> What /does/ btrfs check, specifically?  Reading this makes me wonder if
>> xfs shouldn't re-read its primary super on thaw to check that nobody ran
>> us over with a backhoe, though that wouldn't help us in the hibernation
>> case.  (Or does it?  Is userspace/systemd finally smart enough to freeze
>> filesystems?)
>
>  From looking at the commit described below, it appears to do some
> basic superblock sanity checks, and then it checks to see if the last
> commited transaction has changed from what has been recorded in the
> superblock.
>
> The simple stupid thing I could add in ext4 is to simply make a full
> copy of the ext4 superblock, and if *anything* in that 1k set of bytes
> has changed between the freeze and the thaw, call ext4_error(), and mark
> the file system corrupted.

Yes, for EXT4/XFS it can be a simple memcmp(), but for btrfs we have
multiple devices, and super block for each device has something
different (related to that device).

Thus at least for btrfs, we can not do a full memcmp() level check, but
you get the idea correctly.

>
> We've been talking about changing the default for ext4 to remount the
> file system read-only, and if we did this then the behavior would be
> the same as btrfs.  Or maybe in the specific case of the superblock
> has changed between freeze and thaw, we will always remount the file
> system read-only.
>
>>> For XFS, it will detect the corruption at touch time, return -EUCLEAN.
>>> (Without the cache drop, XFS seems to be very happy using the cache info
>>> to do the work without any error though.)
>>
>> Yep.
>
> I would suggest not putting this test in generic/NNN, but put it in
> shared, and to let each file system opt-in to this test.  There are a
> whole bunch of file systems such such as jfs, reiserfs, vfat, exfat,
> etc., which could run this test, and depending on the specifics of how
> a file system might behave to determine whether the test "passes" or
> "fails" seems wrong.
>
> After all, what you're really doing is protecting against a specific
> form of "stupid user trick", and other Linux file systems happen to do
> something different when you completely trash the file system by
> overwriting the block device with 0xcd, callign what some other file
> system, whether it be f2fs, exfat, overlayfs, nfs, as a "failure"
> doesn't seem right.
>
> Moving it into shared also means you don't have to add extra checks to
> make sure the test gets skipped if there is no block device to trash
> (for example, if you are testing overlayfs, nfs, tmpfs, etc.)

Thanks for the advice, would move it to shared in next update.

Thanks,
Qu

>
> Cheers,
>
> 					- Ted

Zorro Lang Oct. 20, 2022, 3 p.m. UTC | #6

On Wed, Oct 19, 2022 at 01:29:55PM +0800, Qu Wenruo wrote:
> [BACKGROUND]
> There is bug report from btrfs mailing list that, hiberation can allow
> one to modify the frozen filesystem unexpectedly (using another OS).
> (https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)
> 
> Later btrfs adds the check to make sure the fs is not changed
> unexpectedly, to prevent corruption from happening.
> 
> [TESTCASE]
> Here the new test case will create a basic filesystem, fill it with
> something by using fsstress, then sync the fs, and finally freeze the fs.
> 
> Then corrupt the whole fs by overwriting the block device with 0xcd
> (default seed from xfs_io pwrite command).
> 
> Finally we thaw the fs, and try if we can create a new file.
> 
> for EXT4, it will detect the corruption at touch time, causing -EUCLEAN.
> 
> For Btrfs, it will detect the corruption at thaw time, marking the
> fs RO immediately, and later touch will return -EROFS.
> 
> For XFS, it will detect the corruption at touch time, return -EUCLEAN.
> (Without the cache drop, XFS seems to be very happy using the cache info
> to do the work without any error though.)
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  tests/generic/702     | 61 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/702.out |  2 ++
>  2 files changed, 63 insertions(+)
>  create mode 100755 tests/generic/702
>  create mode 100644 tests/generic/702.out
> 
> diff --git a/tests/generic/702 b/tests/generic/702
> new file mode 100755
> index 00000000..fc3624e1
> --- /dev/null
> +++ b/tests/generic/702
> @@ -0,0 +1,61 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 702
> +#
> +# Test if the filesystem can detect the underlying disk has changed at
> +# thaw time.
> +#
> +. ./common/preamble
> +. ./common/filter
> +_begin_fstest freeze quick
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +_fixed_by_kernel_commit a05d3c915314 \
> +	"btrfs: check superblock to ensure the fs was not modified at thaw time"
> +
> +# We will corrupt the device completely, thus should not check it after the test.
> +_require_scratch_nocheck
> +_require_freeze
> +
> +# Limit the fs to 512M so we won't waste too much time screwing it up later.
> +_scratch_mkfs_sized $((512 * 1024 * 1024)) >> $seqres.full 2>&1
> +_scratch_mount
> +
> +# Populate the fs with something.
> +$FSSTRESS_PROG -n 500 -d $SCRATCH_MNT >> $seqres.full
> +
> +# Sync to make sure no dirty journal
> +sync
> +
> +# Drop all cache, so later write will need to read from disk, increasing
> +# the chance of detecting the corruption.
> +echo 3 > /proc/sys/vm/drop_caches
> +
> +$XFS_IO_PROG -x -c "freeze" $SCRATCH_MNT
> +
> +# Now screw up the block device
> +$XFS_IO_PROG -f -c "pwrite 0 512M" -c sync $SCRATCH_DEV >> $seqres.full
> +
> +# Thaw the fs, it may or may not report error, we will check it manually later.
> +$XFS_IO_PROG -x -c "thaw" $SCRATCH_MNT
> +
> +# If the fs detects something wrong, it should trigger error now.
> +# We don't use the error message as golden output, as btrfs and ext4 use
> +# different error number for different reasons.
> +# (btrfs detects the change immediately at thaw time and mark the fs RO, thus
> +#  touch returns -EROFS, while ext4 detects the change at journal write time,
> +#  returning -EUCLEAN).
> +touch $SCRATCH_MNT/foobar >>$seqres.full 2>&1
> +if [ $? -eq 0 ]; then
> +	echo "Failed to detect corrupted fs"
> +else
> +	echo "Detected corrupted fs (expected)"
> +fi

Thanks for all help to review!

That `_require_freeze` will skip exfat and others which not support freeze.

And `_require_block_device $SCRATCH_DEV` helps you to avoid this test run/fail on
overlayfs, nfs, tmpfs, etc. Due to you try to write the $SCRATCH_DEV directly.

And you can use `_supported_fs ^f2fs` to skip this test from some specified fs
if they're not suit for this test.

But I'm wondering if the last test step will fail on every fs soon? Except
you're trying to test how fast a fs can find itself is corrupted. Or how
about give some fs more chance/time to detect errors? Likes do more operations
which enough to trigger errors on most fs?

Thanks,
Zorro

> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/702.out b/tests/generic/702.out
> new file mode 100644
> index 00000000..c29311ff
> --- /dev/null
> +++ b/tests/generic/702.out
> @@ -0,0 +1,2 @@
> +QA output created by 702
> +Detected corrupted fs (expected)
> -- 
> 2.38.0
>

Qu Wenruo Oct. 20, 2022, 10:59 p.m. UTC | #7

On 2022/10/20 23:00, Zorro Lang wrote:
> On Wed, Oct 19, 2022 at 01:29:55PM +0800, Qu Wenruo wrote:
>> [BACKGROUND]
>> There is bug report from btrfs mailing list that, hiberation can allow
>> one to modify the frozen filesystem unexpectedly (using another OS).
>> (https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/)
>>
>> Later btrfs adds the check to make sure the fs is not changed
>> unexpectedly, to prevent corruption from happening.
>>
>> [TESTCASE]
>> Here the new test case will create a basic filesystem, fill it with
>> something by using fsstress, then sync the fs, and finally freeze the fs.
>>
>> Then corrupt the whole fs by overwriting the block device with 0xcd
>> (default seed from xfs_io pwrite command).
>>
>> Finally we thaw the fs, and try if we can create a new file.
>>
>> for EXT4, it will detect the corruption at touch time, causing -EUCLEAN.
>>
>> For Btrfs, it will detect the corruption at thaw time, marking the
>> fs RO immediately, and later touch will return -EROFS.
>>
>> For XFS, it will detect the corruption at touch time, return -EUCLEAN.
>> (Without the cache drop, XFS seems to be very happy using the cache info
>> to do the work without any error though.)
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   tests/generic/702     | 61 +++++++++++++++++++++++++++++++++++++++++++
>>   tests/generic/702.out |  2 ++
>>   2 files changed, 63 insertions(+)
>>   create mode 100755 tests/generic/702
>>   create mode 100644 tests/generic/702.out
>>
>> diff --git a/tests/generic/702 b/tests/generic/702
>> new file mode 100755
>> index 00000000..fc3624e1
>> --- /dev/null
>> +++ b/tests/generic/702
>> @@ -0,0 +1,61 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 702
>> +#
>> +# Test if the filesystem can detect the underlying disk has changed at
>> +# thaw time.
>> +#
>> +. ./common/preamble
>> +. ./common/filter
>> +_begin_fstest freeze quick
>> +
>> +# real QA test starts here
>> +
>> +_supported_fs generic
>> +_fixed_by_kernel_commit a05d3c915314 \
>> +	"btrfs: check superblock to ensure the fs was not modified at thaw time"
>> +
>> +# We will corrupt the device completely, thus should not check it after the test.
>> +_require_scratch_nocheck
>> +_require_freeze
>> +
>> +# Limit the fs to 512M so we won't waste too much time screwing it up later.
>> +_scratch_mkfs_sized $((512 * 1024 * 1024)) >> $seqres.full 2>&1
>> +_scratch_mount
>> +
>> +# Populate the fs with something.
>> +$FSSTRESS_PROG -n 500 -d $SCRATCH_MNT >> $seqres.full
>> +
>> +# Sync to make sure no dirty journal
>> +sync
>> +
>> +# Drop all cache, so later write will need to read from disk, increasing
>> +# the chance of detecting the corruption.
>> +echo 3 > /proc/sys/vm/drop_caches
>> +
>> +$XFS_IO_PROG -x -c "freeze" $SCRATCH_MNT
>> +
>> +# Now screw up the block device
>> +$XFS_IO_PROG -f -c "pwrite 0 512M" -c sync $SCRATCH_DEV >> $seqres.full
>> +
>> +# Thaw the fs, it may or may not report error, we will check it manually later.
>> +$XFS_IO_PROG -x -c "thaw" $SCRATCH_MNT
>> +
>> +# If the fs detects something wrong, it should trigger error now.
>> +# We don't use the error message as golden output, as btrfs and ext4 use
>> +# different error number for different reasons.
>> +# (btrfs detects the change immediately at thaw time and mark the fs RO, thus
>> +#  touch returns -EROFS, while ext4 detects the change at journal write time,
>> +#  returning -EUCLEAN).
>> +touch $SCRATCH_MNT/foobar >>$seqres.full 2>&1
>> +if [ $? -eq 0 ]; then
>> +	echo "Failed to detect corrupted fs"
>> +else
>> +	echo "Detected corrupted fs (expected)"
>> +fi
>
> Thanks for all help to review!
>
> That `_require_freeze` will skip exfat and others which not support freeze.
>
> And `_require_block_device $SCRATCH_DEV` helps you to avoid this test run/fail on
> overlayfs, nfs, tmpfs, etc. Due to you try to write the $SCRATCH_DEV directly.
>
> And you can use `_supported_fs ^f2fs` to skip this test from some specified fs
> if they're not suit for this test.
>
> But I'm wondering if the last test step will fail on every fs soon? Except
> you're trying to test how fast a fs can find itself is corrupted. Or how
> about give some fs more chance/time to detect errors? Likes do more operations
> which enough to trigger errors on most fs?

I don't think any timeout/extra work would help, there is already a drop
cache to make sure the fs won't have any cached info.

Or fs like old btrfs or XFS can easily continue without any problem by
using all the cached info.

And without cached info, any write should cause the fs to read its
metadata and detect the problem.

Thanks,
Qu

>
> Thanks,
> Zorro
>
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/generic/702.out b/tests/generic/702.out
>> new file mode 100644
>> index 00000000..c29311ff
>> --- /dev/null
>> +++ b/tests/generic/702.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 702
>> +Detected corrupted fs (expected)
>> --
>> 2.38.0
>>

generic: check if one fs can detect damage at/after fs thaw

Commit Message

Comments

Patch