diff mbox series

[3/3] btrfs: test mount fails on physical device with configured dm volume

Message ID c68878cb99025b8c8465221205d5de9e40777b18.1709806478.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series fstests: test tempfsid with dm flakey device | expand

Commit Message

Anand Jain March 7, 2024, 12:50 p.m. UTC
When a flakey device is configured, we have access to both the physical
device and the DM flaky device. Ensure that when the flakey device is
configured, the physical device mount fails.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 tests/btrfs/318     | 45 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/318.out |  3 +++
 2 files changed, 48 insertions(+)
 create mode 100755 tests/btrfs/318
 create mode 100644 tests/btrfs/318.out

Comments

Zorro Lang March 7, 2024, 2:59 p.m. UTC | #1
On Thu, Mar 07, 2024 at 06:20:24PM +0530, Anand Jain wrote:
> When a flakey device is configured, we have access to both the physical
> device and the DM flaky device. Ensure that when the flakey device is
> configured, the physical device mount fails.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  tests/btrfs/318     | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/318.out |  3 +++
>  2 files changed, 48 insertions(+)
>  create mode 100755 tests/btrfs/318
>  create mode 100644 tests/btrfs/318.out
> 
> diff --git a/tests/btrfs/318 b/tests/btrfs/318
> new file mode 100755
> index 000000000000..015950fbd93c
> --- /dev/null
> +++ b/tests/btrfs/318
> @@ -0,0 +1,45 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 318
> +#
> +# Create multiple device nodes with the same device try mount
> +#
> +. ./common/preamble
> +_begin_fstest auto volume tempfsid

May I ask why it's a "tempfsid" related test?

> +
> +# Override the default cleanup function.
> +_cleanup()
> +{
> +	umount $extra_mnt &> /dev/null
> +	rm -rf $extra_mnt &> /dev/null

The "&> /dev/null" isn't needed, if you use "-f" for rm

> +	_unmount_flakey
> +	_cleanup_flakey
> + 	cd /
> + 	rm -r -f $tmp.*
> +}
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs btrfs
> +_require_scratch
> +_require_dm_target flakey
> +
> +_scratch_mkfs >> $seqres.full
> +_init_flakey
> +
> +_mount_flakey
> +extra_mnt=$TEST_DIR/extra_mnt

_require_test ?

> +rm -rf $extra_mnt
> +mkdir -p $extra_mnt
> +_mount $SCRATCH_DEV $extra_mnt 2>&1 | _filter_testdir_and_scratch

Recommend calling "_filter_error_mount" too

Thanks,
Zorro

> +
> +_flakey_drop_and_remount
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/318.out b/tests/btrfs/318.out
> new file mode 100644
> index 000000000000..5cdbea8c4b2a
> --- /dev/null
> +++ b/tests/btrfs/318.out
> @@ -0,0 +1,3 @@
> +QA output created by 318
> +mount: TEST_DIR/extra_mnt: wrong fs type, bad option, bad superblock on SCRATCH_DEV, missing codepage or helper program, or other error.
> +       dmesg(1) may have more information after failed mount system call.
> -- 
> 2.39.3
> 
>
Filipe Manana March 7, 2024, 4:15 p.m. UTC | #2
On Thu, Mar 7, 2024 at 12:51 PM Anand Jain <anand.jain@oracle.com> wrote:
>
> When a flakey device is configured, we have access to both the physical
> device and the DM flaky device. Ensure that when the flakey device is
> configured, the physical device mount fails.

Does it need to be flakey? Can't it be any other DM type?
The way it's phrased, gives the idea that somehow this is all flakey specific.

Just be clear and mention any DM on top of the device.

>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  tests/btrfs/318     | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/318.out |  3 +++
>  2 files changed, 48 insertions(+)
>  create mode 100755 tests/btrfs/318
>  create mode 100644 tests/btrfs/318.out
>
> diff --git a/tests/btrfs/318 b/tests/btrfs/318
> new file mode 100755
> index 000000000000..015950fbd93c
> --- /dev/null
> +++ b/tests/btrfs/318
> @@ -0,0 +1,45 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 318
> +#
> +# Create multiple device nodes with the same device try mount
> +#
> +. ./common/preamble
> +_begin_fstest auto volume tempfsid

Also 'quick' group.

> +
> +# Override the default cleanup function.
> +_cleanup()
> +{
> +       umount $extra_mnt &> /dev/null
> +       rm -rf $extra_mnt &> /dev/null
> +       _unmount_flakey
> +       _cleanup_flakey
> +       cd /
> +       rm -r -f $tmp.*
> +}
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs btrfs
> +_require_scratch
> +_require_dm_target flakey
> +
> +_scratch_mkfs >> $seqres.full
> +_init_flakey
> +
> +_mount_flakey
> +extra_mnt=$TEST_DIR/extra_mnt
> +rm -rf $extra_mnt
> +mkdir -p $extra_mnt
> +_mount $SCRATCH_DEV $extra_mnt 2>&1 | _filter_testdir_and_scratch
> +
> +_flakey_drop_and_remount

Why? Please add a comment.

Either this is a leftover from copy-paste from other tests, that
exercise fsync and power failure, or the goal is to do an unmount
followed by a mount and check that the mount succeeds.
If it's the latter case, then it's confusing to use
_flakey_drop_and_remount, because that drops writes, unmounts, allows
writes again and then mounts - i.e. we don't want to simulate power
loss by silently dropping writes.
Just call _unmount_flakey and _mount_flakey and add a comment
mentioning why we are doing it.

Finally, a link to the report that motivated this, due to a bug in a
patch that ended not being sent to Linus, would be useful:

https://lore.kernel.org/linux-btrfs/CAL3q7H5wx5rKmSzGWP7mRqaSfAY88g=35N4OBrbJB61rK0mt2w@mail.gmail.com/

I'm also not convinced that we need this test, because the bug could
be reliably reproduced by running all existing tests or just a subset
of the tests as I reported there.

Thanks.

> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/318.out b/tests/btrfs/318.out
> new file mode 100644
> index 000000000000..5cdbea8c4b2a
> --- /dev/null
> +++ b/tests/btrfs/318.out
> @@ -0,0 +1,3 @@
> +QA output created by 318
> +mount: TEST_DIR/extra_mnt: wrong fs type, bad option, bad superblock on SCRATCH_DEV, missing codepage or helper program, or other error.
> +       dmesg(1) may have more information after failed mount system call.
> --
> 2.39.3
>
>
Anand Jain March 7, 2024, 4:46 p.m. UTC | #3
> I'm also not convinced that we need this test, because the bug could
> be reliably reproduced by running all existing tests or just a subset
> of the tests as I reported there.

This test case was motivated by btrfs/159; and recent report;
yep, most of the lines here are taken from btrfs/159.
However, I wanted a test case in the tempfsid group which
exercises multi-node-same-physical;

If there are no objections, I am going to add the tempfsid group to
btrfs/159 for now.

Thanks, Anand
Zorro Lang March 7, 2024, 7:20 p.m. UTC | #4
On Thu, Mar 07, 2024 at 06:20:24PM +0530, Anand Jain wrote:
> When a flakey device is configured, we have access to both the physical
> device and the DM flaky device. Ensure that when the flakey device is
> configured, the physical device mount fails.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  tests/btrfs/318     | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/318.out |  3 +++
>  2 files changed, 48 insertions(+)
>  create mode 100755 tests/btrfs/318
>  create mode 100644 tests/btrfs/318.out
> 
> diff --git a/tests/btrfs/318 b/tests/btrfs/318
> new file mode 100755
> index 000000000000..015950fbd93c
> --- /dev/null
> +++ b/tests/btrfs/318
> @@ -0,0 +1,45 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 318
> +#
> +# Create multiple device nodes with the same device try mount
> +#
> +. ./common/preamble
> +_begin_fstest auto volume tempfsid
> +
> +# Override the default cleanup function.
> +_cleanup()
> +{
> +	umount $extra_mnt &> /dev/null
> +	rm -rf $extra_mnt &> /dev/null
> +	_unmount_flakey
> +	_cleanup_flakey
> + 	cd /
> + 	rm -r -f $tmp.*
> +}
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs btrfs

BTW, what cause it have to be a btrfs specific test case? I didn't any
btrfs specific operations below, can you explain the reason?

Thanks,
Zorro

> +_require_scratch
> +_require_dm_target flakey
> +
> +_scratch_mkfs >> $seqres.full
> +_init_flakey
> +
> +_mount_flakey
> +extra_mnt=$TEST_DIR/extra_mnt
> +rm -rf $extra_mnt
> +mkdir -p $extra_mnt
> +_mount $SCRATCH_DEV $extra_mnt 2>&1 | _filter_testdir_and_scratch
> +
> +_flakey_drop_and_remount
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/318.out b/tests/btrfs/318.out
> new file mode 100644
> index 000000000000..5cdbea8c4b2a
> --- /dev/null
> +++ b/tests/btrfs/318.out
> @@ -0,0 +1,3 @@
> +QA output created by 318
> +mount: TEST_DIR/extra_mnt: wrong fs type, bad option, bad superblock on SCRATCH_DEV, missing codepage or helper program, or other error.
> +       dmesg(1) may have more information after failed mount system call.
> -- 
> 2.39.3
> 
>
Filipe Manana March 8, 2024, 12:30 p.m. UTC | #5
On Thu, Mar 7, 2024 at 4:46 PM Anand Jain <anand.jain@oracle.com> wrote:
>
>
> > I'm also not convinced that we need this test, because the bug could
> > be reliably reproduced by running all existing tests or just a subset
> > of the tests as I reported there.
>
> This test case was motivated by btrfs/159; and recent report;
> yep, most of the lines here are taken from btrfs/159.

Ok, so it's motivated by what I reported, triggered by btrfs/159 but
only when other tests are run before it.

> However, I wanted a test case in the tempfsid group which
> exercises multi-node-same-physical;

Sure. It's just that we could trigger the issue simply by running all
existing tests, or just a subset as I reported before [1].
It was fully deterministic and David could reproduce it after the report too.

What surprises me is that no one noticed this before and at some stage
the faulty patch was about to be sent to Linus.

>
> If there are no objections, I am going to add the tempfsid group to
> btrfs/159 for now.

That is confusing, please don't.
btrfs/159 doesn't test tempfsid, it existed for many years before the
tempfsid feature (introduced in the 6.7 kernel),
it just happens that it triggers the issue if other fstests are run before it.

That would also be pointless because running btrfs/159 alone doesn't
trigger the bug, so running "./check -g tempfsid" wouldn't cause 159
to fail.

Between that and a new test case that is somewhat redundant, I'd
rather have the new test case with proper documentation/comments.

Thanks.

[1] https://lore.kernel.org/linux-btrfs/CAL3q7H5wx5rKmSzGWP7mRqaSfAY88g=35N4OBrbJB61rK0mt2w@mail.gmail.com/

>
> Thanks, Anand
Anand Jain March 8, 2024, 1:35 p.m. UTC | #6
On 3/8/24 18:00, Filipe Manana wrote:
> On Thu, Mar 7, 2024 at 4:46 PM Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>>> I'm also not convinced that we need this test, because the bug could
>>> be reliably reproduced by running all existing tests or just a subset
>>> of the tests as I reported there.
>>
>> This test case was motivated by btrfs/159; and recent report;
>> yep, most of the lines here are taken from btrfs/159.
> 
> Ok, so it's motivated by what I reported, triggered by btrfs/159 but
> only when other tests are run before it.
> 
>> However, I wanted a test case in the tempfsid group which
>> exercises multi-node-same-physical;
> 
> Sure. It's just that we could trigger the issue simply by running all
> existing tests, or just a subset as I reported before [1].
> It was fully deterministic and David could reproduce it after the report too.
> 
> What surprises me is that no one noticed this before and at some stage
> the faulty patch was about to be sent to Linus.
> 
>>
>> If there are no objections, I am going to add the tempfsid group to
>> btrfs/159 for now.
> 
> That is confusing, please don't.
> btrfs/159 doesn't test tempfsid, it existed for many years before the
> tempfsid feature (introduced in the 6.7 kernel),
> it just happens that it triggers the issue if other fstests are run before it.
> 
> That would also be pointless because running btrfs/159 alone doesn't
> trigger the bug, so running "./check -g tempfsid" wouldn't cause 159
> to fail.
> 
> Between that and a new test case that is somewhat redundant, I'd
> rather have the new test case with proper documentation/comments.
> 

I have converted this test case into a generic one. Now, XFS, ext4,
and Btrfs (after a patch) match the golden output. I'll be sending
the btrfs kernel patch along with it.

Thanks, Anand


> Thanks.
> 
> [1] https://lore.kernel.org/linux-btrfs/CAL3q7H5wx5rKmSzGWP7mRqaSfAY88g=35N4OBrbJB61rK0mt2w@mail.gmail.com/
> 
>>
>> Thanks, Anand
Anand Jain March 8, 2024, 1:38 p.m. UTC | #7
On 3/8/24 00:50, Zorro Lang wrote:
> On Thu, Mar 07, 2024 at 06:20:24PM +0530, Anand Jain wrote:
>> When a flakey device is configured, we have access to both the physical
>> device and the DM flaky device. Ensure that when the flakey device is
>> configured, the physical device mount fails.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   tests/btrfs/318     | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/btrfs/318.out |  3 +++
>>   2 files changed, 48 insertions(+)
>>   create mode 100755 tests/btrfs/318
>>   create mode 100644 tests/btrfs/318.out
>>
>> diff --git a/tests/btrfs/318 b/tests/btrfs/318
>> new file mode 100755
>> index 000000000000..015950fbd93c
>> --- /dev/null
>> +++ b/tests/btrfs/318
>> @@ -0,0 +1,45 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (c) 2024 Oracle.  All Rights Reserved.
>> +#
>> +# FS QA Test 318
>> +#
>> +# Create multiple device nodes with the same device try mount
>> +#
>> +. ./common/preamble
>> +_begin_fstest auto volume tempfsid
>> +
>> +# Override the default cleanup function.
>> +_cleanup()
>> +{
>> +	umount $extra_mnt &> /dev/null
>> +	rm -rf $extra_mnt &> /dev/null
>> +	_unmount_flakey
>> +	_cleanup_flakey
>> + 	cd /
>> + 	rm -r -f $tmp.*
>> +}
>> +
>> +# Import common functions.
>> +. ./common/filter
>> +. ./common/dmflakey
>> +
>> +# real QA test starts here
>> +_supported_fs btrfs
> 
> BTW, what cause it have to be a btrfs specific test case? I didn't any
> btrfs specific operations below, can you explain the reason?
> 

Right. Now I notice it can be made generic. I have converted this
into a generic test case. I'll send it when the kernel patch is
ready. For now, I have to withdraw this patch.

Thanks for the suggestion.

Anand
diff mbox series

Patch

diff --git a/tests/btrfs/318 b/tests/btrfs/318
new file mode 100755
index 000000000000..015950fbd93c
--- /dev/null
+++ b/tests/btrfs/318
@@ -0,0 +1,45 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Oracle.  All Rights Reserved.
+#
+# FS QA Test 318
+#
+# Create multiple device nodes with the same device try mount
+#
+. ./common/preamble
+_begin_fstest auto volume tempfsid
+
+# Override the default cleanup function.
+_cleanup()
+{
+	umount $extra_mnt &> /dev/null
+	rm -rf $extra_mnt &> /dev/null
+	_unmount_flakey
+	_cleanup_flakey
+ 	cd /
+ 	rm -r -f $tmp.*
+}
+
+# Import common functions.
+. ./common/filter
+. ./common/dmflakey
+
+# real QA test starts here
+_supported_fs btrfs
+_require_scratch
+_require_dm_target flakey
+
+_scratch_mkfs >> $seqres.full
+_init_flakey
+
+_mount_flakey
+extra_mnt=$TEST_DIR/extra_mnt
+rm -rf $extra_mnt
+mkdir -p $extra_mnt
+_mount $SCRATCH_DEV $extra_mnt 2>&1 | _filter_testdir_and_scratch
+
+_flakey_drop_and_remount
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/318.out b/tests/btrfs/318.out
new file mode 100644
index 000000000000..5cdbea8c4b2a
--- /dev/null
+++ b/tests/btrfs/318.out
@@ -0,0 +1,3 @@ 
+QA output created by 318
+mount: TEST_DIR/extra_mnt: wrong fs type, bad option, bad superblock on SCRATCH_DEV, missing codepage or helper program, or other error.
+       dmesg(1) may have more information after failed mount system call.