diff mbox series

[11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown

Message ID 173706974243.1927324.9105721327110864014.stgit@frogsfrogsfrogs (mailing list archive)
State New
Headers show
Series [01/23] generic/476: fix fsstress process management | expand

Commit Message

Darrick J. Wong Jan. 16, 2025, 11:28 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

xfs/336 does this somewhat sketchy thing where it mdrestores into a
regular file, and then does this to validate the restored metadata:

SCRATCH_DEV=$TEST_DIR/image _scratch_mount

Unfortunately, commit 1a49022fab9b4d causes the following regression:

 --- /tmp/fstests/tests/xfs/336.out      2024-11-12 16:17:36.733447713 -0800
 +++ /var/tmp/fstests/xfs/336.out.bad    2025-01-04 19:10:39.861871114 -0800
 @@ -5,4 +5,5 @@ Create big file
  Explode the rtrmapbt
  Create metadump file
  Restore metadump
 -Check restored fs
 +Usage: _set_fs_sysfs_attr <mounted_device> <attr> <content>
 +(see /var/tmp/fstests/xfs/336.full for details)

This is due to the fact that SCRATCH_DEV is temporarily reassigned to
the regular file.  That path is passed straight through _scratch_mount
to _xfs_prepare_for_eio_shutdown, but that helper _fails because the
"dev" argument isn't actually a path to a block device.

Fix this by detecting non-bdevs and finding (we hope) the loop device
that was created to handle the mount.  While we're at it, have the
helper return the exit code from mount, not _prepare_for_eio_shutdown.

Cc: <fstests@vger.kernel.org> # v2024.12.08
Fixes: 1a49022fab9b4d ("fstests: always use fail-at-unmount semantics for XFS")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 common/rc  |    8 ++++++++
 common/xfs |    6 ++++++
 2 files changed, 14 insertions(+)

Comments

Dave Chinner Jan. 21, 2025, 4:37 a.m. UTC | #1
On Thu, Jan 16, 2025 at 03:28:02PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> xfs/336 does this somewhat sketchy thing where it mdrestores into a
> regular file, and then does this to validate the restored metadata:
> 
> SCRATCH_DEV=$TEST_DIR/image _scratch_mount

That's a canonical example of what is called "stepping on a
landmine".

We validate that the SCRATCH_DEV is a block device at the start of
check and each section it reads and runs (via common/config), and
then make the assumption in all the infrastructure that SCRATCH_DEV
always points to a valid block device.

Now we have one new test that overwrites SCRATCH_DEV temporarily
with a file and so we have to add checks all through the
infrastructure to handle this one whacky test?

> Unfortunately, commit 1a49022fab9b4d causes the following regression:
> 
>  --- /tmp/fstests/tests/xfs/336.out      2024-11-12 16:17:36.733447713 -0800
>  +++ /var/tmp/fstests/xfs/336.out.bad    2025-01-04 19:10:39.861871114 -0800
>  @@ -5,4 +5,5 @@ Create big file
>   Explode the rtrmapbt
>   Create metadump file
>   Restore metadump
>  -Check restored fs
>  +Usage: _set_fs_sysfs_attr <mounted_device> <attr> <content>
>  +(see /var/tmp/fstests/xfs/336.full for details)
> 
> This is due to the fact that SCRATCH_DEV is temporarily reassigned to
> the regular file.  That path is passed straight through _scratch_mount
> to _xfs_prepare_for_eio_shutdown, but that helper _fails because the
> "dev" argument isn't actually a path to a block device.

_scratch_mount assumes that SCRATCH_DEV points to a valid block
device. xfs/336 is the problem here, not the code that assumes
SCRATCH_DEV points to a block device....

Why are these hacks needed? Why can't _xfs_verify_metadumps()
loopdev usage be extended to handle the new rt rmap code that this
test is supposed to be exercising? 

> Fix this by detecting non-bdevs and finding (we hope) the loop device
> that was created to handle the mount. 

What loop device? xfs/336 doesn't use loop devices at all.

Oh, this is assuming that mount will silently do a loopback mount
when passed a file rather than a block device. IOWs, it's relying on
some third party to do the loop device creation and hence allow it
to be mounted.

IOWs, this change is addressing a landmine by adding another
landmine.

I really think that xfs/336 needs to be fixed - one off test hacks
like this, while they may work, only make modifying and maintaining
the fstests infrastructure that much harder....

> While we're at it, have the
> helper return the exit code from mount, not _prepare_for_eio_shutdown.

That should be a separate patch.

-Dave.
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index 885669beeb5e26..4419cfc3188374 100644
--- a/common/rc
+++ b/common/rc
@@ -441,6 +441,7 @@  _try_scratch_mount()
 	[ $mount_ret -ne 0 ] && return $mount_ret
 	_idmapped_mount $SCRATCH_DEV $SCRATCH_MNT
 	_prepare_for_eio_shutdown $SCRATCH_DEV
+	return $mount_ret
 }
 
 # mount scratch device with given options and _fail if mount fails
@@ -658,6 +659,7 @@  _test_mount()
     [ $mount_ret -ne 0 ] && return $mount_ret
     _idmapped_mount $TEST_DEV $TEST_DIR
     _prepare_for_eio_shutdown $TEST_DEV
+    return $mount_ret
 }
 
 _test_unmount()
@@ -4469,6 +4471,12 @@  _destroy_loop_device()
 	losetup -d $dev || _fail "Cannot destroy loop device $dev"
 }
 
+# Find the loop bdev for a given file, if there is one.
+_find_loop_device()
+{
+	losetup --list -n -O NAME -j "$1"
+}
+
 _scale_fsstress_args()
 {
     local args=""
diff --git a/common/xfs b/common/xfs
index 0417a40adba3e2..c68bd6d7c773ac 100644
--- a/common/xfs
+++ b/common/xfs
@@ -1110,6 +1110,12 @@  _xfs_prepare_for_eio_shutdown()
 	local dev="$1"
 	local ctlfile="error/fail_at_unmount"
 
+	# Is this a regular file?  Check if there's a loop device somewhere.
+	# Hopefully that lines up with a mounted filesystem.
+	if [ ! -b "$dev" ]; then
+		dev=$(_find_loop_device "$1" | tail -n 1)
+	fi
+
 	# Once we enable IO errors, it's possible that a writer thread will
 	# trip over EIO, cancel the transaction, and shut down the system.
 	# This is expected behavior, so we need to remove the "Internal error"