diff mbox series

[v5,2/2] common/scsi_debug: use the patient module remover

Message ID 20210820010402.2343320-3-mcgrof@kernel.org (mailing list archive)
State New, archived
Headers show
Series fstests: add patient module remover | expand

Commit Message

Luis Chamberlain Aug. 20, 2021, 1:04 a.m. UTC
If you try to run tests such as generic/108 in a loop
you'll eventually see a failure, but the failure can
be a false positive and the test was just unable to remove
the scsi_debug module.

We need to give some time for the refcnt to become 0. For
instance for the test generic/108 the refcnt lingers between
2 and 1. It should be 0 when we're done but a bit of time
seems to be required. The chance of us trying to run rmmod
when the refcnt is 2 or 1 is low, about 1/30 times if you
run the test in a loop on linux-next today.

Likewise, even when its 0 we just need a tiny breather before
we can remove the module (sleep 10 suffices) but this is
only required on older kernels. Otherwise removing the module
will just fail.

Some of these races are documented on the korg#212337, and
Doug Gilbert has posted at least one patch attempt to try
to help with this [1]. The patch does not resolve all the
issues though, it helps though.

This let's us remove the cheesy try loop. We keep the
udevadm settle call as it can help salvage buggy tests
which forgot to call it.

We also special-case where MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
is set to "forever" and the initial module check finds its
in use, for that case we just try removing the module once
since fstests would not be the one leaving modues lingering
around, and waiting forever could mean you won't discover
the issue for a while.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
[1] https://lkml.kernel.org/r/20210508230745.27923-1-dgilbert@interlog.com
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/scsi_debug | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)
diff mbox series

Patch

diff --git a/common/scsi_debug b/common/scsi_debug
index e7988469..1e0ca255 100644
--- a/common/scsi_debug
+++ b/common/scsi_debug
@@ -4,11 +4,32 @@ 
 #
 # Functions useful for tests on unique block devices
 
+. common/module
+
 _require_scsi_debug()
 {
-	# make sure we have the module and it's not already used
+	local mod_present=0
+
+	# make sure we have the module
 	modinfo scsi_debug 2>&1 > /dev/null || _notrun "scsi_debug module not found"
-	lsmod | grep -wq scsi_debug && (rmmod scsi_debug || _notrun "scsi_debug module in use")
+
+	lsmod | grep -wq scsi_debug
+	if [[ $? -eq 0 ]]; then
+		mod_present=1
+	fi
+
+	if [[ $mod_present -eq 1 ]]; then
+		# We try to remove the module only once if MODPROBE_PATIENT_RM_TIMEOUT_SECONDS
+		# is set to forever because fstests does not leave modules
+		# lingering around. If you do have a module lingering around
+		# and its being used, it wasn't us who started it, so you
+		# likely would not want to wait forever for it really.
+		if [[ "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" == "forever" ]]; then
+			rmmod scsi_debug || _notrun "scsi_debug module in use and MODPROBE_PATIENT_RM_TIMEOUT_SECONDS set to forever, removing once failed"
+		else
+			_patient_rmmod scsi_debug || _notrun "scsi_debug module in use"
+		fi
+	fi
 	# make sure it has the features we need
 	# logical/physical sectors plus unmap support all went in together
 	modinfo scsi_debug | grep -wq sector_size || _notrun "scsi_debug too old"
@@ -44,14 +65,6 @@  _get_scsi_debug_dev()
 _put_scsi_debug_dev()
 {
 	lsmod | grep -wq scsi_debug || return
-
-	n=2
-	# use redirection not -q option of modprobe here, because -q of old
-	# modprobe is only quiet when the module is not found, not when the
-	# module is in use.
-	while [ $n -ge 0 ] && ! modprobe -nr scsi_debug >/dev/null 2>&1; do
-		$UDEV_SETTLE_PROG
-		n=$((n-1))
-	done
-	rmmod scsi_debug || _fail "Could not remove scsi_debug module"
+	$UDEV_SETTLE_PROG
+	_patient_rmmod scsi_debug || _fail "Could not remove scsi_debug module"
 }