From patchwork Wed Aug 11 15:45:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 12431521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE19DC432BE for ; Wed, 11 Aug 2021 15:45:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CB32560FDA for ; Wed, 11 Aug 2021 15:45:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233374AbhHKPpm (ORCPT ); Wed, 11 Aug 2021 11:45:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233305AbhHKPpk (ORCPT ); Wed, 11 Aug 2021 11:45:40 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC6B2C061765; Wed, 11 Aug 2021 08:45:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=F1cdFO/HMSW74Xo13hr750mIwMV4fTBWscYd6QDc6Wk=; b=N0lS0JdkdfmAUKeyb2H2fKXl3M vS4T/a9t9ZHecmyulCwJJjBxxIzpzNV7wCRxZfk9ksL/IZC8hAfSCCaQn+I0U7WJeRfWdyqndpnj8 WmqnWYPeNVVIWWXwdrhRTRN+Tc9ovgd90YYDbdydQaZvreaMENd/0BxPOAHCqD9usyOdcyJyJeBPu QyfB+EjAgRs/erSuU1612WWBveITilSzCxcW2aqxoYQt5l9LGVzg5fhMv1Lkb1w9zPaOubWq0CRQA /gNYXHV4hOtvtQ7lffOuI3hi6zuwPvBSisS/4x+5E1IVlZQBYAqnNGDfTuaE+no9so6cKZyzpm7Go 7ytQNCZg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mDqPr-007bpG-3W; Wed, 11 Aug 2021 15:45:15 +0000 From: Luis Chamberlain To: fstests@vger.kernel.org Cc: hare@suse.de, dgilbert@interlog.com, jeyu@kernel.org, lucas.demarchi@intel.com, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v2 1/3] fstests: use udevadm settle after pvremove Date: Wed, 11 Aug 2021 08:45:10 -0700 Message-Id: <20210811154512.1813622-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210811154512.1813622-1-mcgrof@kernel.org> References: <20210811154512.1813622-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org As with creation, we also need to use udevadm settle when removing a pv, otherwise we can trip on races with module removals for the block devices in use. This reduces the amount of time in which a block device module refcnt for test modules such as scsi_debug spends outside of 0. Races with the refcnt being greater than 0 means module removal can fail causing false positives. This helps ensure that the pv is really long gone. These issues are tracked for scsi_debug [0] and later found to be a generic issue regardless of filesystem with pvremove [1]. Using udevadm settle *helps*, it does not address all possible races with the refcnt as noted in the generic bug entry [1]. [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337 [1] https://bugzilla.kernel.org/show_bug.cgi?id=214015 Signed-off-by: Luis Chamberlain --- tests/generic/081 | 5 ++++- tests/generic/108 | 1 + tests/generic/459 | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tests/generic/081 b/tests/generic/081 index f795b2c1..8e552074 100755 --- a/tests/generic/081 +++ b/tests/generic/081 @@ -12,6 +12,7 @@ _begin_fstest auto quick # Override the default cleanup function. _cleanup() { + local pv_ret cd / rm -f $tmp.* @@ -34,7 +35,9 @@ _cleanup() $UMOUNT_PROG $mnt >> $seqres.full 2>&1 $LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1 $LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1 - test $? -eq 0 && break + pv_ret=$? + $UDEV_SETTLE_PROG + test $pv_ret -eq 0 && break sleep 2 done } diff --git a/tests/generic/108 b/tests/generic/108 index 7dd426c1..b7797e8f 100755 --- a/tests/generic/108 +++ b/tests/generic/108 @@ -21,6 +21,7 @@ _cleanup() $UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1 $LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1 $LVM_PROG pvremove -f $SCRATCH_DEV $SCSI_DEBUG_DEV >>$seqres.full 2>&1 + $UDEV_SETTLE_PROG _put_scsi_debug_dev rm -f $tmp.* } diff --git a/tests/generic/459 b/tests/generic/459 index e5e5e9ab..5b44e245 100755 --- a/tests/generic/459 +++ b/tests/generic/459 @@ -29,6 +29,7 @@ _cleanup() $UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1 $LVM_PROG vgremove -ff $vgname >>$seqres.full 2>&1 $LVM_PROG pvremove -ff $SCRATCH_DEV >>$seqres.full 2>&1 + $UDEV_SETTLE_PROG } # Import common functions. From patchwork Wed Aug 11 15:45:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 12431517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74D7DC4320A for ; Wed, 11 Aug 2021 15:45:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5594660FC0 for ; Wed, 11 Aug 2021 15:45:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233322AbhHKPpl (ORCPT ); Wed, 11 Aug 2021 11:45:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233316AbhHKPpk (ORCPT ); Wed, 11 Aug 2021 11:45:40 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD88DC061798; Wed, 11 Aug 2021 08:45:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=MBTHsrTCocaWhUU5M2BBNwXZiyUe5++N9YCerVKRYmY=; b=O4yxfvmozCYsGOhtAkc3qRpZet 7a8w6FqW9ZzVPaMVb8eHdUaSrHoaPMmDrFZE8QH3NZMwnSl+dUsVbSph9pVnzNRGOiSmWeLtIYLS9 2AhLV2LOt9+K6GBWKXfHPHdOjB098qS7aXomdqAAgQCvoQmpUsb/YpWgvtvKMrtBi/xARJ8hZjGFH kOxkbN3/DMW6PjawdIGS9Ne8jS1tnVZ/BFKzHJ/jMvpittgYBMFDtUvPoSQQ42Pf2AcG4v+bdJIHE c5QMgKR6kSmdId890POaOgMxTFImrTJZx7/oYwiDjnL1EwnukfekJAXGwPvWCHXEfSa5dAM2QvlDV oHLVL36g==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mDqPr-007bpI-5z; Wed, 11 Aug 2021 15:45:15 +0000 From: Luis Chamberlain To: fstests@vger.kernel.org Cc: hare@suse.de, dgilbert@interlog.com, jeyu@kernel.org, lucas.demarchi@intel.com, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v2 2/3] common/module: add patient module rmmod support Date: Wed, 11 Aug 2021 08:45:11 -0700 Message-Id: <20210811154512.1813622-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210811154512.1813622-1-mcgrof@kernel.org> References: <20210811154512.1813622-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org When we call rmmod it will fail if the refcnt is greater than 0. This is expected, however, if using test modules such as scsi_debug, userspace tests may expect that once userspace is done issuing out commands it can safely remove the module, and the module will be removed. This is not true for few reasons. First, a module might take a while to quiesce after its used. This varies module by module. For example, at least for scsi_debug there is one patch to help with this but that is not sufficient to address all the removal issues, it just helps quiesce the module faster. If something like LVM pvremove is used, as in the case of generic/108, it may take time before the module's refcnt goes to 0 even if DM_DEFERRED_REMOVE is *not* used and even if udevadm settle is used. Even *after* all this... the module refcnt is still very fickle. For example, any blkdev_open() against a block device will bump a module refcnt up and we have little control over stopping these sporadic userspace calls after a test. A failure on module removal then just becomes an inconvenience on false positives. This was first observed on scsi_debug [0]. Doug worked on a patch to help the driver quiesce [1]. Later the issue has been determined to be generic [2]. The only way to properly resolve these issues is with a patient module remover. The kernel used to support a wait for the delete_module() system call, however this was later deprecated into kmod with a 10 second userspace sleep. That 10 second sleep is long gone from kmod now though. I've posted patches now for a kmod patient module remover then [3], in light of the fact that this issue is generic and the only way to then properly deal with this is implementing a userspace patient module remover. Use the kmod patient module remover when supported, otherwise we open code our own solution inside fstests. We default to a timeout of 100 seconds. Each test can override the timeout by setting the variable MODPROBE_PATIENT_RM_TIMEOUT_SECONDS or setting it to "forever" if they wish for the patience to be infinite. This uses kmod's patient module remover if you have that feature, otherwise we open code a solution in fstests which is a simplified version of what has been proposed for kmod. [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337 [1] https://lore.kernel.org/linux-scsi/20210508230745.27923-1-dgilbert@interlog.com/ [2] https://bugzilla.kernel.org/show_bug.cgi?id=214015 [3] https://lkml.kernel.org/r/20210810051602.3067384-1-mcgrof@kernel.org Signed-off-by: Luis Chamberlain --- common/config | 31 +++++++++++++++ common/module | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) diff --git a/common/config b/common/config index 005fd50a..9b8a2bc4 100644 --- a/common/config +++ b/common/config @@ -252,6 +252,37 @@ if [[ "$UDEV_SETTLE_PROG" == "" || ! -d /proc/net ]]; then fi export UDEV_SETTLE_PROG +# Set MODPROBE_PATIENT_RM_TIMEOUT_SECONDS to "forever" if you want the patient +# modprobe removal to run forever trying to remove a module. +MODPROBE_REMOVE_PATIENT="" +modprobe --help | grep -q -1 "remove-patiently" +if [[ $? -ne 0 ]]; then + if [[ -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then + # We will open code our own implementation of patien module + # remover in fstests. Use 100 second default. + export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS="100" + fi +else + MODPROBE_RM_PATIENT_TIMEOUT_ARGS="" + if [[ ! -z "$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS" ]]; then + if [[ "$MODPROBE_PATIENT_RM_TIMEOUT_MS" != "forever" ]]; then + MODPROBE_PATIENT_RM_TIMEOUT_MS="$((MODPROBE_PATIENT_RM_TIMEOUT_SECONDS * 1000))" + MODPROBE_RM_PATIENT_TIMEOUT_ARGS="-t $MODPROBE_PATIENT_RM_TIMEOUT_MS" + fi + else + # We export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS here for parity + # with environments without support for modprobe -p, but we + # only really need it exported right now for environments which + # don't have support for modprobe -p to implement our own + # patient module removal support within fstests. + export MODPROBE_PATIENT_RM_TIMEOUT_SECONDS="100" + MODPROBE_PATIENT_RM_TIMEOUT_MS="$((MODPROBE_PATIENT_RM_TIMEOUT_SECONDS * 1000))" + MODPROBE_RM_PATIENT_TIMEOUT_ARGS="-t $MODPROBE_PATIENT_RM_TIMEOUT_MS" + fi + MODPROBE_REMOVE_PATIENT="modprobe -p $MODPROBE_RM_TIMEOUT_ARGS" +fi +export MODPROBE_REMOVE_PATIENT + export MKFS_XFS_PROG=$(type -P mkfs.xfs) export MKFS_EXT4_PROG=$(type -P mkfs.ext4) export MKFS_UDF_PROG=$(type -P mkudffs) diff --git a/common/module b/common/module index 39e4e793..03953fa1 100644 --- a/common/module +++ b/common/module @@ -4,6 +4,8 @@ # # Routines for messing around with loadable kernel modules +source common/config + # Return the module name for this fs. _module_for_fs() { @@ -81,3 +83,108 @@ _get_fs_module_param() { cat /sys/module/${FSTYP}/parameters/${1} 2>/dev/null } + +# checks the refcount and returns 0 if we can safely remove the module. rmmod +# does this check for us, but we can use this to also iterate checking for this +# refcount before we even try to remove the module. This is useful when using +# debug test modules which take a while to quiesce. +_patient_rmmod_check_refcnt() +{ + local module=$1 + local refcnt=0 + + if [[ -f /sys/module/$module/refcnt ]]; then + refcnt=$(cat /sys/module/$module/refcnt 2>/dev/null) + if [[ $? -ne 0 || $refcnt -eq 0 ]]; then + return 0 + fi + return 1 + fi + return 0 +} + +# Patiently tries to wait to remove a module by ensuring first +# the refcnt is 0 and then trying to persistently remove the module within +# the time allowed. The timeout is configurable per test, just set +# MODPROBE_PATIENT_RM_TIMEOUT_SECONDS prior to including this file. +# If you want this to try forever just set MODPROBE_PATIENT_RM_TIMEOUT_SECONDS +# to the special value of "forever". This applies to both cases where kmod +# supports the patient module remover (modrobe -p) and where it does not. +# +# If your version of kmod supports modprobe -p, we instead use that +# instead. Otherwise we have to implement a patient module remover +# ourselves. +_patient_rmmod() +{ + local module=$1 + local max_tries_max=$MODPROBE_PATIENT_RM_TIMEOUT_SECONDS + local max_tries=0 + local mod_ret=0 + local refcnt_is_zero=0 + + if [[ ! -z $MODPROBE_REMOVE_PATIENT ]]; then + $MODPROBE_REMOVE_PATIENT $module + mod_ret=$? + if [[ $mod_ret -ne 0 ]]; then + echo "kmod patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max returned $mod_ret" + fi + return $mod_ret + fi + + max_tries=$max_tries_max + + while [[ "$max_tries" != "0" ]]; do + _patient_rmmod_check_refcnt $module + if [[ $? -eq 0 ]]; then + refcnt_is_zero=1 + break + fi + sleep 1 + if [[ "$max_tries" == "forever" ]]; then + continue + fi + let max_tries=$max_tries-1 + done + + if [[ $refcnt_is_zero -ne 1 ]]; then + echo "custom patient module removal for $module timed out waiting for refcnt to become 0 using timeout of $max_tries_max" + return -1 + fi + + # If we ran out of time but our refcnt check confirms we had + # a refcnt of 0, just try to remove the module once. + if [[ "$max_tries" == "0" ]]; then + modprobe -r $module + return $? + fi + + # If we have extra time left. Use the time left to now try to + # persistently remove the module. We do this because although through + # the above we found refcnt to be 0, removal can still fail since + # userspace can always race to bump the refcnt. An example is any + # blkdev_open() calls against a block device. These issues have been + # tracked and documented in the following bug reports, which justifies + # our need to do this in userspace: + # https://bugzilla.kernel.org/show_bug.cgi?id=212337 + # https://bugzilla.kernel.org/show_bug.cgi?id=214015 + while [[ $max_tries != 0 ]]; do + if [[ -d /sys/module/$module ]]; then + modprobe -r $module 2> /dev/null + mod_ret=$? + if [[ $mod_ret == 0 ]]; then + break; + fi + sleep 1 + if [[ "$max_tries" == "forever" ]]; then + continue + fi + let max_tries=$max_tries-1 + fi + done + + if [[ $mod_ret -ne 0 ]]; then + echo "custom patient module removal for $module timed out trying to remove $module using timeout of $max_tries_max last try returned $mod_ret" + fi + + return $mod_ret +} From patchwork Wed Aug 11 15:45:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 12431523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA52FC4320E for ; Wed, 11 Aug 2021 15:45:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9238160FC3 for ; Wed, 11 Aug 2021 15:45:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233338AbhHKPpm (ORCPT ); Wed, 11 Aug 2021 11:45:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233309AbhHKPpk (ORCPT ); Wed, 11 Aug 2021 11:45:40 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D640AC0613D5; Wed, 11 Aug 2021 08:45:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=eiAdZBFJ/dkIb72t8AnNUyPp8Tvk3ThI3ydnoCcM7/U=; b=Bh0nCZGIyLB3BsxVWLEL1SaLym iTQ3AKp08LIEQbugAG/VlCP7wp+gKFF7esCUOnYwjuGO9nmLeXGfoVjZTnR+mrmJNOQnUHcZXqVRr K4S1naVqBzWXavRIFOYk0Jqg2GOYp9go6Y6FDkJnHAFsfWSJOpudMZ9uXkFH1Kyn5Xliaroe+8asG dEyQLcNZGkRrrzBfH1TML1TKY1u3QBlJUiPxCYow9HD4i7O/HHpgcF3xNVaHs91sQ/kZMWnvyJUBb RaNMZkCKeXi7Y2H7vtZZ6FgPkqkasuDgx+o9TtygVzuPamBW+4G/x5xk4pdVs7dbvNutKsFuHfogJ XyGXqqkQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mDqPr-007bpO-8w; Wed, 11 Aug 2021 15:45:15 +0000 From: Luis Chamberlain To: fstests@vger.kernel.org Cc: hare@suse.de, dgilbert@interlog.com, jeyu@kernel.org, lucas.demarchi@intel.com, linux-kernel@vger.kernel.org, Luis Chamberlain Subject: [PATCH v2 3/3] common/scsi_debug: use the patient module remover Date: Wed, 11 Aug 2021 08:45:12 -0700 Message-Id: <20210811154512.1813622-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210811154512.1813622-1-mcgrof@kernel.org> References: <20210811154512.1813622-1-mcgrof@kernel.org> MIME-Version: 1.0 Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org If you try to run tests such as generic/108 in a loop you'll eventually see a failure, but the failure can be a false positive and the test was just unable to remove the scsi_debug module. We need to give some time for the refcnt to become 0. For instance for the test generic/108 the refcnt lingers between 2 and 1. It should be 0 when we're done but a bit of time seems to be required. The chance of us trying to run rmmod when the refcnt is 2 or 1 is low, about 1/30 times if you run the test in a loop on linux-next today. Likewise, even when its 0 we just need a tiny breather before we can remove the module (sleep 10 suffices) but this is only required on older kernels. Otherwise removing the module will just fail. Some of these races are documented on the korg#212337, and Doug Gilbert has posted at least one patch attempt to try to help with this [1]. The patch does not resolve all the issues though, it helps though. [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337 [1] https://lkml.kernel.org/r/20210508230745.27923-1-dgilbert@interlog.com Signed-off-by: Luis Chamberlain --- common/scsi_debug | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/common/scsi_debug b/common/scsi_debug index e7988469..3c9cd820 100644 --- a/common/scsi_debug +++ b/common/scsi_debug @@ -4,11 +4,13 @@ # # Functions useful for tests on unique block devices +. common/module + _require_scsi_debug() { # make sure we have the module and it's not already used modinfo scsi_debug 2>&1 > /dev/null || _notrun "scsi_debug module not found" - lsmod | grep -wq scsi_debug && (rmmod scsi_debug || _notrun "scsi_debug module in use") + lsmod | grep -wq scsi_debug && (_patient_rmmod scsi_debug || _notrun "scsi_debug module in use") # make sure it has the features we need # logical/physical sectors plus unmap support all went in together modinfo scsi_debug | grep -wq sector_size || _notrun "scsi_debug too old" @@ -53,5 +55,5 @@ _put_scsi_debug_dev() $UDEV_SETTLE_PROG n=$((n-1)) done - rmmod scsi_debug || _fail "Could not remove scsi_debug module" + _patient_rmmod scsi_debug || _fail "Could not remove scsi_debug module" }