From patchwork Wed Nov 29 10:31:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shin'ichiro Kawasaki X-Patchwork-Id: 13472597 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="XlLF4y8o" Received: from esa6.hgst.iphmx.com (esa6.hgst.iphmx.com [216.71.154.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50438BA for ; Wed, 29 Nov 2023 02:31:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1701253908; x=1732789908; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9batTlaSDrFqw9U3fReQJZMXqCnbBM0PPNyoyRkA5C8=; b=XlLF4y8oHSPcTeLiiRwhmrtLdfGXx3ZbR7Vg3ztqJccKyM3i3vZlwtes IxaOBcOCDsmSZ5JFPf6+tt6H3Uid0HfGwxyNSJf9TT4wA3lY8LlILSIEE QnBoFHNUURqlx9lNb1uZMs+oHienVb0qVmFSpvgoAZ/tafNKO95WKQWUz DVeDM7ZjXlqhKq4lAGg+zoH1CBalHDs+jzAebJTgaiPmrZZMGcxwZQglh 0suddePyETLvzHR1uPXdhby3ya9nBgmvmnQjv4hZx/c8vYKA7yBxodHgW c7Sz6CLqYf7Nhp3lIADWJMO4cnWTkWISWVHk7Jiw7Fi+kZSsF1DLqrkED Q==; X-CSE-ConnectionGUID: ATHDeardR5yQI/hWl2IT9g== X-CSE-MsgGUID: 0ZkYh5unQHKzpwwDCjqgqA== X-IronPort-AV: E=Sophos;i="6.04,235,1695657600"; d="scan'208";a="3614311" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 29 Nov 2023 18:31:46 +0800 IronPort-SDR: UKsW5p2XdLVpR836IdYtcKO/xH+4YXkHgtH3XOxSfK5wiz0qqTNPcCA7HOrqk8KO0R/eyLaQqm CmKAMzIGJCbg== Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 29 Nov 2023 01:37:16 -0800 IronPort-SDR: 6etQF5tyg3NVrGO7MSGz7dIYknX9xE+5YVgxWIuq0CHfJK+v+s1fQstyBoLf53dXdqXZFV9yEK JoPPJaHrPP1Q== WDCIronportException: Internal Received: from shindev.dhcp.fujisawa.hgst.com (HELO shindev.fujisawa.hgst.com) ([10.149.53.55]) by uls-op-cesaip01.wdc.com with ESMTP; 29 Nov 2023 02:31:46 -0800 From: Shin'ichiro Kawasaki To: linux-block@vger.kernel.org Cc: Johannes Thumshirn , Shin'ichiro Kawasaki Subject: [PATCH blktests 1/2] block/011: recover test target devices to online or live status Date: Wed, 29 Nov 2023 19:31:44 +0900 Message-ID: <20231129103145.655612-2-shinichiro.kawasaki@wdc.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231129103145.655612-1-shinichiro.kawasaki@wdc.com> References: <20231129103145.655612-1-shinichiro.kawasaki@wdc.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The test case runs fio while disabling and enabling PCI device of the test target block device. This often leaves the devices in offline or dead status. For example, when the block device is a HDD connected to HBA, kernel makes the device into offline mode with this message: sd x:x:x:x Device offlined - not ready after error recovery This causes following test cases to fail. To avoid the failure, remove and rescan the devices to get them back to online or live status. This improvement is similar as the commit f8f33218eca7 ("block/011: recover test target NVME device capacity"). While at this change, improve code comments for the commit f8f33218eca7, and add missing local variable declarations. Of note is that the added rescan operation triggers a lockdep WARN if the system has devices which depend on P2SB [1]. [1] https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/ Signed-off-by: Shin'ichiro Kawasaki --- tests/block/011 | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/tests/block/011 b/tests/block/011 index a4230f4..2a967d1 100755 --- a/tests/block/011 +++ b/tests/block/011 @@ -38,6 +38,8 @@ device_requires() { test_device() { echo "Running ${TEST_NAME}" + local pdev size rescan=false state i + pdev="$(_get_pci_dev_from_blkdev)" if _test_dev_is_rotational; then @@ -60,17 +62,36 @@ test_device() { echo "Test complete" - # This test triggers NVME controller resets. When any failure happens - # during the resets, the driver marks the NVME block devices with zero - # capacity. Then following tests fail with the zero capacity devices. To - # get back the correct capacity, remove and rescan the devices. + # This test triggers NVME controller resets. When failures happen during + # the resets, the driver marks the NVME block devices as zero capacity. + # Remove and rescan the devices to regain the correct capacity. if ((!$(<"$TEST_DEV_SYSFS/size"))); then - echo "$TEST_DEV has zero capacity" >> "$FULL" - if [[ -w $TEST_DEV_SYSFS/device/device/remove ]] && + echo "$TEST_DEV has zero capacity. Rescan it." >> "$FULL" + rescan=true + fi + + # This test case often makes NVME or HDDs connected to HBAs in offline + # or dead mode. Remove and rescan the devices to make them online again. + if [[ -r $TEST_DEV_SYSFS/device/state ]]; then + state=$(cat "$TEST_DEV_SYSFS/device/state") + if [[ $state == offline || $state == dead ]]; then + echo "$TEST_DEV is $state. Rescan it." >> "$FULL" + rescan=true + fi + fi + + if [[ $rescan == true ]]; then + if [[ -w /sys/bus/pci/devices/$pdev/remove ]] && [[ -w /sys/bus/pci/rescan ]]; then - echo "Rescan to tegain the correct capacity" >> "$FULL" - echo 1 > "$TEST_DEV_SYSFS/device/device/remove" + echo 1 > "/sys/bus/pci/devices/$pdev/remove" echo 1 > /sys/bus/pci/rescan + else + echo "Can not rescan PCI device for recovery" + return 1 fi + for ((i = 0; i < 10; i++)); do + [[ -w $TEST_DEV ]] && break + sleep 5 + done fi } From patchwork Wed Nov 29 10:31:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shin'ichiro Kawasaki X-Patchwork-Id: 13472596 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="CCoj+IMc" Received: from esa6.hgst.iphmx.com (esa6.hgst.iphmx.com [216.71.154.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 532721AE for ; Wed, 29 Nov 2023 02:31:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1701253908; x=1732789908; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KnviQ2ymAzfNc4y3yC6typtzhWKDAm9b+z3JTEDzMuc=; b=CCoj+IMcw2PnWXU24bPECuFTuu4dLjnkd+qd78Y74bEmWMHqk3u3qb+i Ke05tp3XLd2t+0c6dDGDwqX95JOop5t0SNfuGuoGg4dAsTRAOQqFMUp1G YveGm/2b0YCsw59ifVN9cIBpRyrFiEFPdam26a6OpTgkrYR2d4fz/4qL3 m2MS+TJiIp5xHki6E+pxIc9223fPuHOL4XOBhCYbpoLsmcuLcicpUd+do jc3V32J0aTQqMwTk7fCuH8X8B6efhEgxjyhjxBLyxTZ41MaiUTrngXlrF QDrX7ZjYpfmTxqRe3kWHrIs/olNB8XvXdYdAbk3IVH6RuhRVowuCum+L0 w==; X-CSE-ConnectionGUID: hDlwciWMTx2xs09l20OCHg== X-CSE-MsgGUID: 5iZkzzkoTLiTgvTgrUATig== X-IronPort-AV: E=Sophos;i="6.04,235,1695657600"; d="scan'208";a="3614312" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 29 Nov 2023 18:31:47 +0800 IronPort-SDR: 3XQE4eOa3c0lIlZVyDkAw984Tj0pBtBiRIymwTzsB9M3zhOmZQDEKtNMJWVUDA9OkWxNc1OLNj ccF2tWmqayGg== Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 29 Nov 2023 01:37:16 -0800 IronPort-SDR: RDGle/Qs6CbYgeJPiWpE8p39qd7pZJGQaGSghSxYTPkBlDvRC5a8/bMwvOmmRHHvZ3F3uBUEiW pEHin2Td3tHQ== WDCIronportException: Internal Received: from shindev.dhcp.fujisawa.hgst.com (HELO shindev.fujisawa.hgst.com) ([10.149.53.55]) by uls-op-cesaip01.wdc.com with ESMTP; 29 Nov 2023 02:31:47 -0800 From: Shin'ichiro Kawasaki To: linux-block@vger.kernel.org Cc: Johannes Thumshirn , Shin'ichiro Kawasaki Subject: [PATCH blktests 2/2] block/011: set default timeout to 20 minutes Date: Wed, 29 Nov 2023 19:31:45 +0900 Message-ID: <20231129103145.655612-3-shinichiro.kawasaki@wdc.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231129103145.655612-1-shinichiro.kawasaki@wdc.com> References: <20231129103145.655612-1-shinichiro.kawasaki@wdc.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The test case runs fio while disabling and enabling PCI device of the test target block device. Depending on the device type, it takes very long time to re-enable the device. At worst case, it takes 4 hours to complete the test case. To avoid the meaningless long test runtime, set default timeout limit. I ran the test case on various devices: real NVME SSD, QEMU NVME emulation, HDDs with AHCI, HDDs with SAS-HBA. Many of them takes less than 20 minutes to complete and pass the test case. Hence, choose 20 minutes as the timeout duration. Signed-off-by: Shin'ichiro Kawasaki --- tests/block/011 | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/block/011 b/tests/block/011 index 2a967d1..78d8d3d 100755 --- a/tests/block/011 +++ b/tests/block/011 @@ -49,6 +49,7 @@ test_device() { fi # start fio job + : "${TIMEOUT:=1200}" _run_fio_rand_io --filename="$TEST_DEV" --size="$size" \ --ignore_error=EIO,ENXIO,ENODEV &