diff mbox series

[blktests,V3] nvme: add nvme pci timeout testcase

Message ID 20240123225547.10221-1-kch@nvidia.com (mailing list archive)
State New, archived
Headers show
Series [blktests,V3] nvme: add nvme pci timeout testcase | expand

Commit Message

Chaitanya Kulkarni Jan. 23, 2024, 10:55 p.m. UTC
Trigger and test nvme-pci timeout with concurrent fio jobs.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
V3:-

1. Add CAN_BE_ZONED.
2. Add FAULT_INJECTION_DEBUG_FS check in requires.
3. Remove _require_nvme_trtype pci in requires().
4. Remove device_requires().
5. Store fio output in FULL.
6. Revmoe shellcheck and use grep I/O error value to pass/fail testcase.

---
 tests/nvme/050     | 69 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/nvme/050.out |  2 ++
 2 files changed, 71 insertions(+)
 create mode 100755 tests/nvme/050
 create mode 100644 tests/nvme/050.out

Comments

Chaitanya Kulkarni Jan. 29, 2024, 11:13 a.m. UTC | #1
On 1/23/24 14:55, Chaitanya Kulkarni wrote:
> Trigger and test nvme-pci timeout with concurrent fio jobs.
>
> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
> ---
> V3:-
>
> 1. Add CAN_BE_ZONED.
> 2. Add FAULT_INJECTION_DEBUG_FS check in requires.
> 3. Remove _require_nvme_trtype pci in requires().
> 4. Remove device_requires().
> 5. Store fio output in FULL.
> 6. Revmoe shellcheck and use grep I/O error value to pass/fail testcase.
>
> ---
>

is there any objections on this patch ?

-ck
Shinichiro Kawasaki Jan. 29, 2024, 11:57 a.m. UTC | #2
On Jan 29, 2024 / 11:13, Chaitanya Kulkarni wrote:
> On 1/23/24 14:55, Chaitanya Kulkarni wrote:
> > Trigger and test nvme-pci timeout with concurrent fio jobs.
> >
> > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
> > ---
> > V3:-
> >
> > 1. Add CAN_BE_ZONED.
> > 2. Add FAULT_INJECTION_DEBUG_FS check in requires.
> > 3. Remove _require_nvme_trtype pci in requires().
> > 4. Remove device_requires().
> > 5. Store fio output in FULL.
> > 6. Revmoe shellcheck and use grep I/O error value to pass/fail testcase.
> >
> > ---
> >
> 
> is there any objections on this patch ?

Thanks for the V3 patch. It looks almost good, except one thing: as I commented
on V1 and V2, this test case often leaves the target NVME device with zero size.
It will make the following test cases fail (I mean the future test cases to be
added).

I suggest to remove and rescan the device to regain the good status of the
device. The following change will do it. What do you think?

(If we take this change as it is, it will recover the io_timeout_fail sysfs
attribute also, so the code to save and restore io_timeout_fail can be removed.)

diff --git a/tests/nvme/050 b/tests/nvme/050
index cacaba6..cb1c6f5 100755
--- a/tests/nvme/050
+++ b/tests/nvme/050
@@ -41,9 +41,12 @@ restore_fi_settings() {
 test_device() {
 	local nvme_ns
 	local io_fimeout_fail
+	local pdev
 
 	echo "Running ${TEST_NAME}"
 
+	pdev=$(_get_pci_dev_from_blkdev)
+
 	nvme_ns="$(basename "${TEST_DEV}")"
 	io_fimeout_fail="$(cat /sys/block/"${nvme_ns}"/io-timeout-fail)"
 	save_fi_settings
@@ -66,4 +69,11 @@ test_device() {
 	fi
 	restore_fi_settings
 	echo "${io_fimeout_fail}" > /sys/block/"${nvme_ns}"/io-timeout-fail
+
+	# Remove and rescan the NVME device to ensure that it has come back
+	echo 1 > "/sys/bus/pci/devices/$pdev/remove"
+	echo 1 > /sys/bus/pci/rescan
+	if [[ ! -b $TEST_DEV ]]; then
+		echo "Failed to regain $TEST_DEV"
+	fi
 }
diff mbox series

Patch

diff --git a/tests/nvme/050 b/tests/nvme/050
new file mode 100755
index 0000000..cacaba6
--- /dev/null
+++ b/tests/nvme/050
@@ -0,0 +1,69 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2024 Chaitanya Kulkarni
+#
+# Test NVMe-PCI timeout with FIO jobs by triggering the nvme_timeout function.
+#
+
+. tests/nvme/rc
+
+DESCRIPTION="test nvme-pci timeout with fio jobs"
+CAN_BE_ZONED=1
+
+sysfs_path="/sys/kernel/debug/fail_io_timeout/"
+#restrict test to nvme-pci only
+nvme_trtype=pci
+
+# fault injection config array
+declare -A fi_array
+
+requires() {
+	_have_fio
+	_nvme_requires
+	_have_kernel_option FAIL_IO_TIMEOUT
+	_have_kernel_option FAULT_INJECTION_DEBUG_FS
+}
+
+save_fi_settings() {
+	for fi_attr in probability interval times space verbose
+	do
+		fi_array["${fi_attr}"]=$(cat "${sysfs_path}/${fi_attr}")
+	done
+}
+
+restore_fi_settings() {
+	for fi_attr in probability interval times space verbose
+	do
+		echo "${fi_array["${fi_attr}"]}" > "${sysfs_path}/${fi_attr}"
+	done
+}
+
+test_device() {
+	local nvme_ns
+	local io_fimeout_fail
+
+	echo "Running ${TEST_NAME}"
+
+	nvme_ns="$(basename "${TEST_DEV}")"
+	io_fimeout_fail="$(cat /sys/block/"${nvme_ns}"/io-timeout-fail)"
+	save_fi_settings
+	echo 1 > /sys/block/"${nvme_ns}"/io-timeout-fail
+
+	echo 100 > /sys/kernel/debug/fail_io_timeout/probability
+	echo   1 > /sys/kernel/debug/fail_io_timeout/interval
+	echo  -1 > /sys/kernel/debug/fail_io_timeout/times
+	echo   0 > /sys/kernel/debug/fail_io_timeout/space
+	echo   1 > /sys/kernel/debug/fail_io_timeout/verbose
+
+	fio --bs=4k --rw=randread --norandommap --numjobs="$(nproc)" \
+	    --name=reads --direct=1 --filename="${TEST_DEV}" --group_reporting \
+	    --time_based --runtime=1m >& "$FULL"
+
+	if grep -q "Input/output error" "$FULL"; then
+		echo "Test complete"
+	else
+		echo "Test failed"
+	fi
+	restore_fi_settings
+	echo "${io_fimeout_fail}" > /sys/block/"${nvme_ns}"/io-timeout-fail
+}
diff --git a/tests/nvme/050.out b/tests/nvme/050.out
new file mode 100644
index 0000000..b78b05f
--- /dev/null
+++ b/tests/nvme/050.out
@@ -0,0 +1,2 @@ 
+Running nvme/050
+Test complete