Message ID | 20240123225547.10221-1-kch@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests,V3] nvme: add nvme pci timeout testcase | expand |
On 1/23/24 14:55, Chaitanya Kulkarni wrote: > Trigger and test nvme-pci timeout with concurrent fio jobs. > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > --- > V3:- > > 1. Add CAN_BE_ZONED. > 2. Add FAULT_INJECTION_DEBUG_FS check in requires. > 3. Remove _require_nvme_trtype pci in requires(). > 4. Remove device_requires(). > 5. Store fio output in FULL. > 6. Revmoe shellcheck and use grep I/O error value to pass/fail testcase. > > --- > is there any objections on this patch ? -ck
On Jan 29, 2024 / 11:13, Chaitanya Kulkarni wrote: > On 1/23/24 14:55, Chaitanya Kulkarni wrote: > > Trigger and test nvme-pci timeout with concurrent fio jobs. > > > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > > --- > > V3:- > > > > 1. Add CAN_BE_ZONED. > > 2. Add FAULT_INJECTION_DEBUG_FS check in requires. > > 3. Remove _require_nvme_trtype pci in requires(). > > 4. Remove device_requires(). > > 5. Store fio output in FULL. > > 6. Revmoe shellcheck and use grep I/O error value to pass/fail testcase. > > > > --- > > > > is there any objections on this patch ? Thanks for the V3 patch. It looks almost good, except one thing: as I commented on V1 and V2, this test case often leaves the target NVME device with zero size. It will make the following test cases fail (I mean the future test cases to be added). I suggest to remove and rescan the device to regain the good status of the device. The following change will do it. What do you think? (If we take this change as it is, it will recover the io_timeout_fail sysfs attribute also, so the code to save and restore io_timeout_fail can be removed.) diff --git a/tests/nvme/050 b/tests/nvme/050 index cacaba6..cb1c6f5 100755 --- a/tests/nvme/050 +++ b/tests/nvme/050 @@ -41,9 +41,12 @@ restore_fi_settings() { test_device() { local nvme_ns local io_fimeout_fail + local pdev echo "Running ${TEST_NAME}" + pdev=$(_get_pci_dev_from_blkdev) + nvme_ns="$(basename "${TEST_DEV}")" io_fimeout_fail="$(cat /sys/block/"${nvme_ns}"/io-timeout-fail)" save_fi_settings @@ -66,4 +69,11 @@ test_device() { fi restore_fi_settings echo "${io_fimeout_fail}" > /sys/block/"${nvme_ns}"/io-timeout-fail + + # Remove and rescan the NVME device to ensure that it has come back + echo 1 > "/sys/bus/pci/devices/$pdev/remove" + echo 1 > /sys/bus/pci/rescan + if [[ ! -b $TEST_DEV ]]; then + echo "Failed to regain $TEST_DEV" + fi }
diff --git a/tests/nvme/050 b/tests/nvme/050 new file mode 100755 index 0000000..cacaba6 --- /dev/null +++ b/tests/nvme/050 @@ -0,0 +1,69 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-3.0+ +# Copyright (C) 2024 Chaitanya Kulkarni +# +# Test NVMe-PCI timeout with FIO jobs by triggering the nvme_timeout function. +# + +. tests/nvme/rc + +DESCRIPTION="test nvme-pci timeout with fio jobs" +CAN_BE_ZONED=1 + +sysfs_path="/sys/kernel/debug/fail_io_timeout/" +#restrict test to nvme-pci only +nvme_trtype=pci + +# fault injection config array +declare -A fi_array + +requires() { + _have_fio + _nvme_requires + _have_kernel_option FAIL_IO_TIMEOUT + _have_kernel_option FAULT_INJECTION_DEBUG_FS +} + +save_fi_settings() { + for fi_attr in probability interval times space verbose + do + fi_array["${fi_attr}"]=$(cat "${sysfs_path}/${fi_attr}") + done +} + +restore_fi_settings() { + for fi_attr in probability interval times space verbose + do + echo "${fi_array["${fi_attr}"]}" > "${sysfs_path}/${fi_attr}" + done +} + +test_device() { + local nvme_ns + local io_fimeout_fail + + echo "Running ${TEST_NAME}" + + nvme_ns="$(basename "${TEST_DEV}")" + io_fimeout_fail="$(cat /sys/block/"${nvme_ns}"/io-timeout-fail)" + save_fi_settings + echo 1 > /sys/block/"${nvme_ns}"/io-timeout-fail + + echo 100 > /sys/kernel/debug/fail_io_timeout/probability + echo 1 > /sys/kernel/debug/fail_io_timeout/interval + echo -1 > /sys/kernel/debug/fail_io_timeout/times + echo 0 > /sys/kernel/debug/fail_io_timeout/space + echo 1 > /sys/kernel/debug/fail_io_timeout/verbose + + fio --bs=4k --rw=randread --norandommap --numjobs="$(nproc)" \ + --name=reads --direct=1 --filename="${TEST_DEV}" --group_reporting \ + --time_based --runtime=1m >& "$FULL" + + if grep -q "Input/output error" "$FULL"; then + echo "Test complete" + else + echo "Test failed" + fi + restore_fi_settings + echo "${io_fimeout_fail}" > /sys/block/"${nvme_ns}"/io-timeout-fail +} diff --git a/tests/nvme/050.out b/tests/nvme/050.out new file mode 100644 index 0000000..b78b05f --- /dev/null +++ b/tests/nvme/050.out @@ -0,0 +1,2 @@ +Running nvme/050 +Test complete
Trigger and test nvme-pci timeout with concurrent fio jobs. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> --- V3:- 1. Add CAN_BE_ZONED. 2. Add FAULT_INJECTION_DEBUG_FS check in requires. 3. Remove _require_nvme_trtype pci in requires(). 4. Remove device_requires(). 5. Store fio output in FULL. 6. Revmoe shellcheck and use grep I/O error value to pass/fail testcase. --- tests/nvme/050 | 69 ++++++++++++++++++++++++++++++++++++++++++++++ tests/nvme/050.out | 2 ++ 2 files changed, 71 insertions(+) create mode 100755 tests/nvme/050 create mode 100644 tests/nvme/050.out