Message ID | 20240822193814.106111-3-mwilck@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] blktests: nvme: skip passthru tests on multipath devices | expand |
On Thu, Aug 22, 2024 at 09:38:14PM GMT, Martin Wilck wrote: > Add a test that repeatedly rescans nvme controllers while doing IO > on an nvme namespace connected to these controllers. The purpose > of the test is to make sure that no I/O errors or data corruption > occurs because of the rescan operations. Could you elaborate why this tests is added? Does it test for a regression, are there any patches for this? Or is it more let's ensure this actually works? The code looks good. Reviewed-by: Daniel Wagner <dwagner@suse.de>
On Fri, 2024-08-23 at 08:45 +0200, Daniel Wagner wrote: > On Thu, Aug 22, 2024 at 09:38:14PM GMT, Martin Wilck wrote: > > Add a test that repeatedly rescans nvme controllers while doing IO > > on an nvme namespace connected to these controllers. The purpose > > of the test is to make sure that no I/O errors or data corruption > > occurs because of the rescan operations. > > Could you elaborate why this tests is added? Does it test for a > regression, are there any patches for this? Or is it more let's > ensure > this actually works? The rationale was to test the kernel patch that I submitted yesterday: https://lore.kernel.org/linux-nvme/9de89e5a-04fc-4684-8514-b86884643a5d@suse.de/T/#t Thanks, Martin
On Fri, Aug 23, 2024 at 08:50:04AM GMT, Martin Wilck wrote: > On Fri, 2024-08-23 at 08:45 +0200, Daniel Wagner wrote: > > On Thu, Aug 22, 2024 at 09:38:14PM GMT, Martin Wilck wrote: > > > Add a test that repeatedly rescans nvme controllers while doing IO > > > on an nvme namespace connected to these controllers. The purpose > > > of the test is to make sure that no I/O errors or data corruption > > > occurs because of the rescan operations. > > > > Could you elaborate why this tests is added? Does it test for a > > regression, are there any patches for this? Or is it more let's > > ensure > > this actually works? > > The rationale was to test the kernel patch that I submitted yesterday: > https://lore.kernel.org/linux-nvme/9de89e5a-04fc-4684-8514-b86884643a5d@suse.de/T/#t I figured this out later too :) Just going chronology through the inbox.
On 8/23/24 01:08, Martin Wilck wrote: > Add a test that repeatedly rescans nvme controllers while doing IO > on an nvme namespace connected to these controllers. The purpose > of the test is to make sure that no I/O errors or data corruption > occurs because of the rescan operations. > > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > tests/nvme/053 | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/nvme/rc | 18 ++++++++++++++++ > 2 files changed, 74 insertions(+) > create mode 100755 tests/nvme/053 > > diff --git a/tests/nvme/053 b/tests/nvme/053 > new file mode 100755 > index 0000000..41dc8f2 > --- /dev/null > +++ b/tests/nvme/053 > @@ -0,0 +1,56 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-3.0+ > +# Copyright (C) 2024 Martin Wilck, SUSE LLC > + > +. tests/nvme/rc > + > +DESCRIPTION="test controller rescan under I/O load" > +TIMED=1 > +: "${TIMEOUT:=60}" > + > +rescan_controller() { > + local finish > + > + [[ -f "$1/rescan_controller" ]] || { > + echo "cannot rescan $1" > + return 1 > + } > + > + finish=$(($(date +%s) + TIMEOUT)) > + while [[ $(date +%s) -le $finish ]]; do > + # sleep interval between 0.1 and 5s > + usleep "$(((RANDOM%50 + 1)*100000))" > + echo 1 >"$1/rescan_controller" > + done > +} I think here usleep may not be available by default on all systems. For instance, on fedora/rhel I don't have usleep installed in the defualt configuration and so I have to first install it. So you may want to add "usleep" as per-requisite for this test. Moreover, after I installed usleep on fedora and ran the above test I see this warning: warning: usleep is deprecated, and will be removed in near future! Due to above warning the test fails. So is it possible to replace usleep with sleep? > + > +test_device() { > + local -a ctrls > + local c > + > + echo "Running ${TEST_NAME}" > + ctrls=($(_nvme_get_ctrl_list)) > + > + _run_fio_verify_io --filename="$TEST_DEV" --time_based &> "$FULL" & > + > + for c in "${ctrls[@]}"; do > + rescan_controller "$c" & > + done > + > + while true; do > + wait -n &>/dev/null > + st=$? > + case $st in > + 127) > + break > + ;; > + 0) > + ;; > + *) > + echo "child process exited with $st!" > + ;; > + esac > + done > + > + echo "Test complete" > +} > diff --git a/tests/nvme/rc b/tests/nvme/rc > index e7d2ab1..93b0571 100644 > --- a/tests/nvme/rc > +++ b/tests/nvme/rc > @@ -192,6 +192,24 @@ _test_dev_nvme_nsid() { > cat "${TEST_DEV_SYSFS}/nsid" > } > > +_nvme_get_ctrl_list() { > + local subsys > + local c > + > + subsys=$(readlink "${TEST_DEV_SYSFS}/device/subsystem") > + case $subsys in > + */nvme) > + readlink -f "${TEST_DEV_SYSFS}/device" > + ;; > + */nvme-subsystem) > + for c in "${TEST_DEV_SYSFS}"/device/nvme*; do > + [[ -L "$c" ]] || continue > + [[ -f "$c/dev" ]] && readlink -f "$c" > + done > + ;; > + esac > +} > + I don't know if I am missing anything here but just curious to know for which case $subsys would point to link ending in */nvme? I think that for all cases $subsys shall point to link which ends in */nvme-subsystem, isn't it? I assume here that $TEST_DEV_SYSFS would always resolve to a nvme block device. And the last point: I don't see 053.out file in your patchset. Did you forget to add this file? Thanks, --Nilay
On 8/23/24 15:48, Nilay Shroff wrote: > > > On 8/23/24 01:08, Martin Wilck wrote: >> Add a test that repeatedly rescans nvme controllers while doing IO >> on an nvme namespace connected to these controllers. The purpose >> of the test is to make sure that no I/O errors or data corruption >> occurs because of the rescan operations. >> >> Signed-off-by: Martin Wilck <mwilck@suse.com> >> --- >> tests/nvme/053 | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> tests/nvme/rc | 18 ++++++++++++++++ >> 2 files changed, 74 insertions(+) >> create mode 100755 tests/nvme/053 >> >> diff --git a/tests/nvme/053 b/tests/nvme/053 >> new file mode 100755 >> index 0000000..41dc8f2 >> --- /dev/null >> +++ b/tests/nvme/053 >> @@ -0,0 +1,56 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-3.0+ >> +# Copyright (C) 2024 Martin Wilck, SUSE LLC >> + >> +. tests/nvme/rc >> + >> +DESCRIPTION="test controller rescan under I/O load" >> +TIMED=1 >> +: "${TIMEOUT:=60}" >> + >> +rescan_controller() { >> + local finish >> + >> + [[ -f "$1/rescan_controller" ]] || { >> + echo "cannot rescan $1" >> + return 1 >> + } >> + >> + finish=$(($(date +%s) + TIMEOUT)) >> + while [[ $(date +%s) -le $finish ]]; do >> + # sleep interval between 0.1 and 5s >> + usleep "$(((RANDOM%50 + 1)*100000))" >> + echo 1 >"$1/rescan_controller" >> + done >> +} > I think here usleep may not be available by default on all systems. > For instance, on fedora/rhel I don't have usleep installed in the > defualt configuration and so I have to first install it. So you may > want to add "usleep" as per-requisite for this test. Moreover, after > I installed usleep on fedora and ran the above test I see this warning: > > warning: usleep is deprecated, and will be removed in near future! > > Due to above warning the test fails. So is it possible to replace > usleep with sleep? > >> + >> +test_device() { >> + local -a ctrls >> + local c >> + >> + echo "Running ${TEST_NAME}" >> + ctrls=($(_nvme_get_ctrl_list)) >> + >> + _run_fio_verify_io --filename="$TEST_DEV" --time_based &> "$FULL" & >> + >> + for c in "${ctrls[@]}"; do >> + rescan_controller "$c" & >> + done >> + >> + while true; do >> + wait -n &>/dev/null >> + st=$? >> + case $st in >> + 127) >> + break >> + ;; >> + 0) >> + ;; >> + *) >> + echo "child process exited with $st!" >> + ;; >> + esac >> + done >> + >> + echo "Test complete" >> +} >> diff --git a/tests/nvme/rc b/tests/nvme/rc >> index e7d2ab1..93b0571 100644 >> --- a/tests/nvme/rc >> +++ b/tests/nvme/rc >> @@ -192,6 +192,24 @@ _test_dev_nvme_nsid() { >> cat "${TEST_DEV_SYSFS}/nsid" >> } >> >> +_nvme_get_ctrl_list() { >> + local subsys >> + local c >> + >> + subsys=$(readlink "${TEST_DEV_SYSFS}/device/subsystem") >> + case $subsys in >> + */nvme) >> + readlink -f "${TEST_DEV_SYSFS}/device" >> + ;; >> + */nvme-subsystem) >> + for c in "${TEST_DEV_SYSFS}"/device/nvme*; do >> + [[ -L "$c" ]] || continue >> + [[ -f "$c/dev" ]] && readlink -f "$c" >> + done >> + ;; >> + esac >> +} >> + > I don't know if I am missing anything here but just curious to know > for which case $subsys would point to link ending in */nvme? > I think that for all cases $subsys shall point to link which ends > in */nvme-subsystem, isn't it? I assume here that $TEST_DEV_SYSFS would > always resolve to a nvme block device. > I think I got the answer for the above query, when I disabled the nvme multipath in the kernel config, I could see we hit the first case when $subsys would point to the link ending in */nvme, so this is not an issue. > And the last point: I don't see 053.out file in your patchset. Did you forget > to add this file? > > Thanks, > --Nilay > > > > >
On Fri, 2024-08-23 at 15:48 +0530, Nilay Shroff wrote: > > On 8/23/24 01:08, Martin Wilck wrote: > > > > + finish=$(($(date +%s) + TIMEOUT)) > > + while [[ $(date +%s) -le $finish ]]; do > > + # sleep interval between 0.1 and 5s > > + usleep "$(((RANDOM%50 + 1)*100000))" > > + echo 1 >"$1/rescan_controller" > > + done > > +} > I think here usleep may not be available by default on all systems. > For instance, on fedora/rhel I don't have usleep installed in the > defualt configuration and so I have to first install it. So you may > want to add "usleep" as per-requisite for this test. Moreover, after > I installed usleep on fedora and ran the above test I see this > warning: > > warning: usleep is deprecated, and will be removed in near future! > > Due to above warning the test fails. So is it possible to replace > usleep with sleep? The README states that blktests requires GNU coreutils, so yes, that would be feasible - in principle. The problem is that bash can't do floating point math, and I want to be able to sleep for fractions of a second. So I'd need to do something like this: usleep() { sleep "$(awk "BEGIN { print $1 / 1.e6; }" </dev/null)" } But the fork-and-exec to "awk" is slow. millisecond sleep times can't be realized this way. Anyway, I realize that calling "usleep" also carries a lot of overhead, and thus "usleep 1000" doesn't do what one would naïvely expect, either. The only way I can see to make this work as originally intended is to implement is as an awk script. The README says that GNU awk is required, so sleep() with floating point argument is available. Thanks Martin
diff --git a/tests/nvme/053 b/tests/nvme/053 new file mode 100755 index 0000000..41dc8f2 --- /dev/null +++ b/tests/nvme/053 @@ -0,0 +1,56 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-3.0+ +# Copyright (C) 2024 Martin Wilck, SUSE LLC + +. tests/nvme/rc + +DESCRIPTION="test controller rescan under I/O load" +TIMED=1 +: "${TIMEOUT:=60}" + +rescan_controller() { + local finish + + [[ -f "$1/rescan_controller" ]] || { + echo "cannot rescan $1" + return 1 + } + + finish=$(($(date +%s) + TIMEOUT)) + while [[ $(date +%s) -le $finish ]]; do + # sleep interval between 0.1 and 5s + usleep "$(((RANDOM%50 + 1)*100000))" + echo 1 >"$1/rescan_controller" + done +} + +test_device() { + local -a ctrls + local c + + echo "Running ${TEST_NAME}" + ctrls=($(_nvme_get_ctrl_list)) + + _run_fio_verify_io --filename="$TEST_DEV" --time_based &> "$FULL" & + + for c in "${ctrls[@]}"; do + rescan_controller "$c" & + done + + while true; do + wait -n &>/dev/null + st=$? + case $st in + 127) + break + ;; + 0) + ;; + *) + echo "child process exited with $st!" + ;; + esac + done + + echo "Test complete" +} diff --git a/tests/nvme/rc b/tests/nvme/rc index e7d2ab1..93b0571 100644 --- a/tests/nvme/rc +++ b/tests/nvme/rc @@ -192,6 +192,24 @@ _test_dev_nvme_nsid() { cat "${TEST_DEV_SYSFS}/nsid" } +_nvme_get_ctrl_list() { + local subsys + local c + + subsys=$(readlink "${TEST_DEV_SYSFS}/device/subsystem") + case $subsys in + */nvme) + readlink -f "${TEST_DEV_SYSFS}/device" + ;; + */nvme-subsystem) + for c in "${TEST_DEV_SYSFS}"/device/nvme*; do + [[ -L "$c" ]] || continue + [[ -f "$c/dev" ]] && readlink -f "$c" + done + ;; + esac +} + _nvme_calc_rand_io_size() { local img_size_mb local io_size_kb
Add a test that repeatedly rescans nvme controllers while doing IO on an nvme namespace connected to these controllers. The purpose of the test is to make sure that no I/O errors or data corruption occurs because of the rescan operations. Signed-off-by: Martin Wilck <mwilck@suse.com> --- tests/nvme/053 | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++ tests/nvme/rc | 18 ++++++++++++++++ 2 files changed, 74 insertions(+) create mode 100755 tests/nvme/053