Message ID | 20240902105454.1244406-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests,v3] nvme/052: wait for namespace removal before recreating namespace | expand |
On 9/2/24 16:24, Shin'ichiro Kawasaki wrote: > The CKI project reported that the test case nvme/052 fails occasionally > with the errors below: > > nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed] > runtime ... 22.209s > --- tests/nvme/052.out 2024-07-30 18:38:29.041716566 -0400 > +++ > +/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\ > ests-production.zip/storage/blktests/nvme/nvme-loop/blktests > +/results/nodev_tr_loop/nvme/052.out.bad 2024-07-30 18:45:35.438067452 -0400 > @@ -1,2 +1,4 @@ > Running nvme/052 > +cat: /sys/block/nvme1n2/uuid: No such file or directory > +cat: /sys/block/nvme1n2/uuid: No such file or directory > Test complete > > The test case repeats creating and removing namespaces. When the test > case removes the namespace by echoing 0 to the sysfs enable file, this > echo write does not wait for the completion of the namespace removal. > Before the removal completes, the test case recreates the namespace. > At this point, the sysfs uuid file for the old namespace still exists. > The test case misunderstands that the the sysfs uuid file would be for > the recreated namespace, and tries to read it. However, the removal > process for the old namespace deletes the sysfs uuid file at this point. > Then the read attempt fails and results in the errors. > > To avoid the failure, wait for the namespace removal before recreating > the namespace. Modify nvmf_wait_for_ns() so that it can wait for > namespace creation and removal. Call nvmf_wait_for_ns() for removal > after _remove_nvmet_ns() call. While at it, reduce the wait time unit > from 1 second to 0.1 second to shorten test time. > > The test case intends to catch the regression fixed by the kernel commit > ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit > from the kernel v6.11-rc4, then confirmed that the test case still can > catch the regression with this change. > > Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/ > Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > Changes from v2: > * Avoided conditional stderr redirect of _find_nvme_ns > > Changes from v1: > * Reused nvmf_wait_for_ns() instead of introuducing nvmf_wait_for_ns_removal() > * Added check of nvme_wait_for_ns() return value > > tests/nvme/052 | 25 ++++++++++++++++++------- > 1 file changed, 18 insertions(+), 7 deletions(-) > > diff --git a/tests/nvme/052 b/tests/nvme/052 > index cf6061a..401f043 100755 > --- a/tests/nvme/052 > +++ b/tests/nvme/052 > @@ -20,23 +20,27 @@ set_conditions() { > _set_nvme_trtype "$@" > } > > +# Wait for the namespace with specified uuid to fulfill the specified condtion, > +# "created" or "removed". > nvmf_wait_for_ns() { > local ns > local timeout="5" > local uuid="$1" > + local condition="$2" > > - ns=$(_find_nvme_ns "${uuid}") > + ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL") > > start_time=$(date +%s) > - while [[ -z "$ns" ]]; do > - sleep 1 > + while [[ -z "$ns" && "$condition" == created ]] || > + [[ -n "$ns" && "$condition" == removed ]]; do > + sleep .1 > end_time=$(date +%s) > if (( end_time - start_time > timeout )); then > echo "namespace with uuid \"${uuid}\" not " \ > - "found within ${timeout} seconds" > + "${condition} within ${timeout} seconds" > return 1 > fi > - ns=$(_find_nvme_ns "${uuid}") > + ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL") > done > > return 0 > @@ -63,14 +67,21 @@ test() { > _create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}" > > # wait until async request is processed and ns is created > - nvmf_wait_for_ns "${uuid}" > - if [ $? -eq 1 ]; then > + if ! nvmf_wait_for_ns "${uuid}" created; then > echo "FAIL" > rm "$(_nvme_def_file_path).$i" > break > fi > > _remove_nvmet_ns "${def_subsysnqn}" "${i}" > + > + # wait until async request is processed and ns is removed > + if ! nvmf_wait_for_ns "${uuid}" removed; then > + echo "FAIL" > + rm "$(_nvme_def_file_path).$i" > + break > + fi > + > rm "$(_nvme_def_file_path).$i" > } > done Looks good. Reviewed-by: Nilay Shroff (nilay@linux.ibm.com)
On Sep 02, 2024 / 19:54, Shin'ichiro Kawasaki wrote: > The CKI project reported that the test case nvme/052 fails occasionally > with the errors below: [...] > To avoid the failure, wait for the namespace removal before recreating > the namespace. Modify nvmf_wait_for_ns() so that it can wait for > namespace creation and removal. Call nvmf_wait_for_ns() for removal > after _remove_nvmet_ns() call. While at it, reduce the wait time unit > from 1 second to 0.1 second to shorten test time. > > The test case intends to catch the regression fixed by the kernel commit > ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit > from the kernel v6.11-rc4, then confirmed that the test case still can > catch the regression with this change. > > Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/ > Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> I applied the patch. Nilay, thanks for your help!
On Fri, Sep 6, 2024 at 4:22 PM Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote: > > On Sep 02, 2024 / 19:54, Shin'ichiro Kawasaki wrote: > > The CKI project reported that the test case nvme/052 fails occasionally > > with the errors below: > [...] > > To avoid the failure, wait for the namespace removal before recreating > > the namespace. Modify nvmf_wait_for_ns() so that it can wait for > > namespace creation and removal. Call nvmf_wait_for_ns() for removal > > after _remove_nvmet_ns() call. While at it, reduce the wait time unit > > from 1 second to 0.1 second to shorten test time. > > > > The test case intends to catch the regression fixed by the kernel commit > > ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit > > from the kernel v6.11-rc4, then confirmed that the test case still can > > catch the regression with this change. > > > > Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/ > > Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns") > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > > I applied the patch. Nilay, thanks for your help! > Thanks for the fix, I've verified the fix on the reproduced server. BTW, after the change, the executing time also reduced to 7s from 22s ===============================294 nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [passed] runtime 7.759s ... 7.673s
On Sep 07, 2024 / 14:37, Yi Zhang wrote: > On Fri, Sep 6, 2024 at 4:22 PM Shinichiro Kawasaki > <shinichiro.kawasaki@wdc.com> wrote: > > > > On Sep 02, 2024 / 19:54, Shin'ichiro Kawasaki wrote: > > > The CKI project reported that the test case nvme/052 fails occasionally > > > with the errors below: > > [...] > > > To avoid the failure, wait for the namespace removal before recreating > > > the namespace. Modify nvmf_wait_for_ns() so that it can wait for > > > namespace creation and removal. Call nvmf_wait_for_ns() for removal > > > after _remove_nvmet_ns() call. While at it, reduce the wait time unit > > > from 1 second to 0.1 second to shorten test time. > > > > > > The test case intends to catch the regression fixed by the kernel commit > > > ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit > > > from the kernel v6.11-rc4, then confirmed that the test case still can > > > catch the regression with this change. > > > > > > Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/ > > > Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns") > > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > > > > I applied the patch. Nilay, thanks for your help! > > > > Thanks for the fix, I've verified the fix on the reproduced server. Thanks for the confirmation :) > BTW, after the change, the executing time also reduced to 7s from 22s This runtime reduction is expected. I suggested a change for it, and it was discussed here: https://lore.kernel.org/linux-block/0750187c-24ad-4073-9ba1-d47b0ee95062@linux.ibm.com/
diff --git a/tests/nvme/052 b/tests/nvme/052 index cf6061a..401f043 100755 --- a/tests/nvme/052 +++ b/tests/nvme/052 @@ -20,23 +20,27 @@ set_conditions() { _set_nvme_trtype "$@" } +# Wait for the namespace with specified uuid to fulfill the specified condtion, +# "created" or "removed". nvmf_wait_for_ns() { local ns local timeout="5" local uuid="$1" + local condition="$2" - ns=$(_find_nvme_ns "${uuid}") + ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL") start_time=$(date +%s) - while [[ -z "$ns" ]]; do - sleep 1 + while [[ -z "$ns" && "$condition" == created ]] || + [[ -n "$ns" && "$condition" == removed ]]; do + sleep .1 end_time=$(date +%s) if (( end_time - start_time > timeout )); then echo "namespace with uuid \"${uuid}\" not " \ - "found within ${timeout} seconds" + "${condition} within ${timeout} seconds" return 1 fi - ns=$(_find_nvme_ns "${uuid}") + ns=$(_find_nvme_ns "${uuid}" 2>> "$FULL") done return 0 @@ -63,14 +67,21 @@ test() { _create_nvmet_ns "${def_subsysnqn}" "${i}" "$(_nvme_def_file_path).$i" "${uuid}" # wait until async request is processed and ns is created - nvmf_wait_for_ns "${uuid}" - if [ $? -eq 1 ]; then + if ! nvmf_wait_for_ns "${uuid}" created; then echo "FAIL" rm "$(_nvme_def_file_path).$i" break fi _remove_nvmet_ns "${def_subsysnqn}" "${i}" + + # wait until async request is processed and ns is removed + if ! nvmf_wait_for_ns "${uuid}" removed; then + echo "FAIL" + rm "$(_nvme_def_file_path).$i" + break + fi + rm "$(_nvme_def_file_path).$i" } done
The CKI project reported that the test case nvme/052 fails occasionally with the errors below: nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed] runtime ... 22.209s --- tests/nvme/052.out 2024-07-30 18:38:29.041716566 -0400 +++ +/mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-t\ ests-production.zip/storage/blktests/nvme/nvme-loop/blktests +/results/nodev_tr_loop/nvme/052.out.bad 2024-07-30 18:45:35.438067452 -0400 @@ -1,2 +1,4 @@ Running nvme/052 +cat: /sys/block/nvme1n2/uuid: No such file or directory +cat: /sys/block/nvme1n2/uuid: No such file or directory Test complete The test case repeats creating and removing namespaces. When the test case removes the namespace by echoing 0 to the sysfs enable file, this echo write does not wait for the completion of the namespace removal. Before the removal completes, the test case recreates the namespace. At this point, the sysfs uuid file for the old namespace still exists. The test case misunderstands that the the sysfs uuid file would be for the recreated namespace, and tries to read it. However, the removal process for the old namespace deletes the sysfs uuid file at this point. Then the read attempt fails and results in the errors. To avoid the failure, wait for the namespace removal before recreating the namespace. Modify nvmf_wait_for_ns() so that it can wait for namespace creation and removal. Call nvmf_wait_for_ns() for removal after _remove_nvmet_ns() call. While at it, reduce the wait time unit from 1 second to 0.1 second to shorten test time. The test case intends to catch the regression fixed by the kernel commit ff0ffe5b7c3c ("nvme: fix namespace removal list"). I reverted the commit from the kernel v6.11-rc4, then confirmed that the test case still can catch the regression with this change. Link: https://lore.kernel.org/linux-block/tczctp5tkr34o3k3f4dlyhuutgp2ycex6gdbjuqx4trn6ewm2i@qbkza3yr5wdd/ Fixes: 077211a0e9ff ("nvme: add test for creating/deleting file-ns") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- Changes from v2: * Avoided conditional stderr redirect of _find_nvme_ns Changes from v1: * Reused nvmf_wait_for_ns() instead of introuducing nvmf_wait_for_ns_removal() * Added check of nvme_wait_for_ns() return value tests/nvme/052 | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)