diff mbox series

[blktests] nbd/rc: check nbd connection with nbd-client -check command

Message ID 20240319085015.3901051-1-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series [blktests] nbd/rc: check nbd connection with nbd-client -check command | expand

Commit Message

Shinichiro Kawasaki March 19, 2024, 8:50 a.m. UTC
_wait_for_nbd_connect() checks nbd connections by checking the existence
of a debugfs attribute file. However, even when the file exists, nbd
connections are not fully ready, and the stat command for the nbd device
file in the test case nbd/002 may fail with unexpected I/O errors.

To avoid the failure, check the nbd connections not only by the debugfs
attribute file, but also by "nbd-client -check" command.

Link: https://github.com/osandov/blktests/pull/134
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 tests/nbd/rc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Yi Zhang March 20, 2024, 6:12 a.m. UTC | #1
Hi Shinichiro

Thanks for the fix, with this change, the issue still can be
reproduced, here is the log:

=======================98
nbd/002 (tests on partition handling for an nbd device)      [failed]
    runtime  1.436s  ...  0.917s
    --- tests/nbd/002.out 2024-03-19 04:51:34.051614893 +0100
    +++ /root/blktests/results/nodev/nbd/002.out.bad 2024-03-20
07:01:28.769392087 +0100
    @@ -1,4 +1,4 @@
     Running nbd/002
     Testing IOCTL path
     Testing the netlink path
    -Test complete
    +Didn't have partition on the netlink path

dmesg:
[  737.405376] run blktests nbd/002 at 2024-03-20 07:01:27
[  738.102997] nbd0: detected capacity change from 0 to 20971520
[  738.122439]  nbd0:
[  738.157483] block nbd0: NBD_DISCONNECT
[  738.157742] block nbd0: Disconnected due to user request.
[  738.158094] block nbd0: shutting down sockets
[  738.206999] nbd0: detected capacity change from 0 to 20971520
[  738.208587]  nbd0: p1
[  738.246641] block nbd0: NBD_DISCONNECT
[  738.246893] block nbd0: Disconnected due to user request.
[  738.247217] block nbd0: shutting down sockets
[  738.313979] nbd0: detected capacity change from 0 to 20971520
[  738.315450]  nbd0: p1
[  738.319949] block nbd0: NBD_DISCONNECT
[  738.320244] block nbd0: Disconnected due to user request.
[  738.320535] block nbd0: shutting down sockets
[  738.321276] blk_print_req_error: 4 callbacks suppressed
[  738.321280] I/O error, dev nbd0, sector 272 op 0x0:(READ) flags
0x80700 phys_seg 30 prio class 0
[  738.322466] I/O error, dev nbd0, sector 272 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 0
[  738.322901] buffer_io_error: 4 callbacks suppressed
[  738.322903] Buffer I/O error on dev nbd0, logical block 34, async page read
[  738.326007] I/O error, dev nbd0, sector 16 op 0x0:(READ) flags
0x80700 phys_seg 1 prio class 0
[  738.326916] I/O error, dev nbd0, sector 16 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 0
[  738.327381] Buffer I/O error on dev nbd0, logical block 2, async page read

On Tue, Mar 19, 2024 at 4:50 PM Shin'ichiro Kawasaki
<shinichiro.kawasaki@wdc.com> wrote:
>
> _wait_for_nbd_connect() checks nbd connections by checking the existence
> of a debugfs attribute file. However, even when the file exists, nbd
> connections are not fully ready, and the stat command for the nbd device
> file in the test case nbd/002 may fail with unexpected I/O errors.
>
> To avoid the failure, check the nbd connections not only by the debugfs
> attribute file, but also by "nbd-client -check" command.
>
> Link: https://github.com/osandov/blktests/pull/134
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  tests/nbd/rc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tests/nbd/rc b/tests/nbd/rc
> index 9c1c15b..266befd 100644
> --- a/tests/nbd/rc
> +++ b/tests/nbd/rc
> @@ -43,7 +43,8 @@ _have_nbd_netlink() {
>
>  _wait_for_nbd_connect() {
>         for ((i = 0; i < 3; i++)); do
> -               if [[ -e /sys/kernel/debug/nbd/nbd0/tasks ]]; then
> +               if [[ -e /sys/kernel/debug/nbd/nbd0/tasks ]] && \
> +                          nbd-client -check /dev/nbd0 &> /dev/null; then
>                         return 0
>                 fi
>                 sleep 1
> --
> 2.44.0
>
Shinichiro Kawasaki March 22, 2024, 12:23 p.m. UTC | #2
On Mar 20, 2024 / 14:12, Yi Zhang wrote:
> Hi Shinichiro
> 
> Thanks for the fix, with this change, the issue still can be
> reproduced, here is the log:
> 
> =======================98
> nbd/002 (tests on partition handling for an nbd device)      [failed]
>     runtime  1.436s  ...  0.917s
>     --- tests/nbd/002.out 2024-03-19 04:51:34.051614893 +0100
>     +++ /root/blktests/results/nodev/nbd/002.out.bad 2024-03-20
> 07:01:28.769392087 +0100
>     @@ -1,4 +1,4 @@
>      Running nbd/002
>      Testing IOCTL path
>      Testing the netlink path
>     -Test complete
>     +Didn't have partition on the netlink path

Thanks. The patch reduces the ratio of the failure, but it it done not fix
the bug completely. Without the patch, the failure happens once in a twice. With
the patch, the failure happens once in a 30 times repeats of the test case. I
will dig in further.
diff mbox series

Patch

diff --git a/tests/nbd/rc b/tests/nbd/rc
index 9c1c15b..266befd 100644
--- a/tests/nbd/rc
+++ b/tests/nbd/rc
@@ -43,7 +43,8 @@  _have_nbd_netlink() {
 
 _wait_for_nbd_connect() {
 	for ((i = 0; i < 3; i++)); do
-		if [[ -e /sys/kernel/debug/nbd/nbd0/tasks ]]; then
+		if [[ -e /sys/kernel/debug/nbd/nbd0/tasks ]] && \
+			   nbd-client -check /dev/nbd0 &> /dev/null; then
 			return 0
 		fi
 		sleep 1