Message ID | 9377610cbdc3568c172cd7c5d2e9d36da8dd2cf4.1716312272.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blktests: fix how we handle waiting for nbd to connect | expand |
On May 21, 2024 / 13:24, Josef Bacik wrote: > Because NBD has the old style configure the device directly config we > sometimes have spurious failures where the device wasn't quite ready > before the rest of the test continued. > > nbd/002 had been using _wait_for_nbd_connect, however this helper waits > for the recv task to show up, which actually happens slightly before the > size is set and we're actually ready to be read from. This means we > would sometimes fail nbd/002 because the device wasn't quite right. > > Additionally nbd/001 has a similar issue where we weren't waiting for > the device to be ready before going ahead with the test, which would > cause spurious failures. > > Fix this by adjusting _wait_for_nbd_connect to only exit once the size > for the device is being reported properly, meaning that it's ready to be > read from. > > Then add this call to nbd/001 to eliminate the random failures we would > see with this test. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Josef, thank you very much. I've been seeing nbd/001 on my test system and this patch avoids it :) The change looks good to me. I will leave it for several days before applying it, in case anyone has some more comments.
On May 22, 2024 / 07:02, Shinichiro Kawasaki wrote: > On May 21, 2024 / 13:24, Josef Bacik wrote: > > Because NBD has the old style configure the device directly config we > > sometimes have spurious failures where the device wasn't quite ready > > before the rest of the test continued. > > > > nbd/002 had been using _wait_for_nbd_connect, however this helper waits > > for the recv task to show up, which actually happens slightly before the > > size is set and we're actually ready to be read from. This means we > > would sometimes fail nbd/002 because the device wasn't quite right. > > > > Additionally nbd/001 has a similar issue where we weren't waiting for > > the device to be ready before going ahead with the test, which would > > cause spurious failures. > > > > Fix this by adjusting _wait_for_nbd_connect to only exit once the size > > for the device is being reported properly, meaning that it's ready to be > > read from. > > > > Then add this call to nbd/001 to eliminate the random failures we would > > see with this test. > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > Josef, thank you very much. I've been seeing nbd/001 on my test system and this > patch avoids it :) The change looks good to me. I will leave it for several > days before applying it, in case anyone has some more comments. I've applied it. Thanks!
On May 27, 2024 / 07:49, Shinichiro Kawasaki wrote: > On May 22, 2024 / 07:02, Shinichiro Kawasaki wrote: > > On May 21, 2024 / 13:24, Josef Bacik wrote: > > > Because NBD has the old style configure the device directly config we > > > sometimes have spurious failures where the device wasn't quite ready > > > before the rest of the test continued. > > > > > > nbd/002 had been using _wait_for_nbd_connect, however this helper waits > > > for the recv task to show up, which actually happens slightly before the > > > size is set and we're actually ready to be read from. This means we > > > would sometimes fail nbd/002 because the device wasn't quite right. > > > > > > Additionally nbd/001 has a similar issue where we weren't waiting for > > > the device to be ready before going ahead with the test, which would > > > cause spurious failures. > > > > > > Fix this by adjusting _wait_for_nbd_connect to only exit once the size > > > for the device is being reported properly, meaning that it's ready to be > > > read from. > > > > > > Then add this call to nbd/001 to eliminate the random failures we would > > > see with this test. > > > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > > > Josef, thank you very much. I've been seeing nbd/001 on my test system and this > > patch avoids it :) The change looks good to me. I will leave it for several > > days before applying it, in case anyone has some more comments. > > I've applied it. Thanks! To: Yi Zhang Yi, I guess this fix may avoid the nbd/002 failure that CKI reports recently [*]. To apply the fix, could you rebase the blktests for CKI runs? [*] https://datawarehouse.cki-project.org/kcidb/tests/12631448
On Fri, May 31, 2024 at 9:47 AM Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> wrote: > > On May 27, 2024 / 07:49, Shinichiro Kawasaki wrote: > > On May 22, 2024 / 07:02, Shinichiro Kawasaki wrote: > > > On May 21, 2024 / 13:24, Josef Bacik wrote: > > > > Because NBD has the old style configure the device directly config we > > > > sometimes have spurious failures where the device wasn't quite ready > > > > before the rest of the test continued. > > > > > > > > nbd/002 had been using _wait_for_nbd_connect, however this helper waits > > > > for the recv task to show up, which actually happens slightly before the > > > > size is set and we're actually ready to be read from. This means we > > > > would sometimes fail nbd/002 because the device wasn't quite right. > > > > > > > > Additionally nbd/001 has a similar issue where we weren't waiting for > > > > the device to be ready before going ahead with the test, which would > > > > cause spurious failures. > > > > > > > > Fix this by adjusting _wait_for_nbd_connect to only exit once the size > > > > for the device is being reported properly, meaning that it's ready to be > > > > read from. > > > > > > > > Then add this call to nbd/001 to eliminate the random failures we would > > > > see with this test. > > > > > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > > > > > Josef, thank you very much. I've been seeing nbd/001 on my test system and this > > > patch avoids it :) The change looks good to me. I will leave it for several > > > days before applying it, in case anyone has some more comments. > > > > I've applied it. Thanks! > > To: Yi Zhang > > Yi, I guess this fix may avoid the nbd/002 failure that CKI reports recently > [*]. To apply the fix, could you rebase the blktests for CKI runs? Sure, will apply this patch and verify the failure. > > [*] https://datawarehouse.cki-project.org/kcidb/tests/12631448 >
diff --git a/tests/nbd/001 b/tests/nbd/001 index 5fd0d43..0975af0 100755 --- a/tests/nbd/001 +++ b/tests/nbd/001 @@ -18,6 +18,7 @@ test() { echo "Running ${TEST_NAME}" _start_nbd_server nbd-client -L -N export localhost /dev/nbd0 >> "$FULL" 2>&1 + _wait_for_nbd_connect udevadm settle parted -s /dev/nbd0 print 2>> "$FULL" | grep 'Disk /dev/nbd0' diff --git a/tests/nbd/rc b/tests/nbd/rc index 9c1c15b..e96dc61 100644 --- a/tests/nbd/rc +++ b/tests/nbd/rc @@ -43,7 +43,8 @@ _have_nbd_netlink() { _wait_for_nbd_connect() { for ((i = 0; i < 3; i++)); do - if [[ -e /sys/kernel/debug/nbd/nbd0/tasks ]]; then + sz=$(lsblk --raw --noheadings -o SIZE /dev/nbd0) + if [ "$sz" != "0B" ]; then return 0 fi sleep 1
Because NBD has the old style configure the device directly config we sometimes have spurious failures where the device wasn't quite ready before the rest of the test continued. nbd/002 had been using _wait_for_nbd_connect, however this helper waits for the recv task to show up, which actually happens slightly before the size is set and we're actually ready to be read from. This means we would sometimes fail nbd/002 because the device wasn't quite right. Additionally nbd/001 has a similar issue where we weren't waiting for the device to be ready before going ahead with the test, which would cause spurious failures. Fix this by adjusting _wait_for_nbd_connect to only exit once the size for the device is being reported properly, meaning that it's ready to be read from. Then add this call to nbd/001 to eliminate the random failures we would see with this test. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- tests/nbd/001 | 1 + tests/nbd/rc | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-)