Message ID | 20221130024012.2313090-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [blktests] block/017: extend IO inflight duration | expand |
On 11/29/22 18:40, Shin'ichiro Kawasaki wrote: > The test case block/017 often fails on slow test systems. When it runs > on QEMU and kernel with LOCKDEP, it fails around 50% by chance with > error message as follows: > > block/017 (do I/O and check the inflight counter) [failed] > runtime 1.715s ... 1.726s > --- tests/block/017.out 2022-11-15 15:30:51.285717678 +0900 > +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/block/017.out.bad 2022-11-25 16:23:50.778747167 +0900 > @@ -6,7 +6,7 @@ > sysfs inflight reads 1 > sysfs inflight writes 1 > sysfs stat 2 > -diskstats 2 > +diskstats 1 > sysfs inflight reads 0 > sysfs inflight writes 0 > ... > > The test case issues one read and one write to a null_blk device, and > checks that inflight counters reports correct numbers of inflight IOs. > To keep IOs inflight during test, it prepares null_blk device with > completion_nsec parameter 0.5 second. However, when test system is slow, > inflight counter check takes long time and the read completes before the > check. Hence the failure. > > To avoid the failure, extend the inflight duration of IOs. Prepare a > null_blk device without completion_nsec parameter and measure time to > check the inflight counters. Prepare null_blk device again specifying > completion_nsec parameter 0.5 seconds plus the measured time of inflight > counter check. > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> overall it looks good to me, I trust that you have tested this on the fast machine also and make sure it doesn't regress, with that said :- Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck
On Nov 30, 2022 / 05:14, Chaitanya Kulkarni wrote: > On 11/29/22 18:40, Shin'ichiro Kawasaki wrote: > > The test case block/017 often fails on slow test systems. When it runs > > on QEMU and kernel with LOCKDEP, it fails around 50% by chance with > > error message as follows: > > > > block/017 (do I/O and check the inflight counter) [failed] > > runtime 1.715s ... 1.726s > > --- tests/block/017.out 2022-11-15 15:30:51.285717678 +0900 > > +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/block/017.out.bad 2022-11-25 16:23:50.778747167 +0900 > > @@ -6,7 +6,7 @@ > > sysfs inflight reads 1 > > sysfs inflight writes 1 > > sysfs stat 2 > > -diskstats 2 > > +diskstats 1 > > sysfs inflight reads 0 > > sysfs inflight writes 0 > > ... > > > > The test case issues one read and one write to a null_blk device, and > > checks that inflight counters reports correct numbers of inflight IOs. > > To keep IOs inflight during test, it prepares null_blk device with > > completion_nsec parameter 0.5 second. However, when test system is slow, > > inflight counter check takes long time and the read completes before the > > check. Hence the failure. > > > > To avoid the failure, extend the inflight duration of IOs. Prepare a > > null_blk device without completion_nsec parameter and measure time to > > check the inflight counters. Prepare null_blk device again specifying > > completion_nsec parameter 0.5 seconds plus the measured time of inflight > > counter check. > > > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > > overall it looks good to me, I trust that you have tested this on the > fast machine also and make sure it doesn't regress, with that said :- > > Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Thanks for the review. I've applied it. Yes, I tested with multiple machines including a bare metal fast machine, and saw no regression.
diff --git a/tests/block/017 b/tests/block/017 index 8596888..59429b0 100755 --- a/tests/block/017 +++ b/tests/block/017 @@ -24,11 +24,38 @@ show_inflight() { awk '$3 == "nullb1" { print "diskstats " $12 }' /proc/diskstats } +# Measure how long it takes for inflight counter check using time command. +# Convert X.YYYs time format into integer in milliseconds. +counter_check_duration_in_millis() +{ + local show_seconds sub_second seconds + + show_seconds=$( { time show_inflight > /dev/null; } 2>&1 ) + show_seconds=${show_seconds/s/} + sub_second=${show_seconds##*.} + sub_second=$((10#${sub_second})) + seconds=${show_seconds%%.*} + echo $(( seconds * 1000 + sub_second )) +} + test() { + local io_in_millis + echo "Running ${TEST_NAME}" - if ! _configure_null_blk nullb1 irqmode=2 completion_nsec=500000000 \ - power=1; then + # Prepare null_blk to measure time to check inflight counters. IOs + # should be inflight long enough to cover counter checks twice. + if ! _configure_null_blk nullb1 irqmode=2 power=1; then + return 1 + fi + + io_in_millis=$(( 500 + $(counter_check_duration_in_millis) * 2 )) + + _exit_null_blk + + # Prepare null_blk again with desired IO inflight duration + if ! _configure_null_blk nullb1 irqmode=2 \ + completion_nsec=$((io_in_millis * 1000 * 1000)) power=1; then return 1 fi
The test case block/017 often fails on slow test systems. When it runs on QEMU and kernel with LOCKDEP, it fails around 50% by chance with error message as follows: block/017 (do I/O and check the inflight counter) [failed] runtime 1.715s ... 1.726s --- tests/block/017.out 2022-11-15 15:30:51.285717678 +0900 +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/block/017.out.bad 2022-11-25 16:23:50.778747167 +0900 @@ -6,7 +6,7 @@ sysfs inflight reads 1 sysfs inflight writes 1 sysfs stat 2 -diskstats 2 +diskstats 1 sysfs inflight reads 0 sysfs inflight writes 0 ... The test case issues one read and one write to a null_blk device, and checks that inflight counters reports correct numbers of inflight IOs. To keep IOs inflight during test, it prepares null_blk device with completion_nsec parameter 0.5 second. However, when test system is slow, inflight counter check takes long time and the read completes before the check. Hence the failure. To avoid the failure, extend the inflight duration of IOs. Prepare a null_blk device without completion_nsec parameter and measure time to check the inflight counters. Prepare null_blk device again specifying completion_nsec parameter 0.5 seconds plus the measured time of inflight counter check. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- tests/block/017 | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-)