diff mbox series

[blktests] block/017: extend IO inflight duration

Message ID 20221130024012.2313090-1-shinichiro.kawasaki@wdc.com (mailing list archive)
State New, archived
Headers show
Series [blktests] block/017: extend IO inflight duration | expand

Commit Message

Shin'ichiro Kawasaki Nov. 30, 2022, 2:40 a.m. UTC
The test case block/017 often fails on slow test systems. When it runs
on QEMU and kernel with LOCKDEP, it fails around 50% by chance with
error message as follows:

block/017 (do I/O and check the inflight counter)            [failed]
    runtime  1.715s  ...  1.726s
    --- tests/block/017.out     2022-11-15 15:30:51.285717678 +0900
    +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/block/017.out.bad   2022-11-25 16:23:50.778747167 +0900
    @@ -6,7 +6,7 @@
     sysfs inflight reads 1
     sysfs inflight writes 1
     sysfs stat 2
    -diskstats 2
    +diskstats 1
     sysfs inflight reads 0
     sysfs inflight writes 0
    ...

The test case issues one read and one write to a null_blk device, and
checks that inflight counters reports correct numbers of inflight IOs.
To keep IOs inflight during test, it prepares null_blk device with
completion_nsec parameter 0.5 second. However, when test system is slow,
inflight counter check takes long time and the read completes before the
check. Hence the failure.

To avoid the failure, extend the inflight duration of IOs. Prepare a
null_blk device without completion_nsec parameter and measure time to
check the inflight counters. Prepare null_blk device again specifying
completion_nsec parameter 0.5 seconds plus the measured time of inflight
counter check.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 tests/block/017 | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

Comments

Chaitanya Kulkarni Nov. 30, 2022, 5:14 a.m. UTC | #1
On 11/29/22 18:40, Shin'ichiro Kawasaki wrote:
> The test case block/017 often fails on slow test systems. When it runs
> on QEMU and kernel with LOCKDEP, it fails around 50% by chance with
> error message as follows:
> 
> block/017 (do I/O and check the inflight counter)            [failed]
>      runtime  1.715s  ...  1.726s
>      --- tests/block/017.out     2022-11-15 15:30:51.285717678 +0900
>      +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/block/017.out.bad   2022-11-25 16:23:50.778747167 +0900
>      @@ -6,7 +6,7 @@
>       sysfs inflight reads 1
>       sysfs inflight writes 1
>       sysfs stat 2
>      -diskstats 2
>      +diskstats 1
>       sysfs inflight reads 0
>       sysfs inflight writes 0
>      ...
> 
> The test case issues one read and one write to a null_blk device, and
> checks that inflight counters reports correct numbers of inflight IOs.
> To keep IOs inflight during test, it prepares null_blk device with
> completion_nsec parameter 0.5 second. However, when test system is slow,
> inflight counter check takes long time and the read completes before the
> check. Hence the failure.
> 
> To avoid the failure, extend the inflight duration of IOs. Prepare a
> null_blk device without completion_nsec parameter and measure time to
> check the inflight counters. Prepare null_blk device again specifying
> completion_nsec parameter 0.5 seconds plus the measured time of inflight
> counter check.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

overall it looks good to me, I trust that you have tested this on the
fast machine also and make sure it doesn't regress, with that said :-

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck
Shin'ichiro Kawasaki Dec. 5, 2022, 2:09 a.m. UTC | #2
On Nov 30, 2022 / 05:14, Chaitanya Kulkarni wrote:
> On 11/29/22 18:40, Shin'ichiro Kawasaki wrote:
> > The test case block/017 often fails on slow test systems. When it runs
> > on QEMU and kernel with LOCKDEP, it fails around 50% by chance with
> > error message as follows:
> > 
> > block/017 (do I/O and check the inflight counter)            [failed]
> >      runtime  1.715s  ...  1.726s
> >      --- tests/block/017.out     2022-11-15 15:30:51.285717678 +0900
> >      +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/block/017.out.bad   2022-11-25 16:23:50.778747167 +0900
> >      @@ -6,7 +6,7 @@
> >       sysfs inflight reads 1
> >       sysfs inflight writes 1
> >       sysfs stat 2
> >      -diskstats 2
> >      +diskstats 1
> >       sysfs inflight reads 0
> >       sysfs inflight writes 0
> >      ...
> > 
> > The test case issues one read and one write to a null_blk device, and
> > checks that inflight counters reports correct numbers of inflight IOs.
> > To keep IOs inflight during test, it prepares null_blk device with
> > completion_nsec parameter 0.5 second. However, when test system is slow,
> > inflight counter check takes long time and the read completes before the
> > check. Hence the failure.
> > 
> > To avoid the failure, extend the inflight duration of IOs. Prepare a
> > null_blk device without completion_nsec parameter and measure time to
> > check the inflight counters. Prepare null_blk device again specifying
> > completion_nsec parameter 0.5 seconds plus the measured time of inflight
> > counter check.
> > 
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> 
> overall it looks good to me, I trust that you have tested this on the
> fast machine also and make sure it doesn't regress, with that said :-
> 
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

Thanks for the review. I've applied it. Yes, I tested with multiple machines
including a bare metal fast machine, and saw no regression.
diff mbox series

Patch

diff --git a/tests/block/017 b/tests/block/017
index 8596888..59429b0 100755
--- a/tests/block/017
+++ b/tests/block/017
@@ -24,11 +24,38 @@  show_inflight() {
 	awk '$3 == "nullb1" { print "diskstats " $12 }' /proc/diskstats
 }
 
+# Measure how long it takes for inflight counter check using time command.
+# Convert X.YYYs time format into integer in milliseconds.
+counter_check_duration_in_millis()
+{
+	local show_seconds sub_second seconds
+
+	show_seconds=$( { time show_inflight > /dev/null; } 2>&1 )
+	show_seconds=${show_seconds/s/}
+	sub_second=${show_seconds##*.}
+	sub_second=$((10#${sub_second}))
+	seconds=${show_seconds%%.*}
+	echo $(( seconds * 1000 + sub_second ))
+}
+
 test() {
+	local io_in_millis
+
 	echo "Running ${TEST_NAME}"
 
-	if ! _configure_null_blk nullb1 irqmode=2 completion_nsec=500000000 \
-			power=1; then
+	# Prepare null_blk to measure time to check inflight counters. IOs
+	# should be inflight long enough to cover counter checks twice.
+	if ! _configure_null_blk nullb1 irqmode=2 power=1; then
+		return 1
+	fi
+
+	io_in_millis=$(( 500 + $(counter_check_duration_in_millis) * 2 ))
+
+	_exit_null_blk
+
+	# Prepare null_blk again with desired IO inflight duration
+	if ! _configure_null_blk nullb1 irqmode=2 \
+	     completion_nsec=$((io_in_millis * 1000 * 1000)) power=1; then
 		return 1
 	fi