diff mbox series

[net-next] selftests: net: give up on the cmsg_time accuracy on slow machines

Message ID 20250116020105.931338-1-kuba@kernel.org (mailing list archive)
State New
Headers show
Series [net-next] selftests: net: give up on the cmsg_time accuracy on slow machines | expand

Commit Message

Jakub Kicinski Jan. 16, 2025, 2:01 a.m. UTC
Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative
cmsg_time.sh test") widened the accepted value range 8x but we still
see flakes (at a rate of around 7%).

Return XFAIL for the most timing sensitive test on slow machines.

Before:

  # ./cmsg_time.sh
    Case UDPv4  - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK'
  FAIL - 1/36 cases failed

After:

  # ./cmsg_time.sh
    Case UDPv4  - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL)
    Case UDPv6  - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL)
  OK

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: linux-kselftest@vger.kernel.org
CC: willemdebruijn.kernel@gmail.com
---
 tools/testing/selftests/net/cmsg_time.sh | 35 +++++++++++++++++-------
 1 file changed, 25 insertions(+), 10 deletions(-)

Comments

Przemek Kitszel Jan. 16, 2025, 11:52 a.m. UTC | #1
On 1/16/25 03:01, Jakub Kicinski wrote:
> Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative
> cmsg_time.sh test") widened the accepted value range 8x but we still
> see flakes (at a rate of around 7%).

you have undid the 8x change by this commit (without mentioning that)
[fine for me]

> 
> Return XFAIL for the most timing sensitive test on slow machines.

code change looks fine for me, and does exactly that, so:
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

> 
> Before:
> 
>    # ./cmsg_time.sh
>      Case UDPv4  - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK'
>    FAIL - 1/36 cases failed
> 
> After:
> 
>    # ./cmsg_time.sh
>      Case UDPv4  - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL)
>      Case UDPv6  - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL)
>    OK
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn Jan. 16, 2025, 1 p.m. UTC | #2
Jakub Kicinski wrote:
> Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative
> cmsg_time.sh test") widened the accepted value range 8x but we still
> see flakes (at a rate of around 7%).
> 
> Return XFAIL for the most timing sensitive test on slow machines.
> 
> Before:
> 
>   # ./cmsg_time.sh
>     Case UDPv4  - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK'
>   FAIL - 1/36 cases failed
> 
> After:
> 
>   # ./cmsg_time.sh
>     Case UDPv4  - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL)
>     Case UDPv6  - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL)
>   OK
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>
Petr Machata Jan. 17, 2025, 12:49 p.m. UTC | #3
Jakub Kicinski <kuba@kernel.org> writes:

> Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative
> cmsg_time.sh test") widened the accepted value range 8x but we still
> see flakes (at a rate of around 7%).
>
> Return XFAIL for the most timing sensitive test on slow machines.
>
> Before:
>
>   # ./cmsg_time.sh
>     Case UDPv4  - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK'
>   FAIL - 1/36 cases failed
>
> After:
>
>   # ./cmsg_time.sh
>     Case UDPv4  - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL)
>     Case UDPv6  - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL)
>   OK
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: shuah@kernel.org
> CC: linux-kselftest@vger.kernel.org
> CC: willemdebruijn.kernel@gmail.com
> ---
>  tools/testing/selftests/net/cmsg_time.sh | 35 +++++++++++++++++-------
>  1 file changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/tools/testing/selftests/net/cmsg_time.sh b/tools/testing/selftests/net/cmsg_time.sh
> index 1d7e756644bc..478af0aefa97 100755
> --- a/tools/testing/selftests/net/cmsg_time.sh
> +++ b/tools/testing/selftests/net/cmsg_time.sh
> @@ -34,13 +34,28 @@ BAD=0
>  TOTAL=0
>  
>  check_result() {
> +    local ret=$1
> +    local got=$2
> +    local exp=$3
> +    local case=$4
> +    local xfail=$5
> +    local xf=
> +    local inc=
> +
> +    if [ "$xfail" == "xfail" ]; then
> +	xf="(XFAIL)"
> +	inc=0
> +    else
> +	inc=1
> +    fi
> +
>      ((TOTAL++))
> -    if [ $1 -ne 0 ]; then
> -	echo "  Case $4 returned $1, expected 0"
> -	((BAD++))
> +    if [ $ret -ne 0 ]; then
> +	echo "  Case $case returned $ret, expected 0 $xf"
> +	((BAD+=inc))
>      elif [ "$2" != "$3" ]; then
> -	echo "  Case $4 returned '$2', expected '$3'"
> -	((BAD++))
> +	echo "  Case $case returned '$got', expected '$exp' $xf"
> +	((BAD+=inc))
>      fi
>  }
>  
> @@ -66,14 +81,14 @@ for i in "-4 $TGT4" "-6 $TGT6"; do
>  		 awk '/SND/ { if ($3 > 1000) print "OK"; }')
>  	check_result $? "$ts" "OK" "$prot - TXTIME abs"
>  
> -	[ "$KSFT_MACHINE_SLOW" = yes ] && delay=8000 || delay=1000
> +	[ "$KSFT_MACHINE_SLOW" = yes ] && xfail=xfail
>  
> -	ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d $delay |
> +	ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d 1000 |
>  		 awk '/SND/ {snd=$3}
>  		      /SCHED/ {sch=$3}
> -		      END { if (snd - sch > '$((delay/2))') print "OK";
> -			    else print snd, "-", sch, "<", '$((delay/2))'; }')
> -	check_result $? "$ts" "OK" "$prot - TXTIME rel"
> +		      END { if (snd - sch > 500) print "OK";
> +			    else print snd, "-", sch, "<", 500; }')
> +	check_result $? "$ts" "OK" "$prot - TXTIME rel" $xfail
>      done
>  done

This logging and xfail handling duplicates lib.sh. Would a patch like
below be OK with you? The gist of it is to just use check_err, log_test
and xfail_on_slow from lib.sh to achieve what the test open-codes.

I can send it as a follow-up instead if you prefer.

--- a/tools/testing/selftests/net/cmsg_time.sh
+++ b/tools/testing/selftests/net/cmsg_time.sh
@@ -29,19 +29,21 @@ ip -netns $NS addr add $IP6 dev dummy0
 # Need FQ for TXTIME
 ip netns exec $NS tc qdisc replace dev dummy0 root fq
 
-# Test
-BAD=0
-TOTAL=0
-
 check_result() {
-    ((TOTAL++))
-    if [ $1 -ne 0 ]; then
-	echo "  Case $4 returned $1, expected 0"
-	((BAD++))
-    elif [ "$2" != "$3" ]; then
-	echo "  Case $4 returned '$2', expected '$3'"
-	((BAD++))
-    fi
+    local ret=$1
+    local got=$2
+    local exp=$3
+    local case=$4
+
+    RET=0
+
+    [ $1 -eq 0 ]
+    check_err $? "Case $4 returned $1, expected 0"
+
+    [ "$2" == "$3" ]
+    check_err $? "Case $4 returned '$2', expected '$3'"
+
+    log_test "$4"
 }
 
 for i in "-4 $TGT4" "-6 $TGT6"; do
@@ -73,15 +79,8 @@ for i in "-4 $TGT4" "-6 $TGT6"; do
 		      /SCHED/ {sch=$3}
 		      END { if (snd - sch > '$((delay/2))') print "OK";
 			    else print snd, "-", sch, "<", '$((delay/2))'; }')
-	check_result $? "$ts" "OK" "$prot - TXTIME rel"
+	xfail_on_slow check_result $? "$ts" "OK" "$prot - TXTIME rel"
     done
 done
 
-# Summary
-if [ $BAD -ne 0 ]; then
-    echo "FAIL - $BAD/$TOTAL cases failed"
-    exit 1
-else
-    echo "OK"
-    exit 0
-fi
+exit $EXIT_STATUS

This is much more verbose, but that's how tests tend to be. I could
change it to only log on RET != 0, but like this the custom results
block can go away and the test is overall more median.

bash-5.2# KSFT_MACHINE_SLOW=yes ./cmsg_time.sh   
TEST: UDPv4  - no options                                           [ OK ]
TEST: UDPv4  - ts cnt                                               [ OK ]
TEST: UDPv4  - ts0 SCHED                                            [ OK ]
TEST: UDPv4  - ts0 SND                                              [ OK ]
TEST: UDPv4  - TXTIME abs                                           [ OK ]
TEST: UDPv4  - TXTIME rel                                           [XFAIL]
        Case UDPv4  - TXTIME rel returned '34us - 30us < 4000', expected 'OK'
TEST: ICMPv4  - no options                                          [ OK ]
TEST: ICMPv4  - ts cnt                                              [ OK ]
[... snip ...]
bash-5.2# echo $?
0
Jakub Kicinski Jan. 17, 2025, 2:53 p.m. UTC | #4
On Fri, 17 Jan 2025 13:49:23 +0100 Petr Machata wrote:
> This logging and xfail handling duplicates lib.sh. Would a patch like
> below be OK with you? The gist of it is to just use check_err, log_test
> and xfail_on_slow from lib.sh to achieve what the test open-codes.

Hm, maybe? I'm not going to say no if you send a patch, but IMHO
if we were to change the output I think we should make it ktap.
Petr Machata Jan. 17, 2025, 3:16 p.m. UTC | #5
Jakub Kicinski <kuba@kernel.org> writes:

> On Fri, 17 Jan 2025 13:49:23 +0100 Petr Machata wrote:
>> This logging and xfail handling duplicates lib.sh. Would a patch like
>> below be OK with you? The gist of it is to just use check_err, log_test
>> and xfail_on_slow from lib.sh to achieve what the test open-codes.
>
> Hm, maybe? I'm not going to say no if you send a patch, but IMHO
> if we were to change the output I think we should make it ktap.

Hmm, at some point, we'll want to convert lib.sh to ktap as well.
Hopefully transparently so that all the existing selftests out there
just magically become ktap-compatible.

If you want to preserve the output 1:1 then I feel like that's bending
over backwards too much. We could still maybe reuse xfail_on_slow and
ask for FAIL_TO_XFAIL, but that's marked internal in lib.sh. So I
retract my proposal. Too faffy at that point.
patchwork-bot+netdevbpf@kernel.org Jan. 18, 2025, 3:50 a.m. UTC | #6
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Jan 2025 18:01:05 -0800 you wrote:
> Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative
> cmsg_time.sh test") widened the accepted value range 8x but we still
> see flakes (at a rate of around 7%).
> 
> Return XFAIL for the most timing sensitive test on slow machines.
> 
> Before:
> 
> [...]

Here is the summary with links:
  - [net-next] selftests: net: give up on the cmsg_time accuracy on slow machines
    https://git.kernel.org/netdev/net-next/c/54ea680b759c

You are awesome, thank you!
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/cmsg_time.sh b/tools/testing/selftests/net/cmsg_time.sh
index 1d7e756644bc..478af0aefa97 100755
--- a/tools/testing/selftests/net/cmsg_time.sh
+++ b/tools/testing/selftests/net/cmsg_time.sh
@@ -34,13 +34,28 @@  BAD=0
 TOTAL=0
 
 check_result() {
+    local ret=$1
+    local got=$2
+    local exp=$3
+    local case=$4
+    local xfail=$5
+    local xf=
+    local inc=
+
+    if [ "$xfail" == "xfail" ]; then
+	xf="(XFAIL)"
+	inc=0
+    else
+	inc=1
+    fi
+
     ((TOTAL++))
-    if [ $1 -ne 0 ]; then
-	echo "  Case $4 returned $1, expected 0"
-	((BAD++))
+    if [ $ret -ne 0 ]; then
+	echo "  Case $case returned $ret, expected 0 $xf"
+	((BAD+=inc))
     elif [ "$2" != "$3" ]; then
-	echo "  Case $4 returned '$2', expected '$3'"
-	((BAD++))
+	echo "  Case $case returned '$got', expected '$exp' $xf"
+	((BAD+=inc))
     fi
 }
 
@@ -66,14 +81,14 @@  for i in "-4 $TGT4" "-6 $TGT6"; do
 		 awk '/SND/ { if ($3 > 1000) print "OK"; }')
 	check_result $? "$ts" "OK" "$prot - TXTIME abs"
 
-	[ "$KSFT_MACHINE_SLOW" = yes ] && delay=8000 || delay=1000
+	[ "$KSFT_MACHINE_SLOW" = yes ] && xfail=xfail
 
-	ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d $delay |
+	ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d 1000 |
 		 awk '/SND/ {snd=$3}
 		      /SCHED/ {sch=$3}
-		      END { if (snd - sch > '$((delay/2))') print "OK";
-			    else print snd, "-", sch, "<", '$((delay/2))'; }')
-	check_result $? "$ts" "OK" "$prot - TXTIME rel"
+		      END { if (snd - sch > 500) print "OK";
+			    else print snd, "-", sch, "<", 500; }')
+	check_result $? "$ts" "OK" "$prot - TXTIME rel" $xfail
     done
 done