Message ID | 20250116020105.931338-1-kuba@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [net-next] selftests: net: give up on the cmsg_time accuracy on slow machines | expand |
On 1/16/25 03:01, Jakub Kicinski wrote: > Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative > cmsg_time.sh test") widened the accepted value range 8x but we still > see flakes (at a rate of around 7%). you have undid the 8x change by this commit (without mentioning that) [fine for me] > > Return XFAIL for the most timing sensitive test on slow machines. code change looks fine for me, and does exactly that, so: Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > > Before: > > # ./cmsg_time.sh > Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' > FAIL - 1/36 cases failed > > After: > > # ./cmsg_time.sh > Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) > Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) > OK > > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski wrote: > Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative > cmsg_time.sh test") widened the accepted value range 8x but we still > see flakes (at a rate of around 7%). > > Return XFAIL for the most timing sensitive test on slow machines. > > Before: > > # ./cmsg_time.sh > Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' > FAIL - 1/36 cases failed > > After: > > # ./cmsg_time.sh > Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) > Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) > OK > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com>
Jakub Kicinski <kuba@kernel.org> writes: > Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative > cmsg_time.sh test") widened the accepted value range 8x but we still > see flakes (at a rate of around 7%). > > Return XFAIL for the most timing sensitive test on slow machines. > > Before: > > # ./cmsg_time.sh > Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' > FAIL - 1/36 cases failed > > After: > > # ./cmsg_time.sh > Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) > Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) > OK > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > CC: shuah@kernel.org > CC: linux-kselftest@vger.kernel.org > CC: willemdebruijn.kernel@gmail.com > --- > tools/testing/selftests/net/cmsg_time.sh | 35 +++++++++++++++++------- > 1 file changed, 25 insertions(+), 10 deletions(-) > > diff --git a/tools/testing/selftests/net/cmsg_time.sh b/tools/testing/selftests/net/cmsg_time.sh > index 1d7e756644bc..478af0aefa97 100755 > --- a/tools/testing/selftests/net/cmsg_time.sh > +++ b/tools/testing/selftests/net/cmsg_time.sh > @@ -34,13 +34,28 @@ BAD=0 > TOTAL=0 > > check_result() { > + local ret=$1 > + local got=$2 > + local exp=$3 > + local case=$4 > + local xfail=$5 > + local xf= > + local inc= > + > + if [ "$xfail" == "xfail" ]; then > + xf="(XFAIL)" > + inc=0 > + else > + inc=1 > + fi > + > ((TOTAL++)) > - if [ $1 -ne 0 ]; then > - echo " Case $4 returned $1, expected 0" > - ((BAD++)) > + if [ $ret -ne 0 ]; then > + echo " Case $case returned $ret, expected 0 $xf" > + ((BAD+=inc)) > elif [ "$2" != "$3" ]; then > - echo " Case $4 returned '$2', expected '$3'" > - ((BAD++)) > + echo " Case $case returned '$got', expected '$exp' $xf" > + ((BAD+=inc)) > fi > } > > @@ -66,14 +81,14 @@ for i in "-4 $TGT4" "-6 $TGT6"; do > awk '/SND/ { if ($3 > 1000) print "OK"; }') > check_result $? "$ts" "OK" "$prot - TXTIME abs" > > - [ "$KSFT_MACHINE_SLOW" = yes ] && delay=8000 || delay=1000 > + [ "$KSFT_MACHINE_SLOW" = yes ] && xfail=xfail > > - ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d $delay | > + ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d 1000 | > awk '/SND/ {snd=$3} > /SCHED/ {sch=$3} > - END { if (snd - sch > '$((delay/2))') print "OK"; > - else print snd, "-", sch, "<", '$((delay/2))'; }') > - check_result $? "$ts" "OK" "$prot - TXTIME rel" > + END { if (snd - sch > 500) print "OK"; > + else print snd, "-", sch, "<", 500; }') > + check_result $? "$ts" "OK" "$prot - TXTIME rel" $xfail > done > done This logging and xfail handling duplicates lib.sh. Would a patch like below be OK with you? The gist of it is to just use check_err, log_test and xfail_on_slow from lib.sh to achieve what the test open-codes. I can send it as a follow-up instead if you prefer. --- a/tools/testing/selftests/net/cmsg_time.sh +++ b/tools/testing/selftests/net/cmsg_time.sh @@ -29,19 +29,21 @@ ip -netns $NS addr add $IP6 dev dummy0 # Need FQ for TXTIME ip netns exec $NS tc qdisc replace dev dummy0 root fq -# Test -BAD=0 -TOTAL=0 - check_result() { - ((TOTAL++)) - if [ $1 -ne 0 ]; then - echo " Case $4 returned $1, expected 0" - ((BAD++)) - elif [ "$2" != "$3" ]; then - echo " Case $4 returned '$2', expected '$3'" - ((BAD++)) - fi + local ret=$1 + local got=$2 + local exp=$3 + local case=$4 + + RET=0 + + [ $1 -eq 0 ] + check_err $? "Case $4 returned $1, expected 0" + + [ "$2" == "$3" ] + check_err $? "Case $4 returned '$2', expected '$3'" + + log_test "$4" } for i in "-4 $TGT4" "-6 $TGT6"; do @@ -73,15 +79,8 @@ for i in "-4 $TGT4" "-6 $TGT6"; do /SCHED/ {sch=$3} END { if (snd - sch > '$((delay/2))') print "OK"; else print snd, "-", sch, "<", '$((delay/2))'; }') - check_result $? "$ts" "OK" "$prot - TXTIME rel" + xfail_on_slow check_result $? "$ts" "OK" "$prot - TXTIME rel" done done -# Summary -if [ $BAD -ne 0 ]; then - echo "FAIL - $BAD/$TOTAL cases failed" - exit 1 -else - echo "OK" - exit 0 -fi +exit $EXIT_STATUS This is much more verbose, but that's how tests tend to be. I could change it to only log on RET != 0, but like this the custom results block can go away and the test is overall more median. bash-5.2# KSFT_MACHINE_SLOW=yes ./cmsg_time.sh TEST: UDPv4 - no options [ OK ] TEST: UDPv4 - ts cnt [ OK ] TEST: UDPv4 - ts0 SCHED [ OK ] TEST: UDPv4 - ts0 SND [ OK ] TEST: UDPv4 - TXTIME abs [ OK ] TEST: UDPv4 - TXTIME rel [XFAIL] Case UDPv4 - TXTIME rel returned '34us - 30us < 4000', expected 'OK' TEST: ICMPv4 - no options [ OK ] TEST: ICMPv4 - ts cnt [ OK ] [... snip ...] bash-5.2# echo $? 0
On Fri, 17 Jan 2025 13:49:23 +0100 Petr Machata wrote: > This logging and xfail handling duplicates lib.sh. Would a patch like > below be OK with you? The gist of it is to just use check_err, log_test > and xfail_on_slow from lib.sh to achieve what the test open-codes. Hm, maybe? I'm not going to say no if you send a patch, but IMHO if we were to change the output I think we should make it ktap.
Jakub Kicinski <kuba@kernel.org> writes: > On Fri, 17 Jan 2025 13:49:23 +0100 Petr Machata wrote: >> This logging and xfail handling duplicates lib.sh. Would a patch like >> below be OK with you? The gist of it is to just use check_err, log_test >> and xfail_on_slow from lib.sh to achieve what the test open-codes. > > Hm, maybe? I'm not going to say no if you send a patch, but IMHO > if we were to change the output I think we should make it ktap. Hmm, at some point, we'll want to convert lib.sh to ktap as well. Hopefully transparently so that all the existing selftests out there just magically become ktap-compatible. If you want to preserve the output 1:1 then I feel like that's bending over backwards too much. We could still maybe reuse xfail_on_slow and ask for FAIL_TO_XFAIL, but that's marked internal in lib.sh. So I retract my proposal. Too faffy at that point.
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Wed, 15 Jan 2025 18:01:05 -0800 you wrote: > Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative > cmsg_time.sh test") widened the accepted value range 8x but we still > see flakes (at a rate of around 7%). > > Return XFAIL for the most timing sensitive test on slow machines. > > Before: > > [...] Here is the summary with links: - [net-next] selftests: net: give up on the cmsg_time accuracy on slow machines https://git.kernel.org/netdev/net-next/c/54ea680b759c You are awesome, thank you!
diff --git a/tools/testing/selftests/net/cmsg_time.sh b/tools/testing/selftests/net/cmsg_time.sh index 1d7e756644bc..478af0aefa97 100755 --- a/tools/testing/selftests/net/cmsg_time.sh +++ b/tools/testing/selftests/net/cmsg_time.sh @@ -34,13 +34,28 @@ BAD=0 TOTAL=0 check_result() { + local ret=$1 + local got=$2 + local exp=$3 + local case=$4 + local xfail=$5 + local xf= + local inc= + + if [ "$xfail" == "xfail" ]; then + xf="(XFAIL)" + inc=0 + else + inc=1 + fi + ((TOTAL++)) - if [ $1 -ne 0 ]; then - echo " Case $4 returned $1, expected 0" - ((BAD++)) + if [ $ret -ne 0 ]; then + echo " Case $case returned $ret, expected 0 $xf" + ((BAD+=inc)) elif [ "$2" != "$3" ]; then - echo " Case $4 returned '$2', expected '$3'" - ((BAD++)) + echo " Case $case returned '$got', expected '$exp' $xf" + ((BAD+=inc)) fi } @@ -66,14 +81,14 @@ for i in "-4 $TGT4" "-6 $TGT6"; do awk '/SND/ { if ($3 > 1000) print "OK"; }') check_result $? "$ts" "OK" "$prot - TXTIME abs" - [ "$KSFT_MACHINE_SLOW" = yes ] && delay=8000 || delay=1000 + [ "$KSFT_MACHINE_SLOW" = yes ] && xfail=xfail - ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d $delay | + ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d 1000 | awk '/SND/ {snd=$3} /SCHED/ {sch=$3} - END { if (snd - sch > '$((delay/2))') print "OK"; - else print snd, "-", sch, "<", '$((delay/2))'; }') - check_result $? "$ts" "OK" "$prot - TXTIME rel" + END { if (snd - sch > 500) print "OK"; + else print snd, "-", sch, "<", 500; }') + check_result $? "$ts" "OK" "$prot - TXTIME rel" $xfail done done
Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative cmsg_time.sh test") widened the accepted value range 8x but we still see flakes (at a rate of around 7%). Return XFAIL for the most timing sensitive test on slow machines. Before: # ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' FAIL - 1/36 cases failed After: # ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) OK Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- CC: shuah@kernel.org CC: linux-kselftest@vger.kernel.org CC: willemdebruijn.kernel@gmail.com --- tools/testing/selftests/net/cmsg_time.sh | 35 +++++++++++++++++------- 1 file changed, 25 insertions(+), 10 deletions(-)