diff mbox series

[net] selftests/net: big_tcp: longer netperf session on slow machines

Message ID bd55c0d5a90b35f7eeee6d132e950ca338ea1d67.1739895412.git.pablmart@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] selftests/net: big_tcp: longer netperf session on slow machines | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers warning 1 maintainers not CCed: linux-kselftest@vger.kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff fail author Signed-off-by missing
netdev/deprecated_api success None detected
netdev/check_selftest success net selftest script(s) already in Makefile
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 46 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-18--21-00 (tests: 891)

Commit Message

Pablo Martin Medrano Feb. 18, 2025, 4:19 p.m. UTC
After debugging the following output for big_tcp.sh on a board:

CLI GSO | GW GRO | GW GSO | SER GRO
on        on       on       on      : [PASS]
on        off      on       off     : [PASS]
off       on       on       on      : [FAIL_on_link1]
on        on       off      on      : [FAIL_on_link1]

Davide Caratti found that by default the test duration 1s is too short
in slow systems to reach the correct cwd size necessary for tcp/ip to
generate at least one packet bigger than 65536 (matching the iptables
match on length rule the test evaluates)

This skips (with xfail) the aforementioned failing combinations when
KSFT_MACHINE_SLOW is set. For that the test has been modified to use
facilities from net/lib.sh.

The new output for the test will look like this (example with a forced
XFAIL)

Testing for BIG TCP:
      CLI GSO | GW GRO | GW GSO | SER GRO
TEST: on        on       on       on                    [ OK ]
TEST: on        off      on       off                   [ OK ]
TEST: off       on       on       on                    [XFAIL]
---
 tools/testing/selftests/net/big_tcp.sh | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

Comments

Jakub Kicinski Feb. 21, 2025, 12:54 a.m. UTC | #1
On Tue, 18 Feb 2025 17:19:28 +0100 Pablo Martin Medrano wrote:
> After debugging the following output for big_tcp.sh on a board:
> 
> CLI GSO | GW GRO | GW GSO | SER GRO
> on        on       on       on      : [PASS]
> on        off      on       off     : [PASS]
> off       on       on       on      : [FAIL_on_link1]
> on        on       off      on      : [FAIL_on_link1]
> 
> Davide Caratti found that by default the test duration 1s is too short
> in slow systems to reach the correct cwd size necessary for tcp/ip to
> generate at least one packet bigger than 65536 (matching the iptables
> match on length rule the test evaluates)

Why not increase the test duration then?
Paolo Abeni Feb. 21, 2025, 9:14 a.m. UTC | #2
On 2/21/25 1:54 AM, Jakub Kicinski wrote:
> On Tue, 18 Feb 2025 17:19:28 +0100 Pablo Martin Medrano wrote:
>> After debugging the following output for big_tcp.sh on a board:
>>
>> CLI GSO | GW GRO | GW GSO | SER GRO
>> on        on       on       on      : [PASS]
>> on        off      on       off     : [PASS]
>> off       on       on       on      : [FAIL_on_link1]
>> on        on       off      on      : [FAIL_on_link1]
>>
>> Davide Caratti found that by default the test duration 1s is too short
>> in slow systems to reach the correct cwd size necessary for tcp/ip to
>> generate at least one packet bigger than 65536 (matching the iptables
>> match on length rule the test evaluates)
> 
> Why not increase the test duration then?

I gave this guidance, as with arbitrary slow machines we would need very
long runtime. Similarly to the packetdril tests, instead of increasing
the allowed time, simply allow xfail on KSFT_MACHINE_SLOW.

Cheers,

Paolo
Pablo Martin Medrano Feb. 21, 2025, 10:14 a.m. UTC | #3
On Fri, 21 Feb 2025, Paolo Abeni wrote:
> On 2/21/25 1:54 AM, Jakub Kicinski wrote:
>> Why not increase the test duration then?
>
> I gave this guidance, as with arbitrary slow machines we would need very
> long runtime. Similarly to the packetdril tests, instead of increasing
> the allowed time, simply allow xfail on KSFT_MACHINE_SLOW.

I have resubmitted a properly versioned and tagged patch (and with the 
right title as indeed it does not increase the netperf session duration) at:

https://lore.kernel.org/netdev/23340252eb7bbc1547f5e873be7804adbd7ad092.1739983848.git.pablmart@redhat.com/

In that patch the Fixes: commit, found by Paolo, was when the duration 
moved from the netperf default (10 seconds) to 1 second. As he mentions 
even with 10 seconds it is not guaranteed that in slow systems and/or 
under load the test will not fail, hence the skip/xfail
Jakub Kicinski Feb. 21, 2025, 10:44 p.m. UTC | #4
On Fri, 21 Feb 2025 10:14:35 +0100 Paolo Abeni wrote:
> >> Davide Caratti found that by default the test duration 1s is too short
> >> in slow systems to reach the correct cwd size necessary for tcp/ip to
> >> generate at least one packet bigger than 65536 (matching the iptables
> >> match on length rule the test evaluates)  
> > 
> > Why not increase the test duration then?  
> 
> I gave this guidance, as with arbitrary slow machines we would need very
> long runtime. Similarly to the packetdril tests, instead of increasing
> the allowed time, simply allow xfail on KSFT_MACHINE_SLOW.

Hm. Wouldn't we ideally specify the flow length in bytes? Instead of
giving all machines 1 sec, ask to transfer ${TDB number of bytes} and
on fast machines it will complete in 1 sec, on slower machines take
longer but have a good chance of still growing the windows?
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
index 2db9d15cd45f..dc2ecfd58961 100755
--- a/tools/testing/selftests/net/big_tcp.sh
+++ b/tools/testing/selftests/net/big_tcp.sh
@@ -21,8 +21,7 @@  CLIENT_GW6="2001:db8:1::2"
 MAX_SIZE=128000
 CHK_SIZE=65535
 
-# Kselftest framework requirement - SKIP code is 4.
-ksft_skip=4
+source lib.sh
 
 setup() {
 	ip netns add $CLIENT_NS
@@ -143,21 +142,20 @@  do_test() {
 	start_counter link3 $SERVER_NS
 	do_netperf $CLIENT_NS
 
-	if check_counter link1 $ROUTER_NS; then
-		check_counter link3 $SERVER_NS || ret="FAIL_on_link3"
-	else
-		ret="FAIL_on_link1"
-	fi
+	check_counter link1 $ROUTER_NS
+	check_err $? "fail on link1"
+	check_counter link3 $SERVER_NS
+	check_err $? "fail on link3"
 
 	stop_counter link1 $ROUTER_NS
 	stop_counter link3 $SERVER_NS
-	printf "%-9s %-8s %-8s %-8s: [%s]\n" \
-		$cli_tso $gw_gro $gw_tso $ser_gro $ret
+	log_test "$(printf "%-9s %-8s %-8s %-8s" \
+			$cli_tso $gw_gro $gw_tso $ser_gro)"
 	test $ret = "PASS"
 }
 
 testup() {
-	echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
+	echo "      CLI GSO | GW GRO | GW GSO | SER GRO" && \
 	do_test "on"  "on"  "on"  "on"  && \
 	do_test "on"  "off" "on"  "off" && \
 	do_test "off" "on"  "on"  "on"  && \
@@ -176,7 +174,8 @@  if ! ip link help 2>&1 | grep gso_ipv4_max_size &> /dev/null; then
 fi
 
 trap cleanup EXIT
+xfail_on_slow
 setup && echo "Testing for BIG TCP:" && \
 NF=4 testup && echo "***v4 Tests Done***" && \
 NF=6 testup && echo "***v6 Tests Done***"
-exit $?
+exit $EXIT_STATUS