diff mbox series

[net-next] selftests/net: packetdrill: make tcp buf limited timing tests benign

Message ID 20250115232129.845884-1-kuba@kernel.org (mailing list archive)
State New
Headers show
Series [net-next] selftests/net: packetdrill: make tcp buf limited timing tests benign | expand

Commit Message

Jakub Kicinski Jan. 15, 2025, 11:21 p.m. UTC
The following tests are failing on debug kernels:

  tcp_tcp_info_tcp-info-rwnd-limited.pkt
  tcp_tcp_info_tcp-info-sndbuf-limited.pkt

with reports like:

      assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \
  AssertionError: 18000

and:

      assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time
  AssertionError: 362000

Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign
debug flakes as xfail") to cover them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: willemb@google.com
CC: matttbe@kernel.org
CC: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/net/packetdrill/ksft_runner.sh | 1 +
 1 file changed, 1 insertion(+)

Comments

Willem de Bruijn Jan. 16, 2025, 1:05 p.m. UTC | #1
Jakub Kicinski wrote:
> The following tests are failing on debug kernels:
> 
>   tcp_tcp_info_tcp-info-rwnd-limited.pkt
>   tcp_tcp_info_tcp-info-sndbuf-limited.pkt
> 
> with reports like:
> 
>       assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \
>   AssertionError: 18000
> 
> and:
> 
>       assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time
>   AssertionError: 362000
> 
> Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign
> debug flakes as xfail") to cover them.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

Thanks.

I see that we'll still have a few flakes on dbg. Perhaps one total
failure a day. From the following.

tcp-close-close-local-close-then-remote-fin-pkt
tcp-ecn-ecn-uses-ect0-pkt
tcp-eor-no-coalesce-retrans-pkt
tcp-slow-start-slow-start-after-win-update-pkt
tcp-sack-sack-route-refresh-ip-tos-pkt
tcp-ts-recent-reset-tsval-pkt
tcp-zerocopy-closed-pkt

We'll take a look after this change whether we can make these
more resilient. But likely also allow-list or even xfail for
everything in dbg.
Jakub Kicinski Jan. 16, 2025, 2:58 p.m. UTC | #2
On Thu, 16 Jan 2025 08:05:57 -0500 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > The following tests are failing on debug kernels:
> > 
> >   tcp_tcp_info_tcp-info-rwnd-limited.pkt
> >   tcp_tcp_info_tcp-info-sndbuf-limited.pkt
> > 
> > with reports like:
> > 
> >       assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \
> >   AssertionError: 18000
> > 
> > and:
> > 
> >       assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time
> >   AssertionError: 362000
> > 
> > Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign
> > debug flakes as xfail") to cover them.
> > 
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> 
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> 
> Thanks.
> 
> I see that we'll still have a few flakes on dbg. Perhaps one total
> failure a day. From the following.
> 
> tcp-close-close-local-close-then-remote-fin-pkt
> tcp-ecn-ecn-uses-ect0-pkt
> tcp-eor-no-coalesce-retrans-pkt
> tcp-slow-start-slow-start-after-win-update-pkt

Argh, I missed the two above, I had the ignored cases filtered out 
when I was looking :(

> tcp-sack-sack-route-refresh-ip-tos-pkt
> tcp-ts-recent-reset-tsval-pkt
> tcp-zerocopy-closed-pkt
> 
> We'll take a look after this change whether we can make these
> more resilient. But likely also allow-list or even xfail for
> everything in dbg.

Okay.
patchwork-bot+netdevbpf@kernel.org Jan. 17, 2025, 1:40 a.m. UTC | #3
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Jan 2025 15:21:29 -0800 you wrote:
> The following tests are failing on debug kernels:
> 
>   tcp_tcp_info_tcp-info-rwnd-limited.pkt
>   tcp_tcp_info_tcp-info-sndbuf-limited.pkt
> 
> with reports like:
> 
> [...]

Here is the summary with links:
  - [net-next] selftests/net: packetdrill: make tcp buf limited timing tests benign
    https://git.kernel.org/netdev/net-next/c/3030e3d57ba8

You are awesome, thank you!
Matthieu Baerts Jan. 17, 2025, 11:21 a.m. UTC | #4
Hi Willem, Jakub,

On 16/01/2025 14:05, Willem de Bruijn wrote:
> Jakub Kicinski wrote:
>> The following tests are failing on debug kernels:
>>
>>   tcp_tcp_info_tcp-info-rwnd-limited.pkt
>>   tcp_tcp_info_tcp-info-sndbuf-limited.pkt

(...)

> We'll take a look after this change whether we can make these
> more resilient. But likely also allow-list or even xfail for
> everything in dbg.

On MPTCP side, I spent quite a bit of time trying to improve the
situation on debug kernels. Sure it feels good and reassuring to have
spent this time understanding the instabilities. Most issues were due to
spurious retransmissions, because Packetdrill was "too slow" to inject
replies: so more like an issue in the tests. But I don't know if having
these tests running in such slow environments helped to find bugs
directly, e.g. catching unexpected packets. Maybe once? But at what cost?

Still it is good to run them on debug kernels to have extra
verifications on the kernel side. As Ido mentioned last summer, perhaps
we can ignore the test results, but keep logging them, and only look at
the kernel warnings?

So yes, I agree with Willem: if that cannot easily be fixed, ignoring
packetdrill err code for everything in debug sounds like the right
direction.

Cheers,
Matt
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
index ff989c325eef..e15c43b7359b 100755
--- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh
+++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
@@ -43,6 +43,7 @@  if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then
 		"tcp_timestamping.*.pkt"
 		"tcp_user_timeout_user-timeout-probe.pkt"
 		"tcp_zerocopy_epoll_.*.pkt"
+		"tcp_tcp_info_tcp-info-*-limited.pkt"
 	)
 	readonly xfail_regex="^($(printf '%s|' "${xfail_list[@]}"))$"
 	[[ "$script" =~ ${xfail_regex} ]] && failfunc=ktap_test_xfail