Message ID | 20250115232129.845884-1-kuba@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [net-next] selftests/net: packetdrill: make tcp buf limited timing tests benign | expand |
Jakub Kicinski wrote: > The following tests are failing on debug kernels: > > tcp_tcp_info_tcp-info-rwnd-limited.pkt > tcp_tcp_info_tcp-info-sndbuf-limited.pkt > > with reports like: > > assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \ > AssertionError: 18000 > > and: > > assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time > AssertionError: 362000 > > Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign > debug flakes as xfail") to cover them. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Thanks. I see that we'll still have a few flakes on dbg. Perhaps one total failure a day. From the following. tcp-close-close-local-close-then-remote-fin-pkt tcp-ecn-ecn-uses-ect0-pkt tcp-eor-no-coalesce-retrans-pkt tcp-slow-start-slow-start-after-win-update-pkt tcp-sack-sack-route-refresh-ip-tos-pkt tcp-ts-recent-reset-tsval-pkt tcp-zerocopy-closed-pkt We'll take a look after this change whether we can make these more resilient. But likely also allow-list or even xfail for everything in dbg.
On Thu, 16 Jan 2025 08:05:57 -0500 Willem de Bruijn wrote: > Jakub Kicinski wrote: > > The following tests are failing on debug kernels: > > > > tcp_tcp_info_tcp-info-rwnd-limited.pkt > > tcp_tcp_info_tcp-info-sndbuf-limited.pkt > > > > with reports like: > > > > assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \ > > AssertionError: 18000 > > > > and: > > > > assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time > > AssertionError: 362000 > > > > Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign > > debug flakes as xfail") to cover them. > > > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > > Reviewed-by: Willem de Bruijn <willemb@google.com> > > Thanks. > > I see that we'll still have a few flakes on dbg. Perhaps one total > failure a day. From the following. > > tcp-close-close-local-close-then-remote-fin-pkt > tcp-ecn-ecn-uses-ect0-pkt > tcp-eor-no-coalesce-retrans-pkt > tcp-slow-start-slow-start-after-win-update-pkt Argh, I missed the two above, I had the ignored cases filtered out when I was looking :( > tcp-sack-sack-route-refresh-ip-tos-pkt > tcp-ts-recent-reset-tsval-pkt > tcp-zerocopy-closed-pkt > > We'll take a look after this change whether we can make these > more resilient. But likely also allow-list or even xfail for > everything in dbg. Okay.
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Wed, 15 Jan 2025 15:21:29 -0800 you wrote: > The following tests are failing on debug kernels: > > tcp_tcp_info_tcp-info-rwnd-limited.pkt > tcp_tcp_info_tcp-info-sndbuf-limited.pkt > > with reports like: > > [...] Here is the summary with links: - [net-next] selftests/net: packetdrill: make tcp buf limited timing tests benign https://git.kernel.org/netdev/net-next/c/3030e3d57ba8 You are awesome, thank you!
Hi Willem, Jakub, On 16/01/2025 14:05, Willem de Bruijn wrote: > Jakub Kicinski wrote: >> The following tests are failing on debug kernels: >> >> tcp_tcp_info_tcp-info-rwnd-limited.pkt >> tcp_tcp_info_tcp-info-sndbuf-limited.pkt (...) > We'll take a look after this change whether we can make these > more resilient. But likely also allow-list or even xfail for > everything in dbg. On MPTCP side, I spent quite a bit of time trying to improve the situation on debug kernels. Sure it feels good and reassuring to have spent this time understanding the instabilities. Most issues were due to spurious retransmissions, because Packetdrill was "too slow" to inject replies: so more like an issue in the tests. But I don't know if having these tests running in such slow environments helped to find bugs directly, e.g. catching unexpected packets. Maybe once? But at what cost? Still it is good to run them on debug kernels to have extra verifications on the kernel side. As Ido mentioned last summer, perhaps we can ignore the test results, but keep logging them, and only look at the kernel warnings? So yes, I agree with Willem: if that cannot easily be fixed, ignoring packetdrill err code for everything in debug sounds like the right direction. Cheers, Matt
diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh index ff989c325eef..e15c43b7359b 100755 --- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh +++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh @@ -43,6 +43,7 @@ if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then "tcp_timestamping.*.pkt" "tcp_user_timeout_user-timeout-probe.pkt" "tcp_zerocopy_epoll_.*.pkt" + "tcp_tcp_info_tcp-info-*-limited.pkt" ) readonly xfail_regex="^($(printf '%s|' "${xfail_list[@]}"))$" [[ "$script" =~ ${xfail_regex} ]] && failfunc=ktap_test_xfail
The following tests are failing on debug kernels: tcp_tcp_info_tcp-info-rwnd-limited.pkt tcp_tcp_info_tcp-info-sndbuf-limited.pkt with reports like: assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \ AssertionError: 18000 and: assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time AssertionError: 362000 Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign debug flakes as xfail") to cover them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- CC: shuah@kernel.org CC: willemb@google.com CC: matttbe@kernel.org CC: linux-kselftest@vger.kernel.org --- tools/testing/selftests/net/packetdrill/ksft_runner.sh | 1 + 1 file changed, 1 insertion(+)