Message ID | 20241217185203.297935-1-sohamch.kernel@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | selftests/net: packetdrill: import multiple tests | expand |
On Tue, 17 Dec 2024 18:51:57 +0000 Soham Chakradeo wrote: > Import tests for the following features (folder names in brackets): > ECN (ecn) : RFC 3168 > Close (close) : RFC 9293 > TCP_INFO (tcp_info) : RFC 9293 > Fast recovery (fast_recovery) : RFC 5681 > Timestamping (timestamping) : RFC 1323 > Nagle (nagle) : RFC 896 > Selective Acknowledgments (sack) : RFC 2018 > Recent Timestamp (ts_recent) : RFC 1323 > Send file (sendfile) > Syscall bad arg (syscall_bad_arg) > Validate (validate) > Blocking (blocking) > Splice (splice) > End of record (eor) > Limited transmit (limited_transmit) Excellent, thanks for adding all these! I will merge the patches momentarily but I do see a number of flakes on our VMs with debug configs enabled: https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=packetdrill-dbg In the 7 runs so far we got 2 flakes on: tcp-timestamping-client-only-last-byte-pkt tcp-fast-recovery-prr-ss-ack-below-snd-una-cubic-pkt tcp-timestamping-server-pkt 1 flake on: tcp-timestamping-partial-pkt tcp-eor-no-coalesce-retrans-pkt LMK if you can't find the outputs, they should be there within a couple of clicks. I'll set these cases to be ignored for now, but would be great if we could find the way for them to be less time sensitive, perhaps?
Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 17 Dec 2024 18:51:57 +0000 you wrote: > From: Soham Chakradeo <sohamch@google.com> > > Import tests for the following features (folder names in brackets): > ECN (ecn) : RFC 3168 > Close (close) : RFC 9293 > TCP_INFO (tcp_info) : RFC 9293 > Fast recovery (fast_recovery) : RFC 5681 > Timestamping (timestamping) : RFC 1323 > Nagle (nagle) : RFC 896 > Selective Acknowledgments (sack) : RFC 2018 > Recent Timestamp (ts_recent) : RFC 1323 > Send file (sendfile) > Syscall bad arg (syscall_bad_arg) > Validate (validate) > Blocking (blocking) > Splice (splice) > End of record (eor) > Limited transmit (limited_transmit) > > [...] Here is the summary with links: - [net-next,1/4] selftests/net: packetdrill: import tcp/ecn, tcp/close, tcp/sack, tcp/tcp_info (no matching commit) - [net-next,2/4] selftests/net: packetdrill: import tcp/fast_recovery, tcp/nagle, tcp/timestamping https://git.kernel.org/netdev/net-next/c/eab35989cc37 - [net-next,3/4] selftests/net: packetdrill: import tcp/eor, tcp/splice, tcp/ts_recent, tcp/blocking https://git.kernel.org/netdev/net-next/c/6f6692053939 - [net-next,4/4] selftests/net: packetdrill: import tcp/user_timeout, tcp/validate, tcp/sendfile, tcp/limited-transmit, tcp/syscall_bad_arg (no matching commit) You are awesome, thank you!
Jakub Kicinski wrote: > On Tue, 17 Dec 2024 18:51:57 +0000 Soham Chakradeo wrote: > > Import tests for the following features (folder names in brackets): > > ECN (ecn) : RFC 3168 > > Close (close) : RFC 9293 > > TCP_INFO (tcp_info) : RFC 9293 > > Fast recovery (fast_recovery) : RFC 5681 > > Timestamping (timestamping) : RFC 1323 > > Nagle (nagle) : RFC 896 > > Selective Acknowledgments (sack) : RFC 2018 > > Recent Timestamp (ts_recent) : RFC 1323 > > Send file (sendfile) > > Syscall bad arg (syscall_bad_arg) > > Validate (validate) > > Blocking (blocking) > > Splice (splice) > > End of record (eor) > > Limited transmit (limited_transmit) > > Excellent, thanks for adding all these! I will merge the patches > momentarily but I do see a number of flakes on our VMs with debug > configs enabled: > https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=packetdrill-dbg > > In the 7 runs so far we got 2 flakes on: > > tcp-timestamping-client-only-last-byte-pkt > tcp-fast-recovery-prr-ss-ack-below-snd-una-cubic-pkt > tcp-timestamping-server-pkt > > 1 flake on: > > tcp-timestamping-partial-pkt > tcp-eor-no-coalesce-retrans-pkt > > LMK if you can't find the outputs, they should be there within a couple > of clicks. > > I'll set these cases to be ignored for now, but would be great if we > could find the way for them to be less time sensitive, perhaps? Yes, let's get a bit more data how flaky they are and investigate. Hopefully it's just tuning. Worst case we could back out these few tests (temporarily) to avoid polluting the dash. We did not observe this on our end during repeated debug runs.
On 12/18/24 19:00, Jakub Kicinski wrote: > On Tue, 17 Dec 2024 18:51:57 +0000 Soham Chakradeo wrote: >> Import tests for the following features (folder names in brackets): >> ECN (ecn) : RFC 3168 >> Close (close) : RFC 9293 >> TCP_INFO (tcp_info) : RFC 9293 >> Fast recovery (fast_recovery) : RFC 5681 >> Timestamping (timestamping) : RFC 1323 >> Nagle (nagle) : RFC 896 >> Selective Acknowledgments (sack) : RFC 2018 >> Recent Timestamp (ts_recent) : RFC 1323 >> Send file (sendfile) >> Syscall bad arg (syscall_bad_arg) >> Validate (validate) >> Blocking (blocking) >> Splice (splice) >> End of record (eor) >> Limited transmit (limited_transmit) > > Excellent, thanks for adding all these! I will merge the patches > momentarily but I do see a number of flakes on our VMs with debug > configs enabled: > https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=packetdrill-dbg > > In the 7 runs so far we got 2 flakes on: > > tcp-timestamping-client-only-last-byte-pkt Quickly skimming over this one, it looks like it does not account for the increased default 'tolerance_us'. Kernel packetdrill set it by default to 14K (instead of 10K IIRC). I guess this statement: // SCM_TSTAMP_SCHED for the last byte should be received almost immediately // once 10001 is acked at t=20ms. the the follow-up check should be updated accordingly. In the failures observed so far the max timestamp is > 35ms. Cheers, Paolo
Paolo Abeni wrote: > On 12/18/24 19:00, Jakub Kicinski wrote: > > On Tue, 17 Dec 2024 18:51:57 +0000 Soham Chakradeo wrote: > >> Import tests for the following features (folder names in brackets): > >> ECN (ecn) : RFC 3168 > >> Close (close) : RFC 9293 > >> TCP_INFO (tcp_info) : RFC 9293 > >> Fast recovery (fast_recovery) : RFC 5681 > >> Timestamping (timestamping) : RFC 1323 > >> Nagle (nagle) : RFC 896 > >> Selective Acknowledgments (sack) : RFC 2018 > >> Recent Timestamp (ts_recent) : RFC 1323 > >> Send file (sendfile) > >> Syscall bad arg (syscall_bad_arg) > >> Validate (validate) > >> Blocking (blocking) > >> Splice (splice) > >> End of record (eor) > >> Limited transmit (limited_transmit) > > > > Excellent, thanks for adding all these! I will merge the patches > > momentarily but I do see a number of flakes on our VMs with debug > > configs enabled: > > https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=packetdrill-dbg > > > > In the 7 runs so far we got 2 flakes on: > > > > tcp-timestamping-client-only-last-byte-pkt > > Quickly skimming over this one, it looks like it does not account for > the increased default 'tolerance_us'. Kernel packetdrill set it by > default to 14K (instead of 10K IIRC). > > I guess this statement: > > // SCM_TSTAMP_SCHED for the last byte should be received almost immediately > // once 10001 is acked at t=20ms. > > the the follow-up check should be updated accordingly. In the failures > observed so far the max timestamp is > 35ms. Thanks Paolo. All three timestamping flakes are instances where the script expects the timestamp to be taken essentially instantaneously after the send call. This is not the case, and the delay is outside even the 14K tolerance. I see occurrences of 20K. At some point we cannot keep increasing the tolerance, perhaps.
On Thu, 19 Dec 2024 14:31:44 -0500 Willem de Bruijn wrote: > All three timestamping flakes are instances where the script expects > the timestamp to be taken essentially instantaneously after the send > call. > > This is not the case, and the delay is outside even the 14K tolerance. > I see occurrences of 20K. At some point we cannot keep increasing the > tolerance, perhaps. I pinned the other services away and gave the packetdrill tester its own cores. Let's see how much of a difference this makes. The net-next-2024-12-20--03-00 branch will be the first to have this.