Message ID | cover.1673666803.git.lucien.xin@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net: support ipv4 big tcp | expand |
On 1/13/23 8:31 PM, Xin Long wrote: > This is similar to the BIG TCP patchset added by Eric for IPv6: > > https://lwn.net/Articles/895398/ > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > indicate this might be a BIG TCP packet and use skb->len as the real > IPv4 total length. > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > processed on the same host as they were created; There is no padding > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > packet total length; Also, before implementing the feature, all those > places that may get iph tot_len from BIG TCP packets are taken care > with some new APIs: > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > used in all these places where IPv4 BIG TCP packets may reach in Patch > 2-7, and Patch 8 implements this feature and Patch 10 adds a selftest > for it. Patch 9 is a fix in netfilter length_mt6 so that the selftest > can also cover IPv6 BIG TCP. > > Note that the similar change as in Patch 2-7 are also needed for IPv6 > BIG TCP packets, and will be addressed in another patchset. > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > and 1.5K MTU: > > No BIG TCP: > for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done > 168 322 337 3776.49 > 143 236 277 4654.67 > 128 258 288 4772.83 > 171 229 278 4645.77 > 175 228 243 4678.93 > 149 239 279 4599.86 > 164 234 268 4606.94 > 155 276 289 4235.82 > 180 255 268 4418.95 > 168 241 249 4417.82 > > Enable BIG TCP: > ip link set dev ens1f0np0 gro_max_size 128000 gso_max_size 128000 > for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done > 161 241 252 4821.73 > 174 205 217 5098.28 > 167 208 220 5001.43 > 164 228 249 4883.98 > 150 233 249 4914.90 > 180 233 244 4819.66 > 154 208 219 5004.92 > 157 209 247 4999.78 > 160 218 246 4842.31 > 174 206 217 5080.99 > > Xin Long (10): > net: add a couple of helpers for iph tot_len > bridge: use skb_ip_totlen in br netfilter > openvswitch: use skb_ip_totlen in conntrack > net: sched: use skb_ip_totlen and iph_totlen > netfilter: use skb_ip_totlen and iph_totlen > cipso_ipv4: use iph_set_totlen in skbuff_setattr > ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdr > net: add support for ipv4 big tcp > netfilter: get ipv6 pktlen properly in length_mt6 > selftests: add a selftest for big tcp > > drivers/net/ipvlan/ipvlan_core.c | 2 +- > include/linux/ip.h | 20 +++ > include/linux/ipv6.h | 9 ++ > include/net/netfilter/nf_tables_ipv4.h | 4 +- > include/net/route.h | 3 - > net/bridge/br_netfilter_hooks.c | 2 +- > net/bridge/netfilter/nf_conntrack_bridge.c | 4 +- > net/core/gro.c | 6 +- > net/core/sock.c | 11 +- > net/ipv4/af_inet.c | 7 +- > net/ipv4/cipso_ipv4.c | 2 +- > net/ipv4/ip_input.c | 2 +- > net/ipv4/ip_output.c | 2 +- > net/netfilter/ipvs/ip_vs_xmit.c | 2 +- > net/netfilter/nf_log_syslog.c | 2 +- > net/netfilter/xt_length.c | 5 +- > net/openvswitch/conntrack.c | 2 +- > net/sched/act_ct.c | 2 +- > net/sched/sch_cake.c | 2 +- > tools/testing/selftests/net/Makefile | 1 + > tools/testing/selftests/net/big_tcp.sh | 157 +++++++++++++++++++++ > 21 files changed, 212 insertions(+), 35 deletions(-) > create mode 100755 tools/testing/selftests/net/big_tcp.sh > Thanks for working on this and writing the selftests. A couple of years ago I was experimenting with a simpler version of this change (only changed what I needed to run tests). tcpdump (as an example of packet socket app) was confused about the packet length and reported truncated packet errors. As I recall that is the only really tricky part to getting large packets for IPv4.
On Sat, Jan 14, 2023 at 4:31 AM Xin Long <lucien.xin@gmail.com> wrote: > > This is similar to the BIG TCP patchset added by Eric for IPv6: > > https://lwn.net/Articles/895398/ > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > indicate this might be a BIG TCP packet and use skb->len as the real > IPv4 total length. > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > processed on the same host as they were created; There is no padding > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > packet total length; Also, before implementing the feature, all those > places that may get iph tot_len from BIG TCP packets are taken care > with some new APIs: > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > used in all these places where IPv4 BIG TCP packets may reach in Patch > 2-7, and Patch 8 implements this feature and Patch 10 adds a selftest > for it. Patch 9 is a fix in netfilter length_mt6 so that the selftest > can also cover IPv6 BIG TCP. > > Note that the similar change as in Patch 2-7 are also needed for IPv6 > BIG TCP packets, and will be addressed in another patchset. > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > and 1.5K MTU: > This is broken, sorry. There are reasons BIG TCP was implemented for IPv6 only, it seems you missed a lot of them. Networking needs observability and diagnostic tools. Until you come back with a proper way for tcpdump to not mess things, there is no way I can ACK these changes.
On Sun, Jan 15, 2023 at 11:05 AM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Jan 14, 2023 at 4:31 AM Xin Long <lucien.xin@gmail.com> wrote: > > > > This is similar to the BIG TCP patchset added by Eric for IPv6: > > > > https://lwn.net/Articles/895398/ > > > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > > indicate this might be a BIG TCP packet and use skb->len as the real > > IPv4 total length. > > > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > > processed on the same host as they were created; There is no padding > > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > > packet total length; Also, before implementing the feature, all those > > places that may get iph tot_len from BIG TCP packets are taken care > > with some new APIs: > > > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > > used in all these places where IPv4 BIG TCP packets may reach in Patch > > 2-7, and Patch 8 implements this feature and Patch 10 adds a selftest > > for it. Patch 9 is a fix in netfilter length_mt6 so that the selftest > > can also cover IPv6 BIG TCP. > > > > Note that the similar change as in Patch 2-7 are also needed for IPv6 > > BIG TCP packets, and will be addressed in another patchset. > > > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > > and 1.5K MTU: > > > > This is broken, sorry. > > There are reasons BIG TCP was implemented for IPv6 only, it seems you > missed a lot of them. > > Networking needs observability and diagnostic tools. > > Until you come back with a proper way for tcpdump to not mess things, > there is no way I can ACK these changes. For the installed tcpdump, it's just parsing iph->tot_len from the raw pkt, and I'm not sure how to make it without any change in tcpdump. But, https://github.com/the-tcpdump-group/tcpdump/commit/c8623960f050cb81c12b31107070021f27f14b18 As this is already in tcpdump, we can build tcpdump with "-DGUESS_TSO": # ./configure CFLAGS=-DGUESS_TSO It seems someone had met such problems even without IPv4 BIG TCP, not sure in Linux or other OS. Now it's time to enable this CFLAG. :-) Thanks.