Message ID | 20240521134220.12510-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 378979e94e953c2070acb4f0e0c98d29260bd09d |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: remove 64 KByte limit for initial tp->rcv_wnd value | expand |
On Tue, May 21, 2024 at 3:42 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > Recently, we had some servers upgraded to the latest kernel and noticed > the indicator from the user side showed worse results than before. It is > caused by the limitation of tp->rcv_wnd. > > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most > CDN teams would not benefit from this change because they cannot have a > large window to receive a big packet, which will be slowed down especially > in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's > the side effect for the latency/time-sensitive users. > > To avoid future confusion, current change doesn't affect the initial > receive window on the wire in a SYN or SYN+ACK packet which are set within > 65535 bytes according to RFC 7323 also due to the limit in > __tcp_transmit_skb(): > > th->window = htons(min(tp->rcv_wnd, 65535U)); > > In one word, __tcp_transmit_skb() already ensures that constraint is > respected, no matter how large tp->rcv_wnd is. The change doesn't violate > RFC. > > Let me provide one example if with or without the patch: > Before: > client --- SYN: rwindow=65535 ---> server > client <--- SYN+ACK: rwindow=65535 ---- server > client --- ACK: rwindow=65536 ---> server > Note: for the last ACK, the calculation is 512 << 7. > > After: > client --- SYN: rwindow=65535 ---> server > client <--- SYN+ACK: rwindow=65535 ---- server > client --- ACK: rwindow=175232 ---> server > Note: I use the following command to make it work: > ip route change default via [ip] dev eth0 metric 100 initrwnd 120 > For the last ACK, the calculation is 1369 << 7. > > When we apply such a patch, having a large rcv_wnd if the user tweak this > knob can help transfer data more rapidly and save some rtts. > > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- Reviewed-by: Eric Dumazet <edumazet@google.com>
On Wed, May 22, 2024 at 2:52 AM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, May 21, 2024 at 3:42 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > Recently, we had some servers upgraded to the latest kernel and noticed > > the indicator from the user side showed worse results than before. It is > > caused by the limitation of tp->rcv_wnd. > > > > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin > > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most > > CDN teams would not benefit from this change because they cannot have a > > large window to receive a big packet, which will be slowed down especially > > in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's > > the side effect for the latency/time-sensitive users. > > > > To avoid future confusion, current change doesn't affect the initial > > receive window on the wire in a SYN or SYN+ACK packet which are set within > > 65535 bytes according to RFC 7323 also due to the limit in > > __tcp_transmit_skb(): > > > > th->window = htons(min(tp->rcv_wnd, 65535U)); > > > > In one word, __tcp_transmit_skb() already ensures that constraint is > > respected, no matter how large tp->rcv_wnd is. The change doesn't violate > > RFC. > > > > Let me provide one example if with or without the patch: > > Before: > > client --- SYN: rwindow=65535 ---> server > > client <--- SYN+ACK: rwindow=65535 ---- server > > client --- ACK: rwindow=65536 ---> server > > Note: for the last ACK, the calculation is 512 << 7. > > > > After: > > client --- SYN: rwindow=65535 ---> server > > client <--- SYN+ACK: rwindow=65535 ---- server > > client --- ACK: rwindow=175232 ---> server > > Note: I use the following command to make it work: > > ip route change default via [ip] dev eth0 metric 100 initrwnd 120 > > For the last ACK, the calculation is 1369 << 7. > > > > When we apply such a patch, having a large rcv_wnd if the user tweak this > > knob can help transfer data more rapidly and save some rtts. > > > > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Jason, thanks for the patch! neal
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Tue, 21 May 2024 21:42:20 +0800 you wrote: > From: Jason Xing <kernelxing@tencent.com> > > Recently, we had some servers upgraded to the latest kernel and noticed > the indicator from the user side showed worse results than before. It is > caused by the limitation of tp->rcv_wnd. > > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most > CDN teams would not benefit from this change because they cannot have a > large window to receive a big packet, which will be slowed down especially > in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's > the side effect for the latency/time-sensitive users. > > [...] Here is the summary with links: - [net] tcp: remove 64 KByte limit for initial tp->rcv_wnd value https://git.kernel.org/netdev/net/c/378979e94e95 You are awesome, thank you!
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 95caf8aaa8be..95618d0e78e4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss, if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows)) (*rcv_wnd) = min(space, MAX_TCP_WINDOW); else - (*rcv_wnd) = min_t(u32, space, U16_MAX); + (*rcv_wnd) = space; if (init_rcv_wnd) *rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);