Message ID | 20201208162131.313635-1-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: select sane initial rcvq_space.space for big MSS | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 2 this patch: 2 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 15 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 2 this patch: 2 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Tue, Dec 8, 2020 at 11:21 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. > > This is no longer the case, and Hazem Mohamed Abuelfotoh reported > that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. > > Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit > of rcvq_space.space computation, while it can select later a smaller > value for tp->rcv_ssthresh and tp->window_clamp. > > ss -temoi on receiver would show : > > skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 > > This means that TCP can not increase its window in tcp_grow_window(), > and that DRS can never kick. > > Fix this by making sure that rcvq_space.space is not bigger than number of bytes > that can be held in TCP receive queue. > > People unable/unwilling to change their kernel can work around this issue by > selecting a bigger tcp_rmem[1] value as in : > > echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem > > Based on an initial report and patch from Hazem Mohamed Abuelfotoh > https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ > > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") > Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Nice catch and fix! Thanks Eric! > --- > net/ipv4/tcp_input.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 389d1b34024854a9bdcbe861d4820d1bfb495e24..ef4bdb038a4bbbd949868a01dc855bba0e90b9ca 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -510,7 +510,6 @@ static void tcp_init_buffer_space(struct sock *sk) > if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) > tcp_sndbuf_expand(sk); > > - tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss); > tcp_mstamp_refresh(tp); > tp->rcvq_space.time = tp->tcp_mstamp; > tp->rcvq_space.seq = tp->copied_seq; > @@ -534,6 +533,8 @@ static void tcp_init_buffer_space(struct sock *sk) > > tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp); > tp->snd_cwnd_stamp = tcp_jiffies32; > + tp->rcvq_space.space = min3(tp->rcv_ssthresh, tp->rcv_wnd, > + (u32)TCP_INIT_CWND * tp->advmss); > } > > /* 4. Recalculate window clamp after socket hit its memory bounds. */ > -- > 2.29.2.576.ga3fc446d84-goog >
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 8 Dec 2020 08:21:31 -0800 > From: Eric Dumazet <edumazet@google.com> > > Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. > > This is no longer the case, and Hazem Mohamed Abuelfotoh reported > that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. > > Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit > of rcvq_space.space computation, while it can select later a smaller > value for tp->rcv_ssthresh and tp->window_clamp. > > ss -temoi on receiver would show : > > skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 > > This means that TCP can not increase its window in tcp_grow_window(), > and that DRS can never kick. > > Fix this by making sure that rcvq_space.space is not bigger than number of bytes > that can be held in TCP receive queue. > > People unable/unwilling to change their kernel can work around this issue by > selecting a bigger tcp_rmem[1] value as in : > > echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem > > Based on an initial report and patch from Hazem Mohamed Abuelfotoh > https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ > > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") > Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks Eric.
Hi Eric, I don't see the patch in the stable queue. Can we add it to stable so we can cherry pick it in Amazon Linux kernel? Thank you. Hazem On 09/12/2020, 00:29, "David Miller" <davem@davemloft.net> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 8 Dec 2020 08:21:31 -0800 > From: Eric Dumazet <edumazet@google.com> > > Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. > > This is no longer the case, and Hazem Mohamed Abuelfotoh reported > that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. > > Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit > of rcvq_space.space computation, while it can select later a smaller > value for tp->rcv_ssthresh and tp->window_clamp. > > ss -temoi on receiver would show : > > skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 > > This means that TCP can not increase its window in tcp_grow_window(), > and that DRS can never kick. > > Fix this by making sure that rcvq_space.space is not bigger than number of bytes > that can be held in TCP receive queue. > > People unable/unwilling to change their kernel can work around this issue by > selecting a bigger tcp_rmem[1] value as in : > > echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem > > Based on an initial report and patch from Hazem Mohamed Abuelfotoh > https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ > > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") > Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks Eric. Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284 Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
On Thu, Dec 10, 2020 at 1:49 PM Mohamed Abuelfotoh, Hazem <abuehaze@amazon.com> wrote: > > Hi Eric, > > I don't see the patch in the stable queue. Can we add it to stable so we can cherry pick it in Amazon Linux kernel? > No need for stable tags , as documented in https://elixir.bootlin.com/linux/v5.9/source/Documentation/networking/netdev-FAQ.rst#L147 > Thank you. > > Hazem > > On 09/12/2020, 00:29, "David Miller" <davem@davemloft.net> wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 8 Dec 2020 08:21:31 -0800 > > > From: Eric Dumazet <edumazet@google.com> > > > > Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > > small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. > > > > This is no longer the case, and Hazem Mohamed Abuelfotoh reported > > that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. > > > > Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit > > of rcvq_space.space computation, while it can select later a smaller > > value for tp->rcv_ssthresh and tp->window_clamp. > > > > ss -temoi on receiver would show : > > > > skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 > > > > This means that TCP can not increase its window in tcp_grow_window(), > > and that DRS can never kick. > > > > Fix this by making sure that rcvq_space.space is not bigger than number of bytes > > that can be held in TCP receive queue. > > > > People unable/unwilling to change their kernel can work around this issue by > > selecting a bigger tcp_rmem[1] value as in : > > > > echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem > > > > Based on an initial report and patch from Hazem Mohamed Abuelfotoh > > https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ > > > > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") > > Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") > > Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Applied, thanks Eric. > > > > > Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284 > > Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705 > >
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 389d1b34024854a9bdcbe861d4820d1bfb495e24..ef4bdb038a4bbbd949868a01dc855bba0e90b9ca 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -510,7 +510,6 @@ static void tcp_init_buffer_space(struct sock *sk) if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) tcp_sndbuf_expand(sk); - tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss); tcp_mstamp_refresh(tp); tp->rcvq_space.time = tp->tcp_mstamp; tp->rcvq_space.seq = tp->copied_seq; @@ -534,6 +533,8 @@ static void tcp_init_buffer_space(struct sock *sk) tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp); tp->snd_cwnd_stamp = tcp_jiffies32; + tp->rcvq_space.space = min3(tp->rcv_ssthresh, tp->rcv_wnd, + (u32)TCP_INIT_CWND * tp->advmss); } /* 4. Recalculate window clamp after socket hit its memory bounds. */