From patchwork Tue Dec 8 16:21:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 11959001 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA923C433FE for ; Tue, 8 Dec 2020 16:22:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 92F5B23A6C for ; Tue, 8 Dec 2020 16:22:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729679AbgLHQWT (ORCPT ); Tue, 8 Dec 2020 11:22:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726069AbgLHQWQ (ORCPT ); Tue, 8 Dec 2020 11:22:16 -0500 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93F47C061793 for ; Tue, 8 Dec 2020 08:21:36 -0800 (PST) Received: by mail-pg1-x544.google.com with SMTP id 69so4747897pgg.8 for ; Tue, 08 Dec 2020 08:21:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YGQYij9I5NODaVylR3JxP0AXZQFLt6q3/K8z5SAOXLA=; b=qoyRRCTKp3Ce+VLsxU3SleCEA2bLf4MXJH7tDExdrtcuYvSqUGWGTpjAea7LqLtd4f XT3AGMB0b+tsfmkNDM+s3mbD+dC7gXelLMqJC0PJuftuFyY4CZ6Yc0hlvjqjpBPanoBW 7/lDwVGtyoWXinR77Zuna9HJU+s43xR2UqGtS1VVJftHzxGulwHSWxtWugUjPFPzbc2r 1bpfArOccv1uWKHY5Kh0ZebENkBsvcFtzmEXqqCnIF2mUkZFGUhqfOievcbIimXR03Gn SrXK2E6f/TS1sIdXKDZCcy4clkVgXnwDkIdXTRmv5fUTZHgZ0pbSgeqKqEx2KqswxpHb XfVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YGQYij9I5NODaVylR3JxP0AXZQFLt6q3/K8z5SAOXLA=; b=Yzytw44utwVPMA64HNDOqdTqm5+kCxh5wErESJmhBncaMyB4xMxs02dlL9OCIuQtwK +rJHPtUkpQlmCx2GDnX3OEWmvqtS6i+aGxzIiEQ72JPi/CjfzIxTUZxwclcwuiLuEiIa 3aEbGHFMUgxFdBjePGM5rlo/zw9YRUs25tm1BgAR/HF+M6hmKGmxr8g/LtBwjTAzpuUe UM9LX03cSVMESfwCGsYn/Uu6lFhn7mutEYuZ7febvfdNiFZtpnUk/sAMk8V5WSlr4m8a QApb1l7ocbWV+fWUMV2HkodV0TmisZRTKx9KrEpHgby5AidpC8Z+ZrBMHr31VaXqK8vR kPJQ== X-Gm-Message-State: AOAM532UOIy6VUTYZ8jgX2IFYe3/NADhILCXBaaS2nXgn912vOAc8VN5 PNyzd4SSzxXJ3x/QTZ+AOymBBzeUJ70= X-Google-Smtp-Source: ABdhPJwilUXYV5ldv5WNxowrt95vz4tRQPyLYwBt6JX9x7AEjKFWIn89hQ5P3R5cHp85hjGV4wsv6w== X-Received: by 2002:a63:6207:: with SMTP id w7mr9217674pgb.164.1607444496167; Tue, 08 Dec 2020 08:21:36 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:7220:84ff:fe09:1424]) by smtp.gmail.com with ESMTPSA id j14sm7110350pfi.3.2020.12.08.08.21.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 08:21:35 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Eric Dumazet , Soheil Hassas Yeganeh , Neal Cardwell , Yuchung Cheng , Hazem Mohamed Abuelfotoh Subject: [PATCH net] tcp: select sane initial rcvq_space.space for big MSS Date: Tue, 8 Dec 2020 08:21:31 -0800 Message-Id: <20201208162131.313635-1-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.29.2.576.ga3fc446d84-goog MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. This is no longer the case, and Hazem Mohamed Abuelfotoh reported that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit of rcvq_space.space computation, while it can select later a smaller value for tp->rcv_ssthresh and tp->window_clamp. ss -temoi on receiver would show : skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 This means that TCP can not increase its window in tcp_grow_window(), and that DRS can never kick. Fix this by making sure that rcvq_space.space is not bigger than number of bytes that can be held in TCP receive queue. People unable/unwilling to change their kernel can work around this issue by selecting a bigger tcp_rmem[1] value as in : echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem Based on an initial report and patch from Hazem Mohamed Abuelfotoh https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") Reported-by: Hazem Mohamed Abuelfotoh Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 389d1b34024854a9bdcbe861d4820d1bfb495e24..ef4bdb038a4bbbd949868a01dc855bba0e90b9ca 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -510,7 +510,6 @@ static void tcp_init_buffer_space(struct sock *sk) if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) tcp_sndbuf_expand(sk); - tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss); tcp_mstamp_refresh(tp); tp->rcvq_space.time = tp->tcp_mstamp; tp->rcvq_space.seq = tp->copied_seq; @@ -534,6 +533,8 @@ static void tcp_init_buffer_space(struct sock *sk) tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp); tp->snd_cwnd_stamp = tcp_jiffies32; + tp->rcvq_space.space = min3(tp->rcv_ssthresh, tp->rcv_wnd, + (u32)TCP_INIT_CWND * tp->advmss); } /* 4. Recalculate window clamp after socket hit its memory bounds. */