From patchwork Mon Sep 27 18:25:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12520589 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71729C4332F for ; Mon, 27 Sep 2021 18:25:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5FDD260F6D for ; Mon, 27 Sep 2021 18:25:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236188AbhI0S1H (ORCPT ); Mon, 27 Sep 2021 14:27:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236011AbhI0S1G (ORCPT ); Mon, 27 Sep 2021 14:27:06 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0596C061575 for ; Mon, 27 Sep 2021 11:25:28 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id i83-20020a252256000000b005b67a878f56so7088181ybi.17 for ; Mon, 27 Sep 2021 11:25:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ZtaAcwKYRJmXUMDDp/DKeX2dnEiaHOtYp6WS2jFpvjM=; b=dU5PRXE9tvn4Ae4uyNsRLiVzocJ2+e4jh2XHu2oNco7GELsbI1gUdOD+G7gZdC7YIG yidB1RvtbMpIzlSkrUqTg0XtB7PVYLqOfJVjZvryH9+naBTkSw3jPyC2/Syq05A2OmHj n0+bmLOYVWs3gWQhVVo+V9+I3PTWfhhC/UwAvx7rSgk/ziCxnu84BzOXw+17w+E+Q4Dm KissWVWvJgIzYMZfbB8GDJHJZ1wKANjZYnyuJWWTAZqdWlJclwGq2B3OSgrnYHkdt4rh isUXKhlI8as4ikMm+CIFO5q2fBZsAiAlFqQor/0naa1uKTgh0ORkOJvR/X0TA6jOjSMr BLBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ZtaAcwKYRJmXUMDDp/DKeX2dnEiaHOtYp6WS2jFpvjM=; b=0JlXx8yk1L30EYM/6DYJIg29j8xNJmFUXCgyEZ5SV7IxltUGCz0tR2QBolzic5WSyR P0ldf1oTMECGfRmjBy4Tu/Azf1lFVSOmMHCzND4n6HSwXJAzT2Hx4QgJT6Ho+iuVZuSA Pt5keZ5AK+x4Gd/jtyfiRlNOA/jDtRyud8DDndzSF67zyxT0tPWUOBug77gQ5VZTv6Kg zHyxgwYTeGN9KAxaJbmJj14WjihTIyFrsrqT8HHqWzWjtZM2MuIJrHp/nWyUkf23+NsS 60iXd1R+Xkeo4KtzOq81Gphd3k7AdUo5YJgp9wSR/2qr9d7Niaotjytl891EoEDXUsel XzTg== X-Gm-Message-State: AOAM5332K4ZG0lk5q8F8mmY13zjEJJthM1kDN9YFTGrME4guCisoFdeK ol0dIeykz4WP+diD+o8rkoNEuEuoSHE= X-Google-Smtp-Source: ABdhPJzRg6NcMI5aDh20bbBR4P8wEkwqgYGcIYFa/d880k5Adcu+Rkx/1T9ytrcO2F4kpLiOiWulZAiaMcs= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:889:3fd7:84f6:f39c]) (user=weiwan job=sendgmr) by 2002:a25:cac8:: with SMTP id a191mr1481684ybg.74.1632767127909; Mon, 27 Sep 2021 11:25:27 -0700 (PDT) Date: Mon, 27 Sep 2021 11:25:21 -0700 In-Reply-To: <20210927182523.2704818-1-weiwan@google.com> Message-Id: <20210927182523.2704818-2-weiwan@google.com> Mime-Version: 1.0 References: <20210927182523.2704818-1-weiwan@google.com> X-Mailer: git-send-email 2.33.0.685.g46640cef36-goog Subject: [PATCH net-next 1/3] net: add new socket option SO_RESERVE_MEM From: Wei Wang To: "David S . Miller" , netdev@vger.kernel.org, Jakub Kicinski Cc: Shakeel Butt , Eric Dumazet Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/sock.h | 43 ++++++++++++++++--- include/uapi/asm-generic/socket.h | 2 + net/core/sock.c | 69 +++++++++++++++++++++++++++++++ net/core/stream.c | 2 +- net/ipv4/af_inet.c | 2 +- 5 files changed, 111 insertions(+), 7 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 66a9a90f9558..b0df2d3843fd 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -412,6 +412,7 @@ struct sock { #define sk_rmem_alloc sk_backlog.rmem_alloc int sk_forward_alloc; + u32 sk_reserved_mem; #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sk_ll_usec; /* ===== mostly read cache line ===== */ @@ -1515,20 +1516,49 @@ sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size) skb_pfmemalloc(skb); } +static inline int sk_unused_reserved_mem(const struct sock *sk) +{ + int unused_mem; + + if (likely(!sk->sk_reserved_mem)) + return 0; + + unused_mem = sk->sk_reserved_mem - sk->sk_wmem_queued - + atomic_read(&sk->sk_rmem_alloc); + + return unused_mem > 0 ? unused_mem : 0; +} + static inline void sk_mem_reclaim(struct sock *sk) { + int reclaimable; + if (!sk_has_account(sk)) return; - if (sk->sk_forward_alloc >= SK_MEM_QUANTUM) - __sk_mem_reclaim(sk, sk->sk_forward_alloc); + + reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); + + if (reclaimable >= SK_MEM_QUANTUM) + __sk_mem_reclaim(sk, reclaimable); +} + +static inline void sk_mem_reclaim_final(struct sock *sk) +{ + sk->sk_reserved_mem = 0; + sk_mem_reclaim(sk); } static inline void sk_mem_reclaim_partial(struct sock *sk) { + int reclaimable; + if (!sk_has_account(sk)) return; - if (sk->sk_forward_alloc > SK_MEM_QUANTUM) - __sk_mem_reclaim(sk, sk->sk_forward_alloc - 1); + + reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); + + if (reclaimable > SK_MEM_QUANTUM) + __sk_mem_reclaim(sk, reclaimable - 1); } static inline void sk_mem_charge(struct sock *sk, int size) @@ -1540,9 +1570,12 @@ static inline void sk_mem_charge(struct sock *sk, int size) static inline void sk_mem_uncharge(struct sock *sk, int size) { + int reclaimable; + if (!sk_has_account(sk)) return; sk->sk_forward_alloc += size; + reclaimable = sk->sk_forward_alloc - sk_unused_reserved_mem(sk); /* Avoid a possible overflow. * TCP send queues can make this happen, if sk_mem_reclaim() @@ -1551,7 +1584,7 @@ static inline void sk_mem_uncharge(struct sock *sk, int size) * If we reach 2 MBytes, reclaim 1 MBytes right now, there is * no need to hold that much forward allocation anyway. */ - if (unlikely(sk->sk_forward_alloc >= 1 << 21)) + if (unlikely(reclaimable >= 1 << 21)) __sk_mem_reclaim(sk, 1 << 20); } diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 1f0a2b4864e4..c77a1313b3b0 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -126,6 +126,8 @@ #define SO_BUF_LOCK 72 +#define SO_RESERVE_MEM 73 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) diff --git a/net/core/sock.c b/net/core/sock.c index 62627e868e03..a658c0173015 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -947,6 +947,53 @@ void sock_set_mark(struct sock *sk, u32 val) } EXPORT_SYMBOL(sock_set_mark); +static void sock_release_reserved_memory(struct sock *sk, int bytes) +{ + /* Round down bytes to multiple of pages */ + bytes &= ~(SK_MEM_QUANTUM - 1); + + WARN_ON(bytes > sk->sk_reserved_mem); + sk->sk_reserved_mem -= bytes; + sk_mem_reclaim(sk); +} + +static int sock_reserve_memory(struct sock *sk, int bytes) +{ + long allocated; + bool charged; + int pages; + + if (!mem_cgroup_sockets_enabled || !sk->sk_memcg) + return -EOPNOTSUPP; + + if (!bytes) + return 0; + + pages = sk_mem_pages(bytes); + + /* pre-charge to memcg */ + charged = mem_cgroup_charge_skmem(sk->sk_memcg, pages, + GFP_KERNEL | __GFP_RETRY_MAYFAIL); + if (!charged) + return -ENOMEM; + + /* pre-charge to forward_alloc */ + allocated = sk_memory_allocated_add(sk, pages); + /* If the system goes into memory pressure with this + * precharge, give up and return error. + */ + if (allocated > sk_prot_mem_limits(sk, 1)) { + sk_memory_allocated_sub(sk, pages); + mem_cgroup_uncharge_skmem(sk->sk_memcg, pages); + return -ENOMEM; + } + sk->sk_forward_alloc += pages << SK_MEM_QUANTUM_SHIFT; + + sk->sk_reserved_mem += pages << SK_MEM_QUANTUM_SHIFT; + + return 0; +} + /* * This is meant for all protocols to use and covers goings on * at the socket level. Everything here is generic. @@ -1367,6 +1414,23 @@ int sock_setsockopt(struct socket *sock, int level, int optname, ~SOCK_BUF_LOCK_MASK); break; + case SO_RESERVE_MEM: + { + int delta; + + if (val < 0) { + ret = -EINVAL; + break; + } + + delta = val - sk->sk_reserved_mem; + if (delta < 0) + sock_release_reserved_memory(sk, -delta); + else + ret = sock_reserve_memory(sk, delta); + break; + } + default: ret = -ENOPROTOOPT; break; @@ -1733,6 +1797,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname, v.val = sk->sk_userlocks & SOCK_BUF_LOCK_MASK; break; + case SO_RESERVE_MEM: + v.val = sk->sk_reserved_mem; + break; + default: /* We implement the SO_SNDLOWAT etc to not be settable * (1003.1g 7). @@ -2045,6 +2113,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) newsk->sk_dst_pending_confirm = 0; newsk->sk_wmem_queued = 0; newsk->sk_forward_alloc = 0; + newsk->sk_reserved_mem = 0; atomic_set(&newsk->sk_drops, 0); newsk->sk_send_head = NULL; newsk->sk_userlocks = sk->sk_userlocks & ~SOCK_BINDPORT_LOCK; diff --git a/net/core/stream.c b/net/core/stream.c index 4f1d4aa5fb38..e09ffd410685 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -202,7 +202,7 @@ void sk_stream_kill_queues(struct sock *sk) WARN_ON(!skb_queue_empty(&sk->sk_write_queue)); /* Account for returned memory. */ - sk_mem_reclaim(sk); + sk_mem_reclaim_final(sk); WARN_ON(sk->sk_wmem_queued); WARN_ON(sk->sk_forward_alloc); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 1d816a5fd3eb..a06f6a30b0d4 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -139,7 +139,7 @@ void inet_sock_destruct(struct sock *sk) } __skb_queue_purge(&sk->sk_error_queue); - sk_mem_reclaim(sk); + sk_mem_reclaim_final(sk); if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) { pr_err("Attempt to release TCP socket in state %d %p\n", From patchwork Mon Sep 27 18:25:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12520591 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBDE0C433F5 for ; Mon, 27 Sep 2021 18:25:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D36B060F6C for ; Mon, 27 Sep 2021 18:25:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236089AbhI0S1O (ORCPT ); Mon, 27 Sep 2021 14:27:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236190AbhI0S1J (ORCPT ); Mon, 27 Sep 2021 14:27:09 -0400 Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1AECC061740 for ; Mon, 27 Sep 2021 11:25:30 -0700 (PDT) Received: by mail-qk1-x74a.google.com with SMTP id m6-20020a05620a24c600b004338e8a5a3cso74569800qkn.10 for ; Mon, 27 Sep 2021 11:25:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=AEdbX08kgGkeg4zYclXfcLJocSHrhkXOPVQK+H9otHc=; b=P+ywoHhrnbXSCfemGnHTWK5UvGep8d/jnkMz+obZOUKwpmMnEtSSfVpJ+ZtJQ+GEPW W8D90h4ak5LGZnUUxnanZ9RXpTBDCWyL/XxzKW1OVH428yRg8Mb0XtQTUae0MFtubIX4 +b/UEfFlv12ENzhH8f/ZxKxDKRUGWIJM6wt2q6i0MUBZDtICG7bXsQZg61stCJr6m/IP /8lv0Cw7r+ZMoKmBrIlTZzSgvPQOewwHI4yqMrmyy6GstKIA2bvK02L7KNgA8oTOI/q6 pOb5+Od7mB/yaXuEYfuxd2prLyGALUIitD8OTJCBWYfV+3i/rLv9EFNxNg0GN6od7B9A QxiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=AEdbX08kgGkeg4zYclXfcLJocSHrhkXOPVQK+H9otHc=; b=OUy3fLF9YNvBaUwkKV0VjhLXyYO04nSY6fVEkSs1/keRoC5kFM6WUHBpTxm7wbWNML RDuSDL0NvrM/FuottEagbT50doEwunE7EyWoRkLAztgy+1wTupNNGiSPCS728u9HFgZo 3EFAwPbL8kpVCX/51Zd8jSsxsng4fEXxH8oP6fTh/sgw2j+S8Ul5hDs9sem1sip4F/gY 13BOBfYbFCTtMAT4js99Z2l7brzsW3HLpqYPn01Bnop8mwgS8D7DDIZ3v0mrB6dJ/eO6 eQNFwZezleFrQt3R0D84DwCJkm+7DeFq5Z0guywCIofOzIEQsJv4EOx0RK7RjCJgEavp GTXg== X-Gm-Message-State: AOAM533NrMFolFHkxeHL1mrCskD/BXQmUleoXLWRTHsukPuDA0VunlCp W7Qrp/QCoTvUnX9SEsTZzHfv2L90Ehk= X-Google-Smtp-Source: ABdhPJw89g0qCIkBZUfFQoq05k9ub2BcZrk+qNOtZQxqPhc+E2rnN4kCx+u8SZvQA4YXJilGmC9kQOSD474= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:889:3fd7:84f6:f39c]) (user=weiwan job=sendgmr) by 2002:a05:6214:1351:: with SMTP id b17mr1150347qvw.11.1632767130035; Mon, 27 Sep 2021 11:25:30 -0700 (PDT) Date: Mon, 27 Sep 2021 11:25:22 -0700 In-Reply-To: <20210927182523.2704818-1-weiwan@google.com> Message-Id: <20210927182523.2704818-3-weiwan@google.com> Mime-Version: 1.0 References: <20210927182523.2704818-1-weiwan@google.com> X-Mailer: git-send-email 2.33.0.685.g46640cef36-goog Subject: [PATCH net-next 2/3] tcp: adjust sndbuf according to sk_reserved_mem From: Wei Wang To: "David S . Miller" , netdev@vger.kernel.org, Jakub Kicinski Cc: Shakeel Butt , Eric Dumazet Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org If user sets SO_RESERVE_MEM socket option, in order to fully utilize the reserved memory in memory pressure state on the tx path, we modify the logic in sk_stream_moderate_sndbuf() to set sk_sndbuf according to available reserved memory, instead of MIN_SOCK_SNDBUF, and adjust it when new data is acked. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/sock.h | 1 + net/ipv4/tcp_input.c | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index b0df2d3843fd..e6ad628adcd2 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2388,6 +2388,7 @@ static inline void sk_stream_moderate_sndbuf(struct sock *sk) return; val = min(sk->sk_sndbuf, sk->sk_wmem_queued >> 1); + val = max_t(u32, val, sk_unused_reserved_mem(sk)); WRITE_ONCE(sk->sk_sndbuf, max_t(u32, val, SOCK_MIN_SNDBUF)); } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 141e85e6422b..a7611256f235 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5381,7 +5381,7 @@ static int tcp_prune_queue(struct sock *sk) return -1; } -static bool tcp_should_expand_sndbuf(const struct sock *sk) +static bool tcp_should_expand_sndbuf(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); @@ -5392,8 +5392,18 @@ static bool tcp_should_expand_sndbuf(const struct sock *sk) return false; /* If we are under global TCP memory pressure, do not expand. */ - if (tcp_under_memory_pressure(sk)) + if (tcp_under_memory_pressure(sk)) { + int unused_mem = sk_unused_reserved_mem(sk); + + /* Adjust sndbuf according to reserved mem. But make sure + * it never goes below SOCK_MIN_SNDBUF. + * See sk_stream_moderate_sndbuf() for more details. + */ + if (unused_mem > SOCK_MIN_SNDBUF) + WRITE_ONCE(sk->sk_sndbuf, unused_mem); + return false; + } /* If we are under soft global TCP memory pressure, do not expand. */ if (sk_memory_allocated(sk) >= sk_prot_mem_limits(sk, 0)) From patchwork Mon Sep 27 18:25:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12520593 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C55B6C433F5 for ; Mon, 27 Sep 2021 18:25:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A75E260F6D for ; Mon, 27 Sep 2021 18:25:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236222AbhI0S1P (ORCPT ); Mon, 27 Sep 2021 14:27:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236205AbhI0S1N (ORCPT ); Mon, 27 Sep 2021 14:27:13 -0400 Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCE2FC061770 for ; Mon, 27 Sep 2021 11:25:32 -0700 (PDT) Received: by mail-qk1-x74a.google.com with SMTP id v14-20020a05620a0f0e00b0043355ed67d1so74837941qkl.7 for ; Mon, 27 Sep 2021 11:25:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=GqanaaDr8VJLk6JgbwW4zip5DujZM1l8logVEBUxoME=; b=TrPoRYeKTlmu+V5XdhcD110XtcW/ajFWWuVuYMwy043mLYFJu/4us/sNNOQbFrEHaZ 4hYzxnz7Pu2Yd46EuiJ89SQHxwjjjcbnnlq3V4VHRx799gNB4TN2GBbOOzVafQtsH8Vo VCfeCcK3N6uJxh/pvqguN0/7uem7l+DjRPq5AT1da9hqR7wucwfRcpyqHOOTIaOq83Xe z1Lg5V2ksVTHFhUHztHJXFVrGXdICSClR/VVNIM6d5zw9U28g49ViphK0yRqsahlOQfh gPNRpNapnBXciV1lEt6XflOznx84Q/PdzxLy6MvpMsJF1eqXn1IfQx9xHdqzv0+FiAOZ 3GYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=GqanaaDr8VJLk6JgbwW4zip5DujZM1l8logVEBUxoME=; b=WCG98x7oC2JGGfoh3QI+vlQHHXXE9Nqkoa0SmSXuDf+hw8njG529edxXjHCUYmwYJ2 HBnA9Zu6JzGgtfwzP+Ia5ff7y1mJa2bk6yRdE6sWImS4o2psOzFkVYxiM/1PEOqP2YAC yl2MHDhYApy/ZkRreZwqqmtGb+eS7p0krQrFhTEoRH2j1XmRgFxqcXU7Py1T/ivVHl4C YQxHeiMbperAhNXvGT1Ngsco+VzQGC8wA8waByh05MKiXSLNhsJy/WMjKRqjc78UBDjz Jmu+Fq5BDW8L9Oit5BOhVYzvnauaCoTFTn6NW1UTZ1Oblc83MAgLC4l16w8JB/OC/TZz fxFw== X-Gm-Message-State: AOAM530+OR0i1/Yo/zw9KDYCg0KvysOYFbu+zB2HTPfISB4Bf+LTZH+Z G3XgH7Y0qmvQLaNsQFmScZM75vVx+u8= X-Google-Smtp-Source: ABdhPJz/ci2ClxMv+0Kj+l3ejgziOn4gfqLEeROgiPfTqLD+6luLebl6ANuy6bu8XnrN8vTpExnSZkW/OPU= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:889:3fd7:84f6:f39c]) (user=weiwan job=sendgmr) by 2002:a05:6214:1233:: with SMTP id p19mr1196281qvv.20.1632767131985; Mon, 27 Sep 2021 11:25:31 -0700 (PDT) Date: Mon, 27 Sep 2021 11:25:23 -0700 In-Reply-To: <20210927182523.2704818-1-weiwan@google.com> Message-Id: <20210927182523.2704818-4-weiwan@google.com> Mime-Version: 1.0 References: <20210927182523.2704818-1-weiwan@google.com> X-Mailer: git-send-email 2.33.0.685.g46640cef36-goog Subject: [PATCH net-next 3/3] tcp: adjust rcv_ssthresh according to sk_reserved_mem From: Wei Wang To: "David S . Miller" , netdev@vger.kernel.org, Jakub Kicinski Cc: Shakeel Butt , Eric Dumazet Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org When user sets SO_RESERVE_MEM socket option, in order to utilize the reserved memory when in memory pressure state, we adjust rcv_ssthresh according to the available reserved memory for the socket, instead of using 4 * advmss always. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/tcp.h | 11 +++++++++++ net/ipv4/tcp_input.c | 12 ++++++++++-- net/ipv4/tcp_output.c | 3 +-- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 3166dc15d7d6..27743a97d6cb 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1418,6 +1418,17 @@ static inline int tcp_full_space(const struct sock *sk) return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf)); } +static inline void tcp_adjust_rcv_ssthresh(struct sock *sk) +{ + int unused_mem = sk_unused_reserved_mem(sk); + struct tcp_sock *tp = tcp_sk(sk); + + tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss); + if (unused_mem) + tp->rcv_ssthresh = max_t(u32, tp->rcv_ssthresh, + tcp_win_from_space(sk, unused_mem)); +} + void tcp_cleanup_rbuf(struct sock *sk, int copied); /* We provision sk_rcvbuf around 200% of sk_rcvlowat. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a7611256f235..b79a571a752e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -500,8 +500,11 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb, room = min_t(int, tp->window_clamp, tcp_space(sk)) - tp->rcv_ssthresh; + if (room <= 0) + return; + /* Check #1 */ - if (room > 0 && !tcp_under_memory_pressure(sk)) { + if (!tcp_under_memory_pressure(sk)) { unsigned int truesize = truesize_adjust(adjust, skb); int incr; @@ -518,6 +521,11 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb, tp->rcv_ssthresh += min(room, incr); inet_csk(sk)->icsk_ack.quick |= 1; } + } else { + /* Under pressure: + * Adjust rcv_ssthresh according to reserved mem + */ + tcp_adjust_rcv_ssthresh(sk); } } @@ -5346,7 +5354,7 @@ static int tcp_prune_queue(struct sock *sk) if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) tcp_clamp_window(sk); else if (tcp_under_memory_pressure(sk)) - tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss); + tcp_adjust_rcv_ssthresh(sk); if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) return 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 6d72f3ea48c4..062d6cf13d06 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2969,8 +2969,7 @@ u32 __tcp_select_window(struct sock *sk) icsk->icsk_ack.quick = 0; if (tcp_under_memory_pressure(sk)) - tp->rcv_ssthresh = min(tp->rcv_ssthresh, - 4U * tp->advmss); + tcp_adjust_rcv_ssthresh(sk); /* free_space might become our new window, make sure we don't * increase it due to wscale.