diff mbox series

[net] tcp: Update window clamping condition

Message ID 20240808230640.1384785-1-quic_subashab@quicinc.com (mailing list archive)
State Accepted
Commit a2cbb1603943281a604f5adc48079a148db5cb0d
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: Update window clamping condition | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 30 this patch: 30
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-10--18-00 (tests: 707)

Commit Message

Subash Abhinov Kasiviswanathan (KS) Aug. 8, 2024, 11:06 p.m. UTC
This patch is based on the discussions between Neal Cardwell and
Eric Dumazet in the link
https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/

It was correctly pointed out that tp->window_clamp would not be
updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
(copied <= tp->rcvq_space.space). While it is expected for most
setups to leave the sysctl enabled, the latter condition may
not end up hitting depending on the TCP receive queue size and
the pattern of arriving data.

The updated check should be hit only on initial MSS update from
TCP_MIN_MSS to measured MSS value and subsequently if there was
an update to a larger value.

Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
---
 net/ipv4/tcp_input.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

Comments

Paolo Abeni Aug. 13, 2024, 9:26 a.m. UTC | #1
On 8/9/24 01:06, Subash Abhinov Kasiviswanathan wrote:
> This patch is based on the discussions between Neal Cardwell and
> Eric Dumazet in the link
> https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/
> 
> It was correctly pointed out that tp->window_clamp would not be
> updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
> (copied <= tp->rcvq_space.space). While it is expected for most
> setups to leave the sysctl enabled, the latter condition may
> not end up hitting depending on the TCP receive queue size and
> the pattern of arriving data.
> 
> The updated check should be hit only on initial MSS update from
> TCP_MIN_MSS to measured MSS value and subsequently if there was
> an update to a larger value.
> 
> Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF")
> Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
> ---
>   net/ipv4/tcp_input.c | 28 ++++++++++++----------------
>   1 file changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index e2b9583ed96a..e37488d3453f 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -238,9 +238,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
>   		 */
>   		if (unlikely(len != icsk->icsk_ack.rcv_mss)) {
>   			u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE;
> +			u8 old_ratio = tcp_sk(sk)->scaling_ratio;
>   
>   			do_div(val, skb->truesize);
>   			tcp_sk(sk)->scaling_ratio = val ? val : 1;
> +
> +			if (old_ratio != tcp_sk(sk)->scaling_ratio)

Should we do this only for sk->sk_userlocks & SOCK_RCVBUF_LOCK ?

I think that explicitly checking for an ratio increase would be safer: 
even if len increased I guess the ratio could decrease in some edge 
scenarios.

Thanks!

Paolo
Neal Cardwell Aug. 13, 2024, 3:23 p.m. UTC | #2
On Tue, Aug 13, 2024 at 5:27 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 8/9/24 01:06, Subash Abhinov Kasiviswanathan wrote:
> > This patch is based on the discussions between Neal Cardwell and
> > Eric Dumazet in the link
> > https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/
> >
> > It was correctly pointed out that tp->window_clamp would not be
> > updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
> > (copied <= tp->rcvq_space.space). While it is expected for most
> > setups to leave the sysctl enabled, the latter condition may
> > not end up hitting depending on the TCP receive queue size and
> > the pattern of arriving data.
> >
> > The updated check should be hit only on initial MSS update from
> > TCP_MIN_MSS to measured MSS value and subsequently if there was
> > an update to a larger value.
> >
> > Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF")
> > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
> > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
> > ---

Acked-by: Neal Cardwell <ncardwell@google.com>

> >   net/ipv4/tcp_input.c | 28 ++++++++++++----------------
> >   1 file changed, 12 insertions(+), 16 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index e2b9583ed96a..e37488d3453f 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -238,9 +238,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> >                */
> >               if (unlikely(len != icsk->icsk_ack.rcv_mss)) {
> >                       u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE;
> > +                     u8 old_ratio = tcp_sk(sk)->scaling_ratio;
> >
> >                       do_div(val, skb->truesize);
> >                       tcp_sk(sk)->scaling_ratio = val ? val : 1;
> > +
> > +                     if (old_ratio != tcp_sk(sk)->scaling_ratio)
>
> Should we do this only for sk->sk_userlocks & SOCK_RCVBUF_LOCK ?

No, I'm pretty sure all TCP sockets need to do this, regardless of
their (or sk->sk_userlocks & SOCK_RCVBUF_LOCK) status, because no
matter whether sk->sk_rcvbuf is autotuned or locked to a fixed value,
every socket ideally should have an up-to-date tp->scaling_ratio value
so that it can accurately translate the sk->sk_rcvbuf into a receive
window.

This is basically what I was arguing here:
  https://lore.kernel.org/netdev/CADVnQynKT7QEhm1WksrNQv3BbYhTd=wWaxueybPBQDPtXbJu-A@mail.gmail.com/

> I think that explicitly checking for an ratio increase would be safer:
> even if len increased I guess the ratio could decrease in some edge
> scenarios.

I agree that the ratio could decrease in some edge scenarios (e.g., if
traffic shifts to arriving via NIC with a less space-efficient buffer
allocation strategy). But, in such scenarios, wouldn't it be better to
have the window clamp to adjust to the correct value?

FWIW, this current version of the patch looks good to me. :-)

Thanks!

neal
patchwork-bot+netdevbpf@kernel.org Aug. 14, 2024, 10 a.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu,  8 Aug 2024 16:06:40 -0700 you wrote:
> This patch is based on the discussions between Neal Cardwell and
> Eric Dumazet in the link
> https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/
> 
> It was correctly pointed out that tp->window_clamp would not be
> updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
> (copied <= tp->rcvq_space.space). While it is expected for most
> setups to leave the sysctl enabled, the latter condition may
> not end up hitting depending on the TCP receive queue size and
> the pattern of arriving data.
> 
> [...]

Here is the summary with links:
  - [net] tcp: Update window clamping condition
    https://git.kernel.org/netdev/net/c/a2cbb1603943

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e2b9583ed96a..e37488d3453f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -238,9 +238,14 @@  static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 		 */
 		if (unlikely(len != icsk->icsk_ack.rcv_mss)) {
 			u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE;
+			u8 old_ratio = tcp_sk(sk)->scaling_ratio;
 
 			do_div(val, skb->truesize);
 			tcp_sk(sk)->scaling_ratio = val ? val : 1;
+
+			if (old_ratio != tcp_sk(sk)->scaling_ratio)
+				WRITE_ONCE(tcp_sk(sk)->window_clamp,
+					   tcp_win_from_space(sk, sk->sk_rcvbuf));
 		}
 		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
 					       tcp_sk(sk)->advmss);
@@ -754,7 +759,8 @@  void tcp_rcv_space_adjust(struct sock *sk)
 	 * <prev RTT . ><current RTT .. ><next RTT .... >
 	 */
 
-	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
+	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
+	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
 		u64 rcvwin, grow;
 		int rcvbuf;
 
@@ -770,22 +776,12 @@  void tcp_rcv_space_adjust(struct sock *sk)
 
 		rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
 			       READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
-		if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
-			if (rcvbuf > sk->sk_rcvbuf) {
-				WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
-
-				/* Make the window clamp follow along.  */
-				WRITE_ONCE(tp->window_clamp,
-					   tcp_win_from_space(sk, rcvbuf));
-			}
-		} else {
-			/* Make the window clamp follow along while being bounded
-			 * by SO_RCVBUF.
-			 */
-			int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
+		if (rcvbuf > sk->sk_rcvbuf) {
+			WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
 
-			if (clamp > tp->window_clamp)
-				WRITE_ONCE(tp->window_clamp, clamp);
+			/* Make the window clamp follow along.  */
+			WRITE_ONCE(tp->window_clamp,
+				   tcp_win_from_space(sk, rcvbuf));
 		}
 	}
 	tp->rcvq_space.space = copied;