diff mbox series

[net] tcp: Adjust clamping window for applications specifying SO_RCVBUF

Message ID 20240725215542.894348-1-quic_subashab@quicinc.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: Adjust clamping window for applications specifying SO_RCVBUF | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 273 this patch: 273
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 2 maintainers not CCed: dsahern@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 281 this patch: 281
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 282 this patch: 282
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-26--15-00 (tests: 707)

Commit Message

Subash Abhinov Kasiviswanathan (KS) July 25, 2024, 9:55 p.m. UTC
tp->scaling_ratio is not updated based on skb->len/skb->truesize once
SO_RCVBUF is set leading to the maximum window scaling to be 25% of
rcvbuf after
commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
and 50% of rcvbuf after
commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
50% tries to emulate the behavior of older kernels using
sysctl_tcp_adv_win_scale with default value.

Systems which were using a different values of sysctl_tcp_adv_win_scale
in older kernels ended up seeing reduced download speeds in certain
cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
While the sysctl scheme is no longer acceptable, the value of 50% is
a bit conservative when the skb->len/skb->truesize ratio is later
determined to be ~0.66.

Applications not specifying SO_RCVBUF update the window scaling and
the receiver buffer every time data is copied to userspace. This
computation is now used for applications setting SO_RCVBUF to update
the maximum window scaling while ensuring that the receive buffer
is within the application specified limit.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
---
 net/ipv4/tcp_input.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

Comments

Eric Dumazet July 26, 2024, 9:40 a.m. UTC | #1
On Thu, Jul 25, 2024 at 11:55 PM Subash Abhinov Kasiviswanathan
<quic_subashab@quicinc.com> wrote:
>
> tp->scaling_ratio is not updated based on skb->len/skb->truesize once
> SO_RCVBUF is set leading to the maximum window scaling to be 25% of
> rcvbuf after
> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> and 50% of rcvbuf after
> commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio").
> 50% tries to emulate the behavior of older kernels using
> sysctl_tcp_adv_win_scale with default value.
>
> Systems which were using a different values of sysctl_tcp_adv_win_scale
> in older kernels ended up seeing reduced download speeds in certain
> cases as covered in https://lists.openwall.net/netdev/2024/05/15/13
> While the sysctl scheme is no longer acceptable, the value of 50% is
> a bit conservative when the skb->len/skb->truesize ratio is later
> determined to be ~0.66.
>
> Applications not specifying SO_RCVBUF update the window scaling and
> the receiver buffer every time data is copied to userspace. This
> computation is now used for applications setting SO_RCVBUF to update
> the maximum window scaling while ensuring that the receive buffer
> is within the application specified limit.
>
> Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
> ---
>  net/ipv4/tcp_input.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 454362e359da..c8fb029a15a4 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk)
>          * <prev RTT . ><current RTT .. ><next RTT .... >
>          */
>
> -       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
> -           !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> +       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
>                 u64 rcvwin, grow;
>                 int rcvbuf;
>
> @@ -771,12 +770,24 @@ void tcp_rcv_space_adjust(struct sock *sk)
>
>                 rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
>                                READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
> -               if (rcvbuf > sk->sk_rcvbuf) {
> -                       WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
> +               if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> +                       if (rcvbuf > sk->sk_rcvbuf) {
> +                               WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
>
> -                       /* Make the window clamp follow along.  */
> -                       WRITE_ONCE(tp->window_clamp,
> -                                  tcp_win_from_space(sk, rcvbuf));
> +                               /* Make the window clamp follow along.  */
> +                               WRITE_ONCE(tp->window_clamp,
> +                                          tcp_win_from_space(sk, rcvbuf));
> +                       }
> +               } else {
> +                       /* Make the window clamp follow along while being bounded
> +                        * by SO_RCVBUF.
> +                        */
> +                       if (rcvbuf <= sk->sk_rcvbuf) {

I do not really understand this part.
I am guessing this test will often be false and your problem won't be fixed.
You do not handle all  sysctl_tcp_adv_win_scale values (positive and negative)

I would instead not use "if (rcvbuf <= sk->sk_rcvbuf) {"

and instead :

else {
      int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));

      if (clamp > tp->window_clamp)
            WRITE_ONCE(tp->window_clamp, clamp);
}
Subash Abhinov Kasiviswanathan (KS) July 26, 2024, 8:43 p.m. UTC | #2
On 7/26/2024 3:40 AM, Eric Dumazet wrote:
> On Thu, Jul 25, 2024 at 11:55 PM Subash Abhinov Kasiviswanathan
> <quic_subashab@quicinc.com> wrote:
>>
>>           */
>>
>> -       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
>> -           !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
>> +       if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
>>                  u64 rcvwin, grow;
>>                  int rcvbuf;
>>
>> @@ -771,12 +770,24 @@ void tcp_rcv_space_adjust(struct sock *sk)
>>
>>                  rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
>>                                 READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
>> -               if (rcvbuf > sk->sk_rcvbuf) {
>> -                       WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
>> +               if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
>> +                       if (rcvbuf > sk->sk_rcvbuf) {
>> +                               WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
>>
>> -                       /* Make the window clamp follow along.  */
>> -                       WRITE_ONCE(tp->window_clamp,
>> -                                  tcp_win_from_space(sk, rcvbuf));
>> +                               /* Make the window clamp follow along.  */
>> +                               WRITE_ONCE(tp->window_clamp,
>> +                                          tcp_win_from_space(sk, rcvbuf));
>> +                       }
>> +               } else {
>> +                       /* Make the window clamp follow along while being bounded
>> +                        * by SO_RCVBUF.
>> +                        */
>> +                       if (rcvbuf <= sk->sk_rcvbuf) {
> 
> I do not really understand this part.
> I am guessing this test will often be false and your problem won't be fixed.
> You do not handle all  sysctl_tcp_adv_win_scale values (positive and negative)
> 
> I would instead not use "if (rcvbuf <= sk->sk_rcvbuf) {"
> 
> and instead :
> 
> else {
>        int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf));
> 
>        if (clamp > tp->window_clamp)
>              WRITE_ONCE(tp->window_clamp, clamp);
> }
Thanks Eric, I've updated this in v2.
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 454362e359da..c8fb029a15a4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -754,8 +754,7 @@  void tcp_rcv_space_adjust(struct sock *sk)
 	 * <prev RTT . ><current RTT .. ><next RTT .... >
 	 */
 
-	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) &&
-	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) {
 		u64 rcvwin, grow;
 		int rcvbuf;
 
@@ -771,12 +770,24 @@  void tcp_rcv_space_adjust(struct sock *sk)
 
 		rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin),
 			       READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2]));
-		if (rcvbuf > sk->sk_rcvbuf) {
-			WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
+		if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
+			if (rcvbuf > sk->sk_rcvbuf) {
+				WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
 
-			/* Make the window clamp follow along.  */
-			WRITE_ONCE(tp->window_clamp,
-				   tcp_win_from_space(sk, rcvbuf));
+				/* Make the window clamp follow along.  */
+				WRITE_ONCE(tp->window_clamp,
+					   tcp_win_from_space(sk, rcvbuf));
+			}
+		} else {
+			/* Make the window clamp follow along while being bounded
+			 * by SO_RCVBUF.
+			 */
+			if (rcvbuf <= sk->sk_rcvbuf) {
+				int clamp = tcp_win_from_space(sk, rcvbuf);
+
+				if (clamp > tp->window_clamp)
+					WRITE_ONCE(tp->window_clamp, clamp);
+			}
 		}
 	}
 	tp->rcvq_space.space = copied;