Message ID | 20240808230640.1384785-1-quic_subashab@quicinc.com (mailing list archive) |
---|---|
State | Accepted |
Commit | a2cbb1603943281a604f5adc48079a148db5cb0d |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: Update window clamping condition | expand |
On 8/9/24 01:06, Subash Abhinov Kasiviswanathan wrote: > This patch is based on the discussions between Neal Cardwell and > Eric Dumazet in the link > https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/ > > It was correctly pointed out that tp->window_clamp would not be > updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if > (copied <= tp->rcvq_space.space). While it is expected for most > setups to leave the sysctl enabled, the latter condition may > not end up hitting depending on the TCP receive queue size and > the pattern of arriving data. > > The updated check should be hit only on initial MSS update from > TCP_MIN_MSS to measured MSS value and subsequently if there was > an update to a larger value. > > Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF") > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> > --- > net/ipv4/tcp_input.c | 28 ++++++++++++---------------- > 1 file changed, 12 insertions(+), 16 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index e2b9583ed96a..e37488d3453f 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -238,9 +238,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) > */ > if (unlikely(len != icsk->icsk_ack.rcv_mss)) { > u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE; > + u8 old_ratio = tcp_sk(sk)->scaling_ratio; > > do_div(val, skb->truesize); > tcp_sk(sk)->scaling_ratio = val ? val : 1; > + > + if (old_ratio != tcp_sk(sk)->scaling_ratio) Should we do this only for sk->sk_userlocks & SOCK_RCVBUF_LOCK ? I think that explicitly checking for an ratio increase would be safer: even if len increased I guess the ratio could decrease in some edge scenarios. Thanks! Paolo
On Tue, Aug 13, 2024 at 5:27 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 8/9/24 01:06, Subash Abhinov Kasiviswanathan wrote: > > This patch is based on the discussions between Neal Cardwell and > > Eric Dumazet in the link > > https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/ > > > > It was correctly pointed out that tp->window_clamp would not be > > updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if > > (copied <= tp->rcvq_space.space). While it is expected for most > > setups to leave the sysctl enabled, the latter condition may > > not end up hitting depending on the TCP receive queue size and > > the pattern of arriving data. > > > > The updated check should be hit only on initial MSS update from > > TCP_MIN_MSS to measured MSS value and subsequently if there was > > an update to a larger value. > > > > Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF") > > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> > > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> > > --- Acked-by: Neal Cardwell <ncardwell@google.com> > > net/ipv4/tcp_input.c | 28 ++++++++++++---------------- > > 1 file changed, 12 insertions(+), 16 deletions(-) > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index e2b9583ed96a..e37488d3453f 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -238,9 +238,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) > > */ > > if (unlikely(len != icsk->icsk_ack.rcv_mss)) { > > u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE; > > + u8 old_ratio = tcp_sk(sk)->scaling_ratio; > > > > do_div(val, skb->truesize); > > tcp_sk(sk)->scaling_ratio = val ? val : 1; > > + > > + if (old_ratio != tcp_sk(sk)->scaling_ratio) > > Should we do this only for sk->sk_userlocks & SOCK_RCVBUF_LOCK ? No, I'm pretty sure all TCP sockets need to do this, regardless of their (or sk->sk_userlocks & SOCK_RCVBUF_LOCK) status, because no matter whether sk->sk_rcvbuf is autotuned or locked to a fixed value, every socket ideally should have an up-to-date tp->scaling_ratio value so that it can accurately translate the sk->sk_rcvbuf into a receive window. This is basically what I was arguing here: https://lore.kernel.org/netdev/CADVnQynKT7QEhm1WksrNQv3BbYhTd=wWaxueybPBQDPtXbJu-A@mail.gmail.com/ > I think that explicitly checking for an ratio increase would be safer: > even if len increased I guess the ratio could decrease in some edge > scenarios. I agree that the ratio could decrease in some edge scenarios (e.g., if traffic shifts to arriving via NIC with a less space-efficient buffer allocation strategy). But, in such scenarios, wouldn't it be better to have the window clamp to adjust to the correct value? FWIW, this current version of the patch looks good to me. :-) Thanks! neal
Hello: This patch was applied to netdev/net.git (main) by David S. Miller <davem@davemloft.net>: On Thu, 8 Aug 2024 16:06:40 -0700 you wrote: > This patch is based on the discussions between Neal Cardwell and > Eric Dumazet in the link > https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/ > > It was correctly pointed out that tp->window_clamp would not be > updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if > (copied <= tp->rcvq_space.space). While it is expected for most > setups to leave the sysctl enabled, the latter condition may > not end up hitting depending on the TCP receive queue size and > the pattern of arriving data. > > [...] Here is the summary with links: - [net] tcp: Update window clamping condition https://git.kernel.org/netdev/net/c/a2cbb1603943 You are awesome, thank you!
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e2b9583ed96a..e37488d3453f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -238,9 +238,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) */ if (unlikely(len != icsk->icsk_ack.rcv_mss)) { u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE; + u8 old_ratio = tcp_sk(sk)->scaling_ratio; do_div(val, skb->truesize); tcp_sk(sk)->scaling_ratio = val ? val : 1; + + if (old_ratio != tcp_sk(sk)->scaling_ratio) + WRITE_ONCE(tcp_sk(sk)->window_clamp, + tcp_win_from_space(sk, sk->sk_rcvbuf)); } icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, tcp_sk(sk)->advmss); @@ -754,7 +759,8 @@ void tcp_rcv_space_adjust(struct sock *sk) * <prev RTT . ><current RTT .. ><next RTT .... > */ - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && + !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { u64 rcvwin, grow; int rcvbuf; @@ -770,22 +776,12 @@ void tcp_rcv_space_adjust(struct sock *sk) rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); - if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { - if (rcvbuf > sk->sk_rcvbuf) { - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); - - /* Make the window clamp follow along. */ - WRITE_ONCE(tp->window_clamp, - tcp_win_from_space(sk, rcvbuf)); - } - } else { - /* Make the window clamp follow along while being bounded - * by SO_RCVBUF. - */ - int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf)); + if (rcvbuf > sk->sk_rcvbuf) { + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); - if (clamp > tp->window_clamp) - WRITE_ONCE(tp->window_clamp, clamp); + /* Make the window clamp follow along. */ + WRITE_ONCE(tp->window_clamp, + tcp_win_from_space(sk, rcvbuf)); } } tp->rcvq_space.space = copied;