Message ID | 1697056244-21888-1-git-send-email-haiyangz@microsoft.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 562b1fdf061bff9394ccd884456ed1173c224fdc |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v3] tcp: Set pingpong threshold via sysctl | expand |
On Wed, Oct 11, 2023 at 01:30:44PM -0700, Haiyang Zhang wrote: > TCP pingpong threshold is 1 by default. But some applications, like SQL DB > may prefer a higher pingpong threshold to activate delayed acks in quick > ack mode for better performance. > > The pingpong threshold and related code were changed to 3 in the year > 2019 in: > commit 4a41f453bedf ("tcp: change pingpong threshold to 3") > And reverted to 1 in the year 2022 in: > commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") > > There is no single value that fits all applications. > Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for > optimal performance based on the application needs. > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> > --- > v3: Updated doc as suggested by Neal Cardwell. > Updated variable location in struct netns_ipv4 as suggested by Kuniyuki > Iwashima. > > v2: Make it per-namesapce setting, and other updates suggested by Neal Cardwell, > and Kuniyuki Iwashima. Thanks, this looks clean to me. It seems to address the review of v2. And keeps the knob as syctl as discussed in v2. Reviewed-by: Simon Horman <horms@kernel.org>
On Wed, Oct 11, 2023 at 10:31 PM Haiyang Zhang <haiyangz@microsoft.com> wrote: > > TCP pingpong threshold is 1 by default. But some applications, like SQL DB > may prefer a higher pingpong threshold to activate delayed acks in quick > ack mode for better performance. > ... > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index f207712eece1..7d0fe76d56ef 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -170,10 +170,10 @@ static void tcp_event_data_sent(struct tcp_sock *tp, > tp->lsndtime = now; > > /* If it is a reply for ato after last received > - * packet, enter pingpong mode. > + * packet, increase pingpong count. > */ > if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) > - inet_csk_enter_pingpong_mode(sk); > + inet_csk_inc_pingpong_cnt(sk); > } > > /* Account for an ACK we sent. */ OK, but I do not think we solved the fundamental problem of using jiffies for this heuristic, especially for HZ=100 or HZ=250 builds. Reviewed-by: Eric Dumazet <edumazet@google.com>
On Wed, Oct 11, 2023 at 4:31 PM Haiyang Zhang <haiyangz@microsoft.com> wrote: > > TCP pingpong threshold is 1 by default. But some applications, like SQL DB > may prefer a higher pingpong threshold to activate delayed acks in quick > ack mode for better performance. > > The pingpong threshold and related code were changed to 3 in the year > 2019 in: > commit 4a41f453bedf ("tcp: change pingpong threshold to 3") > And reverted to 1 in the year 2022 in: > commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") > > There is no single value that fits all applications. > Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for > optimal performance based on the application needs. > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> > --- > v3: Updated doc as suggested by Neal Cardwell. > Updated variable location in struct netns_ipv4 as suggested by Kuniyuki > Iwashima. > > v2: Make it per-namesapce setting, and other updates suggested by Neal Cardwell, > and Kuniyuki Iwashima. > --- > Documentation/networking/ip-sysctl.rst | 13 +++++++++++++ > include/net/inet_connection_sock.h | 16 ++++++++++++---- > include/net/netns/ipv4.h | 2 ++ > net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ > net/ipv4/tcp_ipv4.c | 2 ++ > net/ipv4/tcp_output.c | 4 ++-- > 6 files changed, 39 insertions(+), 6 deletions(-) Acked-by: Neal Cardwell <ncardwell@google.com> Thanks! neal
From: Haiyang Zhang <haiyangz@microsoft.com> Date: Wed, 11 Oct 2023 13:30:44 -0700 > TCP pingpong threshold is 1 by default. But some applications, like SQL DB > may prefer a higher pingpong threshold to activate delayed acks in quick > ack mode for better performance. > > The pingpong threshold and related code were changed to 3 in the year > 2019 in: > commit 4a41f453bedf ("tcp: change pingpong threshold to 3") > And reverted to 1 in the year 2022 in: > commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") > > There is no single value that fits all applications. > Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for > optimal performance based on the application needs. > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Thanks!
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Wed, 11 Oct 2023 13:30:44 -0700 you wrote: > TCP pingpong threshold is 1 by default. But some applications, like SQL DB > may prefer a higher pingpong threshold to activate delayed acks in quick > ack mode for better performance. > > The pingpong threshold and related code were changed to 3 in the year > 2019 in: > commit 4a41f453bedf ("tcp: change pingpong threshold to 3") > And reverted to 1 in the year 2022 in: > commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") > > [...] Here is the summary with links: - [net-next,v3] tcp: Set pingpong threshold via sysctl https://git.kernel.org/netdev/net-next/c/562b1fdf061b You are awesome, thank you!
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index f7dfde3b09a9..e7ec9026e5db 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1183,6 +1183,19 @@ tcp_plb_cong_thresh - INTEGER Default: 128 +tcp_pingpong_thresh - INTEGER + The number of estimated data replies sent for estimated incoming data + requests that must happen before TCP considers that a connection is a + "ping-pong" (request-response) connection for which delayed + acknowledgments can provide benefits. + + This threshold is 1 by default, but some applications may need a higher + threshold for optimal performance. + + Possible Values: 1 - 255 + + Default: 1 + UDP variables ============= diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index d6d9d1c1985a..086d1193c9ef 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -328,11 +328,10 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb, struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu); -#define TCP_PINGPONG_THRESH 1 - static inline void inet_csk_enter_pingpong_mode(struct sock *sk) { - inet_csk(sk)->icsk_ack.pingpong = TCP_PINGPONG_THRESH; + inet_csk(sk)->icsk_ack.pingpong = + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); } static inline void inet_csk_exit_pingpong_mode(struct sock *sk) @@ -342,7 +341,16 @@ static inline void inet_csk_exit_pingpong_mode(struct sock *sk) static inline bool inet_csk_in_pingpong_mode(struct sock *sk) { - return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; + return inet_csk(sk)->icsk_ack.pingpong >= + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); +} + +static inline void inet_csk_inc_pingpong_cnt(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_ack.pingpong < U8_MAX) + icsk->icsk_ack.pingpong++; } static inline bool inet_csk_has_ulp(const struct sock *sk) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index d96d05b08819..73f43f699199 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -133,6 +133,8 @@ struct netns_ipv4 { u8 sysctl_tcp_migrate_req; u8 sysctl_tcp_comp_sack_nr; u8 sysctl_tcp_backlog_ack_defer; + u8 sysctl_tcp_pingpong_thresh; + int sysctl_tcp_reordering; u8 sysctl_tcp_retries1; u8 sysctl_tcp_retries2; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index e7f024d93572..f63a545a7374 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -1498,6 +1498,14 @@ static struct ctl_table ipv4_net_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, + { + .procname = "tcp_pingpong_thresh", + .data = &init_net.ipv4.sysctl_tcp_pingpong_thresh, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dou8vec_minmax, + .extra1 = SYSCTL_ONE, + }, { } }; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index a441740616d7..f603ad9307af 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -3288,6 +3288,8 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.sysctl_tcp_syn_linear_timeouts = 4; net->ipv4.sysctl_tcp_shrink_window = 0; + net->ipv4.sysctl_tcp_pingpong_thresh = 1; + return 0; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f207712eece1..7d0fe76d56ef 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -170,10 +170,10 @@ static void tcp_event_data_sent(struct tcp_sock *tp, tp->lsndtime = now; /* If it is a reply for ato after last received - * packet, enter pingpong mode. + * packet, increase pingpong count. */ if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) - inet_csk_enter_pingpong_mode(sk); + inet_csk_inc_pingpong_cnt(sk); } /* Account for an ACK we sent. */
TCP pingpong threshold is 1 by default. But some applications, like SQL DB may prefer a higher pingpong threshold to activate delayed acks in quick ack mode for better performance. The pingpong threshold and related code were changed to 3 in the year 2019 in: commit 4a41f453bedf ("tcp: change pingpong threshold to 3") And reverted to 1 in the year 2022 in: commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") There is no single value that fits all applications. Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for optimal performance based on the application needs. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> --- v3: Updated doc as suggested by Neal Cardwell. Updated variable location in struct netns_ipv4 as suggested by Kuniyuki Iwashima. v2: Make it per-namesapce setting, and other updates suggested by Neal Cardwell, and Kuniyuki Iwashima. --- Documentation/networking/ip-sysctl.rst | 13 +++++++++++++ include/net/inet_connection_sock.h | 16 ++++++++++++---- include/net/netns/ipv4.h | 2 ++ net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ net/ipv4/tcp_ipv4.c | 2 ++ net/ipv4/tcp_output.c | 4 ++-- 6 files changed, 39 insertions(+), 6 deletions(-)