Message ID | 20240606150307.78648-2-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: show the right value of TCP_MIB_RTOMIN | expand |
On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which > is true if without any method to tune rto min. In 2007, we got a way to > tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN > in /proc/net/snmp still shows the same, namely, 200. > > As RFC 1213 said: > "tcpRtoMin > ... > The minimum value permitted by a TCP implementation for the > retransmission timeout, measured in milliseconds." > > Since the lower bound of rto can be changed, we should accordingly > adjust the output of /proc/net/snmp. > > Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.") > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > include/net/tcp.h | 2 ++ > net/ipv4/metrics.c | 4 ++++ > net/ipv4/proc.c | 3 +++ > 3 files changed, 9 insertions(+) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index a70fc39090fe..a111a5d151b7 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX); > extern int sysctl_tcp_max_orphans; > extern long sysctl_tcp_mem[3]; > > +extern unsigned int tcp_rtax_rtomin; > + > #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */ > #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ > #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ > diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c > index 8ddac1f595ed..61ca949b8281 100644 > --- a/net/ipv4/metrics.c > +++ b/net/ipv4/metrics.c > @@ -7,6 +7,8 @@ > #include <net/net_namespace.h> > #include <net/tcp.h> > > +unsigned int tcp_rtax_rtomin __read_mostly; > + > static int ip_metrics_convert(struct nlattr *fc_mx, > int fc_mx_len, u32 *metrics, > struct netlink_ext_ack *extack) > @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx, > if (ecn_ca) > metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; > > + tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1]; > + > return 0; > } > > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c > index 6c4664c681ca..ce387081a3c9 100644 > --- a/net/ipv4/proc.c > +++ b/net/ipv4/proc.c > @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) > /* MaxConn field is signed, RFC 2012 */ > if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN) > seq_printf(seq, " %ld", buff[i]); > + else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN) > + seq_printf(seq, " %lu", > + tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]); > else > seq_printf(seq, " %lu", buff[i]); > } > -- > 2.37.3 > I do not think we can accept this patch. 1) You might have missed that we have multiple network namespaces.. 2) You might have missed that we can have thousands of routes, with different metrics. I would leave /proc/net/snmp as it is, because no value can possibly be right.
On Thu, Jun 6, 2024 at 11:14 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which > > is true if without any method to tune rto min. In 2007, we got a way to > > tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN > > in /proc/net/snmp still shows the same, namely, 200. > > > > As RFC 1213 said: > > "tcpRtoMin > > ... > > The minimum value permitted by a TCP implementation for the > > retransmission timeout, measured in milliseconds." > > > > Since the lower bound of rto can be changed, we should accordingly > > adjust the output of /proc/net/snmp. > > > > Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.") > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > include/net/tcp.h | 2 ++ > > net/ipv4/metrics.c | 4 ++++ > > net/ipv4/proc.c | 3 +++ > > 3 files changed, 9 insertions(+) > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > index a70fc39090fe..a111a5d151b7 100644 > > --- a/include/net/tcp.h > > +++ b/include/net/tcp.h > > @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX); > > extern int sysctl_tcp_max_orphans; > > extern long sysctl_tcp_mem[3]; > > > > +extern unsigned int tcp_rtax_rtomin; > > + > > #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */ > > #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ > > #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ > > diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c > > index 8ddac1f595ed..61ca949b8281 100644 > > --- a/net/ipv4/metrics.c > > +++ b/net/ipv4/metrics.c > > @@ -7,6 +7,8 @@ > > #include <net/net_namespace.h> > > #include <net/tcp.h> > > > > +unsigned int tcp_rtax_rtomin __read_mostly; > > + > > static int ip_metrics_convert(struct nlattr *fc_mx, > > int fc_mx_len, u32 *metrics, > > struct netlink_ext_ack *extack) > > @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx, > > if (ecn_ca) > > metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; > > > > + tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1]; > > + > > return 0; > > } > > > > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c > > index 6c4664c681ca..ce387081a3c9 100644 > > --- a/net/ipv4/proc.c > > +++ b/net/ipv4/proc.c > > @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) > > /* MaxConn field is signed, RFC 2012 */ > > if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN) > > seq_printf(seq, " %ld", buff[i]); > > + else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN) > > + seq_printf(seq, " %lu", > > + tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]); > > else > > seq_printf(seq, " %lu", buff[i]); > > } > > -- > > 2.37.3 > > > > I do not think we can accept this patch. > > 1) You might have missed that we have multiple network namespaces.. Thanks for the review. For this patch, indeed, I think I need to consider namespaces... For the other one, I did. > > 2) You might have missed that we can have thousands of routes, with > different metrics. Oh, that's really complicated... > > I would leave /proc/net/snmp as it is, because no value can possibly be right. It cannot be right if someone tries to set rto min by 'ip route' or 'sysctl -w'. Thanks, Jason
On Thu, Jun 6, 2024 at 5:41 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Jun 6, 2024 at 11:14 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which > > > is true if without any method to tune rto min. In 2007, we got a way to > > > tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN > > > in /proc/net/snmp still shows the same, namely, 200. > > > > > > As RFC 1213 said: > > > "tcpRtoMin > > > ... > > > The minimum value permitted by a TCP implementation for the > > > retransmission timeout, measured in milliseconds." > > > > > > Since the lower bound of rto can be changed, we should accordingly > > > adjust the output of /proc/net/snmp. > > > > > > Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.") > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > include/net/tcp.h | 2 ++ > > > net/ipv4/metrics.c | 4 ++++ > > > net/ipv4/proc.c | 3 +++ > > > 3 files changed, 9 insertions(+) > > > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > > index a70fc39090fe..a111a5d151b7 100644 > > > --- a/include/net/tcp.h > > > +++ b/include/net/tcp.h > > > @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX); > > > extern int sysctl_tcp_max_orphans; > > > extern long sysctl_tcp_mem[3]; > > > > > > +extern unsigned int tcp_rtax_rtomin; > > > + > > > #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */ > > > #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ > > > #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ > > > diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c > > > index 8ddac1f595ed..61ca949b8281 100644 > > > --- a/net/ipv4/metrics.c > > > +++ b/net/ipv4/metrics.c > > > @@ -7,6 +7,8 @@ > > > #include <net/net_namespace.h> > > > #include <net/tcp.h> > > > > > > +unsigned int tcp_rtax_rtomin __read_mostly; > > > + > > > static int ip_metrics_convert(struct nlattr *fc_mx, > > > int fc_mx_len, u32 *metrics, > > > struct netlink_ext_ack *extack) > > > @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx, > > > if (ecn_ca) > > > metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; > > > > > > + tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1]; > > > + > > > return 0; > > > } > > > > > > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c > > > index 6c4664c681ca..ce387081a3c9 100644 > > > --- a/net/ipv4/proc.c > > > +++ b/net/ipv4/proc.c > > > @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) > > > /* MaxConn field is signed, RFC 2012 */ > > > if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN) > > > seq_printf(seq, " %ld", buff[i]); > > > + else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN) > > > + seq_printf(seq, " %lu", > > > + tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]); > > > else > > > seq_printf(seq, " %lu", buff[i]); > > > } > > > -- > > > 2.37.3 > > > > > > > I do not think we can accept this patch. > > > > 1) You might have missed that we have multiple network namespaces.. > > Thanks for the review. > > For this patch, indeed, I think I need to consider namespaces... > For the other one, I did. > > > > > 2) You might have missed that we can have thousands of routes, with > > different metrics. > > Oh, that's really complicated... > > > > > I would leave /proc/net/snmp as it is, because no value can possibly be right. > > It cannot be right if someone tries to set rto min by 'ip route' or 'sysctl -w'. > Or eBPF. There is no way a /proc/net/snmp value can be right. Why would anyone care, since thousands of TCP sockets can all have different values ?
On Thu, Jun 6, 2024 at 11:43 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Jun 6, 2024 at 5:41 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Jun 6, 2024 at 11:14 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Thu, Jun 6, 2024 at 5:03 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > TCP_MIB_RTOMIN implemented in tcp mib definitions is always 200, which > > > > is true if without any method to tune rto min. In 2007, we got a way to > > > > tune it globaly when setting rto_min route option, but TCP_MIB_RTOMIN > > > > in /proc/net/snmp still shows the same, namely, 200. > > > > > > > > As RFC 1213 said: > > > > "tcpRtoMin > > > > ... > > > > The minimum value permitted by a TCP implementation for the > > > > retransmission timeout, measured in milliseconds." > > > > > > > > Since the lower bound of rto can be changed, we should accordingly > > > > adjust the output of /proc/net/snmp. > > > > > > > > Fixes: 05bb1fad1cde ("[TCP]: Allow minimum RTO to be configurable via routing metrics.") > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > --- > > > > include/net/tcp.h | 2 ++ > > > > net/ipv4/metrics.c | 4 ++++ > > > > net/ipv4/proc.c | 3 +++ > > > > 3 files changed, 9 insertions(+) > > > > > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > > > index a70fc39090fe..a111a5d151b7 100644 > > > > --- a/include/net/tcp.h > > > > +++ b/include/net/tcp.h > > > > @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX); > > > > extern int sysctl_tcp_max_orphans; > > > > extern long sysctl_tcp_mem[3]; > > > > > > > > +extern unsigned int tcp_rtax_rtomin; > > > > + > > > > #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */ > > > > #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ > > > > #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ > > > > diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c > > > > index 8ddac1f595ed..61ca949b8281 100644 > > > > --- a/net/ipv4/metrics.c > > > > +++ b/net/ipv4/metrics.c > > > > @@ -7,6 +7,8 @@ > > > > #include <net/net_namespace.h> > > > > #include <net/tcp.h> > > > > > > > > +unsigned int tcp_rtax_rtomin __read_mostly; > > > > + > > > > static int ip_metrics_convert(struct nlattr *fc_mx, > > > > int fc_mx_len, u32 *metrics, > > > > struct netlink_ext_ack *extack) > > > > @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx, > > > > if (ecn_ca) > > > > metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; > > > > > > > > + tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1]; > > > > + > > > > return 0; > > > > } > > > > > > > > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c > > > > index 6c4664c681ca..ce387081a3c9 100644 > > > > --- a/net/ipv4/proc.c > > > > +++ b/net/ipv4/proc.c > > > > @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) > > > > /* MaxConn field is signed, RFC 2012 */ > > > > if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN) > > > > seq_printf(seq, " %ld", buff[i]); > > > > + else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMTIN) > > > > + seq_printf(seq, " %lu", > > > > + tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]); > > > > else > > > > seq_printf(seq, " %lu", buff[i]); > > > > } > > > > -- > > > > 2.37.3 > > > > > > > > > > I do not think we can accept this patch. > > > > > > 1) You might have missed that we have multiple network namespaces.. > > > > Thanks for the review. > > > > For this patch, indeed, I think I need to consider namespaces... > > For the other one, I did. > > > > > > > > 2) You might have missed that we can have thousands of routes, with > > > different metrics. > > > > Oh, that's really complicated... > > > > > > > > I would leave /proc/net/snmp as it is, because no value can possibly be right. > > > > It cannot be right if someone tries to set rto min by 'ip route' or 'sysctl -w'. > > > > Or eBPF. > > There is no way a /proc/net/snmp value can be right. > > Why would anyone care, since thousands of TCP sockets can all have > different values ? I think you're right because there are too many kinds of situations. For kernel developers or knowledgable admins, they are aware of this 'issue' (If I remember correctly, there is one comment in iproute saying this field is obsolete.). For some normal users, they are confused because someone reported this 'issue' to me. I'll drop this series. There's no good way to solve it. Thanks, Eric.
diff --git a/include/net/tcp.h b/include/net/tcp.h index a70fc39090fe..a111a5d151b7 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -260,6 +260,8 @@ static_assert((1 << ATO_BITS) > TCP_DELACK_MAX); extern int sysctl_tcp_max_orphans; extern long sysctl_tcp_mem[3]; +extern unsigned int tcp_rtax_rtomin; + #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */ #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ #define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c index 8ddac1f595ed..61ca949b8281 100644 --- a/net/ipv4/metrics.c +++ b/net/ipv4/metrics.c @@ -7,6 +7,8 @@ #include <net/net_namespace.h> #include <net/tcp.h> +unsigned int tcp_rtax_rtomin __read_mostly; + static int ip_metrics_convert(struct nlattr *fc_mx, int fc_mx_len, u32 *metrics, struct netlink_ext_ack *extack) @@ -60,6 +62,8 @@ static int ip_metrics_convert(struct nlattr *fc_mx, if (ecn_ca) metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; + tcp_rtax_rtomin = metrics[RTAX_RTO_MIN - 1]; + return 0; } diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 6c4664c681ca..ce387081a3c9 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -428,6 +428,9 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void *v) /* MaxConn field is signed, RFC 2012 */ if (snmp4_tcp_list[i].entry == TCP_MIB_MAXCONN) seq_printf(seq, " %ld", buff[i]); + else if (snmp4_tcp_list[i].entry == TCP_MIB_RTOMIN) + seq_printf(seq, " %lu", + tcp_rtax_rtomin ? tcp_rtax_rtomin : buff[i]); else seq_printf(seq, " %lu", buff[i]); }