diff mbox series

[v2,net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets

Message ID 20240529033104.33882-1-kerneljasonxing@gmail.com (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series [v2,net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4976 this patch: 4976
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 2 maintainers not CCed: 0x7f454c46@gmail.com heng.guo@windriver.com
netdev/build_clang success Errors and warnings before: 1009 this patch: 1009
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5245 this patch: 5245
netdev/checkpatch warning WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-29--12-00 (tests: 1040)

Commit Message

Jason Xing May 29, 2024, 3:31 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

CLOSE-WAIT is a relatively special state which "represents waiting for
a connection termination request from the local user" (RFC 793). Some
issues may happen because of unexpected/too many CLOSE-WAIT sockets,
like user application mistakenly handling close() syscall. It's a very
common issue in the real world.

We want to trace this total number of CLOSE-WAIT sockets fastly and
frequently instead of resorting to displaying them altogether by using:

  ss -s state close-wait

or something like this. They need to loop and collect required socket
information in kernel and then get back to the userside for print, which
does harm to the performance especially in heavy load for frequent
sampling.

That's the reason why I chose to introduce this new MIB counter like
CurrEstab does. With this counter implemented, we can record/observe the
normal changes of this counter all the time. It can help us:
1) We are able to be alerted in advance if the counter changes drastically.
2) If some users report some issues happening, we will particularly
pay more attention to it.

Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
should include both ESTABLISHED and CLOSE-WAIT sockets in theory:

  "tcpCurrEstab OBJECT-TYPE
   ...
   The number of TCP connections for which the current state
   is either ESTABLISHED or CLOSE- WAIT."

However, the thing is we don't do that according to RFC. The reason is
unknown. At least since 2005, we should have counted CLOSE-WAIT sockets
I think, there is a need to finish the work as RFC says. It's definitely
not a bad thing.

After this patch, we can see the counter by running 'cat /proc/net/netstat'
or 'nstat -s | grep CloseWait'

Suggested-by: Yongming Liu <yomiliu@tencent.com>
Suggested-by: Wangzi Yong <curuwang@tencent.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
v2
Link: https://lore.kernel.org/all/20240528021149.6186-1-kerneljasonxing@gmail.com/
1. revise the commit message to let other developers know what the use of
such a new counter.
2. introduce a decrement-counter help so that this new counter can do the
same thing as CurrEstab does. It's also the same as what I implemented locally.

BTW, I just finish implementing the correct snmp based on the RFC. Is it really so
hard to count close-wait sockets? I wondered... Any suggestions are welcome.

Thanks in advance.
---
 include/net/ip.h          | 1 +
 include/uapi/linux/snmp.h | 1 +
 net/ipv4/proc.c           | 1 +
 net/ipv4/tcp.c            | 5 +++++
 4 files changed, 8 insertions(+)

Comments

Eric Dumazet May 29, 2024, 3:42 p.m. UTC | #1
On Wed, May 29, 2024 at 5:31 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> CLOSE-WAIT is a relatively special state which "represents waiting for
> a connection termination request from the local user" (RFC 793). Some
> issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> like user application mistakenly handling close() syscall. It's a very
> common issue in the real world.
>
> We want to trace this total number of CLOSE-WAIT sockets fastly and
> frequently instead of resorting to displaying them altogether by using:
>
>   ss -s state close-wait
>
> or something like this. They need to loop and collect required socket
> information in kernel and then get back to the userside for print, which
> does harm to the performance especially in heavy load for frequent
> sampling.
>
> That's the reason why I chose to introduce this new MIB counter like
> CurrEstab does. With this counter implemented, we can record/observe the
> normal changes of this counter all the time. It can help us:
> 1) We are able to be alerted in advance if the counter changes drastically.
> 2) If some users report some issues happening, we will particularly
> pay more attention to it.
>
> Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
> should include both ESTABLISHED and CLOSE-WAIT sockets in theory:
>

We (Neal and myself) prefer to fix TCP_MIB_CURRESTAB to include
CLOSE_WAIT sockets.
We do not think it will annoy anyone, please change tcp_set_state() accordingly.

Rationale is that adoption of a new MIB in documentations and various
products will take years.

Also make a similar change for mptcp.

Thank you.
Jason Xing May 29, 2024, 3:52 p.m. UTC | #2
Hello Eric,

On Wed, May 29, 2024 at 11:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 29, 2024 at 5:31 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > CLOSE-WAIT is a relatively special state which "represents waiting for
> > a connection termination request from the local user" (RFC 793). Some
> > issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> > like user application mistakenly handling close() syscall. It's a very
> > common issue in the real world.
> >
> > We want to trace this total number of CLOSE-WAIT sockets fastly and
> > frequently instead of resorting to displaying them altogether by using:
> >
> >   ss -s state close-wait
> >
> > or something like this. They need to loop and collect required socket
> > information in kernel and then get back to the userside for print, which
> > does harm to the performance especially in heavy load for frequent
> > sampling.
> >
> > That's the reason why I chose to introduce this new MIB counter like
> > CurrEstab does. With this counter implemented, we can record/observe the
> > normal changes of this counter all the time. It can help us:
> > 1) We are able to be alerted in advance if the counter changes drastically.
> > 2) If some users report some issues happening, we will particularly
> > pay more attention to it.
> >
> > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
> > should include both ESTABLISHED and CLOSE-WAIT sockets in theory:
> >
>
> We (Neal and myself) prefer to fix TCP_MIB_CURRESTAB to include
> CLOSE_WAIT sockets.
> We do not think it will annoy anyone, please change tcp_set_state() accordingly.

Thanks for your reply. Honestly, I was worried about what you said.
Now, I'm relieved.

It seems that I should add a Fixes: tag...

>
> Rationale is that adoption of a new MIB in documentations and various
> products will take years.
>
> Also make a similar change for mptcp.

I will check that part tomorrow morning, too.

Thank you :)

Jason
diff mbox series

Patch

diff --git a/include/net/ip.h b/include/net/ip.h
index 6d735e00d3f3..0fe2994796b0 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -298,6 +298,7 @@  void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
 #define IP_UPD_PO_STATS(net, field, val) SNMP_UPD_PO_STATS64((net)->mib.ip_statistics, field, val)
 #define __IP_UPD_PO_STATS(net, field, val) __SNMP_UPD_PO_STATS64((net)->mib.ip_statistics, field, val)
 #define NET_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.net_statistics, field)
+#define NET_DEC_STATS(net, field)	SNMP_DEC_STATS((net)->mib.net_statistics, field)
 #define __NET_INC_STATS(net, field)	__SNMP_INC_STATS((net)->mib.net_statistics, field)
 #define NET_ADD_STATS(net, field, adnd)	SNMP_ADD_STATS((net)->mib.net_statistics, field, adnd)
 #define __NET_ADD_STATS(net, field, adnd) __SNMP_ADD_STATS((net)->mib.net_statistics, field, adnd)
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index adf5fd78dd50..c0feefb4d88b 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -302,6 +302,7 @@  enum
 	LINUX_MIB_TCPAOKEYNOTFOUND,		/* TCPAOKeyNotFound */
 	LINUX_MIB_TCPAOGOOD,			/* TCPAOGood */
 	LINUX_MIB_TCPAODROPPEDICMPS,		/* TCPAODroppedIcmps */
+	LINUX_MIB_TCPCLOSEWAIT,			/* TCPCloseWait */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 6c4664c681ca..964897dc6eb8 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -305,6 +305,7 @@  static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPAOKeyNotFound", LINUX_MIB_TCPAOKEYNOTFOUND),
 	SNMP_MIB_ITEM("TCPAOGood", LINUX_MIB_TCPAOGOOD),
 	SNMP_MIB_ITEM("TCPAODroppedIcmps", LINUX_MIB_TCPAODROPPEDICMPS),
+	SNMP_MIB_ITEM("TCPCloseWait", LINUX_MIB_TCPCLOSEWAIT),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 681b54e1f3a6..3908ea7fd14a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2661,6 +2661,11 @@  void tcp_set_state(struct sock *sk, int state)
 			TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB);
 	}
 
+	if (state == TCP_CLOSE_WAIT)
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPCLOSEWAIT);
+	if (oldstate == TCP_CLOSE_WAIT)
+		NET_DEC_STATS(sock_net(sk), LINUX_MIB_TCPCLOSEWAIT);
+
 	/* Change state AFTER socket is unhashed to avoid closed
 	 * socket sitting in hash tables.
 	 */