[net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets

Message ID	20240528021149.6186-1-kerneljasonxing@gmail.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDAAABA41 for <netdev@vger.kernel.org>; Tue, 28 May 2024 02:11:58 +0000 (UTC) From: Jason Xing <kerneljasonxing@gmail.com> To: edumazet@google.com, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, davem@davemloft.net Cc: netdev@vger.kernel.org, kerneljasonxing@gmail.com, Jason Xing <kernelxing@tencent.com>, Yongming Liu <yomiliu@tencent.com>, Wangzi Yong <curuwang@tencent.com> Subject: [PATCH net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets Date: Tue, 28 May 2024 10:11:49 +0800 Message-Id: <20240528021149.6186-1-kerneljasonxing@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets \| expand [net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 4976 this patch: 4976
netdev/build_tools	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	warning	2 maintainers not CCed: 0x7f454c46@gmail.com heng.guo@windriver.com
netdev/build_clang	success	Errors and warnings before: 1009 this patch: 1009
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 5245 this patch: 5245
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 1 this patch: 1
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-05-28--15-00 (tests: 1037)

Jason Xing May 28, 2024, 2:11 a.m. UTC

From: Jason Xing <kernelxing@tencent.com>

CLOSE-WAIT is a relatively special state which "represents waiting for
a connection termination request from the local user" (RFC 793). Some
issues may happen because of unexpected/too many CLOSE-WAIT sockets,
like user application mistakenly handling close() syscall.

We want to trace this total number of CLOSE-WAIT sockets fastly and
frequently instead of resorting to displaying them altogether by using:

  netstat -anlp | grep CLOSE_WAIT

or something like this, which does harm to the performance especially in
heavy load. That's the reason why I chose to introduce this new MIB counter
like CurrEstab does. It do help us diagnose/find issues in production.

Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
should include both ESTABLISHED and CLOSE-WAIT sockets in theory:

  "tcpCurrEstab OBJECT-TYPE
   ...
   The number of TCP connections for which the current state
   is either ESTABLISHED or CLOSE- WAIT."

Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think
there is a need to count it separately to avoid polluting the existing
TCP_MIB_CURRESTAB counter.

After this patch, we can see the counter by running 'cat /proc/net/netstat'
or 'nstat -s | grep CloseWait'

Suggested-by: Yongming Liu <yomiliu@tencent.com>
Suggested-by: Wangzi Yong <curuwang@tencent.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/uapi/linux/snmp.h | 1 +
 net/ipv4/proc.c           | 1 +
 net/ipv4/tcp.c            | 2 ++
 3 files changed, 4 insertions(+)

Eric Dumazet May 28, 2024, 5:13 a.m. UTC | #1

On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> CLOSE-WAIT is a relatively special state which "represents waiting for
> a connection termination request from the local user" (RFC 793). Some
> issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> like user application mistakenly handling close() syscall.
>
> We want to trace this total number of CLOSE-WAIT sockets fastly and
> frequently instead of resorting to displaying them altogether by using:
>
>   netstat -anlp | grep CLOSE_WAIT

This is horribly expensive.
Why asking af_unix and program names ?
You want to count some TCP sockets in a given state, right ?
iproute2 interface (inet_diag) can do the filtering in the kernel,
saving a lot of cycles.

ss -t state close-wait

>
> or something like this, which does harm to the performance especially in
> heavy load. That's the reason why I chose to introduce this new MIB counter
> like CurrEstab does. It do help us diagnose/find issues in production.
>
> Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
> should include both ESTABLISHED and CLOSE-WAIT sockets in theory:
>
>   "tcpCurrEstab OBJECT-TYPE
>    ...
>    The number of TCP connections for which the current state
>    is either ESTABLISHED or CLOSE- WAIT."
>
> Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think
> there is a need to count it separately to avoid polluting the existing
> TCP_MIB_CURRESTAB counter.
>
> After this patch, we can see the counter by running 'cat /proc/net/netstat'
> or 'nstat -s | grep CloseWait'

I find this counter quite not interesting.
After a few days of uptime, let say it is 52904523
What can you make of this value exactly ?
How do you make any correlation ?

>
> Suggested-by: Yongming Liu <yomiliu@tencent.com>
> Suggested-by: Wangzi Yong <curuwang@tencent.com>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  include/uapi/linux/snmp.h | 1 +
>  net/ipv4/proc.c           | 1 +
>  net/ipv4/tcp.c            | 2 ++
>  3 files changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
> index adf5fd78dd50..c0feefb4d88b 100644
> --- a/include/uapi/linux/snmp.h
> +++ b/include/uapi/linux/snmp.h
> @@ -302,6 +302,7 @@ enum
>         LINUX_MIB_TCPAOKEYNOTFOUND,             /* TCPAOKeyNotFound */
>         LINUX_MIB_TCPAOGOOD,                    /* TCPAOGood */
>         LINUX_MIB_TCPAODROPPEDICMPS,            /* TCPAODroppedIcmps */
> +       LINUX_MIB_TCPCLOSEWAIT,                 /* TCPCloseWait */
>         __LINUX_MIB_MAX
>  };
>
> diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
> index 6c4664c681ca..964897dc6eb8 100644
> --- a/net/ipv4/proc.c
> +++ b/net/ipv4/proc.c
> @@ -305,6 +305,7 @@ static const struct snmp_mib snmp4_net_list[] = {
>         SNMP_MIB_ITEM("TCPAOKeyNotFound", LINUX_MIB_TCPAOKEYNOTFOUND),
>         SNMP_MIB_ITEM("TCPAOGood", LINUX_MIB_TCPAOGOOD),
>         SNMP_MIB_ITEM("TCPAODroppedIcmps", LINUX_MIB_TCPAODROPPEDICMPS),
> +       SNMP_MIB_ITEM("TCPCloseWait", LINUX_MIB_TCPCLOSEWAIT),
>         SNMP_MIB_SENTINEL
>  };
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 681b54e1f3a6..7abaa2660cc8 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2659,6 +2659,8 @@ void tcp_set_state(struct sock *sk, int state)
>         default:
>                 if (oldstate == TCP_ESTABLISHED)
>                         TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB);
> +               if (state == TCP_CLOSE_WAIT)
> +                       NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPCLOSEWAIT);
>         }
>
>         /* Change state AFTER socket is unhashed to avoid closed
> --
> 2.37.3
>

Jason Xing May 28, 2024, 6:48 a.m. UTC | #2

Hello Eric,

On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > CLOSE-WAIT is a relatively special state which "represents waiting for
> > a connection termination request from the local user" (RFC 793). Some
> > issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> > like user application mistakenly handling close() syscall.
> >
> > We want to trace this total number of CLOSE-WAIT sockets fastly and
> > frequently instead of resorting to displaying them altogether by using:
> >
> >   netstat -anlp | grep CLOSE_WAIT
>
> This is horribly expensive.

Yes.

> Why asking af_unix and program names ?
> You want to count some TCP sockets in a given state, right ?
> iproute2 interface (inet_diag) can do the filtering in the kernel,
> saving a lot of cycles.
>
> ss -t state close-wait

Indeed, it is much better than netstat but not that good/fast enough
if we've already generated a lot of sockets. This command is suitable
for debug use, but not for frequent sampling, say, every 10 seconds.

More than this, RFC 1213 defines CurrEstab which should also include
close-wait sockets, but we don't have this one. I have no intention to
change the CurrEstab in Linux because it has been used for a really
long time. So I chose to introduce a new counter in linux mib
definitions.

>
> >
> > or something like this, which does harm to the performance especially in
> > heavy load. That's the reason why I chose to introduce this new MIB counter
> > like CurrEstab does. It do help us diagnose/find issues in production.
> >
> > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
> > should include both ESTABLISHED and CLOSE-WAIT sockets in theory:
> >
> >   "tcpCurrEstab OBJECT-TYPE
> >    ...
> >    The number of TCP connections for which the current state
> >    is either ESTABLISHED or CLOSE- WAIT."
> >
> > Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think
> > there is a need to count it separately to avoid polluting the existing
> > TCP_MIB_CURRESTAB counter.
> >
> > After this patch, we can see the counter by running 'cat /proc/net/netstat'
> > or 'nstat -s | grep CloseWait'
>
> I find this counter quite not interesting.
> After a few days of uptime, let say it is 52904523
> What can you make of this value exactly ?
> How do you make any correlation ?

There are two ways of implementing this counter:
1) like the counters in 'linux mib definitions', we have to 'diff' the
counter then we can know how many close-wait sockets generated in a
certain period.
2) like what CurrEstab does, then we have to introduce a new helper
(for example, NET_DEC_STATS) to decrement the counter if the state of
the close-wait socket changes in tcp_set_state().

After thinking more about your question, the latter is better because
it can easily reflect the current situation, right? What do you think?

Thanks,
Jason

Eric Dumazet May 28, 2024, 7:36 a.m. UTC | #3

On Tue, May 28, 2024 at 8:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > CLOSE-WAIT is a relatively special state which "represents waiting for
> > > a connection termination request from the local user" (RFC 793). Some
> > > issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> > > like user application mistakenly handling close() syscall.
> > >
> > > We want to trace this total number of CLOSE-WAIT sockets fastly and
> > > frequently instead of resorting to displaying them altogether by using:
> > >
> > >   netstat -anlp | grep CLOSE_WAIT
> >
> > This is horribly expensive.
>
> Yes.
>
> > Why asking af_unix and program names ?
> > You want to count some TCP sockets in a given state, right ?
> > iproute2 interface (inet_diag) can do the filtering in the kernel,
> > saving a lot of cycles.
> >
> > ss -t state close-wait
>
> Indeed, it is much better than netstat but not that good/fast enough
> if we've already generated a lot of sockets. This command is suitable
> for debug use, but not for frequent sampling, say, every 10 seconds.
> More than this, RFC 1213 defines CurrEstab which should also include
> close-wait sockets, but we don't have this one.

"we don't have this one."
You mean we do not have CurrEstab ?
That might be user space decision to not display it from nstat
command, in useless_number()
(Not sure why. If someone thought it was useless, then CLOSE_WAIT
count is even more useless...)

> I have no intention to
> change the CurrEstab in Linux because it has been used for a really
> long time. So I chose to introduce a new counter in linux mib
> definitions.
>
> >
> > >
> > > or something like this, which does harm to the performance especially in
> > > heavy load. That's the reason why I chose to introduce this new MIB counter
> > > like CurrEstab does. It do help us diagnose/find issues in production.
> > >
> > > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
> > > should include both ESTABLISHED and CLOSE-WAIT sockets in theory:
> > >
> > >   "tcpCurrEstab OBJECT-TYPE
> > >    ...
> > >    The number of TCP connections for which the current state
> > >    is either ESTABLISHED or CLOSE- WAIT."
> > >
> > > Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think
> > > there is a need to count it separately to avoid polluting the existing
> > > TCP_MIB_CURRESTAB counter.
> > >
> > > After this patch, we can see the counter by running 'cat /proc/net/netstat'
> > > or 'nstat -s | grep CloseWait'
> >
> > I find this counter quite not interesting.
> > After a few days of uptime, let say it is 52904523
> > What can you make of this value exactly ?
> > How do you make any correlation ?
>
> There are two ways of implementing this counter:
> 1) like the counters in 'linux mib definitions', we have to 'diff' the
> counter then we can know how many close-wait sockets generated in a
> certain period.

And what do you make of this raw information ?

if it is 10000 or 20000 in a 10-second period, what conclusion do you get ?

Receiving FIN packets is a fact of life, I see no reason to worry.

> 2) like what CurrEstab does, then we have to introduce a new helper
> (for example, NET_DEC_STATS) to decrement the counter if the state of
> the close-wait socket changes in tcp_set_state().
>
> After thinking more about your question, the latter is better because
> it can easily reflect the current situation, right? What do you think?

I think you are sending not tested patches.

I suggest you use your patches first, and send tests so that we can all see
the intent, how this is supposed to work in the first place.

That would save us time.

Thank you.

Jason Xing May 28, 2024, 8:34 a.m. UTC | #4

On Tue, May 28, 2024 at 3:36 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 28, 2024 at 8:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > CLOSE-WAIT is a relatively special state which "represents waiting for
> > > > a connection termination request from the local user" (RFC 793). Some
> > > > issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> > > > like user application mistakenly handling close() syscall.
> > > >
> > > > We want to trace this total number of CLOSE-WAIT sockets fastly and
> > > > frequently instead of resorting to displaying them altogether by using:
> > > >
> > > >   netstat -anlp | grep CLOSE_WAIT
> > >
> > > This is horribly expensive.
> >
> > Yes.
> >
> > > Why asking af_unix and program names ?
> > > You want to count some TCP sockets in a given state, right ?
> > > iproute2 interface (inet_diag) can do the filtering in the kernel,
> > > saving a lot of cycles.
> > >
> > > ss -t state close-wait
> >
> > Indeed, it is much better than netstat but not that good/fast enough
> > if we've already generated a lot of sockets. This command is suitable
> > for debug use, but not for frequent sampling, say, every 10 seconds.
> > More than this, RFC 1213 defines CurrEstab which should also include
> > close-wait sockets, but we don't have this one.
>
> "we don't have this one."
> You mean we do not have CurrEstab ?
> That might be user space decision to not display it from nstat
> command, in useless_number()
> (Not sure why. If someone thought it was useless, then CLOSE_WAIT
> count is even more useless...)

It has nothing to do with user applications.

Let me give one example, ss -s can show the value of 'estab' which is
derived from /proc/net/snmp file.

What the corresponding CurrEstab implemented in the kernel is only
counting established sockets not including close-wait sockets in
tcp_set_state().

The reason why it does not count close-wait sockets like RFC says is unknown.

>
> > I have no intention to
> > change the CurrEstab in Linux because it has been used for a really
> > long time. So I chose to introduce a new counter in linux mib
> > definitions.
> >
> > >
> > > >
> > > > or something like this, which does harm to the performance especially in
> > > > heavy load. That's the reason why I chose to introduce this new MIB counter
> > > > like CurrEstab does. It do help us diagnose/find issues in production.
> > > >
> > > > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB
> > > > should include both ESTABLISHED and CLOSE-WAIT sockets in theory:
> > > >
> > > >   "tcpCurrEstab OBJECT-TYPE
> > > >    ...
> > > >    The number of TCP connections for which the current state
> > > >    is either ESTABLISHED or CLOSE- WAIT."
> > > >
> > > > Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think
> > > > there is a need to count it separately to avoid polluting the existing
> > > > TCP_MIB_CURRESTAB counter.
> > > >
> > > > After this patch, we can see the counter by running 'cat /proc/net/netstat'
> > > > or 'nstat -s | grep CloseWait'
> > >
> > > I find this counter quite not interesting.
> > > After a few days of uptime, let say it is 52904523
> > > What can you make of this value exactly ?
> > > How do you make any correlation ?
> >
> > There are two ways of implementing this counter:
> > 1) like the counters in 'linux mib definitions', we have to 'diff' the
> > counter then we can know how many close-wait sockets generated in a
> > certain period.
>
> And what do you make of this raw information ?
>
> if it is 10000 or 20000 in a 10-second period, what conclusion do you get ?

Some buggy/stupid user applications cannot handle taking care of
finishing a socket (by using close()), which is a very classic and
common problem in production. If it happens, many weird things could
happen.

If we cannot reproduce the issue easily, we have to trace the monitor
history that collects the close-wait sockets in history.

With this counter implemented, we can record/observe the normal
changes of this counter all the time. It can help us:
1) We are able to know in advance if the counter changes drastically.
2) If some users report some issues happening, we will particularly
pay more attention to it.

>
> Receiving FIN packets is a fact of life, I see no reason to worry.
>
> > 2) like what CurrEstab does, then we have to introduce a new helper
> > (for example, NET_DEC_STATS) to decrement the counter if the state of
> > the close-wait socket changes in tcp_set_state().
> >
> > After thinking more about your question, the latter is better because
> > it can easily reflect the current situation, right? What do you think?
>
> I think you are sending not tested patches.

No.

Honestly, what I've done in our private kernel is different from this patch:
I added a new counter in 'tcp mib definitions' (see diff patch [1]).
You know, it is not good to submit such a patch to the kernel
community because 'tcp mib definitions' enjoys a long history and not
touched more than 10 years. If I do so, I can imagine you might
question/challenge me. I can picture it. :(

Then I decided to re-implement it in 'linux mib definitions'. This
file is touched in these years.

[1]:
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index adf5fd78dd50..27beab1002ce 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -144,6 +144,7 @@ enum
  TCP_MIB_INERRS, /* InErrs */
  TCP_MIB_OUTRSTS, /* OutRsts */
  TCP_MIB_CSUMERRORS, /* InCsumErrors */
+ TCP_MIB_CURRCLOSEWAIT, /* CurrCloseWait */
  __TCP_MIB_MAX
 };

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 6c4664c681ca..5d2a175a6d35 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -157,6 +157,7 @@ static const struct snmp_mib snmp4_tcp_list[] = {
  SNMP_MIB_ITEM("InErrs", TCP_MIB_INERRS),
  SNMP_MIB_ITEM("OutRsts", TCP_MIB_OUTRSTS),
  SNMP_MIB_ITEM("InCsumErrors", TCP_MIB_CSUMERRORS),
+ SNMP_MIB_ITEM("CurrCloseWait", TCP_MIB_CURRCLOSEWAIT),
  SNMP_MIB_SENTINEL
 };

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 681b54e1f3a6..f1bbbd477cda 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2659,6 +2659,10 @@ void tcp_set_state(struct sock *sk, int state)
  default:
  if (oldstate == TCP_ESTABLISHED)
  TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB);
+ if (state == TCP_CLOSE_WAIT)
+ TCP_INC_STATS(sock_net(sk), TCP_MIB_CURRCLOSEWAIT);
+ if (oldstate == TCP_CLOSE_WAIT)
+ TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRCLOSEWAIT);
  }

  /* Change state AFTER socket is unhashed to avoid closed
--
2.37.3

>
> I suggest you use your patches first, and send tests so that we can all see
> the intent, how this is supposed to work in the first place.

This patch is not proposed out of thin air.

Normally, we have two kinds of issue/bug reports:
1) we can easily reproduce, so the issue can be easily diagnosed.
2) It happens in history, say, last night, which cannot be traced
easily. We have to implement a monitor or an agent to know what
happened last night. What I'm doing is like this.

A few years ago, I worked at another company and cooperated with some
guys in Google. We all find it is hard to get to the root cause of
this kind of issue (like spike) as above, so the only thing we can do
is to trace/record more useful information which helps us narrow down
the issue. I'm just saying. It's not effortless to deal with all kinds
of possible issues daily.

And finishing the work of counting close-wait sockets according to RFC
is one of reasons but not that important.

Thanks for your patience and review and suggestions :)

Thanks,
Jason

Jason Xing May 28, 2024, 8:52 a.m. UTC | #5

On Tue, May 28, 2024 at 4:34 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Tue, May 28, 2024 at 3:36 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, May 28, 2024 at 8:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > Hello Eric,
> > >
> > > On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > CLOSE-WAIT is a relatively special state which "represents waiting for
> > > > > a connection termination request from the local user" (RFC 793). Some
> > > > > issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> > > > > like user application mistakenly handling close() syscall.
> > > > >
> > > > > We want to trace this total number of CLOSE-WAIT sockets fastly and
> > > > > frequently instead of resorting to displaying them altogether by using:
> > > > >
> > > > >   netstat -anlp | grep CLOSE_WAIT
> > > >
> > > > This is horribly expensive.
> > >
> > > Yes.
> > >
> > > > Why asking af_unix and program names ?
> > > > You want to count some TCP sockets in a given state, right ?
> > > > iproute2 interface (inet_diag) can do the filtering in the kernel,
> > > > saving a lot of cycles.
> > > >
> > > > ss -t state close-wait
> > >
> > > Indeed, it is much better than netstat but not that good/fast enough
> > > if we've already generated a lot of sockets. This command is suitable
> > > for debug use, but not for frequent sampling, say, every 10 seconds.
> > > More than this, RFC 1213 defines CurrEstab which should also include
> > > close-wait sockets, but we don't have this one.
> >
> > "we don't have this one."
> > You mean we do not have CurrEstab ?
> > That might be user space decision to not display it from nstat
> > command, in useless_number()
> > (Not sure why. If someone thought it was useless, then CLOSE_WAIT
> > count is even more useless...)
>
> It has nothing to do with user applications.
>
> Let me give one example, ss -s can show the value of 'estab' which is
> derived from /proc/net/snmp file.

Speaking of the CurrEstab, for many newbies, they may ask what the use
of this counter is? For me, I would like to share an interesting issue
report I ever handled.

One day we had a moment when most CPUs were burned (cpu% is around
80%) and most applications were stuck all of sudden, but this
phenomenon disappeared very soon. After we deployed an agent
collecting the snmp counters, we noticed that there was one
application mistakenly launching a great number of connections
concurrently. It's a bug in that application.

Even the CurrEstab is useful, let alone the counter for close-wait.

Thanks,
Jason

[net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets

Checks

Commit Message

Comments

Patch