Message ID | 20240528021149.6186-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets | expand |
On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > CLOSE-WAIT is a relatively special state which "represents waiting for > a connection termination request from the local user" (RFC 793). Some > issues may happen because of unexpected/too many CLOSE-WAIT sockets, > like user application mistakenly handling close() syscall. > > We want to trace this total number of CLOSE-WAIT sockets fastly and > frequently instead of resorting to displaying them altogether by using: > > netstat -anlp | grep CLOSE_WAIT This is horribly expensive. Why asking af_unix and program names ? You want to count some TCP sockets in a given state, right ? iproute2 interface (inet_diag) can do the filtering in the kernel, saving a lot of cycles. ss -t state close-wait > > or something like this, which does harm to the performance especially in > heavy load. That's the reason why I chose to introduce this new MIB counter > like CurrEstab does. It do help us diagnose/find issues in production. > > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB > should include both ESTABLISHED and CLOSE-WAIT sockets in theory: > > "tcpCurrEstab OBJECT-TYPE > ... > The number of TCP connections for which the current state > is either ESTABLISHED or CLOSE- WAIT." > > Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think > there is a need to count it separately to avoid polluting the existing > TCP_MIB_CURRESTAB counter. > > After this patch, we can see the counter by running 'cat /proc/net/netstat' > or 'nstat -s | grep CloseWait' I find this counter quite not interesting. After a few days of uptime, let say it is 52904523 What can you make of this value exactly ? How do you make any correlation ? > > Suggested-by: Yongming Liu <yomiliu@tencent.com> > Suggested-by: Wangzi Yong <curuwang@tencent.com> > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > include/uapi/linux/snmp.h | 1 + > net/ipv4/proc.c | 1 + > net/ipv4/tcp.c | 2 ++ > 3 files changed, 4 insertions(+) > > diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h > index adf5fd78dd50..c0feefb4d88b 100644 > --- a/include/uapi/linux/snmp.h > +++ b/include/uapi/linux/snmp.h > @@ -302,6 +302,7 @@ enum > LINUX_MIB_TCPAOKEYNOTFOUND, /* TCPAOKeyNotFound */ > LINUX_MIB_TCPAOGOOD, /* TCPAOGood */ > LINUX_MIB_TCPAODROPPEDICMPS, /* TCPAODroppedIcmps */ > + LINUX_MIB_TCPCLOSEWAIT, /* TCPCloseWait */ > __LINUX_MIB_MAX > }; > > diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c > index 6c4664c681ca..964897dc6eb8 100644 > --- a/net/ipv4/proc.c > +++ b/net/ipv4/proc.c > @@ -305,6 +305,7 @@ static const struct snmp_mib snmp4_net_list[] = { > SNMP_MIB_ITEM("TCPAOKeyNotFound", LINUX_MIB_TCPAOKEYNOTFOUND), > SNMP_MIB_ITEM("TCPAOGood", LINUX_MIB_TCPAOGOOD), > SNMP_MIB_ITEM("TCPAODroppedIcmps", LINUX_MIB_TCPAODROPPEDICMPS), > + SNMP_MIB_ITEM("TCPCloseWait", LINUX_MIB_TCPCLOSEWAIT), > SNMP_MIB_SENTINEL > }; > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 681b54e1f3a6..7abaa2660cc8 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -2659,6 +2659,8 @@ void tcp_set_state(struct sock *sk, int state) > default: > if (oldstate == TCP_ESTABLISHED) > TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); > + if (state == TCP_CLOSE_WAIT) > + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPCLOSEWAIT); > } > > /* Change state AFTER socket is unhashed to avoid closed > -- > 2.37.3 >
Hello Eric, On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > CLOSE-WAIT is a relatively special state which "represents waiting for > > a connection termination request from the local user" (RFC 793). Some > > issues may happen because of unexpected/too many CLOSE-WAIT sockets, > > like user application mistakenly handling close() syscall. > > > > We want to trace this total number of CLOSE-WAIT sockets fastly and > > frequently instead of resorting to displaying them altogether by using: > > > > netstat -anlp | grep CLOSE_WAIT > > This is horribly expensive. Yes. > Why asking af_unix and program names ? > You want to count some TCP sockets in a given state, right ? > iproute2 interface (inet_diag) can do the filtering in the kernel, > saving a lot of cycles. > > ss -t state close-wait Indeed, it is much better than netstat but not that good/fast enough if we've already generated a lot of sockets. This command is suitable for debug use, but not for frequent sampling, say, every 10 seconds. More than this, RFC 1213 defines CurrEstab which should also include close-wait sockets, but we don't have this one. I have no intention to change the CurrEstab in Linux because it has been used for a really long time. So I chose to introduce a new counter in linux mib definitions. > > > > > or something like this, which does harm to the performance especially in > > heavy load. That's the reason why I chose to introduce this new MIB counter > > like CurrEstab does. It do help us diagnose/find issues in production. > > > > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB > > should include both ESTABLISHED and CLOSE-WAIT sockets in theory: > > > > "tcpCurrEstab OBJECT-TYPE > > ... > > The number of TCP connections for which the current state > > is either ESTABLISHED or CLOSE- WAIT." > > > > Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think > > there is a need to count it separately to avoid polluting the existing > > TCP_MIB_CURRESTAB counter. > > > > After this patch, we can see the counter by running 'cat /proc/net/netstat' > > or 'nstat -s | grep CloseWait' > > I find this counter quite not interesting. > After a few days of uptime, let say it is 52904523 > What can you make of this value exactly ? > How do you make any correlation ? There are two ways of implementing this counter: 1) like the counters in 'linux mib definitions', we have to 'diff' the counter then we can know how many close-wait sockets generated in a certain period. 2) like what CurrEstab does, then we have to introduce a new helper (for example, NET_DEC_STATS) to decrement the counter if the state of the close-wait socket changes in tcp_set_state(). After thinking more about your question, the latter is better because it can easily reflect the current situation, right? What do you think? Thanks, Jason
On Tue, May 28, 2024 at 8:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > Hello Eric, > > On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > CLOSE-WAIT is a relatively special state which "represents waiting for > > > a connection termination request from the local user" (RFC 793). Some > > > issues may happen because of unexpected/too many CLOSE-WAIT sockets, > > > like user application mistakenly handling close() syscall. > > > > > > We want to trace this total number of CLOSE-WAIT sockets fastly and > > > frequently instead of resorting to displaying them altogether by using: > > > > > > netstat -anlp | grep CLOSE_WAIT > > > > This is horribly expensive. > > Yes. > > > Why asking af_unix and program names ? > > You want to count some TCP sockets in a given state, right ? > > iproute2 interface (inet_diag) can do the filtering in the kernel, > > saving a lot of cycles. > > > > ss -t state close-wait > > Indeed, it is much better than netstat but not that good/fast enough > if we've already generated a lot of sockets. This command is suitable > for debug use, but not for frequent sampling, say, every 10 seconds. > More than this, RFC 1213 defines CurrEstab which should also include > close-wait sockets, but we don't have this one. "we don't have this one." You mean we do not have CurrEstab ? That might be user space decision to not display it from nstat command, in useless_number() (Not sure why. If someone thought it was useless, then CLOSE_WAIT count is even more useless...) > I have no intention to > change the CurrEstab in Linux because it has been used for a really > long time. So I chose to introduce a new counter in linux mib > definitions. > > > > > > > > > or something like this, which does harm to the performance especially in > > > heavy load. That's the reason why I chose to introduce this new MIB counter > > > like CurrEstab does. It do help us diagnose/find issues in production. > > > > > > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB > > > should include both ESTABLISHED and CLOSE-WAIT sockets in theory: > > > > > > "tcpCurrEstab OBJECT-TYPE > > > ... > > > The number of TCP connections for which the current state > > > is either ESTABLISHED or CLOSE- WAIT." > > > > > > Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think > > > there is a need to count it separately to avoid polluting the existing > > > TCP_MIB_CURRESTAB counter. > > > > > > After this patch, we can see the counter by running 'cat /proc/net/netstat' > > > or 'nstat -s | grep CloseWait' > > > > I find this counter quite not interesting. > > After a few days of uptime, let say it is 52904523 > > What can you make of this value exactly ? > > How do you make any correlation ? > > There are two ways of implementing this counter: > 1) like the counters in 'linux mib definitions', we have to 'diff' the > counter then we can know how many close-wait sockets generated in a > certain period. And what do you make of this raw information ? if it is 10000 or 20000 in a 10-second period, what conclusion do you get ? Receiving FIN packets is a fact of life, I see no reason to worry. > 2) like what CurrEstab does, then we have to introduce a new helper > (for example, NET_DEC_STATS) to decrement the counter if the state of > the close-wait socket changes in tcp_set_state(). > > After thinking more about your question, the latter is better because > it can easily reflect the current situation, right? What do you think? I think you are sending not tested patches. I suggest you use your patches first, and send tests so that we can all see the intent, how this is supposed to work in the first place. That would save us time. Thank you.
On Tue, May 28, 2024 at 3:36 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, May 28, 2024 at 8:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > Hello Eric, > > > > On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > CLOSE-WAIT is a relatively special state which "represents waiting for > > > > a connection termination request from the local user" (RFC 793). Some > > > > issues may happen because of unexpected/too many CLOSE-WAIT sockets, > > > > like user application mistakenly handling close() syscall. > > > > > > > > We want to trace this total number of CLOSE-WAIT sockets fastly and > > > > frequently instead of resorting to displaying them altogether by using: > > > > > > > > netstat -anlp | grep CLOSE_WAIT > > > > > > This is horribly expensive. > > > > Yes. > > > > > Why asking af_unix and program names ? > > > You want to count some TCP sockets in a given state, right ? > > > iproute2 interface (inet_diag) can do the filtering in the kernel, > > > saving a lot of cycles. > > > > > > ss -t state close-wait > > > > Indeed, it is much better than netstat but not that good/fast enough > > if we've already generated a lot of sockets. This command is suitable > > for debug use, but not for frequent sampling, say, every 10 seconds. > > More than this, RFC 1213 defines CurrEstab which should also include > > close-wait sockets, but we don't have this one. > > "we don't have this one." > You mean we do not have CurrEstab ? > That might be user space decision to not display it from nstat > command, in useless_number() > (Not sure why. If someone thought it was useless, then CLOSE_WAIT > count is even more useless...) It has nothing to do with user applications. Let me give one example, ss -s can show the value of 'estab' which is derived from /proc/net/snmp file. What the corresponding CurrEstab implemented in the kernel is only counting established sockets not including close-wait sockets in tcp_set_state(). The reason why it does not count close-wait sockets like RFC says is unknown. > > > I have no intention to > > change the CurrEstab in Linux because it has been used for a really > > long time. So I chose to introduce a new counter in linux mib > > definitions. > > > > > > > > > > > > > or something like this, which does harm to the performance especially in > > > > heavy load. That's the reason why I chose to introduce this new MIB counter > > > > like CurrEstab does. It do help us diagnose/find issues in production. > > > > > > > > Besides, in the group of TCP_MIB_* defined by RFC 1213, TCP_MIB_CURRESTAB > > > > should include both ESTABLISHED and CLOSE-WAIT sockets in theory: > > > > > > > > "tcpCurrEstab OBJECT-TYPE > > > > ... > > > > The number of TCP connections for which the current state > > > > is either ESTABLISHED or CLOSE- WAIT." > > > > > > > > Apparently, at least since 2005, we don't count CLOSE-WAIT sockets. I think > > > > there is a need to count it separately to avoid polluting the existing > > > > TCP_MIB_CURRESTAB counter. > > > > > > > > After this patch, we can see the counter by running 'cat /proc/net/netstat' > > > > or 'nstat -s | grep CloseWait' > > > > > > I find this counter quite not interesting. > > > After a few days of uptime, let say it is 52904523 > > > What can you make of this value exactly ? > > > How do you make any correlation ? > > > > There are two ways of implementing this counter: > > 1) like the counters in 'linux mib definitions', we have to 'diff' the > > counter then we can know how many close-wait sockets generated in a > > certain period. > > And what do you make of this raw information ? > > if it is 10000 or 20000 in a 10-second period, what conclusion do you get ? Some buggy/stupid user applications cannot handle taking care of finishing a socket (by using close()), which is a very classic and common problem in production. If it happens, many weird things could happen. If we cannot reproduce the issue easily, we have to trace the monitor history that collects the close-wait sockets in history. With this counter implemented, we can record/observe the normal changes of this counter all the time. It can help us: 1) We are able to know in advance if the counter changes drastically. 2) If some users report some issues happening, we will particularly pay more attention to it. > > Receiving FIN packets is a fact of life, I see no reason to worry. > > > 2) like what CurrEstab does, then we have to introduce a new helper > > (for example, NET_DEC_STATS) to decrement the counter if the state of > > the close-wait socket changes in tcp_set_state(). > > > > After thinking more about your question, the latter is better because > > it can easily reflect the current situation, right? What do you think? > > I think you are sending not tested patches. No. Honestly, what I've done in our private kernel is different from this patch: I added a new counter in 'tcp mib definitions' (see diff patch [1]). You know, it is not good to submit such a patch to the kernel community because 'tcp mib definitions' enjoys a long history and not touched more than 10 years. If I do so, I can imagine you might question/challenge me. I can picture it. :( Then I decided to re-implement it in 'linux mib definitions'. This file is touched in these years. [1]: diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index adf5fd78dd50..27beab1002ce 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -144,6 +144,7 @@ enum TCP_MIB_INERRS, /* InErrs */ TCP_MIB_OUTRSTS, /* OutRsts */ TCP_MIB_CSUMERRORS, /* InCsumErrors */ + TCP_MIB_CURRCLOSEWAIT, /* CurrCloseWait */ __TCP_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 6c4664c681ca..5d2a175a6d35 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -157,6 +157,7 @@ static const struct snmp_mib snmp4_tcp_list[] = { SNMP_MIB_ITEM("InErrs", TCP_MIB_INERRS), SNMP_MIB_ITEM("OutRsts", TCP_MIB_OUTRSTS), SNMP_MIB_ITEM("InCsumErrors", TCP_MIB_CSUMERRORS), + SNMP_MIB_ITEM("CurrCloseWait", TCP_MIB_CURRCLOSEWAIT), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 681b54e1f3a6..f1bbbd477cda 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2659,6 +2659,10 @@ void tcp_set_state(struct sock *sk, int state) default: if (oldstate == TCP_ESTABLISHED) TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); + if (state == TCP_CLOSE_WAIT) + TCP_INC_STATS(sock_net(sk), TCP_MIB_CURRCLOSEWAIT); + if (oldstate == TCP_CLOSE_WAIT) + TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRCLOSEWAIT); } /* Change state AFTER socket is unhashed to avoid closed -- 2.37.3 > > I suggest you use your patches first, and send tests so that we can all see > the intent, how this is supposed to work in the first place. This patch is not proposed out of thin air. Normally, we have two kinds of issue/bug reports: 1) we can easily reproduce, so the issue can be easily diagnosed. 2) It happens in history, say, last night, which cannot be traced easily. We have to implement a monitor or an agent to know what happened last night. What I'm doing is like this. A few years ago, I worked at another company and cooperated with some guys in Google. We all find it is hard to get to the root cause of this kind of issue (like spike) as above, so the only thing we can do is to trace/record more useful information which helps us narrow down the issue. I'm just saying. It's not effortless to deal with all kinds of possible issues daily. And finishing the work of counting close-wait sockets according to RFC is one of reasons but not that important. Thanks for your patience and review and suggestions :) Thanks, Jason
On Tue, May 28, 2024 at 4:34 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Tue, May 28, 2024 at 3:36 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, May 28, 2024 at 8:48 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > Hello Eric, > > > > > > On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > CLOSE-WAIT is a relatively special state which "represents waiting for > > > > > a connection termination request from the local user" (RFC 793). Some > > > > > issues may happen because of unexpected/too many CLOSE-WAIT sockets, > > > > > like user application mistakenly handling close() syscall. > > > > > > > > > > We want to trace this total number of CLOSE-WAIT sockets fastly and > > > > > frequently instead of resorting to displaying them altogether by using: > > > > > > > > > > netstat -anlp | grep CLOSE_WAIT > > > > > > > > This is horribly expensive. > > > > > > Yes. > > > > > > > Why asking af_unix and program names ? > > > > You want to count some TCP sockets in a given state, right ? > > > > iproute2 interface (inet_diag) can do the filtering in the kernel, > > > > saving a lot of cycles. > > > > > > > > ss -t state close-wait > > > > > > Indeed, it is much better than netstat but not that good/fast enough > > > if we've already generated a lot of sockets. This command is suitable > > > for debug use, but not for frequent sampling, say, every 10 seconds. > > > More than this, RFC 1213 defines CurrEstab which should also include > > > close-wait sockets, but we don't have this one. > > > > "we don't have this one." > > You mean we do not have CurrEstab ? > > That might be user space decision to not display it from nstat > > command, in useless_number() > > (Not sure why. If someone thought it was useless, then CLOSE_WAIT > > count is even more useless...) > > It has nothing to do with user applications. > > Let me give one example, ss -s can show the value of 'estab' which is > derived from /proc/net/snmp file. Speaking of the CurrEstab, for many newbies, they may ask what the use of this counter is? For me, I would like to share an interesting issue report I ever handled. One day we had a moment when most CPUs were burned (cpu% is around 80%) and most applications were stuck all of sudden, but this phenomenon disappeared very soon. After we deployed an agent collecting the snmp counters, we noticed that there was one application mistakenly launching a great number of connections concurrently. It's a bug in that application. Even the CurrEstab is useful, let alone the counter for close-wait. Thanks, Jason
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index adf5fd78dd50..c0feefb4d88b 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -302,6 +302,7 @@ enum LINUX_MIB_TCPAOKEYNOTFOUND, /* TCPAOKeyNotFound */ LINUX_MIB_TCPAOGOOD, /* TCPAOGood */ LINUX_MIB_TCPAODROPPEDICMPS, /* TCPAODroppedIcmps */ + LINUX_MIB_TCPCLOSEWAIT, /* TCPCloseWait */ __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 6c4664c681ca..964897dc6eb8 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -305,6 +305,7 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPAOKeyNotFound", LINUX_MIB_TCPAOKEYNOTFOUND), SNMP_MIB_ITEM("TCPAOGood", LINUX_MIB_TCPAOGOOD), SNMP_MIB_ITEM("TCPAODroppedIcmps", LINUX_MIB_TCPAODROPPEDICMPS), + SNMP_MIB_ITEM("TCPCloseWait", LINUX_MIB_TCPCLOSEWAIT), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 681b54e1f3a6..7abaa2660cc8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2659,6 +2659,8 @@ void tcp_set_state(struct sock *sk, int state) default: if (oldstate == TCP_ESTABLISHED) TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); + if (state == TCP_CLOSE_WAIT) + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPCLOSEWAIT); } /* Change state AFTER socket is unhashed to avoid closed