Message ID | 20211128060102.6504-1-imagedong@tencent.com (mailing list archive) |
---|---|
State | Accepted |
Commit | aeeecb889165617a841e939117f9a8095d0e7d80 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net-next] net: snmp: add statistics for tcp small queue check | expand |
Hello: This patch was applied to netdev/net-next.git (master) by David S. Miller <davem@davemloft.net>: On Sun, 28 Nov 2021 14:01:02 +0800 you wrote: > From: Menglong Dong <imagedong@tencent.com> > > Once tcp small queue check failed in tcp_small_queue_check(), the > throughput of tcp will be limited, and it's hard to distinguish > whether it is out of tcp congestion control. > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. > > [...] Here is the summary with links: - [v2,net-next] net: snmp: add statistics for tcp small queue check https://git.kernel.org/netdev/net-next/c/aeeecb889165 You are awesome, thank you!
On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote: > Once tcp small queue check failed in tcp_small_queue_check(), the > throughput of tcp will be limited, and it's hard to distinguish > whether it is out of tcp congestion control. > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. Isn't this going to trigger all the time and alarm users because of the "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine for TCP to bake full TSQ amount of data and have it paced out onto the wire? What's your link speed?
On Mon, Nov 29, 2021 at 7:57 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote: > > Once tcp small queue check failed in tcp_small_queue_check(), the > > throughput of tcp will be limited, and it's hard to distinguish > > whether it is out of tcp congestion control. > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. > > Isn't this going to trigger all the time and alarm users because of the > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine > for TCP to bake full TSQ amount of data and have it paced out onto the > wire? What's your link speed? Yes, I would be curious to have some instructions on how this new SNMP variable can be used, in a concrete case. Like, how getting these SNMP values can translate to an action, giving more throughput ?
On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote: > > Once tcp small queue check failed in tcp_small_queue_check(), the > > throughput of tcp will be limited, and it's hard to distinguish > > whether it is out of tcp congestion control. > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. > > Isn't this going to trigger all the time and alarm users because of the > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine > for TCP to bake full TSQ amount of data and have it paced out onto the > wire? What's your link speed? Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net is used with napi_tx enabled. With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net, until it is released. The point is that the sending interrupt of virtio_net will be turned off and the skb can't be released until the next net_rx interrupt comes. So, wmem_alloc can't decrease on time, and the bandwidth is limited. When this happens, the bandwidth can decrease from 500M to 10M. In fact, this issue of uapi_tx is fixed in this commit: https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/ I added this statistics to monitor the sending failure (may be called sending delay) caused by qdisc and net_device. When something happen, maybe users can raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth. Thanks! Menglong Dong
On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote: > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote: > > > Once tcp small queue check failed in tcp_small_queue_check(), the > > > throughput of tcp will be limited, and it's hard to distinguish > > > whether it is out of tcp congestion control. > > > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. > > > > Isn't this going to trigger all the time and alarm users because of the > > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine > > for TCP to bake full TSQ amount of data and have it paced out onto the > > wire? What's your link speed? > > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net > is used with napi_tx enabled. > > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net, > until it is released. The point is that the sending interrupt of > virtio_net will be > turned off and the skb can't be released until the next net_rx interrupt comes. > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When > this happens, the bandwidth can decrease from 500M to 10M. > > In fact, this issue of uapi_tx is fixed in this commit: > https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/ > > I added this statistics to monitor the sending failure (may be called > sending delay) > caused by qdisc and net_device. When something happen, maybe users can > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth. Sounds very second-order and particular to a buggy driver :/ Let's see what Eric says but I vote revert.
On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote: > > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote: > > > > Once tcp small queue check failed in tcp_small_queue_check(), the > > > > throughput of tcp will be limited, and it's hard to distinguish > > > > whether it is out of tcp congestion control. > > > > > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. > > > > > > Isn't this going to trigger all the time and alarm users because of the > > > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine > > > for TCP to bake full TSQ amount of data and have it paced out onto the > > > wire? What's your link speed? > > > > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net > > is used with napi_tx enabled. > > > > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net, > > until it is released. The point is that the sending interrupt of > > virtio_net will be > > turned off and the skb can't be released until the next net_rx interrupt comes. > > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When > > this happens, the bandwidth can decrease from 500M to 10M. > > > > In fact, this issue of uapi_tx is fixed in this commit: > > https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/ > > > > I added this statistics to monitor the sending failure (may be called > > sending delay) > > caused by qdisc and net_device. When something happen, maybe users can > > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth. > > Sounds very second-order and particular to a buggy driver :/ > Let's see what Eric says but I vote revert. I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit), and got plenty of increments when using pfifo_fast qdisc. Yet seed was nominal (bottleneck is the copyout() cost at receiver) I got few counter increments when qdisc is fq as expected (because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use limit_output_bytes for paced flows") So this new SNMP counter is not a proxy for the kind of problems that a buggy driver would trigger. I also suggest we revert this patch. Thanks !
On Tue, Nov 30, 2021 at 11:56 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote: > > > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote: > > > > > Once tcp small queue check failed in tcp_small_queue_check(), the > > > > > throughput of tcp will be limited, and it's hard to distinguish > > > > > whether it is out of tcp congestion control. > > > > > > > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene. > > > > > > > > Isn't this going to trigger all the time and alarm users because of the > > > > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine > > > > for TCP to bake full TSQ amount of data and have it paced out onto the > > > > wire? What's your link speed? > > > > > > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net > > > is used with napi_tx enabled. > > > > > > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net, > > > until it is released. The point is that the sending interrupt of > > > virtio_net will be > > > turned off and the skb can't be released until the next net_rx interrupt comes. > > > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When > > > this happens, the bandwidth can decrease from 500M to 10M. > > > > > > In fact, this issue of uapi_tx is fixed in this commit: > > > https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/ > > > > > > I added this statistics to monitor the sending failure (may be called > > > sending delay) > > > caused by qdisc and net_device. When something happen, maybe users can > > > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth. > > > > Sounds very second-order and particular to a buggy driver :/ > > Let's see what Eric says but I vote revert. > > I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit), > and got plenty of increments when using pfifo_fast qdisc. > Yet seed was nominal (bottleneck is the copyout() cost at receiver) > > I got few counter increments when qdisc is fq as expected > (because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use > limit_output_bytes for paced flows") > > > So this new SNMP counter is not a proxy for the kind of problems that a buggy > driver would trigger. > > I also suggest we revert this patch. Seems this SNMP is indeed not suitable....I vote to revert too. (Seems Jakub already do the revert, thanks~) In fact, I'm a little curious that this patch is applied directly. I used to receive emails with the message 'applied'. > > Thanks !
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 904909d020e2..e32ec6932e82 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -292,6 +292,7 @@ enum LINUX_MIB_TCPDSACKIGNOREDDUBIOUS, /* TCPDSACKIgnoredDubious */ LINUX_MIB_TCPMIGRATEREQSUCCESS, /* TCPMigrateReqSuccess */ LINUX_MIB_TCPMIGRATEREQFAILURE, /* TCPMigrateReqFailure */ + LINUX_MIB_TCPSMALLQUEUEFAILURE, /* TCPSmallQueueFailure */ __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index f30273afb539..43b7a77cd6b4 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -297,6 +297,7 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPDSACKIgnoredDubious", LINUX_MIB_TCPDSACKIGNOREDDUBIOUS), SNMP_MIB_ITEM("TCPMigrateReqSuccess", LINUX_MIB_TCPMIGRATEREQSUCCESS), SNMP_MIB_ITEM("TCPMigrateReqFailure", LINUX_MIB_TCPMIGRATEREQFAILURE), + SNMP_MIB_ITEM("TCPSmallQueueFailure", LINUX_MIB_TCPSMALLQUEUEFAILURE), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2e6e5a70168e..835a556a597a 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2524,8 +2524,11 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, * test again the condition. */ smp_mb__after_atomic(); - if (refcount_read(&sk->sk_wmem_alloc) > limit) + if (refcount_read(&sk->sk_wmem_alloc) > limit) { + NET_INC_STATS(sock_net(sk), + LINUX_MIB_TCPSMALLQUEUEFAILURE); return true; + } } return false; }