diff mbox series

[v2,net-next] net: snmp: add statistics for tcp small queue check

Message ID 20211128060102.6504-1-imagedong@tencent.com (mailing list archive)
State Accepted
Commit aeeecb889165617a841e939117f9a8095d0e7d80
Delegated to: Netdev Maintainers
Headers show
Series [v2,net-next] net: snmp: add statistics for tcp small queue check | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5088 this patch: 5088
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 889 this patch: 889
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5231 this patch: 5231
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Menglong Dong Nov. 28, 2021, 6:01 a.m. UTC
From: Menglong Dong <imagedong@tencent.com>

Once tcp small queue check failed in tcp_small_queue_check(), the
throughput of tcp will be limited, and it's hard to distinguish
whether it is out of tcp congestion control.

Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
v2:
- use NET_INC_STATS() instead of __NET_INC_STATS()
---
 include/uapi/linux/snmp.h | 1 +
 net/ipv4/proc.c           | 1 +
 net/ipv4/tcp_output.c     | 5 ++++-
 3 files changed, 6 insertions(+), 1 deletion(-)

Comments

patchwork-bot+netdevbpf@kernel.org Nov. 29, 2021, 1:10 p.m. UTC | #1
Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Sun, 28 Nov 2021 14:01:02 +0800 you wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> Once tcp small queue check failed in tcp_small_queue_check(), the
> throughput of tcp will be limited, and it's hard to distinguish
> whether it is out of tcp congestion control.
> 
> Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> 
> [...]

Here is the summary with links:
  - [v2,net-next] net: snmp: add statistics for tcp small queue check
    https://git.kernel.org/netdev/net-next/c/aeeecb889165

You are awesome, thank you!
Jakub Kicinski Nov. 29, 2021, 3:57 p.m. UTC | #2
On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> Once tcp small queue check failed in tcp_small_queue_check(), the
> throughput of tcp will be limited, and it's hard to distinguish
> whether it is out of tcp congestion control.
> 
> Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.

Isn't this going to trigger all the time and alarm users because of the
"Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
for TCP to bake full TSQ amount of data and have it paced out onto the
wire? What's your link speed?
Eric Dumazet Nov. 29, 2021, 4:37 p.m. UTC | #3
On Mon, Nov 29, 2021 at 7:57 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > Once tcp small queue check failed in tcp_small_queue_check(), the
> > throughput of tcp will be limited, and it's hard to distinguish
> > whether it is out of tcp congestion control.
> >
> > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
>
> Isn't this going to trigger all the time and alarm users because of the
> "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> for TCP to bake full TSQ amount of data and have it paced out onto the
> wire? What's your link speed?

Yes, I would be curious to have some instructions on how this new SNMP
variable can be used,
in a concrete case.

Like, how getting these SNMP values can translate to an action, giving
more throughput ?
Menglong Dong Nov. 30, 2021, 2:36 p.m. UTC | #4
On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > Once tcp small queue check failed in tcp_small_queue_check(), the
> > throughput of tcp will be limited, and it's hard to distinguish
> > whether it is out of tcp congestion control.
> >
> > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
>
> Isn't this going to trigger all the time and alarm users because of the
> "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> for TCP to bake full TSQ amount of data and have it paced out onto the
> wire? What's your link speed?

Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
is used with napi_tx enabled.

With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
until it is released. The point is that the sending interrupt of
virtio_net will be
turned off and the skb can't be released until the next net_rx interrupt comes.
So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
this happens, the bandwidth can decrease from 500M to 10M.

In fact, this issue of uapi_tx is fixed in this commit:
https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/

I added this statistics to monitor the sending failure (may be called
sending delay)
caused by qdisc and net_device. When something happen, maybe users can
raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.

Thanks!
Menglong Dong
Jakub Kicinski Nov. 30, 2021, 3:23 p.m. UTC | #5
On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:  
> > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > throughput of tcp will be limited, and it's hard to distinguish
> > > whether it is out of tcp congestion control.
> > >
> > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.  
> >
> > Isn't this going to trigger all the time and alarm users because of the
> > "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> > for TCP to bake full TSQ amount of data and have it paced out onto the
> > wire? What's your link speed?  
> 
> Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> is used with napi_tx enabled.
> 
> With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> until it is released. The point is that the sending interrupt of
> virtio_net will be
> turned off and the skb can't be released until the next net_rx interrupt comes.
> So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> this happens, the bandwidth can decrease from 500M to 10M.
> 
> In fact, this issue of uapi_tx is fixed in this commit:
> https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/
> 
> I added this statistics to monitor the sending failure (may be called
> sending delay)
> caused by qdisc and net_device. When something happen, maybe users can
> raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.

Sounds very second-order and particular to a buggy driver :/
Let's see what Eric says but I vote revert.
Eric Dumazet Nov. 30, 2021, 3:55 p.m. UTC | #6
On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > > throughput of tcp will be limited, and it's hard to distinguish
> > > > whether it is out of tcp congestion control.
> > > >
> > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> > >
> > > Isn't this going to trigger all the time and alarm users because of the
> > > "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> > > for TCP to bake full TSQ amount of data and have it paced out onto the
> > > wire? What's your link speed?
> >
> > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> > is used with napi_tx enabled.
> >
> > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> > until it is released. The point is that the sending interrupt of
> > virtio_net will be
> > turned off and the skb can't be released until the next net_rx interrupt comes.
> > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> > this happens, the bandwidth can decrease from 500M to 10M.
> >
> > In fact, this issue of uapi_tx is fixed in this commit:
> > https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/
> >
> > I added this statistics to monitor the sending failure (may be called
> > sending delay)
> > caused by qdisc and net_device. When something happen, maybe users can
> > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
>
> Sounds very second-order and particular to a buggy driver :/
> Let's see what Eric says but I vote revert.

I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit),
and got plenty of increments when using pfifo_fast qdisc.
Yet seed was nominal (bottleneck is the copyout() cost at receiver)

I got few counter increments when qdisc is fq as expected
(because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use
limit_output_bytes for paced flows")


So this new SNMP counter is not a proxy for the kind of problems that a buggy
driver would trigger.

I also suggest we revert this patch.

Thanks !
Menglong Dong Dec. 1, 2021, 2:05 a.m. UTC | #7
On Tue, Nov 30, 2021 at 11:56 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> > > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > > > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > > > throughput of tcp will be limited, and it's hard to distinguish
> > > > > whether it is out of tcp congestion control.
> > > > >
> > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> > > >
> > > > Isn't this going to trigger all the time and alarm users because of the
> > > > "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> > > > for TCP to bake full TSQ amount of data and have it paced out onto the
> > > > wire? What's your link speed?
> > >
> > > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> > > is used with napi_tx enabled.
> > >
> > > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> > > until it is released. The point is that the sending interrupt of
> > > virtio_net will be
> > > turned off and the skb can't be released until the next net_rx interrupt comes.
> > > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> > > this happens, the bandwidth can decrease from 500M to 10M.
> > >
> > > In fact, this issue of uapi_tx is fixed in this commit:
> > > https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/
> > >
> > > I added this statistics to monitor the sending failure (may be called
> > > sending delay)
> > > caused by qdisc and net_device. When something happen, maybe users can
> > > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
> >
> > Sounds very second-order and particular to a buggy driver :/
> > Let's see what Eric says but I vote revert.
>
> I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit),
> and got plenty of increments when using pfifo_fast qdisc.
> Yet seed was nominal (bottleneck is the copyout() cost at receiver)
>
> I got few counter increments when qdisc is fq as expected
> (because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use
> limit_output_bytes for paced flows")
>
>
> So this new SNMP counter is not a proxy for the kind of problems that a buggy
> driver would trigger.
>
> I also suggest we revert this patch.

Seems this SNMP is indeed not suitable....I vote to revert too.

(Seems Jakub already do the revert, thanks~)

In fact, I'm a little curious that this patch is applied directly. I
used to receive
emails with the message 'applied'.

>
> Thanks !
diff mbox series

Patch

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 904909d020e2..e32ec6932e82 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -292,6 +292,7 @@  enum
 	LINUX_MIB_TCPDSACKIGNOREDDUBIOUS,	/* TCPDSACKIgnoredDubious */
 	LINUX_MIB_TCPMIGRATEREQSUCCESS,		/* TCPMigrateReqSuccess */
 	LINUX_MIB_TCPMIGRATEREQFAILURE,		/* TCPMigrateReqFailure */
+	LINUX_MIB_TCPSMALLQUEUEFAILURE,		/* TCPSmallQueueFailure */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index f30273afb539..43b7a77cd6b4 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -297,6 +297,7 @@  static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPDSACKIgnoredDubious", LINUX_MIB_TCPDSACKIGNOREDDUBIOUS),
 	SNMP_MIB_ITEM("TCPMigrateReqSuccess", LINUX_MIB_TCPMIGRATEREQSUCCESS),
 	SNMP_MIB_ITEM("TCPMigrateReqFailure", LINUX_MIB_TCPMIGRATEREQFAILURE),
+	SNMP_MIB_ITEM("TCPSmallQueueFailure", LINUX_MIB_TCPSMALLQUEUEFAILURE),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2e6e5a70168e..835a556a597a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2524,8 +2524,11 @@  static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 		 * test again the condition.
 		 */
 		smp_mb__after_atomic();
-		if (refcount_read(&sk->sk_wmem_alloc) > limit)
+		if (refcount_read(&sk->sk_wmem_alloc) > limit) {
+			NET_INC_STATS(sock_net(sk),
+				      LINUX_MIB_TCPSMALLQUEUEFAILURE);
 			return true;
+		}
 	}
 	return false;
 }