Message ID | 1422926330.21689.138.camel@edumazet-glaptop2.roam.corp.google.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 3 February 2015 at 02:18, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2015-02-02 at 10:52 -0800, Eric Dumazet wrote: > >> It seems to break ACK clocking badly (linux stack has a somewhat buggy >> tcp_tso_should_defer(), which relies on ACK being received smoothly, as >> no timer is setup to split the TSO packet.) > > Following patch might help the TSO split defer logic. > > It would avoid setting the TSO defer 'pseudo timer' twice, if/when TCP > Small Queue logic prevented the xmit at the expiration of first 'timer'. > > This patch clears the tso_deferred variable only if we could really > send something. > > Please try it, thanks ! [..patch..] I've done a second round of tests. I've added the A-MSDU count parameter I've mentioned in my other email into the mix. net - net/master (includes stretch ack patches) net-tso - net/master + your TSO defer patch net-gro - net/master + my ath10k GRO patch net-gro-tso - net/master + duh Here's the best of amsdu count 1 and 3: ; for (i in */output.txt) { echo $i; for (j in (1 3)) { cat $i | awk 'x && /Mbits/ {y=$0}; x && y && !/Mbits/ {print y; x=0; y=""}; /set amsdu cnt to '$j'/{x=1}' | awk '{ if (x < $(NF-1)) {x=$(NF-1)} } END{print "A-MSDU limit='$j', " x " Mbits/sec"}' } } net-gro-tso/output.txt A-MSDU limit=1, 436 Mbits/sec A-MSDU limit=3, 284 Mbits/sec net-gro/output.txt A-MSDU limit=1, 444 Mbits/sec A-MSDU limit=3, 283 Mbits/sec net-tso/output.txt A-MSDU limit=1, 376 Mbits/sec A-MSDU limit=3, 251 Mbits/sec net/output.txt A-MSDU limit=1, 387 Mbits/sec A-MSDU limit=3, 260 Mbits/sec IOW: - stretch acks / TSO defer don't seem to help much (when compared to throughput results from yesterday) - GRO helps - disabling A-MSDU on sender helps - net/master+GRO still doesn't reach the performance from before the regression (~600mbps w/ GRO) You can grab logs and dumps here: http://www.filedropper.com/test2tar Micha? -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-02-03 at 12:50 +0100, Michal Kazior wrote: > On 3 February 2015 at 02:18, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Mon, 2015-02-02 at 10:52 -0800, Eric Dumazet wrote: > > > >> It seems to break ACK clocking badly (linux stack has a somewhat buggy > >> tcp_tso_should_defer(), which relies on ACK being received smoothly, as > >> no timer is setup to split the TSO packet.) > > > > Following patch might help the TSO split defer logic. > > > > It would avoid setting the TSO defer 'pseudo timer' twice, if/when TCP > > Small Queue logic prevented the xmit at the expiration of first 'timer'. > > > > This patch clears the tso_deferred variable only if we could really > > send something. > > > > Please try it, thanks ! > [..patch..] > > I've done a second round of tests. I've added the A-MSDU count > parameter I've mentioned in my other email into the mix. > > net - net/master (includes stretch ack patches) > net-tso - net/master + your TSO defer patch > net-gro - net/master + my ath10k GRO patch > net-gro-tso - net/master + duh > > Here's the best of amsdu count 1 and 3: > > ; for (i in */output.txt) { echo $i; for (j in (1 3)) { cat $i | awk > 'x && /Mbits/ {y=$0}; x && y && !/Mbits/ {print y; x=0; y=""}; /set > amsdu cnt to '$j'/{x=1}' | awk '{ if (x < $(NF-1)) {x=$(NF-1)} } > END{print "A-MSDU limit='$j', " x " Mbits/sec"}' } } > net-gro-tso/output.txt > A-MSDU limit=1, 436 Mbits/sec > A-MSDU limit=3, 284 Mbits/sec > net-gro/output.txt > A-MSDU limit=1, 444 Mbits/sec > A-MSDU limit=3, 283 Mbits/sec > net-tso/output.txt > A-MSDU limit=1, 376 Mbits/sec > A-MSDU limit=3, 251 Mbits/sec > net/output.txt > A-MSDU limit=1, 387 Mbits/sec > A-MSDU limit=3, 260 Mbits/sec > > IOW: > - stretch acks / TSO defer don't seem to help much (when compared to > throughput results from yesterday) > - GRO helps > - disabling A-MSDU on sender helps > - net/master+GRO still doesn't reach the performance from before the > regression (~600mbps w/ GRO) > > You can grab logs and dumps here: http://www.filedropper.com/test2tar > Thanks for these traces. There is absolutely a problem at the sender, as we can see a big 2ms delay between reception of ACK and send of following packets. TCP stack should generate them immediately. Are you using some kind of netem qdisc ? These 2ms delays, in a flow with a 5ms RTT are terrible. 06:54:57.408391 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294899240, win 11268, options [nop,nop,TS val 1053302 ecr 1052250], length 0 06:54:57.408418 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294910824, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 06:54:57.408431 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294936888, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 06:54:57.408453 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294962952, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 06:54:57.408474 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 0, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 <this 2ms delay is not generated by TCP stack.> 06:54:57.410243 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 82536:83984, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 06:54:57.410271 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 83984:85432, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 06:54:57.410289 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 85432:86880, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 06:54:57.410310 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 86880:88328, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 06:54:57.410326 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 88328:89776, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 06:54:57.410339 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 89776:91224, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 06:54:57.410353 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 91224:92672, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 06:54:57.410370 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 92672:94120, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 ... 06:54:57.411178 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 28960, win 11268, options [nop,nop,TS val 1053306 ecr 1052253], length 0 06:54:57.411190 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 52128, win 11268, options [nop,nop,TS val 1053306 ecr 1052254], length 0 06:54:57.411220 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 78192, win 11268, options [nop,nop,TS val 1053306 ecr 1052254], length 0 06:54:57.411243 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 82536, win 11268, options [nop,nop,TS val 1053306 ecr 1052254], length 0 < same 2ms unexplained gap here > 06:54:57.412912 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 165072:166520, ack 1, win 457, options [nop,nop,TS val 1052259 ecr 1053306], length 1448 06:54:57.412935 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 166520:167968, ack 1, win 457, options [nop,nop,TS val 1052259 ecr 1053306], length 1448 06:54:57.412948 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 167968:169416, ack 1, win 457, options [nop,nop,TS val 1052259 ecr 1053306], length 1448 ... 06:54:57.413650 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 244712:246160, ack 1, win 457, options [nop,nop,TS val 1052260 ecr 1053306], length 1448 06:54:57.413662 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 246160:247608, ack 1, win 457, options [nop,nop,TS val 1052260 ecr 1053306], length 1448 06:54:57.413712 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 112944, win 11268, options [nop,nop,TS val 1053308 ecr 1052256], length 0 06:54:57.413730 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 130320, win 11268, options [nop,nop,TS val 1053308 ecr 1052257], length 0 06:54:57.413754 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 160728, win 11268, options [nop,nop,TS val 1053309 ecr 1052257], length 0 06:54:57.413779 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 165072, win 11268, options [nop,nop,TS val 1053309 ecr 1052257], length 0 < same 2ms delay > 06:54:57.415682 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 247608:249056, ack 1, win 457, options [nop,nop,TS val 1052262 ecr 1053309], length 1448 06:54:57.415709 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 249056:250504, ack 1, win 457, options [nop,nop,TS val 1052262 ecr 1053309], length 1448 Are packets TX completed after a timer or something ? Some very heavy stuff might run from tasklet (or other softirq triggered) event. BTW, traces tend to show that you 'receive' multiple ACK in the same burst, its not clear if they are delayed at one side or the other. GRO should delay only GRO candidates. ACK packets are not GRO candidates. Have you tried to disable GSO on sender ? (Or maybe wifi drivers should start to use skb->xmit_more as a signal to end aggregation) -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-02-03 at 06:27 -0800, Eric Dumazet wrote: > Are packets TX completed after a timer or something ? > > Some very heavy stuff might run from tasklet (or other softirq triggered) event. > Right, commit 6c5151a9ffa9f796f2d707617cecb6b6b241dff8 ("ath10k: batch htt tx/rx completions") is very suspicious. Please revert it. BTW, ath10k_htt_txrx_compl_task() runs from softirq context, so the _bh() prefixes are not really needed. It seems lot of batching happens in wifi drivers, not necessarily at the right places. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3 February 2015 at 15:27, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2015-02-03 at 12:50 +0100, Michal Kazior wrote: [...] >> IOW: >> - stretch acks / TSO defer don't seem to help much (when compared to >> throughput results from yesterday) >> - GRO helps >> - disabling A-MSDU on sender helps >> - net/master+GRO still doesn't reach the performance from before the >> regression (~600mbps w/ GRO) >> >> You can grab logs and dumps here: http://www.filedropper.com/test2tar >> > > Thanks for these traces. > > There is absolutely a problem at the sender, as we can see a big 2ms > delay between reception of ACK and send of following packets. > TCP stack should generate them immediately. > Are you using some kind of netem qdisc ? Both systems have identical setup: ; tc qdisc qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc mq 0: dev wlan1 root qdisc pfifo_fast 0: dev wlan1 parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev wlan1 parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev wlan1 parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev wlan1 parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > These 2ms delays, in a flow with a 5ms RTT are terrible. > > 06:54:57.408391 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294899240, win 11268, options [nop,nop,TS val 1053302 ecr 1052250], length 0 > 06:54:57.408418 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294910824, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 > 06:54:57.408431 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294936888, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 > 06:54:57.408453 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 4294962952, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 > 06:54:57.408474 IP 192.168.1.2.5001 > 192.168.1.3.51645: Flags [.], ack 0, win 11268, options [nop,nop,TS val 1053303 ecr 1052251], length 0 > <this 2ms delay is not generated by TCP stack.> > 06:54:57.410243 IP 192.168.1.3.51645 > 192.168.1.2.5001: Flags [.], seq 82536:83984, ack 1, win 457, options [nop,nop,TS val 1052256 ecr 1053303], length 1448 [...] > > Are packets TX completed after a timer or something ? As far as ath10k is concerned - no timers here. Not sure about firmware itself though. > Some very heavy stuff might run from tasklet (or other softirq triggered) event. > > BTW, traces tend to show that you 'receive' multiple ACK in the same burst, > its not clear if they are delayed at one side or the other. > > GRO should delay only GRO candidates. ACK packets are not GRO candidates. > > Have you tried to disable GSO on sender ? I assume I do that via ethtool? This is my current setup on both systems: ; ethtool -k wlan1 Features for wlan1: rx-checksumming: off [fixed] tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on [fixed] tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: off tx-scatter-gather: off [fixed] tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: off tx-tcp-segmentation: off [fixed] tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: off [fixed] udp-fragmentation-offload: off [fixed] generic-segmentation-offload: off [requested on] generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: off [fixed] ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: off [fixed] rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: on [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: off [fixed] ; ethtool -K wlan1 generic-segmentation-offload off ethtool: bad command line argument(s) For more information run ethtool -h > (Or maybe wifi drivers should start to use skb->xmit_more as a signal to end aggregation) This could work if your firmware/device supports this kind of thing. To my understanding ath10k firmware doesn't. Micha? -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-02-04 at 12:35 +0100, Michal Kazior wrote: > > (Or maybe wifi drivers should start to use skb->xmit_more as a signal to end aggregation) > > This could work if your firmware/device supports this kind of thing. > To my understanding ath10k firmware doesn't. This is a pure software signal. You do not need firmware support. Idea is the following : Your driver gets a train of messages, coming from upper layers (TCP, IP, qdisc) It can know that a packet is not the last one, by looking at skb->xmit_more. Basically, aggregation logic could use this signal as a very clear indicator you got the end of a train -> force the xmit right now. To disable gso you would have to use : ethtool -K wlan1 gso off -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4 February 2015 at 12:57, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-02-04 at 12:35 +0100, Michal Kazior wrote: > >> > (Or maybe wifi drivers should start to use skb->xmit_more as a signal to end aggregation) >> >> This could work if your firmware/device supports this kind of thing. >> To my understanding ath10k firmware doesn't. > > This is a pure software signal. You do not need firmware support. > > Idea is the following : > > Your driver gets a train of messages, coming from upper layers (TCP, IP, > qdisc) > > It can know that a packet is not the last one, by looking at > skb->xmit_more. > > Basically, aggregation logic could use this signal as a very clear > indicator you got the end of a train -> force the xmit right now. There's no way to tell ath10k firmware: "xmit right now". The firmware does all tx aggregation logic by itself. Host driver just submits a frame and hopes it'll get out soon. It's not even a tx-ring you'd expect. Each frame has a host assigned id which firmware then uses in tx completion. > To disable gso you would have to use : > > ethtool -K wlan1 gso off Oh, thanks! This works. However I can't turn it on: ; ethtool -K wlan1 gso on Could not change any device features ..so I guess it makes no sense to re-run tests because: ; ethtool -k wlan1 | grep generic tx-checksum-ip-generic: on [fixed] generic-segmentation-offload: off [requested on] generic-receive-offload: on And this seems to never change. Micha? -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-02-04 at 13:22 +0100, Michal Kazior wrote: > On 4 February 2015 at 12:57, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > To disable gso you would have to use : > > > > ethtool -K wlan1 gso off > > Oh, thanks! This works. However I can't turn it on: > > ; ethtool -K wlan1 gso on > Could not change any device features > > ..so I guess it makes no sense to re-run tests because: > > ; ethtool -k wlan1 | grep generic > tx-checksum-ip-generic: on [fixed] > generic-segmentation-offload: off [requested on] > generic-receive-offload: on > > And this seems to never change. GSO requires SG (Scatter Gather) Are you sure this hardware has no SG support ? -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4 February 2015 at 13:38, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-02-04 at 13:22 +0100, Michal Kazior wrote: >> On 4 February 2015 at 12:57, Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> >> > To disable gso you would have to use : >> > >> > ethtool -K wlan1 gso off >> >> Oh, thanks! This works. However I can't turn it on: >> >> ; ethtool -K wlan1 gso on >> Could not change any device features >> >> ..so I guess it makes no sense to re-run tests because: >> >> ; ethtool -k wlan1 | grep generic >> tx-checksum-ip-generic: on [fixed] >> generic-segmentation-offload: off [requested on] >> generic-receive-offload: on >> >> And this seems to never change. > > GSO requires SG (Scatter Gather) > > Are you sure this hardware has no SG support ? The hardware itself seems to be capable. The firmware is a problem though. I'm also not sure if mac80211 can handle this as is. No 802.11 driver seems to support SG except wil6210 which uses cfg80211 and netdevs directly. Micha? -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-02-04 at 13:53 +0100, Michal Kazior wrote: > The hardware itself seems to be capable. The firmware is a problem > though. I'm also not sure if mac80211 can handle this as is. No 802.11 > driver seems to support SG except wil6210 which uses cfg80211 and > netdevs directly. mac80211 cannot deal with this right now. This would make a good topic for the workshop since there's interest elsewhere in this as well. It's probably not terribly hard to do as far as mac80211 is concerned. How much offload do you really have though? Sometimes people just want to build A-MSDUs. johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OK guys Using a mlx4 testbed I can reproduce the problem by pushing coalescing settings and disabling SG (thus disabling GSO) ethtool -K eth0 sg off Actual changes: scatter-gather: off tx-scatter-gather: off generic-segmentation-offload: off [requested on] ethtool -C eth0 tx-usecs 1024 tx-frames 64 Meaning that NIC waits one ms before sending the TX IRQ, and can accumulate 64 frames before forcing the interrupt. We probably have a bug in cwnd expansion logic : lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=230 rttvar=30 snd_ssthresh=41 cwnd=59 reordering=3 total_retrans=1 ca_state=0 pacing_rate=5943.1 Mbits Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 530.39 0.40 0.32 2.965 2.398 -> final cwnd=59 which is not enough to avoid the 1ms delay between each burst. So sender sends ~60 packets, then has to wait 1ms (to get NIC TX IRQ) before sending the following burst. I am CCing Neal, he probably can help to root cause the problem. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-02-04 at 05:16 -0800, Eric Dumazet wrote: > OK guys > > Using a mlx4 testbed I can reproduce the problem by pushing coalescing > settings and disabling SG (thus disabling GSO) > > ethtool -K eth0 sg off > Actual changes: > scatter-gather: off > tx-scatter-gather: off > generic-segmentation-offload: off [requested on] > > ethtool -C eth0 tx-usecs 1024 tx-frames 64 > > Meaning that NIC waits one ms before sending the TX IRQ, > and can accumulate 64 frames before forcing the interrupt. > > We probably have a bug in cwnd expansion logic : > > lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET > rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=230 rttvar=30 snd_ssthresh=41 cwnd=59 reordering=3 total_retrans=1 ca_state=0 pacing_rate=5943.1 Mbits > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB > > 87380 16384 16384 10.00 530.39 0.40 0.32 2.965 2.398 > > > -> final cwnd=59 which is not enough to avoid the 1ms delay between each > burst. > > So sender sends ~60 packets, then has to wait 1ms (to get NIC TX IRQ) > before sending the following burst. > > I am CCing Neal, he probably can help to root cause the problem. Arg, this was with net-next, ie not including our recent stretch ack fixes. Using David Miller 'net' tree, cwnd seems OK. Speed is low because of 64 queued frames are exceeding tcp_limit_output_bytes lpaa23:~# cat /proc/sys/net/ipv4/tcp_limit_output_bytes 131072 lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=166 rttvar=16 snd_ssthresh=26 cwnd=59 reordering=3 total_retrans=0 ca_state=0 pacing_rate=8203.52 Mbits Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 569.96 0.52 0.38 3.588 2.625 lpaa23:~# echo 262144 >/proc/sys/net/ipv4/tcp_limit_output_bytes lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=98 rttvar=18 snd_ssthresh=312 cwnd=313 reordering=3 total_retrans=23 ca_state=0 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 8518.40 2.60 1.57 1.200 0.727 -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 65caf8b95e17..e735f38557db 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1821,7 +1821,6 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb, return true; send_now: - tp->tso_deferred = 0; return false; }