Message ID | 20160811161540.9613-1-grygorii.strashko@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 11 Aug 2016 19:15:40 +0300 Grygorii Strashko <grygorii.strashko@ti.com> wrote: > Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT > where this IRQs are forced threaded: > rx-irq > |- schedule threaded rx-irq handler > ... > |- threaded rx-irq handler -> cpsw_rx_interrupt() > |- napi_schedule() > |- __raise_softirq_irqoff() > |- wakeup_proper_softirq() > ... > napi > > after: > rx-irq > |- cpsw_rx_interrupt() > |- napi_schedule() > |- irq_exit() > |- invoke_softirq() > |- wakeup_softirqd() > ... > napi > > And, as result, get benefits from the following improvements (tested > on am57xx-evm): > > 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be > seen any more. Now these warnings can be seen once iperf is started. > # iperf -c $IPERFHOST -w 128K -d -t 60 > > 2) latency reduction when cyclictest is run in parallel with network load > where net_perf.sh is: > iperf -c $IPERFHOST -w 8K -d -t 60 > iperf -c $IPERFHOST -w 16K -d -t 60 > iperf -c $IPERFHOST -w 32K -d -t 60 > iperf -c $IPERFHOST -w 64K -d -t 60 > iperf -c $IPERFHOST -w 128K -d -t 60 > > before: > T: 0 ( 1326) P:98 I:1000 C: 240000 Min: 8 Act: 13 Avg: 18 Max: 70 > T: 1 ( 1327) P:98 I:1500 C: 159981 Min: 9 Act: 15 Avg: 16 Max: 43 > after: > T: 0 ( 1331) P:98 I:1000 C: 240000 Min: 8 Act: 15 Avg: 14 Max: 51 > T: 1 ( 1332) P:98 I:1500 C: 159953 Min: 8 Act: 16 Avg: 15 Max: 33 > > 3) network performance increase > > win, K Mbits/s > before after % > 8K 354 350.3 0.0 > 16K 412 551 33.7 > 32K 423 659.5 55.9 > 64K 436 728.3 67.0 > 128K 537 845 57.4 > > This change does not affect on non-RT. This looks fine to me, but it should go into the development branch, which is currently 4.6-rt. And I can then pull it from there. -- Steve > > Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> > --- > Hi All, > > I'll be appreciated on any feedback or tested-by. > In case of positive feedback I'll resend it for upstream. > > drivers/net/ethernet/ti/cpsw.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c > index 7b59283..fa4bb81 100644 > --- a/drivers/net/ethernet/ti/cpsw.c > +++ b/drivers/net/ethernet/ti/cpsw.c > @@ -769,7 +769,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) > priv->tx_irq_disabled = true; > } > > - napi_schedule(&priv->napi_tx); > + napi_schedule_irqoff(&priv->napi_tx); > return IRQ_HANDLED; > } > > @@ -785,7 +785,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id) > priv->rx_irq_disabled = true; > } > > - napi_schedule(&priv->napi_rx); > + napi_schedule_irqoff(&priv->napi_rx); > return IRQ_HANDLED; > } > > @@ -2827,7 +2827,7 @@ static int cpsw_probe(struct platform_device *pdev) > > priv->irqs_table[0] = irq; > ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt, > - 0, dev_name(&pdev->dev), priv); > + IRQF_NO_THREAD, dev_name(&pdev->dev), priv); > if (ret < 0) { > dev_err(priv->dev, "error attaching irq (%d)\n", ret); > goto clean_ale_ret; > @@ -2842,7 +2842,7 @@ static int cpsw_probe(struct platform_device *pdev) > > priv->irqs_table[1] = irq; > ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt, > - 0, dev_name(&pdev->dev), priv); > + IRQF_NO_THREAD, dev_name(&pdev->dev), priv); > if (ret < 0) { > dev_err(priv->dev, "error attaching irq (%d)\n", ret); > goto clean_ale_ret; -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote: > Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT > where this IRQs are forced threaded: > rx-irq > |- schedule threaded rx-irq handler > ... > |- threaded rx-irq handler -> cpsw_rx_interrupt() > |- napi_schedule() > |- __raise_softirq_irqoff() > |- wakeup_proper_softirq() > ... > napi This should not be the default path. The default should be napi running in the context of the threaded rx-irq handler once the handler is done. The wakeup_proper_softirq() part is only done if napi thinks that the callback functions runs for too long. So in *that* case you continue NAPI in the softirq-thread which runs at SCHED_OTHER. > after: > rx-irq > |- cpsw_rx_interrupt() > |- napi_schedule() > |- irq_exit() > |- invoke_softirq() > |- wakeup_softirqd() > ... > napi Since you schedule the softirq from an IRQ-off region / without a process context you force the softirq to run in the thread at SCHED_OTHER priority. > And, as result, get benefits from the following improvements (tested > on am57xx-evm): > > 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be > seen any more. Now these warnings can be seen once iperf is started. > # iperf -c $IPERFHOST -w 128K -d -t 60 Do you also see "sched: RT throttling activated"? Because I don't see otherwise why this should pop up. > 2) latency reduction when cyclictest is run in parallel with network load > where net_perf.sh is: > iperf -c $IPERFHOST -w 8K -d -t 60 > iperf -c $IPERFHOST -w 16K -d -t 60 > iperf -c $IPERFHOST -w 32K -d -t 60 > iperf -c $IPERFHOST -w 64K -d -t 60 > iperf -c $IPERFHOST -w 128K -d -t 60 > > before: > T: 0 ( 1326) P:98 I:1000 C: 240000 Min: 8 Act: 13 Avg: 18 Max: 70 > T: 1 ( 1327) P:98 I:1500 C: 159981 Min: 9 Act: 15 Avg: 16 Max: 43 > after: > T: 0 ( 1331) P:98 I:1000 C: 240000 Min: 8 Act: 15 Avg: 14 Max: 51 > T: 1 ( 1332) P:98 I:1500 C: 159953 Min: 8 Act: 16 Avg: 15 Max: 33 -d 0 to have I: set to the same value. What does -i 250 say? And without network load we are where we were at "after" values? What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/ in net/core/dev.c and chrt the priority of you network interrupt handlers to SCHED_OTHER priority? > 3) network performance increase > > win, K Mbits/s > before after % > 8K 354 350.3 0.0 > 16K 412 551 33.7 > 32K 423 659.5 55.9 > 64K 436 728.3 67.0 > 128K 537 845 57.4 How close are the after numbers to !RT? Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sebastian, On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote: > On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote: >> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT >> where this IRQs are forced threaded: >> rx-irq >> |- schedule threaded rx-irq handler >> ... >> |- threaded rx-irq handler -> cpsw_rx_interrupt() >> |- napi_schedule() >> |- __raise_softirq_irqoff() >> |- wakeup_proper_softirq() >> ... >> napi > > This should not be the default path. The default should be napi running > in the context of the threaded rx-irq handler once the handler is done. > The wakeup_proper_softirq() part is only done if napi thinks that the > callback functions runs for too long. So in *that* case you continue > NAPI in the softirq-thread which runs at SCHED_OTHER. Thankds for comment. You're right: irq_thread()->irq_forced_thread_fn()->local_bh_enable() but wouldn't here two wake_up_process() calls any way, plus preempt_check_resched_rt() in napi_schedule(). > >> after: >> rx-irq >> |- cpsw_rx_interrupt() >> |- napi_schedule() >> |- irq_exit() >> |- invoke_softirq() >> |- wakeup_softirqd() >> ... >> napi > > Since you schedule the softirq from an IRQ-off region / without a > process context you force the softirq to run in the thread at > SCHED_OTHER priority. Thanks a lot for comments. > >> And, as result, get benefits from the following improvements (tested >> on am57xx-evm): >> >> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be >> seen any more. Now these warnings can be seen once iperf is started. >> # iperf -c $IPERFHOST -w 128K -d -t 60 > > Do you also see "sched: RT throttling activated"? Because I don't see > otherwise why this should pop up. I've reverted my patch an did requested experiments (some additional info below). I do not see "sched: RT throttling activated" :( root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q -D4m [1] 1301 # /dev/cpu_dma_latency set to 0us Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte) ------------------------------------------------------------ [ 3] local 172.22.39.21 port 47500 connected with 172.22.39.17 port 5001 [ 5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59955 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 1.31 GBytes 188 Mbits/sec [ 5] 0.0-60.0 sec 868 MBytes 121 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 47502 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59957 [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 1.92 GBytes 275 Mbits/sec [ 4] 0.0-60.0 sec 1.14 GBytes 163 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 47504 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59965 [ 2285.928516] NOHZ: local_softirq_pending 80 [ 2285.928549] NOHZ: local_softirq_pending 80 [ 2285.928594] NOHZ: local_softirq_pending 80 [ 2285.928689] NOHZ: local_softirq_pending 80 [ 2286.348572] NOHZ: local_softirq_pending 80 [ 2290.358623] NOHZ: local_softirq_pending 80 [ 2291.588624] NOHZ: local_softirq_pending 80 [ 2291.698579] NOHZ: local_softirq_pending 80 [ 2292.608594] NOHZ: local_softirq_pending 80 [ 2293.178627] NOHZ: local_softirq_pending 80 [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 1.43 GBytes 205 Mbits/sec [ 4] 0.0-60.0 sec 1.58 GBytes 226 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 47506 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59980 T: 0 ( 1309) P:98 I:1000 C: 240000 Min: 8 Act: 13 Avg: 18 Max: 60 T: 1 ( 1310) P:98 I:1500 C: 159968 Min: 8 Act: 14 Avg: 16 Max: 58 root@am57xx-evm:~# [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 1.84 GBytes 264 Mbits/sec [ 4] 0.0-60.0 sec 1.47 GBytes 210 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 256 KByte (WARNING: requested 128 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 128 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 47508 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59982 [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 974 MBytes 136 Mbits/sec [ 4] 0.0-60.0 sec 2.84 GBytes 406 Mbits/sec [1]+ Done ./net_perf.sh > >> 2) latency reduction when cyclictest is run in parallel with network load >> where net_perf.sh is: >> iperf -c $IPERFHOST -w 8K -d -t 60 >> iperf -c $IPERFHOST -w 16K -d -t 60 >> iperf -c $IPERFHOST -w 32K -d -t 60 >> iperf -c $IPERFHOST -w 64K -d -t 60 >> iperf -c $IPERFHOST -w 128K -d -t 60 >> >> before: >> T: 0 ( 1326) P:98 I:1000 C: 240000 Min: 8 Act: 13 Avg: 18 Max: 70 >> T: 1 ( 1327) P:98 I:1500 C: 159981 Min: 9 Act: 15 Avg: 16 Max: 43 >> after: >> T: 0 ( 1331) P:98 I:1000 C: 240000 Min: 8 Act: 15 Avg: 14 Max: 51 >> T: 1 ( 1332) P:98 I:1500 C: 159953 Min: 8 Act: 16 Avg: 15 Max: 33 > > -d 0 to have I: set to the same value. > What does -i 250 say? > And without network load we are where we were at "after" values? =============================================== before, no net load: cyclictest -m -Sp98 -q -D4m # /dev/cpu_dma_latency set to 0us T: 0 ( 1270) P:98 I:1000 C: 240000 Min: 8 Act: 10 Avg: 9 Max: 38 T: 1 ( 1271) P:98 I:1500 C: 159986 Min: 8 Act: 10 Avg: 9 Max: 23 cyclictest -m -Sp98 -q -D4m -d0 # /dev/cpu_dma_latency set to 0us T: 0 ( 1280) P:98 I:1000 C: 240000 Min: 8 Act: 10 Avg: 9 Max: 27 T: 1 ( 1281) P:98 I:1000 C: 239982 Min: 8 Act: 13 Avg: 9 Max: 26 cyclictest -m -Sp98 -q -D4m -i250 # /dev/cpu_dma_latency set to 0us T: 0 ( 1284) P:98 I:250 C: 960000 Min: 8 Act: 10 Avg: 8 Max: 24 T: 1 ( 1285) P:98 I:750 C: 319977 Min: 8 Act: 11 Avg: 9 Max: 21 cyclictest -m -Sp98 -q -D4m -i250 -d0 # /dev/cpu_dma_latency set to 0us T: 0 ( 1288) P:98 I:250 C: 960000 Min: 8 Act: 9 Avg: 8 Max: 33 T: 1 ( 1289) P:98 I:250 C: 959929 Min: 7 Act: 11 Avg: 9 Max: 26 =============================================== before, with net load: cyclictest -m -Sp98 -q -D4m T: 0 ( 1309) P:98 I:1000 C: 240000 Min: 8 Act: 13 Avg: 18 Max: 60 T: 1 ( 1310) P:98 I:1500 C: 159968 Min: 8 Act: 14 Avg: 16 Max: 58 cyclictest -m -Sp98 -q -D4m -d0 T: 0 ( 1339) P:98 I:1000 C: 240000 Min: 8 Act: 28 Avg: 18 Max: 60 T: 1 ( 1340) P:98 I:1000 C: 239969 Min: 8 Act: 25 Avg: 16 Max: 43 cyclictest -m -Sp98 -q -D4m -i250 T: 0 ( 1369) P:98 I:250 C: 960000 Min: 8 Act: 13 Avg: 17 Max: 64 T: 1 ( 1370) P:98 I:750 C: 319941 Min: 8 Act: 18 Avg: 16 Max: 56 cyclictest -m -Sp98 -q -D4m -i250 -d0 T: 0 ( 1400) P:98 I:250 C: 960000 Min: 8 Act: 25 Avg: 18 Max: 83 T: 1 ( 1401) P:98 I:250 C: 959801 Min: 7 Act: 27 Avg: 17 Max: 48 =============================================== after, no net load: cyclictest -m -Sp98 -q -D4m T: 0 ( 1287) P:98 I:1000 C: 240000 Min: 8 Act: 10 Avg: 9 Max: 30 T: 1 ( 1288) P:98 I:1500 C: 159987 Min: 8 Act: 13 Avg: 10 Max: 32 cyclictest -m -Sp98 -q -D4m -d0 T: 0 ( 1295) P:98 I:1000 C: 240000 Min: 8 Act: 8 Avg: 9 Max: 39 T: 1 ( 1296) P:98 I:1000 C: 239981 Min: 7 Act: 9 Avg: 8 Max: 29 cyclictest -m -Sp98 -q -D4m -i250 T: 0 ( 1298) P:98 I:250 C: 960000 Min: 7 Act: 8 Avg: 9 Max: 42 T: 1 ( 1299) P:98 I:750 C: 319970 Min: 8 Act: 9 Avg: 8 Max: 29 cyclictest -m -Sp98 -q -D4m -i250 -d0 T: 0 ( 1301) P:98 I:250 C: 960000 Min: 7 Act: 9 Avg: 8 Max: 22 T: 1 ( 1302) P:98 I:250 C: 959914 Min: 7 Act: 11 Avg: 8 Max: 28 =============================================== after, with net load: cyclictest -m -Sp98 -q -D4m T: 0 ( 1315) P:98 I:1000 C: 240000 Min: 8 Act: 11 Avg: 14 Max: 40 T: 1 ( 1316) P:98 I:1500 C: 159962 Min: 8 Act: 19 Avg: 15 Max: 38 cyclictest -m -Sp98 -q -D4m -d0 T: 0 ( 1365) P:98 I:1000 C: 240000 Min: 8 Act: 14 Avg: 15 Max: 51 T: 1 ( 1366) P:98 I:1000 C: 239943 Min: 8 Act: 18 Avg: 15 Max: 36 cyclictest -m -Sp98 -q -D4m -i250 T: 0 ( 1309) P:98 I:250 C: 960000 Min: 8 Act: 15 Avg: 13 Max: 42 T: 1 ( 1310) P:98 I:750 C: 319986 Min: 8 Act: 18 Avg: 15 Max: 36 cyclictest -m -Sp98 -q -D4m -i250 -d0 T: 0 ( 1358) P:98 I:250 C: 960000 Min: 8 Act: 11 Avg: 14 Max: 42 T: 1 ( 1359) P:98 I:250 C: 959743 Min: 7 Act: 18 Avg: 15 Max: 36 > > What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/ > in net/core/dev.c and chrt the priority of you network interrupt > handlers to SCHED_OTHER priority? ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER with net load: cyclictest -m -Sp98 -q -D4m -i250 -d0 T: 0 ( 1325) P:98 I:1000 C: 240000 Min: 8 Act: 22 Avg: 17 Max: 51 T: 1 ( 1326) P:98 I:1500 C: 159981 Min: 8 Act: 15 Avg: 15 Max: 39 cyclictest -m -Sp98 -q -D4m -i250 -d0 T: 0 ( 1307) P:98 I:250 C: 960000 Min: 7 Act: 13 Avg: 16 Max: 50 T: 1 ( 1308) P:98 I:250 C: 959819 Min: 8 Act: 12 Avg: 14 Max: 37 and net parformance is better: root@am57xx-evm:~# ps -A | grep 4848 82 ? 00:00:00 irq/354-4848400 83 ? 00:00:00 irq/355-4848400 root@am57xx-evm:~# chrt -o -p 0 82 root@am57xx-evm:~# chrt -o -p 0 83 ./net_perf.sh & cyclictest -m -Sp98 -q -D4m -i250 -d0 [1] 1298 # /dev/cpu_dma_latency set to 0us Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 12 16:20:42 EEST 2016 armv7l GNU/Linux ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte) ------------------------------------------------------------ [ 3] local 172.22.39.21 port 58672 connected with 172.22.39.17 port 5001 [ 5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60672 [ 78.718828] NOHZ: local_softirq_pending 80 [ 78.908849] NOHZ: local_softirq_pending 80 [ 78.908927] NOHZ: local_softirq_pending 80 [ 78.918827] NOHZ: local_softirq_pending 80 [ 79.058826] NOHZ: local_softirq_pending 80 [ 81.488918] NOHZ: local_softirq_pending 80 [ 81.489162] NOHZ: local_softirq_pending 80 [ 81.489398] NOHZ: local_softirq_pending 80 [ 81.489646] NOHZ: local_softirq_pending 80 [ 81.489895] NOHZ: local_softirq_pending 80 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 1.30 GBytes 187 Mbits/sec [ 5] 0.0-60.0 sec 637 MBytes 89.0 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 58674 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60680 [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 1.78 GBytes 255 Mbits/sec [ 4] 0.0-60.0 sec 1.47 GBytes 210 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 58676 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60693 [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 1.62 GBytes 232 Mbits/sec [ 4] 0.0-60.0 sec 2.07 GBytes 296 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 128 KByte (WARNING: requested 64.0 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 58678 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60702 T: 0 ( 1307) P:98 I:250 C: 960000 Min: 7 Act: 13 Avg: 16 Max: 50 T: 1 ( 1308) P:98 I:250 C: 959819 Min: 8 Act: 12 Avg: 14 Max: 37 root@am57xx-evm:~# [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 1.77 GBytes 254 Mbits/sec [ 4] 0.0-60.0 sec 1.77 GBytes 253 Mbits/sec ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 256 KByte (WARNING: requested 128 KByte) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.22.39.17, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 128 KByte) ------------------------------------------------------------ [ 5] local 172.22.39.21 port 58680 connected with 172.22.39.17 port 5001 [ 4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60711 [ ID] Interval Transfer Bandwidth [ 5] 0.0-60.0 sec 2.18 GBytes 312 Mbits/sec [ 4] 0.0-60.0 sec 2.53 GBytes 362 Mbits/sec > >> 3) network performance increase >> >> win, K Mbits/s >> before after % >> 8K 354 350.3 0.0 >> 16K 412 551 33.7 >> 32K 423 659.5 55.9 >> 64K 436 728.3 67.0 >> 128K 537 845 57.4 > > How close are the after numbers to !RT? http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Kernel_Performance_Guide#AM57XX_TCP_Performance Min 20% drop even with this patch :(
On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote: > On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote: >> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT >> where this IRQs are forced threaded: >> rx-irq >> |- schedule threaded rx-irq handler >> ... >> |- threaded rx-irq handler -> cpsw_rx_interrupt() >> |- napi_schedule() >> |- __raise_softirq_irqoff() >> |- wakeup_proper_softirq() >> ... >> napi > > This should not be the default path. The default should be napi running > in the context of the threaded rx-irq handler once the handler is done. > The wakeup_proper_softirq() part is only done if napi thinks that the > callback functions runs for too long. So in *that* case you continue > NAPI in the softirq-thread which runs at SCHED_OTHER. > >> after: >> rx-irq >> |- cpsw_rx_interrupt() >> |- napi_schedule() >> |- irq_exit() >> |- invoke_softirq() >> |- wakeup_softirqd() >> ... >> napi > > Since you schedule the softirq from an IRQ-off region / without a > process context you force the softirq to run in the thread at > SCHED_OTHER priority. > >> And, as result, get benefits from the following improvements (tested >> on am57xx-evm): >> >> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be >> seen any more. Now these warnings can be seen once iperf is started. >> # iperf -c $IPERFHOST -w 128K -d -t 60 > > Do you also see "sched: RT throttling activated"? Because I don't see > otherwise why this should pop up. > I've collected trace before first occurrence of "NOHZ: local_softirq_pending 80" irq/354-4848400-85 [000] 90.639460: irq_handler_entry: irq=19 name=arch_timer iperf-1284 [001] 90.639474: softirq_raise: vec=1 [action=TIMER] iperf-1284 [001] 90.639486: irq_handler_exit: irq=19 ret=handled irq/354-4848400-85 [000] 90.639488: softirq_raise: vec=7 [action=SCHED] iperf-1284 [001] 90.639490: sched_waking: comm=ksoftirqd/1 pid=21 prio=120 target_cpu=001 irq/354-4848400-85 [000] 90.639492: softirq_raise: vec=1 [action=TIMER] iperf-1284 [001] 90.639499: sched_wakeup: ksoftirqd/1:21 [120] success=1 CPU:001 iperf-1284 [001] 90.639504: sched_waking: comm=ktimersoftd/1 pid=20 prio=98 target_cpu=001 irq/354-4848400-85 [000] 90.639505: irq_handler_exit: irq=19 ret=handled iperf-1284 [001] 90.639512: sched_wakeup: ktimersoftd/1:20 [98] success=1 CPU:001 iperf-1284 [001] 90.639526: sched_stat_runtime: comm=iperf pid=1284 runtime=50752 [ns] vruntime=2105322850 [ns] iperf-1284 [001] 90.639537: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.639545: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.639556: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.639589: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.639614: sched_switch: irq/355-4848400:86 [49] S ==> ktimersoftd/1:20 [98] ktimersoftd/1-20 [001] 90.639625: softirq_entry: vec=1 [action=TIMER] ktimersoftd/1-20 [001] 90.639637: sched_waking: comm=rcu_preempt pid=8 prio=98 target_cpu=001 ktimersoftd/1-20 [001] 90.639646: sched_wakeup: rcu_preempt:8 [98] success=1 CPU:001 ktimersoftd/1-20 [001] 90.639663: softirq_exit: vec=1 [action=TIMER] ktimersoftd/1-20 [001] 90.639679: sched_switch: ktimersoftd/1:20 [98] S ==> rcu_preempt:8 [98] rcu_preempt-8 [001] 90.639722: sched_switch: rcu_preempt:8 [98] S ==> ksoftirqd/1:21 [120] ksoftirqd/1-21 [001] 90.639740: sched_stat_runtime: comm=ksoftirqd/1 pid=21 runtime=25539 [ns] vruntime=29960463828 [ns] ksoftirqd/1-21 [001] 90.639750: sched_switch: ksoftirqd/1:21 [120] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.639878: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.639880: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.639884: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.639896: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.639903: sched_stat_runtime: comm=iperf pid=1284 runtime=150466 [ns] vruntime=2105473316 [ns] iperf-1284 [001] 90.639913: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.639921: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.639932: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.639958: sched_waking: comm=irq/354-4848400 pid=85 prio=49 target_cpu=000 irq/355-4848400-86 [001] 90.639963: sched_wakeup: irq/354-4848400:85 [49] success=1 CPU:000 irq/355-4848400-86 [001] 90.639984: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640012: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.640109: sched_waking: comm=iperf pid=1284 prio=120 target_cpu=001 irq/354-4848400-85 [000] 90.640115: sched_wakeup: iperf:1284 [120] success=1 CPU:001 irq/354-4848400-85 [000] 90.640152: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.640154: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.640158: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.640170: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.640177: sched_stat_runtime: comm=iperf pid=1284 runtime=163480 [ns] vruntime=2105636796 [ns] iperf-1284 [001] 90.640187: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.640195: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640206: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640239: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640266: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.640376: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.640378: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.640382: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.640394: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.640401: sched_stat_runtime: comm=iperf pid=1284 runtime=133387 [ns] vruntime=2105770183 [ns] iperf-1284 [001] 90.640412: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.640419: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640430: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640463: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640489: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.640560: softirq_exit: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.640568: softirq_raise: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.640579: softirq_entry: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.640606: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.640608: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.640611: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.640623: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.640630: sched_stat_runtime: comm=iperf pid=1284 runtime=139080 [ns] vruntime=2105909263 [ns] iperf-1284 [001] 90.640641: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.640648: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640659: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640692: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640720: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.640806: softirq_exit: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.640820: irq_handler_entry: irq=354 name=48484000.ethernet irq/354-4848400-85 [000] 90.640823: irq_handler_exit: irq=354 ret=handled irq/354-4848400-85 [000] 90.640833: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.640835: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.640838: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.640849: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.640857: sched_stat_runtime: comm=iperf pid=1284 runtime=135501 [ns] vruntime=2106044764 [ns] irq/354-4848400-85 [000] 90.640861: softirq_raise: vec=3 [action=NET_RX] iperf-1284 [001] 90.640867: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/354-4848400-85 [000] 90.640872: irq_handler_entry: irq=354 name=48484000.ethernet irq/354-4848400-85 [000] 90.640874: irq_handler_exit: irq=354 ret=handled irq/355-4848400-86 [001] 90.640875: softirq_raise: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.640885: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640886: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640919: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.640945: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.641052: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.641054: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.641058: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.641070: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.641077: sched_stat_runtime: comm=iperf pid=1284 runtime=129483 [ns] vruntime=2106174247 [ns] iperf-1284 [001] 90.641087: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.641095: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641106: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641132: sched_waking: comm=irq/354-4848400 pid=85 prio=49 target_cpu=000 irq/355-4848400-86 [001] 90.641138: sched_wakeup: irq/354-4848400:85 [49] success=1 CPU:000 irq/355-4848400-86 [001] 90.641159: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641185: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.641312: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.641314: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.641317: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.641329: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.641336: sched_stat_runtime: comm=iperf pid=1284 runtime=149003 [ns] vruntime=2106323250 [ns] iperf-1284 [001] 90.641347: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.641355: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641366: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641398: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641425: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.641526: softirq_exit: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.641526: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.641526: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.641526: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.641526: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.641557: sched_stat_runtime: comm=iperf pid=1284 runtime=130946 [ns] vruntime=2106454196 [ns] irq/354-4848400-85 [000] 90.641565: softirq_raise: vec=3 [action=NET_RX] iperf-1284 [001] 90.641567: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.641575: softirq_raise: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.641576: irq_handler_entry: irq=354 name=48484000.ethernet irq/354-4848400-85 [000] 90.641578: irq_handler_exit: irq=354 ret=handled irq/355-4848400-86 [001] 90.641586: softirq_entry: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.641589: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641618: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641645: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.641760: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.641762: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.641765: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.641777: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.641784: sched_stat_runtime: comm=iperf pid=1284 runtime=137941 [ns] vruntime=2106592137 [ns] iperf-1284 [001] 90.641795: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.641802: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641814: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641845: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.641871: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.641916: softirq_exit: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.641924: softirq_raise: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.641935: softirq_entry: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.641964: sched_waking: comm=iperf pid=1284 prio=120 target_cpu=001 irq/354-4848400-85 [000] 90.641969: sched_wakeup: iperf:1284 [120] success=1 CPU:001 irq/354-4848400-85 [000] 90.642006: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.642008: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.642011: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.642024: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.642032: sched_stat_runtime: comm=iperf pid=1284 runtime=156486 [ns] vruntime=2106748623 [ns] iperf-1284 [001] 90.642042: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.642049: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.642060: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.642093: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.642120: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.642233: irq_handler_entry: irq=355 name=48484000.ethernet irq/354-4848400-85 [000] 90.642235: irq_handler_exit: irq=355 ret=handled irq/354-4848400-85 [000] 90.642238: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 irq/354-4848400-85 [000] 90.642250: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 iperf-1284 [001] 90.642257: sched_stat_runtime: comm=iperf pid=1284 runtime=135990 [ns] vruntime=2106884613 [ns] iperf-1284 [001] 90.642268: sched_switch: iperf:1284 [120] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [001] 90.642275: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.642286: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.642319: softirq_exit: vec=3 [action=NET_RX] irq/355-4848400-86 [001] 90.642345: sched_switch: irq/355-4848400:86 [49] S ==> iperf:1284 [120] irq/354-4848400-85 [000] 90.642393: softirq_exit: vec=3 [action=NET_RX] irq/354-4848400-85 [000] 90.642419: sched_switch: irq/354-4848400:85 [49] S ==> rcuc/0:11 [98] rcuc/0-11 [000] 90.642430: irq_handler_entry: irq=354 name=48484000.ethernet rcuc/0-11 [000] 90.642432: irq_handler_exit: irq=354 ret=handled rcuc/0-11 [000] 90.642435: sched_waking: comm=irq/354-4848400 pid=85 prio=49 target_cpu=000 rcuc/0-11 [000] 90.642442: sched_migrate_task: comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1 rcuc/0-11 [000] 90.642453: sched_wakeup: irq/354-4848400:85 [49] success=1 CPU:001 iperf-1284 [001] 90.642462: sched_stat_runtime: comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns] rcuc/0-11 [000] 90.642464: irq_handler_entry: irq=355 name=48484000.ethernet rcuc/0-11 [000] 90.642466: irq_handler_exit: irq=355 ret=handled rcuc/0-11 [000] 90.642469: sched_waking: comm=irq/355-4848400 pid=86 prio=49 target_cpu=001 iperf-1284 [001] 90.642473: sched_switch: iperf:1284 [120] R ==> irq/354-4848400:85 [49] irq/354-4848400-85 [001] 90.642481: softirq_raise: vec=3 [action=NET_RX] rcuc/0-11 [000] 90.642483: sched_wakeup: irq/355-4848400:86 [49] success=1 CPU:001 irq/354-4848400-85 [001] 90.642493: softirq_entry: vec=3 [action=NET_RX] rcuc/0-11 [000] 90.642497: sched_migrate_task: comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0 rcuc/0-11 [000] 90.642515: irq_handler_entry: irq=354 name=48484000.ethernet rcuc/0-11 [000] 90.642516: irq_handler_exit: irq=354 ret=handled rcuc/0-11 [000] 90.642533: sched_switch: rcuc/0:11 [98] R ==> irq/355-4848400:86 [49] irq/355-4848400-86 [000] 90.642541: softirq_raise: vec=3 [action=NET_RX] irq/355-4848400-86 [000] 90.642551: softirq_entry: vec=3 [action=NET_RX] irq/355-4848400-86 [000] 90.642562: sched_pi_setprio: comm=iperf pid=1284 oldprio=120 newprio=49 irq/355-4848400-86 [000] 90.642593: sched_switch: irq/355-4848400:86 [49] D ==> rcuc/0:11 [98] rcuc/0-11 [000] 90.642621: sched_switch: rcuc/0:11 [98] S ==> ktimersoftd/0:4 [98] ktimersoftd/0-4 [000] 90.642631: softirq_entry: vec=1 [action=TIMER] ktimersoftd/0-4 [000] 90.642647: softirq_exit: vec=1 [action=TIMER] ktimersoftd/0-4 [000] 90.642665: sched_switch: ktimersoftd/0:4 [98] S ==> ksoftirqd/0:3 [120] ksoftirqd/0-3 [000] 90.642692: sched_stat_runtime: comm=ksoftirqd/0 pid=3 runtime=34486 [ns] vruntime=33313887612 [ns] ksoftirqd/0-3 [000] 90.642703: sched_switch: ksoftirqd/0:3 [120] D ==> trace-cmd:1278 [120] trace-cmd-1278 [000] 90.642742: sched_waking: comm=trace-cmd pid=1279 prio=120 target_cpu=001 trace-cmd-1278 [000] 90.642755: sched_wakeup: trace-cmd:1279 [120] success=1 CPU:001 irq/354-4848400-85 [001] 90.643012: softirq_exit: vec=3 [action=NET_RX] trace-cmd-1278 [000] 90.643020: sched_stat_runtime: comm=trace-cmd pid=1278 runtime=328098 [ns] vruntime=554552351 [ns] irq/354-4848400-85 [001] 90.643021: softirq_raise: vec=3 [action=NET_RX] irq/354-4848400-85 [001] 90.643031: softirq_entry: vec=3 [action=NET_RX] trace-cmd-1278 [000] 90.643036: sched_switch: trace-cmd:1278 [120] D ==> iperf:1282 [120] iperf-1282 [000] 90.643055: sched_stat_runtime: comm=iperf pid=1282 runtime=34811 [ns] vruntime=560242796 [ns] iperf-1282 [000] 90.643068: sched_switch: iperf:1282 [120] D ==> swapper/0:0 [120] <idle>-0 [000] 90.643079: bputs: 0xc003e7d4s: 58f67c094e488 <idle>-0 [000] 90.643090: bprint: softirq_check_pending_idle: NOHZ: 80
On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote: > Hi Sebastian, Hi Grygorii, > Thankds for comment. You're right: > irq_thread()->irq_forced_thread_fn()->local_bh_enable() > > but wouldn't here two wake_up_process() calls any way, > plus preempt_check_resched_rt() in napi_schedule(). Usually you prefer BH handling in the IRQ-thread because it runs at higher priority and is not interrupted by a SCHED_OTHER process. And you can assign it a higher priority if it should be preferred over an other interrupt. However, if the processing of the interrupt is taking too much time (like that ping flood, a lot of network traffic) then we push it to the softirq thread. If you do this now unconditionally in the SCHED_OTHER softirq thread then you take away all the `good' things we had (like processing important packets at higher priority as long as nobody floods us). Plus you share this thread with everything else that runs in there. > >> And, as result, get benefits from the following improvements (tested > >> on am57xx-evm): > >> > >> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be > >> seen any more. Now these warnings can be seen once iperf is started. > >> # iperf -c $IPERFHOST -w 128K -d -t 60 > > > > Do you also see "sched: RT throttling activated"? Because I don't see > > otherwise why this should pop up. > > I've reverted my patch an did requested experiments (some additional info below). > > I do not see "sched: RT throttling activated" :( That is okay. However if aim for throughput you might want to switch away from NO_HZ (and deactivate the software watchdog wich runs at prio 99 if enabled). > root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q -D4m > [1] 1301 > # /dev/cpu_dma_latency set to 0us > Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux > … > [1]+ Done ./net_perf.sh I can't parse this. But that local_softirq_pending() warning might contribute to lower numbers. > =============================================== before, no net load: > cyclictest -m -Sp98 -q -D4m -i250 -d0 > # /dev/cpu_dma_latency set to 0us > T: 0 ( 1288) P:98 I:250 C: 960000 Min: 8 Act: 9 Avg: 8 Max: 33 > T: 1 ( 1289) P:98 I:250 C: 959929 Min: 7 Act: 11 Avg: 9 Max: 26 > =============================================== after, no net load: > cyclictest -m -Sp98 -q -D4m -i250 -d0 > T: 0 ( 1301) P:98 I:250 C: 960000 Min: 7 Act: 9 Avg: 8 Max: 22 > T: 1 ( 1302) P:98 I:250 C: 959914 Min: 7 Act: 11 Avg: 8 Max: 28 I think those two should be equal more or less since the change should have no impact on "no net load" or do I miss something? > =============================================== before, with net load: > cyclictest -m -Sp98 -q -D4m -i250 -d0 > T: 0 ( 1400) P:98 I:250 C: 960000 Min: 8 Act: 25 Avg: 18 Max: 83 > T: 1 ( 1401) P:98 I:250 C: 959801 Min: 7 Act: 27 Avg: 17 Max: 48 > > > =============================================== after, with net load: > cyclictest -m -Sp98 -q -D4m -i250 -d0 > T: 0 ( 1358) P:98 I:250 C: 960000 Min: 8 Act: 11 Avg: 14 Max: 42 > T: 1 ( 1359) P:98 I:250 C: 959743 Min: 7 Act: 18 Avg: 15 Max: 36 So the max value dropped by ~50% with your patch. Interesting. What I remember from testing is that once you had, say, one hour of hackbench running then after that, the extra network traffic didn't contribute much (if at all) to the max value. That said it is hard to believe that one extra context switch contributes about 40us to the max value on CPU0. > > What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/ > > in net/core/dev.c and chrt the priority of you network interrupt > > handlers to SCHED_OTHER priority? > > ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER > > with net load: > cyclictest -m -Sp98 -q -D4m -i250 -d0 > T: 0 ( 1325) P:98 I:1000 C: 240000 Min: 8 Act: 22 Avg: 17 Max: 51 > T: 1 ( 1326) P:98 I:1500 C: 159981 Min: 8 Act: 15 Avg: 15 Max: 39 > > cyclictest -m -Sp98 -q -D4m -i250 -d0 > T: 0 ( 1307) P:98 I:250 C: 960000 Min: 7 Act: 13 Avg: 16 Max: 50 > T: 1 ( 1308) P:98 I:250 C: 959819 Min: 8 Act: 12 Avg: 14 Max: 37 > > and net parformance is better: > root@am57xx-evm:~# ps -A | grep 4848 > 82 ? 00:00:00 irq/354-4848400 > 83 ? 00:00:00 irq/355-4848400 > root@am57xx-evm:~# chrt -o -p 0 82 > root@am57xx-evm:~# chrt -o -p 0 83 > ./net_perf.sh & cyclictest -m -Sp98 -q -D4m -i250 -d0 > [1] 1298 > # /dev/cpu_dma_latency set to 0us > Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 12 16:20:42 EEST 2016 armv7l GNU/Linux So that looks nice, doesn't it? Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/08/2016 05:28 PM, Sebastian Andrzej Siewior wrote: > On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote: >> Hi Sebastian, > Hi Grygorii, > >> Thankds for comment. You're right: >> irq_thread()->irq_forced_thread_fn()->local_bh_enable() >> >> but wouldn't here two wake_up_process() calls any way, >> plus preempt_check_resched_rt() in napi_schedule(). > > Usually you prefer BH handling in the IRQ-thread because it runs at > higher priority and is not interrupted by a SCHED_OTHER process. And you > can assign it a higher priority if it should be preferred over an other > interrupt. However, if the processing of the interrupt is taking too > much time (like that ping flood, a lot of network traffic) then we push > it to the softirq thread. If you do this now unconditionally in the > SCHED_OTHER softirq thread then you take away all the `good' things we > had (like processing important packets at higher priority as long as > nobody floods us). Plus you share this thread with everything else that > runs in there. That's i understand, but effect from this patch on network throughput is pretty amazing :) > >>>> And, as result, get benefits from the following improvements (tested >>>> on am57xx-evm): >>>> >>>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be >>>> seen any more. Now these warnings can be seen once iperf is started. >>>> # iperf -c $IPERFHOST -w 128K -d -t 60 >>> >>> Do you also see "sched: RT throttling activated"? Because I don't see >>> otherwise why this should pop up. >> >> I've reverted my patch an did requested experiments (some additional info below). >> >> I do not see "sched: RT throttling activated" :( > > That is okay. However if aim for throughput you might want to switch > away from NO_HZ (and deactivate the software watchdog wich runs at > prio 99 if enabled). > >> root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q -D4m >> [1] 1301 >> # /dev/cpu_dma_latency set to 0us >> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux >> > … >> [1]+ Done ./net_perf.sh > > I can't parse this. But that local_softirq_pending() warning might > contribute to lower numbers. > >> =============================================== before, no net load: >> cyclictest -m -Sp98 -q -D4m -i250 -d0 >> # /dev/cpu_dma_latency set to 0us >> T: 0 ( 1288) P:98 I:250 C: 960000 Min: 8 Act: 9 Avg: 8 Max: 33 >> T: 1 ( 1289) P:98 I:250 C: 959929 Min: 7 Act: 11 Avg: 9 Max: 26 > >> =============================================== after, no net load: >> cyclictest -m -Sp98 -q -D4m -i250 -d0 >> T: 0 ( 1301) P:98 I:250 C: 960000 Min: 7 Act: 9 Avg: 8 Max: 22 >> T: 1 ( 1302) P:98 I:250 C: 959914 Min: 7 Act: 11 Avg: 8 Max: 28 > > I think those two should be equal more or less since the change should > have no impact on "no net load" or do I miss something? Correct I see no differences in this case, as per above. > >> =============================================== before, with net load: >> cyclictest -m -Sp98 -q -D4m -i250 -d0 >> T: 0 ( 1400) P:98 I:250 C: 960000 Min: 8 Act: 25 Avg: 18 Max: 83 >> T: 1 ( 1401) P:98 I:250 C: 959801 Min: 7 Act: 27 Avg: 17 Max: 48 >> >> >> =============================================== after, with net load: >> cyclictest -m -Sp98 -q -D4m -i250 -d0 >> T: 0 ( 1358) P:98 I:250 C: 960000 Min: 8 Act: 11 Avg: 14 Max: 42 >> T: 1 ( 1359) P:98 I:250 C: 959743 Min: 7 Act: 18 Avg: 15 Max: 36 > > So the max value dropped by ~50% with your patch. Interesting. What I > remember from testing is that once you had, say, one hour of hackbench > running then after that, the extra network traffic didn't contribute > much (if at all) to the max value. > That said it is hard to believe that one extra context switch > contributes about 40us to the max value on CPU0. Yup. but short time testing provides very stable results. This patch is going to be tested more intensively shortly. > >>> What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/ >>> in net/core/dev.c and chrt the priority of you network interrupt >>> handlers to SCHED_OTHER priority? >> >> ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER >> >> with net load: >> cyclictest -m -Sp98 -q -D4m -i250 -d0 >> T: 0 ( 1325) P:98 I:1000 C: 240000 Min: 8 Act: 22 Avg: 17 Max: 51 >> T: 1 ( 1326) P:98 I:1500 C: 159981 Min: 8 Act: 15 Avg: 15 Max: 39 >> >> cyclictest -m -Sp98 -q -D4m -i250 -d0 >> T: 0 ( 1307) P:98 I:250 C: 960000 Min: 7 Act: 13 Avg: 16 Max: 50 >> T: 1 ( 1308) P:98 I:250 C: 959819 Min: 8 Act: 12 Avg: 14 Max: 37 >> > > So that looks nice, doesn't it? Yah, This improvement, in general. But the fact that so significant net performance drop observed out of the box (without any tunning) and on idle system - triggers a lot of questions ;( I'm worry if observed original behavior can depend on usage NAPI polling for both RX/TX in CPSW driver. CPSW request two IRQs RX and TX and both handler just do napi_schedule()[NET_RX].
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 7b59283..fa4bb81 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -769,7 +769,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) priv->tx_irq_disabled = true; } - napi_schedule(&priv->napi_tx); + napi_schedule_irqoff(&priv->napi_tx); return IRQ_HANDLED; } @@ -785,7 +785,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id) priv->rx_irq_disabled = true; } - napi_schedule(&priv->napi_rx); + napi_schedule_irqoff(&priv->napi_rx); return IRQ_HANDLED; } @@ -2827,7 +2827,7 @@ static int cpsw_probe(struct platform_device *pdev) priv->irqs_table[0] = irq; ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt, - 0, dev_name(&pdev->dev), priv); + IRQF_NO_THREAD, dev_name(&pdev->dev), priv); if (ret < 0) { dev_err(priv->dev, "error attaching irq (%d)\n", ret); goto clean_ale_ret; @@ -2842,7 +2842,7 @@ static int cpsw_probe(struct platform_device *pdev) priv->irqs_table[1] = irq; ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt, - 0, dev_name(&pdev->dev), priv); + IRQF_NO_THREAD, dev_name(&pdev->dev), priv); if (ret < 0) { dev_err(priv->dev, "error attaching irq (%d)\n", ret); goto clean_ale_ret;
Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT where this IRQs are forced threaded: rx-irq |- schedule threaded rx-irq handler ... |- threaded rx-irq handler -> cpsw_rx_interrupt() |- napi_schedule() |- __raise_softirq_irqoff() |- wakeup_proper_softirq() ... napi after: rx-irq |- cpsw_rx_interrupt() |- napi_schedule() |- irq_exit() |- invoke_softirq() |- wakeup_softirqd() ... napi And, as result, get benefits from the following improvements (tested on am57xx-evm): 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be seen any more. Now these warnings can be seen once iperf is started. # iperf -c $IPERFHOST -w 128K -d -t 60 2) latency reduction when cyclictest is run in parallel with network load where net_perf.sh is: iperf -c $IPERFHOST -w 8K -d -t 60 iperf -c $IPERFHOST -w 16K -d -t 60 iperf -c $IPERFHOST -w 32K -d -t 60 iperf -c $IPERFHOST -w 64K -d -t 60 iperf -c $IPERFHOST -w 128K -d -t 60 before: T: 0 ( 1326) P:98 I:1000 C: 240000 Min: 8 Act: 13 Avg: 18 Max: 70 T: 1 ( 1327) P:98 I:1500 C: 159981 Min: 9 Act: 15 Avg: 16 Max: 43 after: T: 0 ( 1331) P:98 I:1000 C: 240000 Min: 8 Act: 15 Avg: 14 Max: 51 T: 1 ( 1332) P:98 I:1500 C: 159953 Min: 8 Act: 16 Avg: 15 Max: 33 3) network performance increase win, K Mbits/s before after % 8K 354 350.3 0.0 16K 412 551 33.7 32K 423 659.5 55.9 64K 436 728.3 67.0 128K 537 845 57.4 This change does not affect on non-RT. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> --- Hi All, I'll be appreciated on any feedback or tested-by. In case of positive feedback I'll resend it for upstream. drivers/net/ethernet/ti/cpsw.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)