diff mbox

[4.4-RT,RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD

Message ID 20160811161540.9613-1-grygorii.strashko@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Grygorii Strashko Aug. 11, 2016, 4:15 p.m. UTC
Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
where this IRQs are forced threaded:
 rx-irq
  |- schedule threaded rx-irq handler
...
  |- threaded rx-irq handler -> cpsw_rx_interrupt()
     |- napi_schedule()
	|- __raise_softirq_irqoff()
	   |- wakeup_proper_softirq()
...
  napi

after:
 rx-irq
  |- cpsw_rx_interrupt()
     |- napi_schedule()
  |- irq_exit()
     |- invoke_softirq()
	   |- wakeup_softirqd()
...
  napi

And, as result, get benefits from the following improvements (tested
on am57xx-evm):

1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
   seen any more. Now these warnings can be seen once iperf is started.
   # iperf -c $IPERFHOST -w 128K  -d -t 60

2) latency reduction when cyclictest is run in parallel with network load
 where net_perf.sh is:
   iperf -c $IPERFHOST -w 8K    -d -t 60
   iperf -c $IPERFHOST -w 16K   -d -t 60
   iperf -c $IPERFHOST -w 32K   -d -t 60
   iperf -c $IPERFHOST -w 64K   -d -t 60
   iperf -c $IPERFHOST -w 128K  -d -t 60

before:
T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
after:
T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33

3) network performance increase

win, K	Mbits/s
	before	after	%
8K	354	350.3	0.0
16K	412	551	33.7
32K	423	659.5	55.9
64K	436	728.3	67.0
128K	537	845	57.4

This change does not affect on non-RT.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
Hi All,

I'll be appreciated on any feedback or tested-by.
In case of positive feedback I'll resend it for upstream.

 drivers/net/ethernet/ti/cpsw.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Steven Rostedt Aug. 11, 2016, 4:36 p.m. UTC | #1
On Thu, 11 Aug 2016 19:15:40 +0300
Grygorii Strashko <grygorii.strashko@ti.com> wrote:

> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
> where this IRQs are forced threaded:
>  rx-irq
>   |- schedule threaded rx-irq handler
> ...
>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>      |- napi_schedule()
> 	|- __raise_softirq_irqoff()
> 	   |- wakeup_proper_softirq()
> ...
>   napi
> 
> after:
>  rx-irq
>   |- cpsw_rx_interrupt()
>      |- napi_schedule()
>   |- irq_exit()
>      |- invoke_softirq()
> 	   |- wakeup_softirqd()
> ...
>   napi
> 
> And, as result, get benefits from the following improvements (tested
> on am57xx-evm):
> 
> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>    seen any more. Now these warnings can be seen once iperf is started.
>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> 2) latency reduction when cyclictest is run in parallel with network load
>  where net_perf.sh is:
>    iperf -c $IPERFHOST -w 8K    -d -t 60
>    iperf -c $IPERFHOST -w 16K   -d -t 60
>    iperf -c $IPERFHOST -w 32K   -d -t 60
>    iperf -c $IPERFHOST -w 64K   -d -t 60
>    iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> before:
> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
> after:
> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33
> 
> 3) network performance increase
> 
> win, K	Mbits/s
> 	before	after	%
> 8K	354	350.3	0.0
> 16K	412	551	33.7
> 32K	423	659.5	55.9
> 64K	436	728.3	67.0
> 128K	537	845	57.4
> 
> This change does not affect on non-RT.

This looks fine to me, but it should go into the development branch,
which is currently 4.6-rt. And I can then pull it from there.

-- Steve


> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
> Hi All,
> 
> I'll be appreciated on any feedback or tested-by.
> In case of positive feedback I'll resend it for upstream.
> 
>  drivers/net/ethernet/ti/cpsw.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 7b59283..fa4bb81 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -769,7 +769,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
>  		priv->tx_irq_disabled = true;
>  	}
>  
> -	napi_schedule(&priv->napi_tx);
> +	napi_schedule_irqoff(&priv->napi_tx);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -785,7 +785,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
>  		priv->rx_irq_disabled = true;
>  	}
>  
> -	napi_schedule(&priv->napi_rx);
> +	napi_schedule_irqoff(&priv->napi_rx);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -2827,7 +2827,7 @@ static int cpsw_probe(struct platform_device *pdev)
>  
>  	priv->irqs_table[0] = irq;
>  	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
> -			       0, dev_name(&pdev->dev), priv);
> +			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
>  	if (ret < 0) {
>  		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
>  		goto clean_ale_ret;
> @@ -2842,7 +2842,7 @@ static int cpsw_probe(struct platform_device *pdev)
>  
>  	priv->irqs_table[1] = irq;
>  	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
> -			       0, dev_name(&pdev->dev), priv);
> +			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
>  	if (ret < 0) {
>  		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
>  		goto clean_ale_ret;

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Aug. 12, 2016, 8:32 a.m. UTC | #2
On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
> where this IRQs are forced threaded:
>  rx-irq
>   |- schedule threaded rx-irq handler
> ...
>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>      |- napi_schedule()
> 	|- __raise_softirq_irqoff()
> 	   |- wakeup_proper_softirq()
> ...
>   napi

This should not be the default path. The default should be napi running
in the context of the threaded rx-irq handler once the handler is done.
The wakeup_proper_softirq() part is only done if napi thinks that the
callback functions runs for too long. So in *that* case you continue
NAPI in the softirq-thread which runs at SCHED_OTHER.

> after:
>  rx-irq
>   |- cpsw_rx_interrupt()
>      |- napi_schedule()
>   |- irq_exit()
>      |- invoke_softirq()
> 	   |- wakeup_softirqd()
> ...
>   napi

Since you schedule the softirq from an IRQ-off region / without a
process context you force the softirq to run in the thread at
SCHED_OTHER priority.

> And, as result, get benefits from the following improvements (tested
> on am57xx-evm):
> 
> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>    seen any more. Now these warnings can be seen once iperf is started.
>    # iperf -c $IPERFHOST -w 128K  -d -t 60

Do you also see "sched: RT throttling activated"? Because I don't see
otherwise why this should pop up.

> 2) latency reduction when cyclictest is run in parallel with network load
>  where net_perf.sh is:
>    iperf -c $IPERFHOST -w 8K    -d -t 60
>    iperf -c $IPERFHOST -w 16K   -d -t 60
>    iperf -c $IPERFHOST -w 32K   -d -t 60
>    iperf -c $IPERFHOST -w 64K   -d -t 60
>    iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> before:
> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
> after:
> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33

-d 0 to have I: set to the same value.
What does -i 250 say?
And without network load we are where we were at "after" values?

What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
in net/core/dev.c and chrt the priority of you network interrupt
handlers to SCHED_OTHER priority?

> 3) network performance increase
> 
> win, K	Mbits/s
> 	before	after	%
> 8K	354	350.3	0.0
> 16K	412	551	33.7
> 32K	423	659.5	55.9
> 64K	436	728.3	67.0
> 128K	537	845	57.4

How close are the after numbers to !RT?

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grygorii Strashko Aug. 12, 2016, 3:58 p.m. UTC | #3
Hi Sebastian,

On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote:
> On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
>> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
>> where this IRQs are forced threaded:
>>  rx-irq
>>   |- schedule threaded rx-irq handler
>> ...
>>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>>      |- napi_schedule()
>> 	|- __raise_softirq_irqoff()
>> 	   |- wakeup_proper_softirq()
>> ...
>>   napi
> 
> This should not be the default path. The default should be napi running
> in the context of the threaded rx-irq handler once the handler is done.
> The wakeup_proper_softirq() part is only done if napi thinks that the
> callback functions runs for too long. So in *that* case you continue
> NAPI in the softirq-thread which runs at SCHED_OTHER.

Thankds for comment. You're right:
irq_thread()->irq_forced_thread_fn()->local_bh_enable()

but wouldn't here two wake_up_process() calls any way,
plus preempt_check_resched_rt() in napi_schedule().

> 
>> after:
>>  rx-irq
>>   |- cpsw_rx_interrupt()
>>      |- napi_schedule()
>>   |- irq_exit()
>>      |- invoke_softirq()
>> 	   |- wakeup_softirqd()
>> ...
>>   napi
> 
> Since you schedule the softirq from an IRQ-off region / without a
> process context you force the softirq to run in the thread at
> SCHED_OTHER priority.

Thanks a lot for comments.
> 
>> And, as result, get benefits from the following improvements (tested
>> on am57xx-evm):
>>
>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>    seen any more. Now these warnings can be seen once iperf is started.
>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> Do you also see "sched: RT throttling activated"? Because I don't see
> otherwise why this should pop up.

I've reverted my patch an did requested experiments (some additional info below).

I do not see "sched: RT throttling activated" :(

root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
[1] 1301
# /dev/cpu_dma_latency set to 0us
Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux


------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
[  3] local 172.22.39.21 port 47500 connected with 172.22.39.17 port 5001
[  5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59955
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec  1.31 GBytes   188 Mbits/sec
[  5]  0.0-60.0 sec   868 MBytes   121 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47502 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59957
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.92 GBytes   275 Mbits/sec
[  4]  0.0-60.0 sec  1.14 GBytes   163 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47504 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59965
[ 2285.928516] NOHZ: local_softirq_pending 80
[ 2285.928549] NOHZ: local_softirq_pending 80
[ 2285.928594] NOHZ: local_softirq_pending 80
[ 2285.928689] NOHZ: local_softirq_pending 80
[ 2286.348572] NOHZ: local_softirq_pending 80
[ 2290.358623] NOHZ: local_softirq_pending 80
[ 2291.588624] NOHZ: local_softirq_pending 80
[ 2291.698579] NOHZ: local_softirq_pending 80
[ 2292.608594] NOHZ: local_softirq_pending 80
[ 2293.178627] NOHZ: local_softirq_pending 80
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.43 GBytes   205 Mbits/sec
[  4]  0.0-60.0 sec  1.58 GBytes   226 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47506 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59980
T: 0 ( 1309) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      60
T: 1 ( 1310) P:98 I:1500 C: 159968 Min:      8 Act:   14 Avg:   16 Max:      58
root@am57xx-evm:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.84 GBytes   264 Mbits/sec
[  4]  0.0-60.0 sec  1.47 GBytes   210 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47508 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59982
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec   974 MBytes   136 Mbits/sec
[  4]  0.0-60.0 sec  2.84 GBytes   406 Mbits/sec

[1]+  Done                    ./net_perf.sh



> 
>> 2) latency reduction when cyclictest is run in parallel with network load
>>  where net_perf.sh is:
>>    iperf -c $IPERFHOST -w 8K    -d -t 60
>>    iperf -c $IPERFHOST -w 16K   -d -t 60
>>    iperf -c $IPERFHOST -w 32K   -d -t 60
>>    iperf -c $IPERFHOST -w 64K   -d -t 60
>>    iperf -c $IPERFHOST -w 128K  -d -t 60
>>
>> before:
>> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
>> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
>> after:
>> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
>> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33
> 
> -d 0 to have I: set to the same value.
> What does -i 250 say?
> And without network load we are where we were at "after" values?

=============================================== before, no net load:
cyclictest -m -Sp98 -q  -D4m
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1270) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      38
T: 1 ( 1271) P:98 I:1500 C: 159986 Min:      8 Act:   10 Avg:    9 Max:      23

cyclictest -m -Sp98 -q  -D4m -d0
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1280) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      27
T: 1 ( 1281) P:98 I:1000 C: 239982 Min:      8 Act:   13 Avg:    9 Max:      26

cyclictest -m -Sp98 -q  -D4m -i250
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1284) P:98 I:250 C: 960000 Min:      8 Act:   10 Avg:    8 Max:      24
T: 1 ( 1285) P:98 I:750 C: 319977 Min:      8 Act:   11 Avg:    9 Max:      21

cyclictest -m -Sp98 -q  -D4m -i250 -d0
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26

=============================================== before, with net load:
cyclictest -m -Sp98 -q  -D4m
T: 0 ( 1309) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      60
T: 1 ( 1310) P:98 I:1500 C: 159968 Min:      8 Act:   14 Avg:   16 Max:      58

cyclictest -m -Sp98 -q -D4m -d0
T: 0 ( 1339) P:98 I:1000 C: 240000 Min:      8 Act:   28 Avg:   18 Max:      60
T: 1 ( 1340) P:98 I:1000 C: 239969 Min:      8 Act:   25 Avg:   16 Max:      43

cyclictest -m -Sp98 -q -D4m -i250
T: 0 ( 1369) P:98 I:250 C: 960000 Min:      8 Act:   13 Avg:   17 Max:      64
T: 1 ( 1370) P:98 I:750 C: 319941 Min:      8 Act:   18 Avg:   16 Max:      56

cyclictest -m -Sp98 -q -D4m -i250 -d0
T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48

=============================================== after, no net load:

cyclictest -m -Sp98 -q  -D4m 
T: 0 ( 1287) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      30
T: 1 ( 1288) P:98 I:1500 C: 159987 Min:      8 Act:   13 Avg:   10 Max:      32

cyclictest -m -Sp98 -q -D4m -d0  
T: 0 ( 1295) P:98 I:1000 C: 240000 Min:      8 Act:    8 Avg:    9 Max:      39
T: 1 ( 1296) P:98 I:1000 C: 239981 Min:      7 Act:    9 Avg:    8 Max:      29

cyclictest -m -Sp98 -q  -D4m -i250
T: 0 ( 1298) P:98 I:250 C: 960000 Min:      7 Act:    8 Avg:    9 Max:      42
T: 1 ( 1299) P:98 I:750 C: 319970 Min:      8 Act:    9 Avg:    8 Max:      29

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28

=============================================== after, with net load:
cyclictest -m -Sp98 -q  -D4m 
T: 0 ( 1315) P:98 I:1000 C: 240000 Min:      8 Act:   11 Avg:   14 Max:      40
T: 1 ( 1316) P:98 I:1500 C: 159962 Min:      8 Act:   19 Avg:   15 Max:      38


cyclictest -m -Sp98 -q -D4m -d0  
T: 0 ( 1365) P:98 I:1000 C: 240000 Min:      8 Act:   14 Avg:   15 Max:      51
T: 1 ( 1366) P:98 I:1000 C: 239943 Min:      8 Act:   18 Avg:   15 Max:      36

cyclictest -m -Sp98 -q  -D4m -i250
T: 0 ( 1309) P:98 I:250 C: 960000 Min:      8 Act:   15 Avg:   13 Max:      42
T: 1 ( 1310) P:98 I:750 C: 319986 Min:      8 Act:   18 Avg:   15 Max:      36

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36

> 
> What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
> in net/core/dev.c and chrt the priority of you network interrupt
> handlers to SCHED_OTHER priority?

===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER

with net load:
cyclictest -m -Sp98 -q -D4m -i250 -d0
T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37

and net parformance is better:
root@am57xx-evm:~# ps -A | grep 4848
   82 ?        00:00:00 irq/354-4848400
   83 ?        00:00:00 irq/355-4848400
root@am57xx-evm:~# chrt -o -p 0 82
root@am57xx-evm:~# chrt -o -p 0 83
./net_perf.sh & cyclictest -m -Sp98 -q  -D4m -i250 -d0
[1] 1298
# /dev/cpu_dma_latency set to 0us
Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 12 16:20:42 EEST 2016 armv7l GNU/Linux


------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
[  3] local 172.22.39.21 port 58672 connected with 172.22.39.17 port 5001
[  5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60672
[   78.718828] NOHZ: local_softirq_pending 80
[   78.908849] NOHZ: local_softirq_pending 80
[   78.908927] NOHZ: local_softirq_pending 80
[   78.918827] NOHZ: local_softirq_pending 80
[   79.058826] NOHZ: local_softirq_pending 80
[   81.488918] NOHZ: local_softirq_pending 80
[   81.489162] NOHZ: local_softirq_pending 80
[   81.489398] NOHZ: local_softirq_pending 80
[   81.489646] NOHZ: local_softirq_pending 80
[   81.489895] NOHZ: local_softirq_pending 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec  1.30 GBytes   187 Mbits/sec
[  5]  0.0-60.0 sec   637 MBytes  89.0 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58674 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60680
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.78 GBytes   255 Mbits/sec
[  4]  0.0-60.0 sec  1.47 GBytes   210 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58676 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60693
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.62 GBytes   232 Mbits/sec
[  4]  0.0-60.0 sec  2.07 GBytes   296 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58678 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60702
T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
root@am57xx-evm:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.77 GBytes   254 Mbits/sec
[  4]  0.0-60.0 sec  1.77 GBytes   253 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58680 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60711
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  2.18 GBytes   312 Mbits/sec
[  4]  0.0-60.0 sec  2.53 GBytes   362 Mbits/sec


> 
>> 3) network performance increase
>>
>> win, K	Mbits/s
>> 	before	after	%
>> 8K	354	350.3	0.0
>> 16K	412	551	33.7 
>> 32K	423	659.5	55.9 
>> 64K	436	728.3	67.0 
>> 128K	537	845	57.4 
> 
> How close are the after numbers to !RT?

http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Kernel_Performance_Guide#AM57XX_TCP_Performance

Min 20% drop even with this patch :(
Grygorii Strashko Aug. 19, 2016, 2:29 p.m. UTC | #4
On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote:
> On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
>> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
>> where this IRQs are forced threaded:
>>  rx-irq
>>   |- schedule threaded rx-irq handler
>> ...
>>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>>      |- napi_schedule()
>> 	|- __raise_softirq_irqoff()
>> 	   |- wakeup_proper_softirq()
>> ...
>>   napi
> 
> This should not be the default path. The default should be napi running
> in the context of the threaded rx-irq handler once the handler is done.
> The wakeup_proper_softirq() part is only done if napi thinks that the
> callback functions runs for too long. So in *that* case you continue
> NAPI in the softirq-thread which runs at SCHED_OTHER.
> 
>> after:
>>  rx-irq
>>   |- cpsw_rx_interrupt()
>>      |- napi_schedule()
>>   |- irq_exit()
>>      |- invoke_softirq()
>> 	   |- wakeup_softirqd()
>> ...
>>   napi
> 
> Since you schedule the softirq from an IRQ-off region / without a
> process context you force the softirq to run in the thread at
> SCHED_OTHER priority.
> 
>> And, as result, get benefits from the following improvements (tested
>> on am57xx-evm):
>>
>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>    seen any more. Now these warnings can be seen once iperf is started.
>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> Do you also see "sched: RT throttling activated"? Because I don't see
> otherwise why this should pop up.
> 

I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"

 irq/354-4848400-85    [000]    90.639460: irq_handler_entry:    irq=19 name=arch_timer
           iperf-1284  [001]    90.639474: softirq_raise:        vec=1 [action=TIMER]
           iperf-1284  [001]    90.639486: irq_handler_exit:     irq=19 ret=handled
 irq/354-4848400-85    [000]    90.639488: softirq_raise:        vec=7 [action=SCHED]
           iperf-1284  [001]    90.639490: sched_waking:         comm=ksoftirqd/1 pid=21 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.639492: softirq_raise:        vec=1 [action=TIMER]
           iperf-1284  [001]    90.639499: sched_wakeup:         ksoftirqd/1:21 [120] success=1 CPU:001
           iperf-1284  [001]    90.639504: sched_waking:         comm=ktimersoftd/1 pid=20 prio=98 target_cpu=001
 irq/354-4848400-85    [000]    90.639505: irq_handler_exit:     irq=19 ret=handled
           iperf-1284  [001]    90.639512: sched_wakeup:         ktimersoftd/1:20 [98] success=1 CPU:001
           iperf-1284  [001]    90.639526: sched_stat_runtime:   comm=iperf pid=1284 runtime=50752 [ns] vruntime=2105322850 [ns]
           iperf-1284  [001]    90.639537: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.639545: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639556: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639589: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639614: sched_switch:         irq/355-4848400:86 [49] S ==> ktimersoftd/1:20 [98]
   ktimersoftd/1-20    [001]    90.639625: softirq_entry:        vec=1 [action=TIMER]
   ktimersoftd/1-20    [001]    90.639637: sched_waking:         comm=rcu_preempt pid=8 prio=98 target_cpu=001
   ktimersoftd/1-20    [001]    90.639646: sched_wakeup:         rcu_preempt:8 [98] success=1 CPU:001
   ktimersoftd/1-20    [001]    90.639663: softirq_exit:         vec=1 [action=TIMER]
   ktimersoftd/1-20    [001]    90.639679: sched_switch:         ktimersoftd/1:20 [98] S ==> rcu_preempt:8 [98]
     rcu_preempt-8     [001]    90.639722: sched_switch:         rcu_preempt:8 [98] S ==> ksoftirqd/1:21 [120]
     ksoftirqd/1-21    [001]    90.639740: sched_stat_runtime:   comm=ksoftirqd/1 pid=21 runtime=25539 [ns] vruntime=29960463828 [ns]
     ksoftirqd/1-21    [001]    90.639750: sched_switch:         ksoftirqd/1:21 [120] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.639878: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.639880: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.639884: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.639896: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.639903: sched_stat_runtime:   comm=iperf pid=1284 runtime=150466 [ns] vruntime=2105473316 [ns]
           iperf-1284  [001]    90.639913: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.639921: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639932: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639958: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
 irq/355-4848400-86    [001]    90.639963: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:000
 irq/355-4848400-86    [001]    90.639984: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640012: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640109: sched_waking:         comm=iperf pid=1284 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.640115: sched_wakeup:         iperf:1284 [120] success=1 CPU:001
 irq/354-4848400-85    [000]    90.640152: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640154: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640158: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640170: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640177: sched_stat_runtime:   comm=iperf pid=1284 runtime=163480 [ns] vruntime=2105636796 [ns]
           iperf-1284  [001]    90.640187: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640195: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640206: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640239: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640266: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640376: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640378: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640382: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640394: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640401: sched_stat_runtime:   comm=iperf pid=1284 runtime=133387 [ns] vruntime=2105770183 [ns]
           iperf-1284  [001]    90.640412: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640419: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640430: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640463: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640489: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640560: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640568: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640579: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640606: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640608: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640611: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640623: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640630: sched_stat_runtime:   comm=iperf pid=1284 runtime=139080 [ns] vruntime=2105909263 [ns]
           iperf-1284  [001]    90.640641: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640648: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640659: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640692: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640720: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640806: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640820: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640823: irq_handler_exit:     irq=354 ret=handled
 irq/354-4848400-85    [000]    90.640833: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640835: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640838: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640849: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640857: sched_stat_runtime:   comm=iperf pid=1284 runtime=135501 [ns] vruntime=2106044764 [ns]
 irq/354-4848400-85    [000]    90.640861: softirq_raise:        vec=3 [action=NET_RX]
           iperf-1284  [001]    90.640867: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/354-4848400-85    [000]    90.640872: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640874: irq_handler_exit:     irq=354 ret=handled
 irq/355-4848400-86    [001]    90.640875: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640885: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640886: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640919: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640945: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641052: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641054: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641058: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641070: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641077: sched_stat_runtime:   comm=iperf pid=1284 runtime=129483 [ns] vruntime=2106174247 [ns]
           iperf-1284  [001]    90.641087: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641095: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641106: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641132: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
 irq/355-4848400-86    [001]    90.641138: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:000
 irq/355-4848400-86    [001]    90.641159: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641185: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641312: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641314: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641317: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641329: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641336: sched_stat_runtime:   comm=iperf pid=1284 runtime=149003 [ns] vruntime=2106323250 [ns]
           iperf-1284  [001]    90.641347: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641355: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641366: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641398: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641425: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641526: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641526: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641526: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641526: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641526: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641557: sched_stat_runtime:   comm=iperf pid=1284 runtime=130946 [ns] vruntime=2106454196 [ns]
 irq/354-4848400-85    [000]    90.641565: softirq_raise:        vec=3 [action=NET_RX]
           iperf-1284  [001]    90.641567: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641575: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641576: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641578: irq_handler_exit:     irq=354 ret=handled
 irq/355-4848400-86    [001]    90.641586: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641589: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641618: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641645: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641760: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641762: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641765: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641777: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641784: sched_stat_runtime:   comm=iperf pid=1284 runtime=137941 [ns] vruntime=2106592137 [ns]
           iperf-1284  [001]    90.641795: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641802: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641814: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641845: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641871: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641916: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641924: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641935: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641964: sched_waking:         comm=iperf pid=1284 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.641969: sched_wakeup:         iperf:1284 [120] success=1 CPU:001
 irq/354-4848400-85    [000]    90.642006: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.642008: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.642011: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.642024: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.642032: sched_stat_runtime:   comm=iperf pid=1284 runtime=156486 [ns] vruntime=2106748623 [ns]
           iperf-1284  [001]    90.642042: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.642049: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642060: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642093: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642120: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.642233: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.642235: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.642238: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.642250: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.642257: sched_stat_runtime:   comm=iperf pid=1284 runtime=135990 [ns] vruntime=2106884613 [ns]
           iperf-1284  [001]    90.642268: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.642275: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642286: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642319: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642345: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
          rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
          rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
          rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
          rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
          rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
           iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
          rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
          rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
          rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
           iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
 irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
          rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
 irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
          rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
          rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
          rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled
          rcuc/0-11    [000]    90.642533: sched_switch:         rcuc/0:11 [98] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [000]    90.642541: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [000]    90.642551: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [000]    90.642562: sched_pi_setprio:     comm=iperf pid=1284 oldprio=120 newprio=49
 irq/355-4848400-86    [000]    90.642593: sched_switch:         irq/355-4848400:86 [49] D ==> rcuc/0:11 [98]
          rcuc/0-11    [000]    90.642621: sched_switch:         rcuc/0:11 [98] S ==> ktimersoftd/0:4 [98]
   ktimersoftd/0-4     [000]    90.642631: softirq_entry:        vec=1 [action=TIMER]
   ktimersoftd/0-4     [000]    90.642647: softirq_exit:         vec=1 [action=TIMER]
   ktimersoftd/0-4     [000]    90.642665: sched_switch:         ktimersoftd/0:4 [98] S ==> ksoftirqd/0:3 [120]
     ksoftirqd/0-3     [000]    90.642692: sched_stat_runtime:   comm=ksoftirqd/0 pid=3 runtime=34486 [ns] vruntime=33313887612 [ns]
     ksoftirqd/0-3     [000]    90.642703: sched_switch:         ksoftirqd/0:3 [120] D ==> trace-cmd:1278 [120]
       trace-cmd-1278  [000]    90.642742: sched_waking:         comm=trace-cmd pid=1279 prio=120 target_cpu=001
       trace-cmd-1278  [000]    90.642755: sched_wakeup:         trace-cmd:1279 [120] success=1 CPU:001
 irq/354-4848400-85    [001]    90.643012: softirq_exit:         vec=3 [action=NET_RX]
       trace-cmd-1278  [000]    90.643020: sched_stat_runtime:   comm=trace-cmd pid=1278 runtime=328098 [ns] vruntime=554552351 [ns]
 irq/354-4848400-85    [001]    90.643021: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [001]    90.643031: softirq_entry:        vec=3 [action=NET_RX]
       trace-cmd-1278  [000]    90.643036: sched_switch:         trace-cmd:1278 [120] D ==> iperf:1282 [120]
           iperf-1282  [000]    90.643055: sched_stat_runtime:   comm=iperf pid=1282 runtime=34811 [ns] vruntime=560242796 [ns]
           iperf-1282  [000]    90.643068: sched_switch:         iperf:1282 [120] D ==> swapper/0:0 [120]
          <idle>-0     [000]    90.643079: bputs:                0xc003e7d4s: 58f67c094e488
          <idle>-0     [000]    90.643090: bprint:               softirq_check_pending_idle: NOHZ: 80
Sebastian Andrzej Siewior Sept. 8, 2016, 2:28 p.m. UTC | #5
On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote:
> Hi Sebastian,
Hi Grygorii,

> Thankds for comment. You're right:
> irq_thread()->irq_forced_thread_fn()->local_bh_enable()
> 
> but wouldn't here two wake_up_process() calls any way,
> plus preempt_check_resched_rt() in napi_schedule().

Usually you prefer BH handling in the IRQ-thread because it runs at
higher priority and is not interrupted by a SCHED_OTHER process. And you
can assign it a higher priority if it should be preferred over an other
interrupt. However, if the processing of the interrupt is taking too
much time (like that ping flood, a lot of network traffic) then we push
it to the softirq thread. If you do this now unconditionally in the
SCHED_OTHER softirq thread then you take away all the `good' things we
had (like processing important packets at higher priority as long as
nobody floods us). Plus you share this thread with everything else that
runs in there.

> >> And, as result, get benefits from the following improvements (tested
> >> on am57xx-evm):
> >>
> >> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
> >>    seen any more. Now these warnings can be seen once iperf is started.
> >>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> > 
> > Do you also see "sched: RT throttling activated"? Because I don't see
> > otherwise why this should pop up.
> 
> I've reverted my patch an did requested experiments (some additional info below).
> 
> I do not see "sched: RT throttling activated" :(

That is okay. However if aim for throughput you might want to switch
away from NO_HZ (and deactivate the software watchdog wich runs at
prio 99 if enabled).

> root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
> [1] 1301
> # /dev/cpu_dma_latency set to 0us
> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux
> > [1]+  Done                    ./net_perf.sh

I can't parse this. But that local_softirq_pending() warning might
contribute to lower numbers.

> =============================================== before, no net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
> T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26

> =============================================== after, no net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
> T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28

I think those two should be equal more or less since the change should
have no impact on "no net load" or do I miss something?

> =============================================== before, with net load:
> cyclictest -m -Sp98 -q -D4m -i250 -d0
> T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
> T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48
> 
> 
> =============================================== after, with net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
> T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36

So the max value dropped by ~50% with your patch. Interesting. What I
remember from testing is that once you had, say, one hour of hackbench
running then after that, the extra network traffic didn't contribute
much (if at all) to the max value.
That said it is hard to believe that one extra context switch
contributes about 40us to the max value on CPU0.

> > What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
> > in net/core/dev.c and chrt the priority of you network interrupt
> > handlers to SCHED_OTHER priority?
> 
> ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER
> 
> with net load:
> cyclictest -m -Sp98 -q -D4m -i250 -d0
> T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
> T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39
> 
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
> T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
> 
> and net parformance is better:
> root@am57xx-evm:~# ps -A | grep 4848
>    82 ?        00:00:00 irq/354-4848400
>    83 ?        00:00:00 irq/355-4848400
> root@am57xx-evm:~# chrt -o -p 0 82
> root@am57xx-evm:~# chrt -o -p 0 83
> ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m -i250 -d0
> [1] 1298
> # /dev/cpu_dma_latency set to 0us
> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 12 16:20:42 EEST 2016 armv7l GNU/Linux

So that looks nice, doesn't it?

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grygorii Strashko Sept. 8, 2016, 4:24 p.m. UTC | #6
On 09/08/2016 05:28 PM, Sebastian Andrzej Siewior wrote:
> On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote:
>> Hi Sebastian,
> Hi Grygorii,
> 
>> Thankds for comment. You're right:
>> irq_thread()->irq_forced_thread_fn()->local_bh_enable()
>>
>> but wouldn't here two wake_up_process() calls any way,
>> plus preempt_check_resched_rt() in napi_schedule().
> 
> Usually you prefer BH handling in the IRQ-thread because it runs at
> higher priority and is not interrupted by a SCHED_OTHER process. And you
> can assign it a higher priority if it should be preferred over an other
> interrupt. However, if the processing of the interrupt is taking too
> much time (like that ping flood, a lot of network traffic) then we push
> it to the softirq thread. If you do this now unconditionally in the
> SCHED_OTHER softirq thread then you take away all the `good' things we
> had (like processing important packets at higher priority as long as
> nobody floods us). Plus you share this thread with everything else that
> runs in there.

That's i understand, but effect from this patch on network throughput
is pretty amazing :)

> 
>>>> And, as result, get benefits from the following improvements (tested
>>>> on am57xx-evm):
>>>>
>>>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>>>    seen any more. Now these warnings can be seen once iperf is started.
>>>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
>>>
>>> Do you also see "sched: RT throttling activated"? Because I don't see
>>> otherwise why this should pop up.
>>
>> I've reverted my patch an did requested experiments (some additional info below).
>>
>> I do not see "sched: RT throttling activated" :(
> 
> That is okay. However if aim for throughput you might want to switch
> away from NO_HZ (and deactivate the software watchdog wich runs at
> prio 99 if enabled).
> 
>> root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
>> [1] 1301
>> # /dev/cpu_dma_latency set to 0us
>> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux
>>
> …
>> [1]+  Done                    ./net_perf.sh
> 
> I can't parse this. But that local_softirq_pending() warning might
> contribute to lower numbers.
> 
>> =============================================== before, no net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> # /dev/cpu_dma_latency set to 0us
>> T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
>> T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26
> 
>> =============================================== after, no net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
>> T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28
> 
> I think those two should be equal more or less since the change should
> have no impact on "no net load" or do I miss something?

Correct I see no differences in this case, as per above.

> 
>> =============================================== before, with net load:
>> cyclictest -m -Sp98 -q -D4m -i250 -d0
>> T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
>> T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48
>>
>>
>> =============================================== after, with net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
>> T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36
> 
> So the max value dropped by ~50% with your patch. Interesting. What I
> remember from testing is that once you had, say, one hour of hackbench
> running then after that, the extra network traffic didn't contribute
> much (if at all) to the max value.
> That said it is hard to believe that one extra context switch
> contributes about 40us to the max value on CPU0.

Yup. but short time testing provides very stable results.
This patch is going to be tested more intensively shortly.

> 
>>> What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
>>> in net/core/dev.c and chrt the priority of you network interrupt
>>> handlers to SCHED_OTHER priority?
>>
>> ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER
>>
>> with net load:
>> cyclictest -m -Sp98 -q -D4m -i250 -d0
>> T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
>> T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39
>>
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
>> T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
>>

> 
> So that looks nice, doesn't it?

Yah, This improvement, in general. 
But the fact that so significant net performance drop observed out of the box 
(without any tunning) and on idle system - triggers a lot of questions ;(

I'm worry if observed original behavior can depend on usage NAPI polling  for both RX/TX 
in CPSW driver. CPSW request two IRQs RX and TX and both handler just do napi_schedule()[NET_RX].
diff mbox

Patch

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 7b59283..fa4bb81 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -769,7 +769,7 @@  static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
 		priv->tx_irq_disabled = true;
 	}
 
-	napi_schedule(&priv->napi_tx);
+	napi_schedule_irqoff(&priv->napi_tx);
 	return IRQ_HANDLED;
 }
 
@@ -785,7 +785,7 @@  static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
 		priv->rx_irq_disabled = true;
 	}
 
-	napi_schedule(&priv->napi_rx);
+	napi_schedule_irqoff(&priv->napi_rx);
 	return IRQ_HANDLED;
 }
 
@@ -2827,7 +2827,7 @@  static int cpsw_probe(struct platform_device *pdev)
 
 	priv->irqs_table[0] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
-			       0, dev_name(&pdev->dev), priv);
+			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
 		goto clean_ale_ret;
@@ -2842,7 +2842,7 @@  static int cpsw_probe(struct platform_device *pdev)
 
 	priv->irqs_table[1] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
-			       0, dev_name(&pdev->dev), priv);
+			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
 		goto clean_ale_ret;