diff mbox series

[net,3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

Message ID 20230704082916.2135501-4-wei.fang@nxp.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: fec: fix some issues of ndo_xdp_xmit() | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang fail Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 27 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Wei Fang July 4, 2023, 8:29 a.m. UTC
From: Wei Fang <wei.fang@nxp.com>

When the XDP feature is enabled and with heavy XDP frames to be
transmitted, there is a considerable probability that available
tx BDs are insufficient. This will lead to some XDP frames to be
discarded and the "NOT enough BD for SG!" error log will appear
in the console (as shown below).

[  160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[  160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[  160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[  160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[  160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!

In the case of heavy XDP traffic, sometimes the speed of recycling
tx BDs may be slower than the speed of sending XDP frames. There
may be several specific reasons, such as the interrupt is not
responsed in time, the efficiency of the NAPI callback function is
too low due to all the queues (tx queues and rx queues) share the
same NAPI, and so on.

After trying various methods, I think that increase the size of tx
BD ring is simple and effective. Maybe the best resolution is that
allocate NAPI for each queue to improve the efficiency of the NAPI
callback, but this change is a bit big and I didn't try this method.
Perheps this method will be implemented in a future patch.

In addtion, this patch also updates the tx_stop_threshold and the
tx_wake_threshold of the tx ring. In previous logic, the value of
tx_stop_threshold is 217, however, the value of tx_wake_threshold
is 147, it does not make sense that tx_wake_threshold is less than
tx_stop_threshold. Besides, both XDP path and 'slow path' share the
tx BD rings. So if tx_stop_threshold is 217, in the case of heavy
XDP traffic, the slow path is easily to be stopped, this will have
a serious impact on the slow path.

Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
 drivers/net/ethernet/freescale/fec.h      | 2 +-
 drivers/net/ethernet/freescale/fec_main.c | 7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)

Comments

Andrew Lunn July 5, 2023, 12:25 a.m. UTC | #1
> After trying various methods, I think that increase the size of tx
> BD ring is simple and effective. Maybe the best resolution is that
> allocate NAPI for each queue to improve the efficiency of the NAPI
> callback, but this change is a bit big and I didn't try this method.
> Perheps this method will be implemented in a future patch.

How does this affect platforms like Vybrid with its fast Ethernet?
Does the burst latency go up?

> In addtion, this patch also updates the tx_stop_threshold and the
> tx_wake_threshold of the tx ring. In previous logic, the value of
> tx_stop_threshold is 217, however, the value of tx_wake_threshold
> is 147, it does not make sense that tx_wake_threshold is less than
> tx_stop_threshold.

What do these actually mean? I could imagine that as the ring fills
you don't want to stop until it is 217/512 full. There is then some
hysteresis, such that it has to drop below 147/512 before more can be
added? 

> Besides, both XDP path and 'slow path' share the
> tx BD rings. So if tx_stop_threshold is 217, in the case of heavy
> XDP traffic, the slow path is easily to be stopped, this will have
> a serious impact on the slow path.

Please post your iperf results for various platforms, so we can see
the effects of this. We generally don't accept tuning patches without
benchmarks which prove the improvements, and also show there is no
regression. And given the wide variety of SoCs using the FEC, i expect
testing on a number of SoCs, but Fast and 1G.

	Andrew
Wei Fang July 5, 2023, 6:20 a.m. UTC | #2
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 2023年7月5日 8:25
> To: Wei Fang <wei.fang@nxp.com>
> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; ast@kernel.org; daniel@iogearbox.net;
> hawk@kernel.org; john.fastabend@gmail.com; Shenwei Wang
> <shenwei.wang@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>;
> netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>;
> linux-kernel@vger.kernel.org; bpf@vger.kernel.org
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
> 
> > After trying various methods, I think that increase the size of tx BD
> > ring is simple and effective. Maybe the best resolution is that
> > allocate NAPI for each queue to improve the efficiency of the NAPI
> > callback, but this change is a bit big and I didn't try this method.
> > Perheps this method will be implemented in a future patch.
> 
> How does this affect platforms like Vybrid with its fast Ethernet?
Sorry, I don't have the Vybrid platform, but I think I don't think it has much
impact, at most it just takes up some more memory.

> Does the burst latency go up?
No, for fec, when a packet is attached to the BDs, the software will immediately
trigger the hardware to send the packet. In addition, I think it may improve the
latency, because the size of the tx ring becomes larger, and more packets can be
attached to the BD ring for burst traffic.

> 
> > In addtion, this patch also updates the tx_stop_threshold and the
> > tx_wake_threshold of the tx ring. In previous logic, the value of
> > tx_stop_threshold is 217, however, the value of tx_wake_threshold is
> > 147, it does not make sense that tx_wake_threshold is less than
> > tx_stop_threshold.
> 
> What do these actually mean? I could imagine that as the ring fills you don't
> want to stop until it is 217/512 full. There is then some hysteresis, such that it
> has to drop below 147/512 before more can be added?
> 
You must have misunderstood, let me explain more clearly, the queue will be
stopped when the available BDs are less than tx_stop_threshold (217 BDs). And
the queue will be waked when the available BDs are greater than tx_wake_threshold
(147 BDs). So in most cases, the available BDs are greater than tx_wake_threshold
when the queue is stopped, the only effect is to delay packet sending.
In my opinion, tx_wake_threshold should be greater than tx_stop_threshold, we
should stop queue when the available BDs are not enough for a skb to be attached.
And wake the queue when the available BDs are sufficient for a skb.

> > Besides, both XDP path and 'slow path' share the tx BD rings. So if
> > tx_stop_threshold is 217, in the case of heavy XDP traffic, the slow
> > path is easily to be stopped, this will have a serious impact on the
> > slow path.
> 
> Please post your iperf results for various platforms, so we can see the effects of
> this. We generally don't accept tuning patches without benchmarks which
> prove the improvements, and also show there is no regression. And given the
> wide variety of SoCs using the FEC, i expect testing on a number of SoCs, but
> Fast and 1G.
> 
Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL and
8ULP only support Fast ethernet. Others support 1G.
1.1 i.MX6UL with tx ring size 512
root@imx6ul7d:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 47148 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  10.1 MBytes  84.9 Mbits/sec    0    103 KBytes
[  5]   1.00-2.00   sec  9.88 MBytes  82.6 Mbits/sec    0    103 KBytes
[  5]   2.00-3.00   sec  9.82 MBytes  82.7 Mbits/sec    0    103 KBytes
[  5]   3.00-4.00   sec  9.82 MBytes  82.4 Mbits/sec    0    103 KBytes
[  5]   4.00-5.00   sec  9.88 MBytes  82.9 Mbits/sec    0    103 KBytes
[  5]   5.00-6.00   sec  9.94 MBytes  83.4 Mbits/sec    0    103 KBytes
[  5]   6.00-7.00   sec  10.1 MBytes  84.3 Mbits/sec    0    103 KBytes
[  5]   7.00-8.00   sec  9.82 MBytes  82.4 Mbits/sec    0    103 KBytes
[  5]   8.00-9.00   sec  9.82 MBytes  82.4 Mbits/sec    0    103 KBytes
[  5]   9.00-10.00  sec  9.88 MBytes  82.9 Mbits/sec    0    103 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  99.0 MBytes  83.1 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  98.8 MBytes  82.8 Mbits/sec                  receiver

1.2 i.MX6UL with tx ring size 1024
root@imx6ul7d:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 55670 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  10.2 MBytes  85.4 Mbits/sec    0    236 KBytes
[  5]   1.00-2.00   sec  10.1 MBytes  84.6 Mbits/sec    0    236 KBytes
[  5]   2.00-3.00   sec  10.2 MBytes  85.5 Mbits/sec    0    249 KBytes
[  5]   3.00-4.00   sec  10.1 MBytes  85.1 Mbits/sec    0    249 KBytes
[  5]   4.00-5.00   sec  10.1 MBytes  84.7 Mbits/sec    0    249 KBytes
[  5]   5.00-6.00   sec  10.0 MBytes  84.1 Mbits/sec    0    249 KBytes
[  5]   6.00-7.00   sec  10.1 MBytes  85.1 Mbits/sec    0    249 KBytes
[  5]   7.00-8.00   sec  10.1 MBytes  84.9 Mbits/sec    0    249 KBytes
[  5]   8.00-9.00   sec  10.3 MBytes  85.9 Mbits/sec    0    249 KBytes
[  5]   9.00-10.00  sec  10.2 MBytes  85.6 Mbits/sec    0    249 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   101 MBytes  85.1 Mbits/sec    0             sender
[  5]   0.00-10.01  sec   101 MBytes  84.5 Mbits/sec                  receiver

2.1 i.MX8ULP with tx ring size 512
root@imx8ulpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 54342 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec    0   99.0 KBytes
[  5]   1.00-2.00   sec  9.94 MBytes  83.4 Mbits/sec    0   99.0 KBytes
[  5]   2.00-3.00   sec  10.2 MBytes  85.5 Mbits/sec    0   99.0 KBytes
[  5]   3.00-4.00   sec  9.94 MBytes  83.4 Mbits/sec    0   99.0 KBytes
[  5]   4.00-5.00   sec  10.2 MBytes  85.5 Mbits/sec    0   99.0 KBytes
[  5]   5.00-6.00   sec  9.94 MBytes  83.4 Mbits/sec    0   99.0 KBytes
[  5]   6.00-7.00   sec  9.69 MBytes  81.3 Mbits/sec    0   99.0 KBytes
[  5]   7.00-8.00   sec  9.94 MBytes  83.4 Mbits/sec    0   99.0 KBytes
[  5]   8.00-9.00   sec  9.69 MBytes  81.3 Mbits/sec    0   99.0 KBytes
[  5]   9.00-10.00  sec  10.2 MBytes  85.5 Mbits/sec    0   99.0 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   100 MBytes  84.3 Mbits/sec    0             sender
[  5]   0.00-9.90   sec   100 MBytes  84.7 Mbits/sec                  receiver

2.1 i.MX8ULP with tx ring size 1024
root@imx8ulpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 44770 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  10.7 MBytes  90.1 Mbits/sec    0   93.3 KBytes
[  5]   1.00-2.00   sec  9.94 MBytes  83.4 Mbits/sec    0   93.3 KBytes
[  5]   2.00-3.00   sec  10.2 MBytes  85.5 Mbits/sec    0   93.3 KBytes
[  5]   3.00-4.00   sec  10.1 MBytes  85.0 Mbits/sec    0   93.3 KBytes
[  5]   4.00-5.00   sec  9.94 MBytes  83.4 Mbits/sec    0    100 KBytes
[  5]   5.00-6.00   sec  10.2 MBytes  85.5 Mbits/sec    0    100 KBytes
[  5]   6.00-7.00   sec  9.69 MBytes  81.3 Mbits/sec    0    100 KBytes
[  5]   7.00-8.00   sec  9.94 MBytes  83.4 Mbits/sec    0    100 KBytes
[  5]   8.00-9.00   sec  10.2 MBytes  85.5 Mbits/sec    0    100 KBytes
[  5]   9.00-10.00  sec  9.69 MBytes  81.3 Mbits/sec    0    100 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   101 MBytes  84.4 Mbits/sec    0             sender
[  5]   0.00-9.92   sec   100 MBytes  84.8 Mbits/sec                  receiver

3.1 i.MX8MM with tx ring size 512
root@imx8mmevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 55734 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   111 MBytes   934 Mbits/sec    0    577 KBytes
[  5]   1.00-2.00   sec   112 MBytes   937 Mbits/sec    0    577 KBytes
[  5]   2.00-3.00   sec   112 MBytes   942 Mbits/sec    0    609 KBytes
[  5]   3.00-4.00   sec   113 MBytes   945 Mbits/sec    0    638 KBytes
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec    0    638 KBytes
[  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec    0    638 KBytes
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec    0    638 KBytes
[  5]   7.00-8.00   sec   112 MBytes   943 Mbits/sec    0    638 KBytes
[  5]   8.00-9.00   sec   112 MBytes   943 Mbits/sec    0    638 KBytes
[  5]   9.00-10.00  sec   112 MBytes   942 Mbits/sec    0    638 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   938 Mbits/sec                  receiver

3.2 i.MX8MM with tx ring size 1024
root@imx8mmevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 53350 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   952 Mbits/sec    0    585 KBytes
[  5]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0    585 KBytes
[  5]   2.00-3.00   sec   113 MBytes   947 Mbits/sec    0    585 KBytes
[  5]   3.00-4.00   sec   112 MBytes   940 Mbits/sec    0    648 KBytes
[  5]   4.00-5.00   sec   112 MBytes   944 Mbits/sec    0    648 KBytes
[  5]   5.00-6.00   sec   112 MBytes   944 Mbits/sec    0    648 KBytes
[  5]   6.00-7.00   sec   111 MBytes   933 Mbits/sec    0    648 KBytes
[  5]   7.00-8.00   sec   112 MBytes   944 Mbits/sec    0    648 KBytes
[  5]   8.00-9.00   sec   112 MBytes   944 Mbits/sec    0    648 KBytes
[  5]   9.00-10.00  sec   112 MBytes   944 Mbits/sec    0    648 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

4.1 i.MX8MP with tx ring size 512
root@imx8mpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 51892 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   959 Mbits/sec    0    594 KBytes
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec    0    626 KBytes
[  5]   2.00-3.00   sec   113 MBytes   946 Mbits/sec    0    626 KBytes
[  5]   3.00-4.00   sec   112 MBytes   937 Mbits/sec    0    626 KBytes
[  5]   4.00-5.00   sec   112 MBytes   940 Mbits/sec    0    626 KBytes
[  5]   5.00-6.00   sec   112 MBytes   940 Mbits/sec    0    626 KBytes
[  5]   6.00-7.00   sec   113 MBytes   946 Mbits/sec    0    626 KBytes
[  5]   7.00-8.00   sec   112 MBytes   939 Mbits/sec    0    626 KBytes
[  5]   8.00-9.00   sec   111 MBytes   935 Mbits/sec    0    626 KBytes
[  5]   9.00-10.00  sec   112 MBytes   943 Mbits/sec    0    626 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

4.2 i.MX8MP with tx ring size 1024
root@imx8mpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 37922 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   113 MBytes   951 Mbits/sec    0    608 KBytes
[  5]   1.00-2.00   sec   112 MBytes   937 Mbits/sec    0    608 KBytes
[  5]   2.00-3.00   sec   113 MBytes   947 Mbits/sec    0    608 KBytes
[  5]   3.00-4.00   sec   111 MBytes   934 Mbits/sec    0    608 KBytes
[  5]   4.00-5.00   sec   112 MBytes   942 Mbits/sec    0    608 KBytes
[  5]   5.00-6.00   sec   112 MBytes   939 Mbits/sec    0    608 KBytes
[  5]   6.00-7.00   sec   113 MBytes   949 Mbits/sec    0    608 KBytes
[  5]   7.00-8.00   sec   112 MBytes   942 Mbits/sec    0    608 KBytes
[  5]   8.00-9.00   sec   112 MBytes   936 Mbits/sec    0    608 KBytes
[  5]   9.00-10.00  sec   112 MBytes   942 Mbits/sec    0    608 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   939 Mbits/sec                  receiver

5.1 i.MX93 with tx ring size 512
root@imx93evk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 44216 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   115 MBytes   965 Mbits/sec    0    656 KBytes
[  5]   1.00-2.00   sec   111 MBytes   934 Mbits/sec    0    656 KBytes
[  5]   2.00-3.00   sec   112 MBytes   944 Mbits/sec    0    656 KBytes
[  5]   3.00-4.00   sec   112 MBytes   944 Mbits/sec    0    656 KBytes
[  5]   4.00-5.00   sec   112 MBytes   944 Mbits/sec    0    656 KBytes
[  5]   5.00-6.00   sec   111 MBytes   933 Mbits/sec    0    656 KBytes
[  5]   6.00-7.00   sec   112 MBytes   944 Mbits/sec    0    656 KBytes
[  5]   7.00-8.00   sec   112 MBytes   944 Mbits/sec    0    656 KBytes
[  5]   8.00-9.00   sec   112 MBytes   944 Mbits/sec    0    656 KBytes
[  5]   9.00-10.00  sec   112 MBytes   944 Mbits/sec    0    656 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   944 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

5.2 i.MX93 with tx ring size 1024
root@imx93evk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[  5] local 192.168.3.9 port 51058 connected to 192.168.3.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   959 Mbits/sec    0    588 KBytes
[  5]   1.00-2.00   sec   112 MBytes   935 Mbits/sec    0    649 KBytes
[  5]   2.00-3.00   sec   112 MBytes   944 Mbits/sec    0    649 KBytes
[  5]   3.00-4.00   sec   112 MBytes   944 Mbits/sec    0    649 KBytes
[  5]   4.00-5.00   sec   112 MBytes   944 Mbits/sec    0    649 KBytes
[  5]   5.00-6.00   sec   112 MBytes   944 Mbits/sec    0    649 KBytes
[  5]   6.00-7.00   sec   111 MBytes   933 Mbits/sec    0    649 KBytes
[  5]   7.00-8.00   sec   112 MBytes   944 Mbits/sec    0    649 KBytes
[  5]   8.00-9.00   sec   112 MBytes   944 Mbits/sec    0    649 KBytes
[  5]   9.00-10.00  sec   112 MBytes   944 Mbits/sec    0    649 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   940 Mbits/sec                  receiver
Jakub Kicinski July 5, 2023, 6:11 p.m. UTC | #3
On Wed, 5 Jul 2023 06:20:26 +0000 Wei Fang wrote:
> > > In addtion, this patch also updates the tx_stop_threshold and the
> > > tx_wake_threshold of the tx ring. In previous logic, the value of
> > > tx_stop_threshold is 217, however, the value of tx_wake_threshold is
> > > 147, it does not make sense that tx_wake_threshold is less than
> > > tx_stop_threshold.  
> > 
> > What do these actually mean? I could imagine that as the ring fills you don't
> > want to stop until it is 217/512 full. There is then some hysteresis, such that it
> > has to drop below 147/512 before more can be added?
> >   
> You must have misunderstood, let me explain more clearly, the queue will be
> stopped when the available BDs are less than tx_stop_threshold (217 BDs). And
> the queue will be waked when the available BDs are greater than tx_wake_threshold
> (147 BDs). So in most cases, the available BDs are greater than tx_wake_threshold
> when the queue is stopped, the only effect is to delay packet sending.
> In my opinion, tx_wake_threshold should be greater than tx_stop_threshold, we
> should stop queue when the available BDs are not enough for a skb to be attached.
> And wake the queue when the available BDs are sufficient for a skb.

But you shouldn't restart the queue for a single packet either.
Restarting for a single packet wastes CPU cycles as there will 
be much more stop / start operations. Two large packets seem 
like the absolute minimum reasonable wake threshold.

Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either,
as you won't be able to accept a full TSO frame.

Please split the change, the netdev_err_once() should be one patch
and then the change to wake thresh a separate one.
Wei Fang July 6, 2023, 1:44 a.m. UTC | #4
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: 2023年7月6日 2:11
> To: Wei Fang <wei.fang@nxp.com>
> Cc: Andrew Lunn <andrew@lunn.ch>; davem@davemloft.net;
> edumazet@google.com; pabeni@redhat.com; ast@kernel.org;
> daniel@iogearbox.net; hawk@kernel.org; john.fastabend@gmail.com;
> Shenwei Wang <shenwei.wang@nxp.com>; Clark Wang
> <xiaoning.wang@nxp.com>; netdev@vger.kernel.org; dl-linux-imx
> <linux-imx@nxp.com>; linux-kernel@vger.kernel.org; bpf@vger.kernel.org
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
> 
> On Wed, 5 Jul 2023 06:20:26 +0000 Wei Fang wrote:
> > > > In addtion, this patch also updates the tx_stop_threshold and the
> > > > tx_wake_threshold of the tx ring. In previous logic, the value of
> > > > tx_stop_threshold is 217, however, the value of tx_wake_threshold
> > > > is 147, it does not make sense that tx_wake_threshold is less than
> > > > tx_stop_threshold.
> > >
> > > What do these actually mean? I could imagine that as the ring fills
> > > you don't want to stop until it is 217/512 full. There is then some
> > > hysteresis, such that it has to drop below 147/512 before more can be
> added?
> > >
> > You must have misunderstood, let me explain more clearly, the queue
> > will be stopped when the available BDs are less than tx_stop_threshold
> > (217 BDs). And the queue will be waked when the available BDs are
> > greater than tx_wake_threshold
> > (147 BDs). So in most cases, the available BDs are greater than
> > tx_wake_threshold when the queue is stopped, the only effect is to delay
> packet sending.
> > In my opinion, tx_wake_threshold should be greater than
> > tx_stop_threshold, we should stop queue when the available BDs are not
> enough for a skb to be attached.
> > And wake the queue when the available BDs are sufficient for a skb.
> 
> But you shouldn't restart the queue for a single packet either.
> Restarting for a single packet wastes CPU cycles as there will be much more
> stop / start operations. Two large packets seem like the absolute minimum
> reasonable wake threshold.
> 
> Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either, as
> you won't be able to accept a full TSO frame.
> 
Maybe I should keep the tx_stop_threshold unchanged, so that the queue is
to be stopped if the available BDs is not enough for a full TSO frame to be attached.
And then just change tx_wake_threshold to tx_stop_threshold + 1, which I think it's
more reasonable.

> Please split the change, the netdev_err_once() should be one patch and then
> the change to wake thresh a separate one.
Okay, thanks!
Jakub Kicinski July 6, 2023, 3:11 a.m. UTC | #5
On Thu, 6 Jul 2023 01:44:49 +0000 Wei Fang wrote:
> > But you shouldn't restart the queue for a single packet either.
> > Restarting for a single packet wastes CPU cycles as there will be much more
> > stop / start operations. Two large packets seem like the absolute minimum
> > reasonable wake threshold.
> > 
> > Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either, as
> > you won't be able to accept a full TSO frame.
> >   
> Maybe I should keep the tx_stop_threshold unchanged, so that the queue is
> to be stopped if the available BDs is not enough for a full TSO frame to be attached.
> And then just change tx_wake_threshold to tx_stop_threshold + 1, which I think it's
> more reasonable.

How about at least tx_stop_threshold + 2 * MAX_SKB_FRAGS ?
If a queue of hundreds of entries is overflowing, we should be able 
to apply a hysteresis of a few tens of entries. Do you see a
difference in drops? The packets from the stack should preferably stay
in the qdiscs instead of the driver queue, where AQM and scheduling can
be applied.
Wei Fang July 6, 2023, 6:14 a.m. UTC | #6
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: 2023年7月6日 11:12
> To: Wei Fang <wei.fang@nxp.com>
> Cc: Andrew Lunn <andrew@lunn.ch>; davem@davemloft.net;
> edumazet@google.com; pabeni@redhat.com; ast@kernel.org;
> daniel@iogearbox.net; hawk@kernel.org; john.fastabend@gmail.com;
> Shenwei Wang <shenwei.wang@nxp.com>; Clark Wang
> <xiaoning.wang@nxp.com>; netdev@vger.kernel.org; dl-linux-imx
> <linux-imx@nxp.com>; linux-kernel@vger.kernel.org; bpf@vger.kernel.org
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
> 
> On Thu, 6 Jul 2023 01:44:49 +0000 Wei Fang wrote:
> > > But you shouldn't restart the queue for a single packet either.
> > > Restarting for a single packet wastes CPU cycles as there will be
> > > much more stop / start operations. Two large packets seem like the
> > > absolute minimum reasonable wake threshold.
> > >
> > > Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right
> > > either, as you won't be able to accept a full TSO frame.
> > >
> > Maybe I should keep the tx_stop_threshold unchanged, so that the queue
> > is to be stopped if the available BDs is not enough for a full TSO frame to be
> attached.
> > And then just change tx_wake_threshold to tx_stop_threshold + 1, which
> > I think it's more reasonable.
> 
> How about at least tx_stop_threshold + 2 * MAX_SKB_FRAGS ?
It's okay. The iperf performance is well as before.

> If a queue of hundreds of entries is overflowing, we should be able to apply a
> hysteresis of a few tens of entries. Do you see a difference in drops? The
I didn't see there was any packet loss.

> packets from the stack should preferably stay in the qdiscs instead of the driver
> queue, where AQM and scheduling can be applied.
Andrew Lunn July 8, 2023, 7:03 p.m. UTC | #7
> > How does this affect platforms like Vybrid with its fast Ethernet?
> Sorry, I don't have the Vybrid platform, but I think I don't think it has much
> impact, at most it just takes up some more memory.

It has been 6 months since the page pool patches were posted and i
asked about benchmark results for other platforms like Vybrid. Is it
so hard to get hold or reference platforms? Fugang Duan used to have a
test farm of all sorts of boards and reported to me the regressions i
introduced with MDIO changes and PM changes. As somebody who seems to
be an NXP FEC Maintainer i would expect you to have access to a range
of hardware. Especially since XDP and eBPF is a bit of a niche for
embedded processes which NXP produce. You want to be sure your changes
don't regress the main use cases which i guess are plain networking.

> > Does the burst latency go up?
> No, for fec, when a packet is attached to the BDs, the software will immediately
> trigger the hardware to send the packet. In addition, I think it may improve the
> latency, because the size of the tx ring becomes larger, and more packets can be
> attached to the BD ring for burst traffic.

And a bigger burst means more latency. Read about Buffer
bloat.

While you have iperf running saturating the link, try a ping as
well. How does ping latency change with more TX buffers?

Ideally you want enough transmit buffers to keep the link full, but
not more. If the driver is using BQL, the network stack will help with
this.

> Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL and
> 8ULP only support Fast ethernet. Others support 1G.

Thanks for the benchmark numbers. Please get into the habit of
including them. We like to see justification for any sort of
performance tweaks.

	Andrew
Wei Fang July 10, 2023, 6:16 a.m. UTC | #8
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 2023年7月9日 3:04
> To: Wei Fang <wei.fang@nxp.com>
> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; ast@kernel.org; daniel@iogearbox.net;
> hawk@kernel.org; john.fastabend@gmail.com; Shenwei Wang
> <shenwei.wang@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>;
> netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>;
> linux-kernel@vger.kernel.org; bpf@vger.kernel.org
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
> 
> > > How does this affect platforms like Vybrid with its fast Ethernet?
> > Sorry, I don't have the Vybrid platform, but I think I don't think it
> > has much impact, at most it just takes up some more memory.
> 
> It has been 6 months since the page pool patches were posted and i asked
> about benchmark results for other platforms like Vybrid. Is it so hard to get
> hold or reference platforms? Fugang Duan used to have a test farm of all sorts
> of boards and reported to me the regressions i introduced with MDIO changes
> and PM changes. As somebody who seems to be an NXP FEC Maintainer i
> would expect you to have access to a range of hardware. Especially since XDP
> and eBPF is a bit of a niche for embedded processes which NXP produce. You
> want to be sure your changes don't regress the main use cases which i guess
> are plain networking.
> 
Sorry, the Vybrid platform is not currently maintained by us, and Vybrid is also not
included in our usual Yocto SDK RC version. Even I find a Vybrid board, I think it
probably cann't run with the latest kernel image, because the latest kernel image
does not match with the old Yocto SDK, and new Yocto SDK does not support Vybrid
platform. I also asked my colleague in test team who is in charge of ethernet testing,
she hadn't even heard of Vybrid platform.

> > > Does the burst latency go up?
> > No, for fec, when a packet is attached to the BDs, the software will
> > immediately trigger the hardware to send the packet. In addition, I
> > think it may improve the latency, because the size of the tx ring
> > becomes larger, and more packets can be attached to the BD ring for burst
> traffic.
> 
> And a bigger burst means more latency. Read about Buffer bloat.
> 
Perhaps, but not necessarily. For example, in some cases that some burst packets
maybe stay in Qdisc instead of hardware queue if ring size is small.

> While you have iperf running saturating the link, try a ping as well. How does
> ping latency change with more TX buffers?
> 
Per your suggestions, I tested on i.MX8ULP, i.MX8MM and i.MX93 platforms. The
results do not appear to be very different.
i.MX93 with ring size 1024:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.114 -c 16
PING 10.193.102.114 (10.193.102.114) 56(84) bytes of data.
64 bytes from 10.193.102.114: icmp_seq=1 ttl=63 time=3.78 ms
64 bytes from 10.193.102.114: icmp_seq=2 ttl=63 time=2.16 ms
64 bytes from 10.193.102.114: icmp_seq=3 ttl=63 time=3.31 ms
64 bytes from 10.193.102.114: icmp_seq=4 ttl=63 time=2.11 ms
64 bytes from 10.193.102.114: icmp_seq=5 ttl=63 time=3.43 ms
64 bytes from 10.193.102.114: icmp_seq=6 ttl=63 time=3.20 ms
64 bytes from 10.193.102.114: icmp_seq=7 ttl=63 time=3.20 ms
64 bytes from 10.193.102.114: icmp_seq=8 ttl=63 time=3.75 ms
64 bytes from 10.193.102.114: icmp_seq=9 ttl=63 time=3.21 ms
64 bytes from 10.193.102.114: icmp_seq=10 ttl=63 time=3.76 ms
64 bytes from 10.193.102.114: icmp_seq=11 ttl=63 time=2.16 ms
64 bytes from 10.193.102.114: icmp_seq=12 ttl=63 time=2.67 ms
64 bytes from 10.193.102.114: icmp_seq=13 ttl=63 time=3.59 ms
64 bytes from 10.193.102.114: icmp_seq=14 ttl=63 time=2.55 ms
64 bytes from 10.193.102.114: icmp_seq=15 ttl=63 time=2.54 ms
64 bytes from 10.193.102.114: icmp_seq=16 ttl=63 time=3.88 ms

--- 10.193.102.114 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15027ms
rtt min/avg/max/mdev = 2.112/3.082/3.875/0.606 ms

i.MX93 with ring size 512:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.114 -c 16
PING 10.193.102.114 (10.193.102.114) 56(84) bytes of data.
64 bytes from 10.193.102.114: icmp_seq=1 ttl=63 time=2.74 ms
64 bytes from 10.193.102.114: icmp_seq=2 ttl=63 time=3.32 ms
64 bytes from 10.193.102.114: icmp_seq=3 ttl=63 time=2.72 ms
64 bytes from 10.193.102.114: icmp_seq=4 ttl=63 time=3.36 ms
64 bytes from 10.193.102.114: icmp_seq=5 ttl=63 time=3.41 ms
64 bytes from 10.193.102.114: icmp_seq=6 ttl=63 time=2.67 ms
64 bytes from 10.193.102.114: icmp_seq=7 ttl=63 time=2.77 ms
64 bytes from 10.193.102.114: icmp_seq=8 ttl=63 time=3.38 ms
64 bytes from 10.193.102.114: icmp_seq=9 ttl=63 time=2.54 ms
64 bytes from 10.193.102.114: icmp_seq=10 ttl=63 time=3.36 ms
64 bytes from 10.193.102.114: icmp_seq=11 ttl=63 time=3.44 ms
64 bytes from 10.193.102.114: icmp_seq=12 ttl=63 time=2.80 ms
64 bytes from 10.193.102.114: icmp_seq=13 ttl=63 time=2.86 ms
64 bytes from 10.193.102.114: icmp_seq=14 ttl=63 time=3.90 ms
64 bytes from 10.193.102.114: icmp_seq=15 ttl=63 time=2.50 ms
64 bytes from 10.193.102.114: icmp_seq=16 ttl=63 time=2.89 ms

--- 10.193.102.114 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15028ms
rtt min/avg/max/mdev = 2.496/3.040/3.898/0.394 ms

i.MX8MM with ring size 1024:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.126 -c 16
PING 10.193.102.126 (10.193.102.126) 56(84) bytes of data.
64 bytes from 10.193.102.126: icmp_seq=1 ttl=127 time=1.34 ms
64 bytes from 10.193.102.126: icmp_seq=2 ttl=127 time=2.07 ms
64 bytes from 10.193.102.126: icmp_seq=3 ttl=127 time=2.40 ms
64 bytes from 10.193.102.126: icmp_seq=4 ttl=127 time=1.48 ms
64 bytes from 10.193.102.126: icmp_seq=5 ttl=127 time=1.69 ms
64 bytes from 10.193.102.126: icmp_seq=6 ttl=127 time=1.54 ms
64 bytes from 10.193.102.126: icmp_seq=7 ttl=127 time=2.30 ms
64 bytes from 10.193.102.126: icmp_seq=8 ttl=127 time=1.94 ms
64 bytes from 10.193.102.126: icmp_seq=9 ttl=127 time=4.25 ms
64 bytes from 10.193.102.126: icmp_seq=10 ttl=127 time=1.75 ms
64 bytes from 10.193.102.126: icmp_seq=11 ttl=127 time=1.25 ms
64 bytes from 10.193.102.126: icmp_seq=12 ttl=127 time=2.04 ms
64 bytes from 10.193.102.126: icmp_seq=13 ttl=127 time=2.31 ms
64 bytes from 10.193.102.126: icmp_seq=14 ttl=127 time=2.18 ms
64 bytes from 10.193.102.126: icmp_seq=15 ttl=127 time=2.25 ms
64 bytes from 10.193.102.126: icmp_seq=16 ttl=127 time=1.37 ms

--- 10.193.102.126 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15026ms
rtt min/avg/max/mdev = 1.248/2.011/4.250/0.686 ms

i.MX8MM with ring size 512:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.126 -c 16
PING 10.193.102.126 (10.193.102.126) 56(84) bytes of data.
64 bytes from 10.193.102.126: icmp_seq=1 ttl=63 time=4.82 ms
64 bytes from 10.193.102.126: icmp_seq=2 ttl=63 time=4.67 ms
64 bytes from 10.193.102.126: icmp_seq=3 ttl=63 time=3.74 ms
64 bytes from 10.193.102.126: icmp_seq=4 ttl=63 time=3.87 ms
64 bytes from 10.193.102.126: icmp_seq=5 ttl=63 time=3.30 ms
64 bytes from 10.193.102.126: icmp_seq=6 ttl=63 time=3.79 ms
64 bytes from 10.193.102.126: icmp_seq=7 ttl=127 time=2.12 ms
64 bytes from 10.193.102.126: icmp_seq=8 ttl=127 time=1.99 ms
64 bytes from 10.193.102.126: icmp_seq=9 ttl=127 time=2.15 ms
64 bytes from 10.193.102.126: icmp_seq=10 ttl=127 time=1.82 ms
64 bytes from 10.193.102.126: icmp_seq=11 ttl=127 time=1.92 ms
64 bytes from 10.193.102.126: icmp_seq=12 ttl=127 time=1.23 ms
64 bytes from 10.193.102.126: icmp_seq=13 ttl=127 time=2.00 ms
64 bytes from 10.193.102.126: icmp_seq=14 ttl=127 time=1.66 ms
64 bytes from 10.193.102.126: icmp_seq=15 ttl=127 time=1.75 ms
64 bytes from 10.193.102.126: icmp_seq=16 ttl=127 time=2.24 ms

--- 10.193.102.126 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15026ms
rtt min/avg/max/mdev = 1.226/2.691/4.820/1.111 ms

i.MX8ULP with ring size 1024:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.216 -c 16
PING 10.193.102.216 (10.193.102.216) 56(84) bytes of data.
64 bytes from 10.193.102.216: icmp_seq=1 ttl=63 time=3.40 ms
64 bytes from 10.193.102.216: icmp_seq=2 ttl=63 time=5.46 ms
64 bytes from 10.193.102.216: icmp_seq=3 ttl=63 time=5.55 ms
64 bytes from 10.193.102.216: icmp_seq=4 ttl=63 time=5.97 ms
64 bytes from 10.193.102.216: icmp_seq=5 ttl=63 time=6.26 ms
64 bytes from 10.193.102.216: icmp_seq=6 ttl=63 time=0.963 ms
64 bytes from 10.193.102.216: icmp_seq=7 ttl=63 time=4.10 ms
64 bytes from 10.193.102.216: icmp_seq=8 ttl=63 time=4.55 ms
64 bytes from 10.193.102.216: icmp_seq=9 ttl=63 time=1.24 ms
64 bytes from 10.193.102.216: icmp_seq=10 ttl=63 time=6.96 ms
64 bytes from 10.193.102.216: icmp_seq=11 ttl=63 time=3.27 ms
64 bytes from 10.193.102.216: icmp_seq=12 ttl=63 time=6.57 ms
64 bytes from 10.193.102.216: icmp_seq=13 ttl=63 time=2.99 ms
64 bytes from 10.193.102.216: icmp_seq=14 ttl=63 time=1.70 ms
64 bytes from 10.193.102.216: icmp_seq=15 ttl=63 time=1.79 ms
64 bytes from 10.193.102.216: icmp_seq=16 ttl=63 time=1.42 ms

--- 10.193.102.216 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15026ms
rtt min/avg/max/mdev = 0.963/3.886/6.955/2.009 ms

i.MX8ULP with ring size 512:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.216 -c 16
PING 10.193.102.216 (10.193.102.216) 56(84) bytes of data.
64 bytes from 10.193.102.216: icmp_seq=1 ttl=63 time=5.70 ms
64 bytes from 10.193.102.216: icmp_seq=2 ttl=63 time=5.89 ms
64 bytes from 10.193.102.216: icmp_seq=3 ttl=63 time=3.37 ms
64 bytes from 10.193.102.216: icmp_seq=4 ttl=63 time=5.07 ms
64 bytes from 10.193.102.216: icmp_seq=5 ttl=63 time=1.47 ms
64 bytes from 10.193.102.216: icmp_seq=6 ttl=63 time=3.45 ms
64 bytes from 10.193.102.216: icmp_seq=7 ttl=63 time=1.35 ms
64 bytes from 10.193.102.216: icmp_seq=8 ttl=63 time=6.62 ms
64 bytes from 10.193.102.216: icmp_seq=9 ttl=63 time=1.41 ms
64 bytes from 10.193.102.216: icmp_seq=10 ttl=63 time=6.43 ms
64 bytes from 10.193.102.216: icmp_seq=11 ttl=63 time=1.41 ms
64 bytes from 10.193.102.216: icmp_seq=12 ttl=63 time=6.75 ms
64 bytes from 10.193.102.216: icmp_seq=13 ttl=63 time=4.76 ms
64 bytes from 10.193.102.216: icmp_seq=14 ttl=63 time=3.85 ms
64 bytes from 10.193.102.216: icmp_seq=15 ttl=63 time=3.50 ms
64 bytes from 10.193.102.216: icmp_seq=16 ttl=63 time=1.37 ms

--- 10.193.102.216 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15027ms
rtt min/avg/max/mdev = 1.349/3.900/6.749/1.985 ms

> Ideally you want enough transmit buffers to keep the link full, but not more. If
> the driver is using BQL, the network stack will help with this.
> 
> > Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL
> > and 8ULP only support Fast ethernet. Others support 1G.
> 
> Thanks for the benchmark numbers. Please get into the habit of including
> them. We like to see justification for any sort of performance tweaks.
> 
> 	Andrew
Andrew Lunn July 12, 2023, 1:16 a.m. UTC | #9
> Sorry, the Vybrid platform is not currently maintained by us, and Vybrid is also not
> included in our usual Yocto SDK RC version.

Your Yocto SDK version does not matter. We are talking about mainline
here... You are maintaining the mainline driver, and submitting
patches to extend the mainline driver.

> Even I find a Vybrid board, I think it
> probably cann't run with the latest kernel image, because the latest kernel image
> does not match with the old Yocto SDK, and new Yocto SDK does not support Vybrid
> platform. I also asked my colleague in test team who is in charge of ethernet testing,
> she hadn't even heard of Vybrid platform.

Well 6.5-rc1 does run on Vybrid, i have a number of boards with
it. You will also find that many inflight entertainment systems have a
box under each row of seats in a aeroplane with a Vybrid controlling a
Marvell switch. Take a look at arch/arm/boot/dts/vf610-zii-* You will
also find a number of DTS files for imx5i, imx6, imx7 used for ZII
products which are back of the seat displays, and make heavy use of
networking with the FEC.

So i have in interest in the FEC for all these platforms.

> > And a bigger burst means more latency. Read about Buffer bloat.
> > 
> Perhaps, but not necessarily. For example, in some cases that some burst packets
> maybe stay in Qdisc instead of hardware queue if ring size is small.

Which is what you want, so high priority packets can jump to the head
of the queue.

> > While you have iperf running saturating the link, try a ping as well. How does
> > ping latency change with more TX buffers?
> > 
> Per your suggestions, I tested on i.MX8ULP, i.MX8MM and i.MX93 platforms. The
> results do not appear to be very different.

Thanks for the benchmark numbers. They look O.K.

       Andrew
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 8c0226d061fe..63a053dea819 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -355,7 +355,7 @@  struct bufdesc_ex {
 #define RX_RING_SIZE		(FEC_ENET_RX_FRPPG * FEC_ENET_RX_PAGES)
 #define FEC_ENET_TX_FRSIZE	2048
 #define FEC_ENET_TX_FRPPG	(PAGE_SIZE / FEC_ENET_TX_FRSIZE)
-#define TX_RING_SIZE		512	/* Must be power of two */
+#define TX_RING_SIZE		1024	/* Must be power of two */
 #define TX_RING_MOD_MASK	511	/*   for this to work */
 
 #define BD_ENET_RX_INT		0x00800000
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 940d3afe1d24..6c255c0d6f41 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3348,9 +3348,8 @@  static int fec_enet_alloc_queue(struct net_device *ndev)
 		txq->bd.ring_size = TX_RING_SIZE;
 		fep->total_tx_ring_size += fep->tx_queue[i]->bd.ring_size;
 
-		txq->tx_stop_threshold = FEC_MAX_SKB_DESCS;
-		txq->tx_wake_threshold =
-			(txq->bd.ring_size - txq->tx_stop_threshold) / 2;
+		txq->tx_stop_threshold = MAX_SKB_FRAGS;
+		txq->tx_wake_threshold = MAX_SKB_FRAGS + 1;
 
 		txq->tso_hdrs = dma_alloc_coherent(&fep->pdev->dev,
 					txq->bd.ring_size * TSO_HEADER_SIZE,
@@ -3837,7 +3836,7 @@  static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep,
 
 	entries_free = fec_enet_get_free_txdesc_num(txq);
 	if (entries_free < MAX_SKB_FRAGS + 1) {
-		netdev_err(fep->netdev, "NOT enough BD for SG!\n");
+		netdev_err_once(fep->netdev, "NOT enough BD for SG!\n");
 		return -EBUSY;
 	}