diff mbox series

[net] net: microchip: lan743x : bidirectional throughuput improvement

Message ID 20230927111623.9966-1-vishvambarpanth.s@microchip.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] net: microchip: lan743x : bidirectional throughuput improvement | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1340 this patch: 1340
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 1363 this patch: 1363
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1363 this patch: 1363
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vishvambar Panth S Sept. 27, 2023, 11:16 a.m. UTC
The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
device supports placing the descriptors in memory back to back or
reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
configurable hardware setting. Currently DSPACE is unnecessarily set to
match the host's L1 cache line size, resulting in space reserved in
between descriptors in most platforms and causing a suboptimal behavior
(single PCIe Mem transaction per descriptor). By changing the setting
to DSPACE=16 many descriptors can be packed in a single PCIe Mem
transaction resulting in a massive performance improvement in
bidirectional tests without any negative effects.
Tested and verified improvements on x64 PC and several ARM platforms
(typical data below)

Test setup 1: x64 PC with LAN7430 ---> x64 PC

iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec   170 MBytes   143 Mbits/sec  sender
[  5][TX-C]   0.00-10.04  sec   169 MBytes   141 Mbits/sec  receiver
[  7][RX-C]   0.00-10.00  sec  1.02 GBytes   876 Mbits/sec  sender
[  7][RX-C]   0.00-10.04  sec  1.02 GBytes   870 Mbits/sec  receiver

iperf3 UDP bidirectional with DSPACE set to 16 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec  sender
[  5][TX-C]   0.00-10.04  sec  1.11 GBytes   951 Mbits/sec  receiver
[  7][RX-C]   0.00-10.00  sec  1.10 GBytes   948 Mbits/sec  sender
[  7][RX-C]   0.00-10.04  sec  1.10 GBytes   942 Mbits/sec  receiver

Test setup 2 : RK3399 with LAN7430 ---> x64 PC

RK3399 Spec:
The SOM-RK3399 is ARM module designed and developed by FriendlyElec.
Cores: 64-bit Dual Core Cortex-A72 + Quad Core Cortex-A53
Frequency: Cortex-A72(up to 2.0GHz), Cortex-A53(up to 1.5GHz)
PCIe: PCIe x4, compatible with PCIe 2.1, Dual operation mode

iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec   534 MBytes   448 Mbits/sec  sender
[  5][TX-C]   0.00-10.05  sec   534 MBytes   446 Mbits/sec  receiver
[  7][RX-C]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  sender
[  7][RX-C]   0.00-10.05  sec  1.11 GBytes   946 Mbits/sec  receiver

iperf3 UDP bidirectional with DSPACE set to 16 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec   966 MBytes   810 Mbits/sec   sender
[  5][TX-C]   0.00-10.04  sec   965 MBytes   806 Mbits/sec   receiver
[  7][RX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec   sender
[  7][RX-C]   0.00-10.04  sec  1.07 GBytes   919 Mbits/sec   receiver

Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com>
---
 drivers/net/ethernet/microchip/lan743x_main.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jacob Keller Sept. 29, 2023, 5:35 p.m. UTC | #1
On 9/27/2023 4:16 AM, Vishvambar Panth S wrote:
> The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
> device supports placing the descriptors in memory back to back or
> reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
> configurable hardware setting. Currently DSPACE is unnecessarily set to
> match the host's L1 cache line size, resulting in space reserved in
> between descriptors in most platforms and causing a suboptimal behavior
> (single PCIe Mem transaction per descriptor). By changing the setting
> to DSPACE=16 many descriptors can be packed in a single PCIe Mem
> transaction resulting in a massive performance improvement in
> bidirectional tests without any negative effects.
> Tested and verified improvements on x64 PC and several ARM platforms
> (typical data below)
> 

I assume here the choice of 16 is to get the 16 bytes from 4 dwords?

> Test setup 1: x64 PC with LAN7430 ---> x64 PC
> 
> iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec   170 MBytes   143 Mbits/sec  sender
> [  5][TX-C]   0.00-10.04  sec   169 MBytes   141 Mbits/sec  receiver
> [  7][RX-C]   0.00-10.00  sec  1.02 GBytes   876 Mbits/sec  sender
> [  7][RX-C]   0.00-10.04  sec  1.02 GBytes   870 Mbits/sec  receiver
> 
> iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec  sender
> [  5][TX-C]   0.00-10.04  sec  1.11 GBytes   951 Mbits/sec  receiver
> [  7][RX-C]   0.00-10.00  sec  1.10 GBytes   948 Mbits/sec  sender
> [  7][RX-C]   0.00-10.04  sec  1.10 GBytes   942 Mbits/sec  receiver
> 

Going from barely transmitting to hitting the line rate *and* improving
both Tx and Rx slightly is fantastic. Huge win just avoiding all that
unnecessary wasted PCIe bandwidth.

> Test setup 2 : RK3399 with LAN7430 ---> x64 PC
> 
> RK3399 Spec:
> The SOM-RK3399 is ARM module designed and developed by FriendlyElec.
> Cores: 64-bit Dual Core Cortex-A72 + Quad Core Cortex-A53
> Frequency: Cortex-A72(up to 2.0GHz), Cortex-A53(up to 1.5GHz)
> PCIe: PCIe x4, compatible with PCIe 2.1, Dual operation mode
> 
> iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec   534 MBytes   448 Mbits/sec  sender
> [  5][TX-C]   0.00-10.05  sec   534 MBytes   446 Mbits/sec  receiver
> [  7][RX-C]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  sender
> [  7][RX-C]   0.00-10.05  sec  1.11 GBytes   946 Mbits/sec  receiver
> 
> iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec   966 MBytes   810 Mbits/sec   sender
> [  5][TX-C]   0.00-10.04  sec   965 MBytes   806 Mbits/sec   receiver
> [  7][RX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec   sender
> [  7][RX-C]   0.00-10.04  sec  1.07 GBytes   919 Mbits/sec   receiver
> 
> Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
> Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com>
> ---

Always fun to see a massive win like this! Even just the 2x improvement
on the RK3399 is huge, let alone 6x improvement on the x86 platform.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

>  drivers/net/ethernet/microchip/lan743x_main.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h
> index 52609fc13ad9..6dac6fef7d24 100644
> --- a/drivers/net/ethernet/microchip/lan743x_main.h
> +++ b/drivers/net/ethernet/microchip/lan743x_main.h
> @@ -1067,7 +1067,7 @@ struct lan743x_adapter {
>  #define DMA_DESCRIPTOR_SPACING_32       (32)
>  #define DMA_DESCRIPTOR_SPACING_64       (64)
>  #define DMA_DESCRIPTOR_SPACING_128      (128)
> -#define DEFAULT_DMA_DESCRIPTOR_SPACING  (L1_CACHE_BYTES)
> +#define DEFAULT_DMA_DESCRIPTOR_SPACING  (DMA_DESCRIPTOR_SPACING_16)
>  
>  #define DMAC_CHANNEL_STATE_SET(start_bit, stop_bit) \
>  	(((start_bit) ? 2 : 0) | ((stop_bit) ? 1 : 0))
Jakub Kicinski Oct. 4, 2023, 7:20 p.m. UTC | #2
On Wed, 27 Sep 2023 16:46:23 +0530 Vishvambar Panth S wrote:
> The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
> device supports placing the descriptors in memory back to back or
> reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
> configurable hardware setting. Currently DSPACE is unnecessarily set to
> match the host's L1 cache line size, resulting in space reserved in
> between descriptors in most platforms and causing a suboptimal behavior
> (single PCIe Mem transaction per descriptor). By changing the setting
> to DSPACE=16 many descriptors can be packed in a single PCIe Mem
> transaction resulting in a massive performance improvement in
> bidirectional tests without any negative effects.
> Tested and verified improvements on x64 PC and several ARM platforms
> (typical data below)

Nobody complained for 5 years, and it's not a regression.
Let's not treat this as a fix, please repost without the Fixes tag for
net-next.
Florian Fainelli Oct. 4, 2023, 8:02 p.m. UTC | #3
On 10/4/23 12:20, Jakub Kicinski wrote:
> On Wed, 27 Sep 2023 16:46:23 +0530 Vishvambar Panth S wrote:
>> The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
>> device supports placing the descriptors in memory back to back or
>> reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
>> configurable hardware setting. Currently DSPACE is unnecessarily set to
>> match the host's L1 cache line size, resulting in space reserved in
>> between descriptors in most platforms and causing a suboptimal behavior
>> (single PCIe Mem transaction per descriptor). By changing the setting
>> to DSPACE=16 many descriptors can be packed in a single PCIe Mem
>> transaction resulting in a massive performance improvement in
>> bidirectional tests without any negative effects.
>> Tested and verified improvements on x64 PC and several ARM platforms
>> (typical data below)
> 
> Nobody complained for 5 years, and it's not a regression.
> Let's not treat this as a fix, please repost without the Fixes tag for
> net-next.

As a driver maintainer, you may want to provide some guarantees to your 
end users/customers that from stable version X.Y.Z the performance 
issues have been fixed. Performance improvements are definitively border 
line in terms of being considered as bug fixes though.
Jakub Kicinski Oct. 4, 2023, 8:09 p.m. UTC | #4
On Wed, 4 Oct 2023 13:02:17 -0700 Florian Fainelli wrote:
> > Nobody complained for 5 years, and it's not a regression.
> > Let's not treat this as a fix, please repost without the Fixes tag for
> > net-next.  
> 
> As a driver maintainer, you may want to provide some guarantees to your 
> end users/customers that from stable version X.Y.Z the performance 
> issues have been fixed. Performance improvements are definitively border 
> line in terms of being considered as bug fixes though.

I understand that, but too often people just "feel like a device which
advertises X Mbps / Gbps should reach line rate" while no end user
cares.

Luckily stable rules are pretty clear about this (search for
"performance"): 
https://docs.kernel.org/process/stable-kernel-rules.html

As posted it doesn't fulfill the requirements 
Vishvambar Panth S Oct. 5, 2023, 5:17 a.m. UTC | #5
On Fri, 2023-09-29 at 10:35 -0700, Jacob Keller wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On 9/27/2023 4:16 AM, Vishvambar Panth S wrote:
> > The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but
> > the
> > device supports placing the descriptors in memory back to back or
> > reserving space in between them using its DMA_DESCRIPTOR_SPACE
> > (DSPACE)
> > configurable hardware setting. Currently DSPACE is unnecessarily
> > set to
> > match the host's L1 cache line size, resulting in space reserved in
> > between descriptors in most platforms and causing a suboptimal
> > behavior
> > (single PCIe Mem transaction per descriptor). By changing the
> > setting
> > to DSPACE=16 many descriptors can be packed in a single PCIe Mem
> > transaction resulting in a massive performance improvement in
> > bidirectional tests without any negative effects.
> > Tested and verified improvements on x64 PC and several ARM
> > platforms
> > (typical data below)
> > 
> 
> I assume here the choice of 16 is to get the 16 bytes from 4 dwords?

Thanks for the comments. Yes, it is set to 16 bytes to match the size
of the descriptors which are 4 dwords long.
> 
> > Test setup 1: x64 PC with LAN7430 ---> x64 PC
> > 
> > iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec   170 MBytes   143 Mbits/sec  sender
> > [  5][TX-C]   0.00-10.04  sec   169 MBytes   141 Mbits/sec 
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.02 GBytes   876 Mbits/sec  sender
> > [  7][RX-C]   0.00-10.04  sec  1.02 GBytes   870 Mbits/sec 
> > receiver
> > 
> > iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec  sender
> > [  5][TX-C]   0.00-10.04  sec  1.11 GBytes   951 Mbits/sec 
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.10 GBytes   948 Mbits/sec  sender
> > [  7][RX-C]   0.00-10.04  sec  1.10 GBytes   942 Mbits/sec 
> > receiver
> > 
> 
> Going from barely transmitting to hitting the line rate *and*
> improving
> both Tx and Rx slightly is fantastic. Huge win just avoiding all that
> unnecessary wasted PCIe bandwidth.
> 
> > Test setup 2 : RK3399 with LAN7430 ---> x64 PC
> > 
> > RK3399 Spec:
> > The SOM-RK3399 is ARM module designed and developed by
> > FriendlyElec.
> > Cores: 64-bit Dual Core Cortex-A72 + Quad Core Cortex-A53
> > Frequency: Cortex-A72(up to 2.0GHz), Cortex-A53(up to 1.5GHz)
> > PCIe: PCIe x4, compatible with PCIe 2.1, Dual operation mode
> > 
> > iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec   534 MBytes   448 Mbits/sec  sender
> > [  5][TX-C]   0.00-10.05  sec   534 MBytes   446 Mbits/sec 
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  sender
> > [  7][RX-C]   0.00-10.05  sec  1.11 GBytes   946 Mbits/sec 
> > receiver
> > 
> > iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec   966 MBytes   810 Mbits/sec   sender
> > [  5][TX-C]   0.00-10.04  sec   965 MBytes   806 Mbits/sec  
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec   sender
> > [  7][RX-C]   0.00-10.04  sec  1.07 GBytes   919 Mbits/sec  
> > receiver
> > 
> > Fixes: 23f0703c125b ("lan743x: Add main source files for new
> > lan743x driver")
> > Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com>
> > ---
> 
> Always fun to see a massive win like this! Even just the 2x
> improvement
> on the RK3399 is huge, let alone 6x improvement on the x86 platform.
> 
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> 
> >  drivers/net/ethernet/microchip/lan743x_main.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/microchip/lan743x_main.h
> > b/drivers/net/ethernet/microchip/lan743x_main.h
> > index 52609fc13ad9..6dac6fef7d24 100644
> > --- a/drivers/net/ethernet/microchip/lan743x_main.h
> > +++ b/drivers/net/ethernet/microchip/lan743x_main.h
> > @@ -1067,7 +1067,7 @@ struct lan743x_adapter {
> >  #define DMA_DESCRIPTOR_SPACING_32       (32)
> >  #define DMA_DESCRIPTOR_SPACING_64       (64)
> >  #define DMA_DESCRIPTOR_SPACING_128      (128)
> > -#define DEFAULT_DMA_DESCRIPTOR_SPACING  (L1_CACHE_BYTES)
> > +#define DEFAULT_DMA_DESCRIPTOR_SPACING 
> > (DMA_DESCRIPTOR_SPACING_16)
> > 
> >  #define DMAC_CHANNEL_STATE_SET(start_bit, stop_bit) \
> >       (((start_bit) ? 2 : 0) | ((stop_bit) ? 1 : 0))
Vishvambar Panth S Nov. 1, 2023, 7:20 a.m. UTC | #6
On Wed, 2023-10-04 at 13:09 -0700, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On Wed, 4 Oct 2023 13:02:17 -0700 Florian Fainelli wrote:
> > > Nobody complained for 5 years, and it's not a regression.
> > > Let's not treat this as a fix, please repost without the Fixes
> > > tag for
> > > net-next.
> > 
> > As a driver maintainer, you may want to provide some guarantees to
> > your
> > end users/customers that from stable version X.Y.Z the performance
> > issues have been fixed. Performance improvements are definitively
> > border
> > line in terms of being considered as bug fixes though.
> 
> I understand that, but too often people just "feel like a device
> which
> advertises X Mbps / Gbps should reach line rate" while no end user
> cares.
> 
> Luckily stable rules are pretty clear about this (search for
> "performance"):
> https://docs.kernel.org/process/stable-kernel-rules.html
> 
> As posted it doesn't fulfill the requirements 
Vishvambar Panth S Nov. 9, 2023, 10:53 a.m. UTC | #7
On Wed, 2023-11-01 at 12:52 +0530, Vishvambar Panth S wrote:
> On Wed, 2023-10-04 at 13:09 -0700, Jakub Kicinski wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > know the content is safe
> > 
> > On Wed, 4 Oct 2023 13:02:17 -0700 Florian Fainelli wrote:
> > > > Nobody complained for 5 years, and it's not a regression.
> > > > Let's not treat this as a fix, please repost without the Fixes
> > > > tag for
> > > > net-next.
> > > 
> > > As a driver maintainer, you may want to provide some guarantees
> > > to
> > > your
> > > end users/customers that from stable version X.Y.Z the
> > > performance
> > > issues have been fixed. Performance improvements are definitively
> > > border
> > > line in terms of being considered as bug fixes though.
> > 
> > I understand that, but too often people just "feel like a device
> > which
> > advertises X Mbps / Gbps should reach line rate" while no end user
> > cares.
> > 
> > Luckily stable rules are pretty clear about this (search for
> > "performance"):
> > https://docs.kernel.org/process/stable-kernel-rules.html
> > 
> > As posted it doesn't fulfill the requirements 
> 
> Thanks for your feedback. I apologize for the delayed response.
>  
> The data presented in the patch description was aimed to convince a
> reviewer with the visible impact of the performance boosts in both
> x64
> and ARM platforms. However, the main motivation behind the patch was
> not merely a "good-to-have" improvement but a solution to the
> throughput issues reported by multiple customers in several
> platforms.
> We received lots of customer requests through our ticket site system
> urging us to address the performance issues on multiple kernel
> versions
> including LTS. While it's acknowledged that stable branch rules
> typically do not consider performance fixes that are not documented
> in
> public Bugzilla, this performance enhancement is essential to many of
> our customers and their end users and we believe should therefore be
> considered for stable branch on the basis of it’s visible user
> impact.
>  
> Few issues reported by our customers are mentioned below, even though
> these issues have existed for a long time, the data presented below
> is
> collected from the customer within last 3 months. 
> 
> Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
> kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
> improved the performance to ~900Mbps Rx  in their platform.
> 
> Customer-B using lan743x with v5.10 has an issue with Tx UDP being
> only
> 157Mbps in their platform. Including the fix in the patch boosts the
> performance to ~600Mbps in Tx UDP.
> 
> Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
> Tx/Rx to be 126/723 Mbps and the fix improved the performance to
> 828/956 Mbps.
> 
> Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
> for their platform from UDP Rx 200Mbps. The fix along with few other
> changes helped us to bring Rx perf to 800Mbps in customer’s platform
>  
> This is a kind request for considering the acceptance of this patch
> into the net branch, as it has a significant positive impact on users
> and does not have any adverse effects.
>  
> Thanks,
> Vishvambar Panth S
>  
> 
It has come to my attention that some people may not have received my
whole reply dated Nov 1st (as per
https://patchwork.kernel.org/project/netdevbpf/patch/20230927111623.9966-1-vishvambarpanth.s@microchip.com/#25577895
), possibly due to a non-ASCII character at the cut-off point.
Therefore, I am resending the part that was cut short below.
 
Jakub, would it be possible for you to apply the patch to the net
branch given the additional justification now posted below?
 
-----
 
Thanks for your feedback. I apologize for the delayed response.

The data presented in the patch description was aimed to convince a
reviewer with the visible impact of the performance boosts in both x64
and ARM platforms. However, the main motivation behind the patch was
not merely a "good-to-have" improvement but a solution to the
throughput issues reported by multiple customers in several platforms.
We received lots of customer requests through our ticket site system
urging us to address the performance issues on multiple kernel versions
including LTS. While it's acknowledged that stable branch rules
typically do not consider performance fixes that are not documented in
public Bugzilla, this performance enhancement is essential to many of
our customers and their end users and we believe should therefore be
considered for stable branch on the basis of it’s visible user impact.
Few issues reported by our customers are mentioned below, even though
these issues have existed for a long time, the data presented below is
collected from the customer within last 3 months.
 
Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
improved the performance to ~900Mbps Rx  in their platform.
 
Customer-B using lan743x with v5.10 has an issue with Tx UDP being only
157Mbps in their platform. Including the fix in the patch boosts the
performance to ~600Mbps in Tx UDP.
 
Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
Tx/Rx to be 126/723 Mbps and the fix improved the performance to
828/956 Mbps.
 
Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
for their platform from UDP Rx 200Mbps. The fix along with few other
changes helped us to bring Rx perf to 800Mbps in customer’s platform

This is a kind request for considering the acceptance of this patch
into the net branch, as it has a significant positive impact on users
and does not have any adverse effects.

Thanks,
Vishvambar Panth S
Jakub Kicinski Nov. 9, 2023, 11:04 p.m. UTC | #8
On Thu, 9 Nov 2023 10:53:26 +0000 VishvambarPanth.S@microchip.com wrote:
> Thanks for your feedback. I apologize for the delayed response.
> 
> The data presented in the patch description was aimed to convince a
> reviewer with the visible impact of the performance boosts in both x64
> and ARM platforms. However, the main motivation behind the patch was
> not merely a "good-to-have" improvement but a solution to the
> throughput issues reported by multiple customers in several platforms.
> We received lots of customer requests through our ticket site system
> urging us to address the performance issues on multiple kernel versions
> including LTS. While it's acknowledged that stable branch rules
> typically do not consider performance fixes that are not documented in
> public Bugzilla, this performance enhancement is essential to many of
> our customers and their end users and we believe should therefore be
> considered for stable branch on the basis of it’s visible user impact.
> Few issues reported by our customers are mentioned below, even though
> these issues have existed for a long time, the data presented below is
> collected from the customer within last 3 months.
>  
> Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
> kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
> improved the performance to ~900Mbps Rx  in their platform.
>  
> Customer-B using lan743x with v5.10 has an issue with Tx UDP being only
> 157Mbps in their platform. Including the fix in the patch boosts the
> performance to ~600Mbps in Tx UDP.
>  
> Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
> Tx/Rx to be 126/723 Mbps and the fix improved the performance to
> 828/956 Mbps.
>  
> Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
> for their platform from UDP Rx 200Mbps. The fix along with few other
> changes helped us to bring Rx perf to 800Mbps in customer’s platform
> 
> This is a kind request for considering the acceptance of this patch
> into the net branch, as it has a significant positive impact on users
> and does not have any adverse effects.

Thanks a lot for the details. Unfortunately after further consideration
I can't accept this patch as a fix with clear conscience. The code has
been this way for a long time, performance improvements should end up
in new kernels and people who want to benefit from faster kernels should
not be sticking to old LTS releases.

So please repost for net-next next week, when it's open again.
Vishvambar Panth S Nov. 16, 2023, 5:49 a.m. UTC | #9
On Thu, 2023-11-09 at 15:04 -0800, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On Thu, 9 Nov 2023 10:53:26 +0000 VishvambarPanth.S@microchip.com
> wrote:
> > Thanks for your feedback. I apologize for the delayed response.
> > 
> > The data presented in the patch description was aimed to convince a
> > reviewer with the visible impact of the performance boosts in both
> > x64
> > and ARM platforms. However, the main motivation behind the patch
> > was
> > not merely a "good-to-have" improvement but a solution to the
> > throughput issues reported by multiple customers in several
> > platforms.
> > We received lots of customer requests through our ticket site
> > system
> > urging us to address the performance issues on multiple kernel
> > versions
> > including LTS. While it's acknowledged that stable branch rules
> > typically do not consider performance fixes that are not documented
> > in
> > public Bugzilla, this performance enhancement is essential to many
> > of
> > our customers and their end users and we believe should therefore
> > be
> > considered for stable branch on the basis of it’s visible user
> > impact.
> > Few issues reported by our customers are mentioned below, even
> > though
> > these issues have existed for a long time, the data presented below
> > is
> > collected from the customer within last 3 months.
> > 
> > Customer-A using lan743x with Hisilicon- Kirin 990 processor in
> > 5.10
> > kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
> > improved the performance to ~900Mbps Rx  in their platform.
> > 
> > Customer-B using lan743x with v5.10 has an issue with Tx UDP being
> > only
> > 157Mbps in their platform. Including the fix in the patch boosts
> > the
> > performance to ~600Mbps in Tx UDP.
> > 
> > Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
> > Tx/Rx to be 126/723 Mbps and the fix improved the performance to
> > 828/956 Mbps.
> > 
> > Customer-D using lan743x with Qcom 6490 with v5.4 wanted
> > improvements
> > for their platform from UDP Rx 200Mbps. The fix along with few
> > other
> > changes helped us to bring Rx perf to 800Mbps in customer’s
> > platform
> > 
> > This is a kind request for considering the acceptance of this patch
> > into the net branch, as it has a significant positive impact on
> > users
> > and does not have any adverse effects.
> 
> Thanks a lot for the details. Unfortunately after further
> consideration
> I can't accept this patch as a fix with clear conscience. The code
> has
> been this way for a long time, performance improvements should end up
> in new kernels and people who want to benefit from faster kernels
> should
> not be sticking to old LTS releases.
> 
> So please repost for net-next next week, when it's open again.

Hi Jakub,
Thanks for your inputs. Have submitted this patch to the net-next
branch.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h
index 52609fc13ad9..6dac6fef7d24 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.h
+++ b/drivers/net/ethernet/microchip/lan743x_main.h
@@ -1067,7 +1067,7 @@  struct lan743x_adapter {
 #define DMA_DESCRIPTOR_SPACING_32       (32)
 #define DMA_DESCRIPTOR_SPACING_64       (64)
 #define DMA_DESCRIPTOR_SPACING_128      (128)
-#define DEFAULT_DMA_DESCRIPTOR_SPACING  (L1_CACHE_BYTES)
+#define DEFAULT_DMA_DESCRIPTOR_SPACING  (DMA_DESCRIPTOR_SPACING_16)
 
 #define DMAC_CHANNEL_STATE_SET(start_bit, stop_bit) \
 	(((start_bit) ? 2 : 0) | ((stop_bit) ? 1 : 0))