mbox series

[net-next,v3,0/5] lan743x speed boost

Message ID 20210216010806.31948-1-TheSven73@gmail.com (mailing list archive)
Headers show
Series lan743x speed boost | expand

Message

Sven Van Asbroeck Feb. 16, 2021, 1:08 a.m. UTC
From: Sven Van Asbroeck <thesven73@gmail.com>

Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git # 9ec5eea5b6ac

v2 -> v3:
- Bryan Whitehead:
    + add Bryan's reviewed-by tag to patch 1/5.
    + Only use FRAME_LENGTH if the LS bit is checked.
        If set use the smaller of FRAME_LENGTH or buffer length.
        If clear use buffer length.
    + Correct typo in cover letter history (swap "packet" <-> "buffer").

v1 -> v2:
- Andrew Lunn:
    + always keep to Reverse Christmas Tree.
    + "changing the cache operations to operate on the received length" should
      go in its own, separate patch, so it can be easily backed out if
      "interesting things" should happen with it.

- Bryan Whitehead:
    + multi-buffer patch concept "looks good".
      As a result, I will squash the intermediate "dma buffer only" patch which
      demonstrated the speed boost using an inflexible solution
      (w/o multi-buffers).
    + Rename lan743x_rx_process_packet() to lan743x_rx_process_buffer()
    + Remove unused RX_PROCESS_RESULT_PACKET_DROPPED
    + Rename RX_PROCESS_RESULT_PACKET_RECEIVED to
      RX_PROCESS_RESULT_BUFFER_RECEIVED
    + Fold "unmap from dma" into lan743x_rx_init_ring_element() to prevent
      use-after-dma-unmap issue
    + ensure that skb allocation issues do not result in the driver sending
      incomplete packets to the OS. E.g. a three-buffer packet, with the
      middle buffer missing

- Willem De Bruyn: skb_hwtstamps(skb) always returns a non-null value, if the
  skb parameter points to a valid skb.

Summary of my tests below.
Suggestions for better tests are very welcome.

Tests with debug logging enabled (add #define DEBUG).

1. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Ping to chip, verify correct packet size is sent to OS.
Ping large packets to chip (ping -s 1400), verify correct
 packet size is sent to OS.
Ping using packets around the buffer size, verify number of
 buffers is changing, verify correct packet size is sent
 to OS:
 $ ping -s 472
 $ ping -s 473
 $ ping -s 992
 $ ping -s 993
Verify that each packet is followed by extension processing.

2. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run iperf3 -s on chip, verify that packets come in 3 buffers
 at a time.
Verify that packet size is equal to mtu.
Verify that each packet is followed by extension processing.

3. Set mtu to 2000 on chip and host.
Limit rx buffer size to 500, so mtu (2000) takes 4 buffers.
Run iperf3 -s on chip, verify that packets come in 4 buffers
 at a time.
Verify that packet size is equal to mtu.
Verify that each packet is followed by extension processing.

Tests with debug logging DISabled (remove #define DEBUG).

4. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run iperf3 -s on chip, note sustained rx speed.
Set mtu to 2000, so mtu takes 4 buffers.
Run iperf3 -s on chip, note sustained rx speed.
Verify no packets are dropped in both cases.
Verify speeds are roughly comparable.

Tests with DEBUG_KMEMLEAK on:
$ mount -t debugfs nodev /sys/kernel/debug/
$ echo scan > /sys/kernel/debug/kmemleak

5. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run the following tests concurrently for at least one hour:
 - iperf3 -s on chip
 - ping -> chip

Monitor reported memory leaks.

6. Set mtu to 2000.
Limit rx buffer size to 500, so mtu (2000) takes 4 buffers.
Run the following tests concurrently for at least one hour:
 - iperf3 -s on chip
 - ping -> chip

Monitor reported memory leaks.

7. Simulate low-memory in lan743x_rx_allocate_skb(): fail once every
 100 allocations.
Repeat (5) and (6).
Monitor reported memory leaks.

8. Simulate  low-memory in lan743x_rx_allocate_skb(): fail 10
 allocations in a row in every 100.
Repeat (5) and (6).
Monitor reported memory leaks.

9. Simulate  low-memory in lan743x_rx_trim_skb(): fail 1 allocation
 in every 100.
Repeat (5) and (6).
Monitor reported memory leaks.

Tests with debug logging enabled (add #define DEBUG).

10. Set the chip mtu to 1500, generate lots of network traffic.
Stop all network traffic.
Set the chip and remote mtus to 8000.
Ping remote -> chip: $ ping <chip ip> -s 7000
Verify that the first few received packets are multi-buffer.
Verify no pings are dropped.

Tests with DEBUG_KMEMLEAK on:
$ mount -t debugfs nodev /sys/kernel/debug/
$ echo scan > /sys/kernel/debug/kmemleak

11. Start with chip mtu at 1500, host mtu at 8000.
Run concurrently:
 - iperf3 -s on chip
 - ping -> chip

Cycle the chip mtu between 1500 and 8000 every 10 seconds.

Scan kmemleak periodically to watch for memory leaks.

Verify that the mtu changeover happens smoothly, i.e.
the iperf3 test does not report periods where speed
drops and recovers suddenly.

Note: iperf3 occasionally reports dropped packets on
changeover. This behaviour also occurs on the original
driver, it's not a regression. Possibly related to the
chip's mac rx being disabled when the mtu is changed.

To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Sven Van Asbroeck (5):
  lan743x: boost performance on cpu archs w/o dma cache snooping
  lan743x: sync only the received area of an rx ring buffer
  TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes
  TEST ONLY: lan743x: skb_alloc failure test
  TEST ONLY: lan743x: skb_trim failure test

 drivers/net/ethernet/microchip/lan743x_main.c | 352 +++++++++---------
 drivers/net/ethernet/microchip/lan743x_main.h |   5 +-
 2 files changed, 174 insertions(+), 183 deletions(-)

Comments

Bryan.Whitehead@microchip.com Feb. 17, 2021, 9:43 p.m. UTC | #1
> From: Sven Van Asbroeck <thesven73@gmail.com>
> 
> Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git #
> 9ec5eea5b6ac
> 
> v2 -> v3:
> - Bryan Whitehead:
>     + add Bryan's reviewed-by tag to patch 1/5.
>     + Only use FRAME_LENGTH if the LS bit is checked.
>         If set use the smaller of FRAME_LENGTH or buffer length.
>         If clear use buffer length.
>     + Correct typo in cover letter history (swap "packet" <-> "buffer").
> 
> v1 -> v2:
> - Andrew Lunn:
>     + always keep to Reverse Christmas Tree.
>     + "changing the cache operations to operate on the received length"
> should
>       go in its own, separate patch, so it can be easily backed out if
>       "interesting things" should happen with it.
> 
> - Bryan Whitehead:
>     + multi-buffer patch concept "looks good".
>       As a result, I will squash the intermediate "dma buffer only" patch which
>       demonstrated the speed boost using an inflexible solution
>       (w/o multi-buffers).
>     + Rename lan743x_rx_process_packet() to lan743x_rx_process_buffer()
>     + Remove unused RX_PROCESS_RESULT_PACKET_DROPPED
>     + Rename RX_PROCESS_RESULT_PACKET_RECEIVED to
>       RX_PROCESS_RESULT_BUFFER_RECEIVED
>     + Fold "unmap from dma" into lan743x_rx_init_ring_element() to prevent
>       use-after-dma-unmap issue
>     + ensure that skb allocation issues do not result in the driver sending
>       incomplete packets to the OS. E.g. a three-buffer packet, with the
>       middle buffer missing
> 
> - Willem De Bruyn: skb_hwtstamps(skb) always returns a non-null value, if
> the
>   skb parameter points to a valid skb.
> 
...
> Sven Van Asbroeck (5):
>   lan743x: boost performance on cpu archs w/o dma cache snooping
>   lan743x: sync only the received area of an rx ring buffer
>   TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes
>   TEST ONLY: lan743x: skb_alloc failure test
>   TEST ONLY: lan743x: skb_trim failure test
> 
>  drivers/net/ethernet/microchip/lan743x_main.c | 352 +++++++++---------
>  drivers/net/ethernet/microchip/lan743x_main.h |   5 +-
>  2 files changed, 174 insertions(+), 183 deletions(-)
> 
> --
> 2.17.1

Hi Sven,

Just to let you know, my colleague tested the patches 1 and 2 on x86 PC and we are satisfied with the result.
We confirmed some performance improvements.
We also confirmed PTP is working.

Thanks for your work on this.

Tested-by: UNGLinuxDriver@microchip.com
Sven Van Asbroeck Feb. 17, 2021, 10:04 p.m. UTC | #2
Hi Jakub and Bryan,

On Wed, Feb 17, 2021 at 4:43 PM <Bryan.Whitehead@microchip.com> wrote:
>
> Just to let you know, my colleague tested the patches 1 and 2 on x86 PC and we are satisfied with the result.
> We confirmed some performance improvements.
> We also confirmed PTP is working.
>
> Thanks for your work on this.
>
> Tested-by: UNGLinuxDriver@microchip.com
>

Bryan, that is great news. My pleasure, thank you for your guidance
and considerable expertise.

Jakub, is there anything else you'd like to see from us, before you
are satisfied that patches 1/5 and 2/5 can be merged into your tree?
David Miller Feb. 17, 2021, 10:16 p.m. UTC | #3
From: Sven Van Asbroeck <thesven73@gmail.com>
Date: Wed, 17 Feb 2021 17:04:05 -0500

> Hi Jakub and Bryan,
> 
> Jakub, is there anything else you'd like to see from us, before you
> are satisfied that patches 1/5 and 2/5 can be merged into your tree?

They are already merged into net-next