mbox series

[net-next,0/5] mlxsw: Improve events processing performance

Message ID cover.1714134205.git.petrm@nvidia.com (mailing list archive)
Headers show
Series mlxsw: Improve events processing performance | expand

Message

Petr Machata April 26, 2024, 12:42 p.m. UTC
Amit Cohen writes:

Spectrum ASICs only support a single interrupt, it means that all the
events are handled by one IRQ (interrupt request) handler.

Currently, we schedule a tasklet to handle events in EQ, then we also use
tasklet for CQ, SDQ and RDQ. Tasklet runs in softIRQ (software IRQ)
context, and will be run on the same CPU which scheduled it. It means that
today we have one CPU which handles all the packets (both network packets
and EMADs) from hardware.

The existing implementation is not efficient and can be improved.

Measuring latency of EMADs in the driver (without the time in FW) shows
that latency is increased by factor of 28 (x28) when network traffic is
handled by the driver.

Measuring throughput in CPU shows that CPU can handle ~35% less packets
of specific flow when corrupted packets are also handled by the driver.
There are cases that these values even worse, we measure decrease of ~44%
packet rate.

This can be improved if network packet and EMADs will be handled in
parallel by several CPUs, and more than that, if different types of traffic
will be handled in parallel. We can achieve this using NAPI.

This set converts the driver to process completions from hardware via NAPI.
The idea is to add NAPI instance per CQ (which is mapped 1:1 to SDQ/RDQ),
which means that each DQ can be handled separately. we have DQ for EMADs
and DQs for each trap group (like LLDP, BGP, L3 drops, etc..). See more
details in commit messages.

An additional improvement which is done as part of this set is related to
doorbells' ring. The idea is to handle small chunks of Rx packets (which
is also recommended using NAPI) and ring doorbells once per chunk. This
reduces the access to hardware which is expensive (time wise) and might
take time because of memory barriers.

With this set we can see better performance.
To summerize:

EMADs latency:
+------------------------------------------------------------------------+
|                  | Before this set           | Now                     |
|------------------|---------------------------|-------------------------|
| Increased factor | x28                       | x1.5                    |
+------------------------------------------------------------------------+
Note that we can see even measurements that show better latency when
traffic is handled by the driver.

Throughput:
+------------------------------------------------------------------------+
|             | Before this set            | Now                         |
|-------------|----------------------------|-----------------------------|
| Reduced     | 35%                        | 6%                          |
| packet rate |                            |                             |
+------------------------------------------------------------------------+

Additional improvements are planned - use page pool for buffer allocations
and avoid cache miss of each SKB using napi_build_skb().

Patch set overview:
Patches #1-#2 improve access to hardware by reducing dorbells' rings
Patch #3-#4 are preaparations for NAPI usage
Patch #5 converts the driver to use NAPI

Amit Cohen (5):
  mlxsw: pci: Handle up to 64 Rx completions in tasklet
  mlxsw: pci: Ring RDQ and CQ doorbells once per several completions
  mlxsw: pci: Initialize dummy net devices for NAPI
  mlxsw: pci: Reorganize 'mlxsw_pci_queue' structure
  mlxsw: pci: Use NAPI for event processing

 drivers/net/ethernet/mellanox/mlxsw/pci.c | 204 ++++++++++++++++------
 1 file changed, 150 insertions(+), 54 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org April 29, 2024, 9:50 a.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri, 26 Apr 2024 14:42:21 +0200 you wrote:
> Amit Cohen writes:
> 
> Spectrum ASICs only support a single interrupt, it means that all the
> events are handled by one IRQ (interrupt request) handler.
> 
> Currently, we schedule a tasklet to handle events in EQ, then we also use
> tasklet for CQ, SDQ and RDQ. Tasklet runs in softIRQ (software IRQ)
> context, and will be run on the same CPU which scheduled it. It means that
> today we have one CPU which handles all the packets (both network packets
> and EMADs) from hardware.
> 
> [...]

Here is the summary with links:
  - [net-next,1/5] mlxsw: pci: Handle up to 64 Rx completions in tasklet
    https://git.kernel.org/netdev/net-next/c/e28d8aba4381
  - [net-next,2/5] mlxsw: pci: Ring RDQ and CQ doorbells once per several completions
    https://git.kernel.org/netdev/net-next/c/6b3d015cdb2a
  - [net-next,3/5] mlxsw: pci: Initialize dummy net devices for NAPI
    https://git.kernel.org/netdev/net-next/c/5d01ed2e9708
  - [net-next,4/5] mlxsw: pci: Reorganize 'mlxsw_pci_queue' structure
    https://git.kernel.org/netdev/net-next/c/c0d9267873bc
  - [net-next,5/5] mlxsw: pci: Use NAPI for event processing
    (no matching commit)

You are awesome, thank you!