mbox series

[v2,net-next,0/8] ionic: rework fix for doorbell miss

Message ID 20240619003257.6138-1-shannon.nelson@amd.com (mailing list archive)
Headers show
Series ionic: rework fix for doorbell miss | expand

Message

Nelson, Shannon June 19, 2024, 12:32 a.m. UTC
A latency test in a scaled out setting (many VMs with many queues)
has uncovered an issue with our missed doorbell fix from
commit b69585bfcece ("ionic: missed doorbell workaround")

As a refresher, the Elba ASIC has an issue where once in a blue
moon it might miss/drop a queue doorbell notification from
the driver.  This can result in Tx timeouts and potential Rx
buffer misses.

The basic problem with the original solution is that
we're delaying things with a timer for every single queue,
periodically using mod_timer() to reset to reset the alarm, and
mod_timer() becomes a more and more expensive thing as there
are more and more VFs and queues each with their own timer.
A ping-pong latency test tends to exacerbate the effect such
that every napi is doing a mod_timer() in every cycle.

An alternative has been worked out to replace this using
periodic workqueue items outside the napi cycle to request a
napi_schedule driven by a single delayed-workqueue per device
rather than a timer for every queue.  Also, now that newer
firmware is actually reporting its ASIC type, we can restrict
this to the appropriate chip.

The testing scenario used 128 VFs in UP state, 16 queues per
VF, and latency tests were done using TCP_RR with adaptive
interrupt coalescing enabled, running on 1 VF.  We would see
99th percentile latencies of up to 900us range, with some max
fliers as much as 4ms.

With these fixes the 99th percentile latencies are typically well
under 50us with the occasional max under 500us.

v2:
  - 3/8: add commentary for why have a private work queue (Jakub)
  - 4/8: no open-code of napi_schedule() (Jakub)
  - 4/8: watch for deadlock with cancel_delayed_work_sync() (Jakub)
  - 7/8: better ionic_lif field order after reducing rx_copybreak size (David)
  - 7/8: include some pahole diff info (Andrew)
  - 8/8: use bool not bitflag for doorbell_wa (David)

v1:
https://lore.kernel.org/netdev/20240610230706.34883-1-shannon.nelson@amd.com/

Brett Creeley (3):
  ionic: Keep interrupt affinity up to date
  ionic: Use an u16 for rx_copybreak
  ionic: Only run the doorbell workaround for certain asic_type

Shannon Nelson (5):
  ionic: remove missed doorbell per-queue timer
  ionic: add private workqueue per-device
  ionic: add work item for missed-doorbell check
  ionic: add per-queue napi_schedule for doorbell check
  ionic: check for queue deadline in doorbell_napi_work

 drivers/net/ethernet/pensando/ionic/ionic.h   |   7 +
 .../ethernet/pensando/ionic/ionic_bus_pci.c   |   3 +
 .../net/ethernet/pensando/ionic/ionic_dev.c   | 129 +++++++++++++++-
 .../net/ethernet/pensando/ionic/ionic_dev.h   |   8 +-
 .../ethernet/pensando/ionic/ionic_ethtool.c   |  11 +-
 .../net/ethernet/pensando/ionic/ionic_lif.c   | 144 ++++++++++++------
 .../net/ethernet/pensando/ionic/ionic_lif.h   |  12 +-
 .../net/ethernet/pensando/ionic/ionic_main.c  |   2 +-
 .../net/ethernet/pensando/ionic/ionic_txrx.c  |  24 ++-
 9 files changed, 264 insertions(+), 76 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org June 20, 2024, 1:40 a.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 18 Jun 2024 17:32:49 -0700 you wrote:
> A latency test in a scaled out setting (many VMs with many queues)
> has uncovered an issue with our missed doorbell fix from
> commit b69585bfcece ("ionic: missed doorbell workaround")
> 
> As a refresher, the Elba ASIC has an issue where once in a blue
> moon it might miss/drop a queue doorbell notification from
> the driver.  This can result in Tx timeouts and potential Rx
> buffer misses.
> 
> [...]

Here is the summary with links:
  - [v2,net-next,1/8] ionic: remove missed doorbell per-queue timer
    https://git.kernel.org/netdev/net-next/c/4aaa49a282ad
  - [v2,net-next,2/8] ionic: Keep interrupt affinity up to date
    https://git.kernel.org/netdev/net-next/c/d458d4b4fd43
  - [v2,net-next,3/8] ionic: add private workqueue per-device
    https://git.kernel.org/netdev/net-next/c/9e25450da700
  - [v2,net-next,4/8] ionic: add work item for missed-doorbell check
    https://git.kernel.org/netdev/net-next/c/4ded136c78f8
  - [v2,net-next,5/8] ionic: add per-queue napi_schedule for doorbell check
    https://git.kernel.org/netdev/net-next/c/d7f9bc685918
  - [v2,net-next,6/8] ionic: check for queue deadline in doorbell_napi_work
    https://git.kernel.org/netdev/net-next/c/55a3982ec721
  - [v2,net-next,7/8] ionic: Use an u16 for rx_copybreak
    https://git.kernel.org/netdev/net-next/c/f703d56c0305
  - [v2,net-next,8/8] ionic: Only run the doorbell workaround for certain asic_type
    https://git.kernel.org/netdev/net-next/c/da0262c2c931

You are awesome, thank you!