mbox series

[RFC,net-next,v4,0/2] mlx5: Add netdev-genl queue stats

Message ID 20240604004629.299699-1-jdamato@fastly.com (mailing list archive)
Headers show
Series mlx5: Add netdev-genl queue stats | expand

Message

Joe Damato June 4, 2024, 12:46 a.m. UTC
Greetings:

Welcome to rfc v4.

Significant rewrite from v3 and hopefully getting closer to correctly
exporting per queue stats from mlx5. Please see changelog below for
detailed changes, especially regarding PTP stats.

Note that my NIC does not seem to support PTP and I couldn't get the
mlnx-tools mlnx_qos script to work, so I was only able to test the
following cases:

- device up at booot
- adjusting queue counts
- device down (e.g. ip link set dev eth4 down)

Please see the commit message of patch 2/2 for more details on output
and test cases.

v3 thread: https://lore.kernel.org/lkml/20240601113913.GA696607@kernel.org/T/

Thanks,
Joe

rfcv3 -> rfcv4:
 - Patch 1/2 now creates a mapping (priv->txq2sq_stats) which maps txq
   indices to sq_stats structures so stats can be accessed directly.
   This mapping is kept up to date along side txq2sq.

 - Patch 2/2:
   - All mutex_lock/unlock on state_lock has been dropped.
   - mlx5e_get_queue_stats_rx now uses ASSERT_RTNL() and has a special
     case for PTP. If PTP was ever opened, is currently opened, and the
     channel index matches, stats for PTP RX are output.
   - mlx5e_get_queue_stats_tx rewritten to use priv->txq2sq_stats. No
     corner cases are needed here because any txq idx (passed in as i)
     will have an up to date mapping in priv->txq2sq_stats.
   - mlx5e_get_base_stats:
     - in the RX case:
       - iterates from [params.num_channels, stats_nch) collecting
         stats.
       - if ptp was ever opened but is currently closed, add the PTP
         stats.
     - in the TX case:
       - handle 2 cases:
         - the channel is available, so sum only the unavailable TCs
           [mlx5e_get_dcb_num_tc, max_opened_tc).
         - the channel is unavailable, so sum all TCs [0, max_opened_tc).
       - if ptp was ever opened but is currently closed, add the PTP
         sq stats.

v2 -> rfcv3:
 - Added patch 1/2 which creates some helpers for computing the txq_ix
   and ch_ix/tc_ix.

 - Patch 2/2 modified in several ways:
   - Fixed variable declarations in mlx5e_get_queue_stats_rx to be at
     the start of the function.
   - mlx5e_get_queue_stats_tx rewritten to access sq stats directly by
     using the helpers added in the previous patch.
   - mlx5e_get_base_stats modified in several ways:
     - Took the state_lock when accessing priv->channels.
     - For the base RX stats, code was simplified to call
       mlx5e_get_queue_stats_rx instead of repeating the same code.
     - For the base TX stats, I attempted to implement what I think
       Tariq suggested in the previous thread:
         - for available channels, only unavailable TC stats are summed
	 - for unavailable channels, all stats for TCs up to
	   max_opened_tc are summed.

v1 - > v2:
  - Essentially a full rewrite after comments from Jakub, Tariq, and
    Zhu.

Joe Damato (2):
  net/mlx5e: Add txq to sq stats mapping
  net/mlx5e: Add per queue netdev-genl stats

 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +
 .../net/ethernet/mellanox/mlx5/core/en/qos.c  |  13 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 149 +++++++++++++++++-
 3 files changed, 161 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski June 6, 2024, 4:23 p.m. UTC | #1
On Tue,  4 Jun 2024 00:46:24 +0000 Joe Damato wrote:
> Significant rewrite from v3 and hopefully getting closer to correctly
> exporting per queue stats from mlx5. Please see changelog below for
> detailed changes, especially regarding PTP stats.
> 
> Note that my NIC does not seem to support PTP and I couldn't get the
> mlnx-tools mlnx_qos script to work, so I was only able to test the
> following cases:
> 
> - device up at booot
> - adjusting queue counts
> - device down (e.g. ip link set dev eth4 down)
> 
> Please see the commit message of patch 2/2 for more details on output
> and test cases.

nvidia, please review this

It's less than 200 lines of code, and every time Joe posts a new
version he has to wait a week to get feedback.