diff mbox series

[net-next,v2,1/1] net/mlx5e: Add per queue netdev-genl stats

Message ID 20240510041705.96453-2-jdamato@fastly.com (mailing list archive)
State Deferred
Delegated to: Netdev Maintainers
Headers show
Series mlx5: Add netdev-genl queue stats | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 925 this patch: 925
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 936 this patch: 936
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 936 this patch: 936
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-14--00-00 (tests: 1022)

Commit Message

Joe Damato May 10, 2024, 4:17 a.m. UTC
Add functions to support the netdev-genl per queue stats API.

./cli.py --spec netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope": "queue"}'

...snip

 {'ifindex': 7,
  'queue-id': 62,
  'queue-type': 'rx',
  'rx-alloc-fail': 0,
  'rx-bytes': 105965251,
  'rx-packets': 179790},
 {'ifindex': 7,
  'queue-id': 0,
  'queue-type': 'tx',
  'tx-bytes': 9402665,
  'tx-packets': 17551},

...snip

Also tested with the script tools/testing/selftests/drivers/net/stats.py
in several scenarios to ensure stats tallying was correct:

- on boot (default queue counts)
- adjusting queue count up or down (ethtool -L eth0 combined ...)
- adding mqprio TCs

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 144 ++++++++++++++++++
 1 file changed, 144 insertions(+)

Comments

Jakub Kicinski May 13, 2024, 2:58 p.m. UTC | #1
On Fri, 10 May 2024 04:17:04 +0000 Joe Damato wrote:
> Add functions to support the netdev-genl per queue stats API.
> 
> ./cli.py --spec netlink/specs/netdev.yaml \
> --dump qstats-get --json '{"scope": "queue"}'
> 
> ...snip
> 
>  {'ifindex': 7,
>   'queue-id': 62,
>   'queue-type': 'rx',
>   'rx-alloc-fail': 0,
>   'rx-bytes': 105965251,
>   'rx-packets': 179790},
>  {'ifindex': 7,
>   'queue-id': 0,
>   'queue-type': 'tx',
>   'tx-bytes': 9402665,
>   'tx-packets': 17551},
> 
> ...snip
> 
> Also tested with the script tools/testing/selftests/drivers/net/stats.py
> in several scenarios to ensure stats tallying was correct:
> 
> - on boot (default queue counts)
> - adjusting queue count up or down (ethtool -L eth0 combined ...)
> - adding mqprio TCs
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>

Tariq, could you take a look? Is it good enough to make 6.10? 
Would be great to have it..
Joe Damato May 13, 2024, 5:33 p.m. UTC | #2
On Mon, May 13, 2024 at 07:58:27AM -0700, Jakub Kicinski wrote:
> On Fri, 10 May 2024 04:17:04 +0000 Joe Damato wrote:
> > Add functions to support the netdev-genl per queue stats API.
> > 
> > ./cli.py --spec netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue"}'
> > 
> > ...snip
> > 
> >  {'ifindex': 7,
> >   'queue-id': 62,
> >   'queue-type': 'rx',
> >   'rx-alloc-fail': 0,
> >   'rx-bytes': 105965251,
> >   'rx-packets': 179790},
> >  {'ifindex': 7,
> >   'queue-id': 0,
> >   'queue-type': 'tx',
> >   'tx-bytes': 9402665,
> >   'tx-packets': 17551},
> > 
> > ...snip
> > 
> > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > in several scenarios to ensure stats tallying was correct:
> > 
> > - on boot (default queue counts)
> > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> > - adding mqprio TCs
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> 
> Tariq, could you take a look? Is it good enough to make 6.10? 
> Would be great to have it..

Thanks Jakub.

FYI: I've also sent a v5 of the mlx4 patches which is only a very minor
change from the v4 as suggested by Tariq (see the changelog in that cover
letter).

I am not trying to "rush" either in, to to speak, but if they both made it
to 6.10 it would be great to have the same support on both drivers in the
same kernel release :)
Joe Damato May 13, 2024, 6:46 p.m. UTC | #3
On Mon, May 13, 2024 at 10:33:28AM -0700, Joe Damato wrote:
> On Mon, May 13, 2024 at 07:58:27AM -0700, Jakub Kicinski wrote:
> > On Fri, 10 May 2024 04:17:04 +0000 Joe Damato wrote:
> > > Add functions to support the netdev-genl per queue stats API.
> > > 
> > > ./cli.py --spec netlink/specs/netdev.yaml \
> > > --dump qstats-get --json '{"scope": "queue"}'
> > > 
> > > ...snip
> > > 
> > >  {'ifindex': 7,
> > >   'queue-id': 62,
> > >   'queue-type': 'rx',
> > >   'rx-alloc-fail': 0,
> > >   'rx-bytes': 105965251,
> > >   'rx-packets': 179790},
> > >  {'ifindex': 7,
> > >   'queue-id': 0,
> > >   'queue-type': 'tx',
> > >   'tx-bytes': 9402665,
> > >   'tx-packets': 17551},
> > > 
> > > ...snip
> > > 
> > > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > > in several scenarios to ensure stats tallying was correct:
> > > 
> > > - on boot (default queue counts)
> > > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> > > - adding mqprio TCs
> > > 
> > > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > 
> > Tariq, could you take a look? Is it good enough to make 6.10? 
> > Would be great to have it..
> 
> Thanks Jakub.
> 
> FYI: I've also sent a v5 of the mlx4 patches which is only a very minor
> change from the v4 as suggested by Tariq (see the changelog in that cover
> letter).
> 
> I am not trying to "rush" either in, to to speak, but if they both made it
> to 6.10 it would be great to have the same support on both drivers in the
> same kernel release :)

Err, sorry, just going through emails now and saw that net-next was closed
just before I sent the v5.

My apologies for missing that announcement.

Do I need to re-send after net-next re-opens or will it automatically be in
the queue for net-next?
Jakub Kicinski May 13, 2024, 6:55 p.m. UTC | #4
On Mon, 13 May 2024 11:46:57 -0700 Joe Damato wrote:
> > FYI: I've also sent a v5 of the mlx4 patches which is only a very minor
> > change from the v4 as suggested by Tariq (see the changelog in that cover
> > letter).
> > 
> > I am not trying to "rush" either in, to to speak, but if they both made it
> > to 6.10 it would be great to have the same support on both drivers in the
> > same kernel release :)  
> 
> Err, sorry, just going through emails now and saw that net-next was closed
> just before I sent the v5.
> 
> My apologies for missing that announcement.
> 
> Do I need to re-send after net-next re-opens or will it automatically be in
> the queue for net-next?

Right, unless it somehow magically gets into our 6.10 PR - you'll most
likely have to make a fresh posting after the merge window :)
Tariq Toukan May 14, 2024, 6:44 p.m. UTC | #5
On 10/05/2024 7:17, Joe Damato wrote:
> Add functions to support the netdev-genl per queue stats API.
> 
> ./cli.py --spec netlink/specs/netdev.yaml \
> --dump qstats-get --json '{"scope": "queue"}'
> 
> ...snip
> 
>   {'ifindex': 7,
>    'queue-id': 62,
>    'queue-type': 'rx',
>    'rx-alloc-fail': 0,
>    'rx-bytes': 105965251,
>    'rx-packets': 179790},
>   {'ifindex': 7,
>    'queue-id': 0,
>    'queue-type': 'tx',
>    'tx-bytes': 9402665,
>    'tx-packets': 17551},
> 
> ...snip
> 
> Also tested with the script tools/testing/selftests/drivers/net/stats.py
> in several scenarios to ensure stats tallying was correct:
> 
> - on boot (default queue counts)
> - adjusting queue count up or down (ethtool -L eth0 combined ...)
> - adding mqprio TCs
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>   .../net/ethernet/mellanox/mlx5/core/en_main.c | 144 ++++++++++++++++++
>   1 file changed, 144 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index ffe8919494d5..4a675d8b31b5 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -39,6 +39,7 @@
>   #include <linux/debugfs.h>
>   #include <linux/if_bridge.h>
>   #include <linux/filter.h>
> +#include <net/netdev_queues.h>
>   #include <net/page_pool/types.h>
>   #include <net/pkt_sched.h>
>   #include <net/xdp_sock_drv.h>
> @@ -5282,6 +5283,148 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
>   	return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
>   }
>   
> +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> +				     struct netdev_queue_stats_rx *stats)
> +{
> +	struct mlx5e_priv *priv = netdev_priv(dev);
> +
> +	if (mlx5e_is_uplink_rep(priv))
> +		return;
> +
> +	struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> +	struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> +	struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> +

Don't we allow variable declaration only at the beginning of a block?
Is this style accepted in the networking subsystem?

> +	stats->packets = rq_stats->packets + xskrq_stats->packets;
> +	stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> +	stats->alloc_fail = rq_stats->buff_alloc_err +
> +			    xskrq_stats->buff_alloc_err;
> +}
> +
> +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> +				     struct netdev_queue_stats_tx *stats)
> +{
> +	struct mlx5e_priv *priv = netdev_priv(dev);
> +	struct net_device *netdev = priv->netdev;
> +	struct mlx5e_txqsq *sq;
> +	int j;
> +
> +	if (mlx5e_is_uplink_rep(priv))
> +		return;
> +
> +	for (j = 0; j < netdev->num_tx_queues; j++) {
> +		sq = priv->txq2sq[j];

No sq instance in case interface is down.
This should be a simple arithmetic calculation.
Need to expose the proper functions for this calculation, and use it 
here and in the sq create flows.

Here it seems that you need a very involved user, so he passes the 
correct index i of the SQ that he's interested in..

> +		if (sq->ch_ix == i) {

So you're looking for the first SQ on channel i?
But there might be multiple SQs on channel i...
Also, this SQ might be already included in the base stats.
In addition, this i might be too large for a channel index 
(num_tx_queues can be 8 * num_channels)

The logic here (of mapping from i in num_tx_queues to SQ stats) needs 
careful definition.

> +			stats->packets = sq->stats->packets;
> +			stats->bytes = sq->stats->bytes;
> +			return;
> +		}
> +	}
> +}
> +
> +static void mlx5e_get_base_stats(struct net_device *dev,
> +				 struct netdev_queue_stats_rx *rx,
> +				 struct netdev_queue_stats_tx *tx)
> +{
> +	struct mlx5e_priv *priv = netdev_priv(dev);
> +	int i, j;
> +
> +	if (!mlx5e_is_uplink_rep(priv)) {
> +		rx->packets = 0;
> +		rx->bytes = 0;
> +		rx->alloc_fail = 0;
> +
> +		/* compute stats for deactivated RX queues
> +		 *
> +		 * if priv->channels.num == 0 the device is down, so compute
> +		 * stats for every queue.
> +		 *
> +		 * otherwise, compute only the queues which have been deactivated.
> +		 */
> +		if (priv->channels.num == 0)
> +			i = 0;
> +		else
> +			i = priv->channels.params.num_channels;
> +
> +		for (; i < priv->stats_nch; i++) {
> +			struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> +			struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> +			struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> +
> +			rx->packets += rq_stats->packets + xskrq_stats->packets;
> +			rx->bytes += rq_stats->bytes + xskrq_stats->bytes;
> +			rx->alloc_fail += rq_stats->buff_alloc_err +
> +					  xskrq_stats->buff_alloc_err;

Isn't this equivalent to mlx5e_get_queue_stats_rx(i) ?

> +		}
> +
> +		if (priv->rx_ptp_opened) {
> +			struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> +
> +			rx->packets += rq_stats->packets;
> +			rx->bytes += rq_stats->bytes;
> +		}
> +	}
> +
> +	tx->packets = 0;
> +	tx->bytes = 0;
> +
> +	/* three TX cases to handle:
> +	 *
> +	 * case 1: priv->channels.num == 0, get the stats for every TC
> +	 *         on every queue.
> +	 *
> +	 * case 2: priv->channel.num > 0, so get the stats for every TC on
> +	 *         every deactivated queue.
> +	 *
> +	 * case 3: the number of TCs has changed, so get the stats for the
> +	 *         inactive TCs on active TX queues (handled in the second loop
> +	 *         below).
> +	 */
> +	if (priv->channels.num == 0)
> +		i = 0;
> +	else
> +		i = priv->channels.params.num_channels;
> +

All reads/writes to priv->channels must be under the priv->state_lock.

> +	for (; i < priv->stats_nch; i++) {
> +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> +
> +		for (j = 0; j < priv->max_opened_tc; j++) {
> +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> +
> +			tx->packets += sq_stats->packets;
> +			tx->bytes += sq_stats->bytes;
> +		}
> +	}
> +
> +	/* Handle case 3 described above. */
> +	for (i = 0; i < priv->channels.params.num_channels; i++) {
> +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> +		u8 dcb_num_tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
> +
> +		for (j = dcb_num_tc; j < priv->max_opened_tc; j++) {
> +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> +
> +			tx->packets += sq_stats->packets;
> +			tx->bytes += sq_stats->bytes;
> +		}
> +	}
> +
> +	if (priv->tx_ptp_opened) {
> +		for (j = 0; j < priv->max_opened_tc; j++) {
> +			struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[j];
> +
> +			tx->packets    += sq_stats->packets;
> +			tx->bytes      += sq_stats->bytes;
> +		}
> +	}
> +}
> +
> +static const struct netdev_stat_ops mlx5e_stat_ops = {
> +	.get_queue_stats_rx     = mlx5e_get_queue_stats_rx,
> +	.get_queue_stats_tx     = mlx5e_get_queue_stats_tx,
> +	.get_base_stats         = mlx5e_get_base_stats,
> +};
> +
>   static void mlx5e_build_nic_netdev(struct net_device *netdev)
>   {
>   	struct mlx5e_priv *priv = netdev_priv(netdev);
> @@ -5299,6 +5442,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>   
>   	netdev->watchdog_timeo    = 15 * HZ;
>   
> +	netdev->stat_ops          = &mlx5e_stat_ops;
>   	netdev->ethtool_ops	  = &mlx5e_ethtool_ops;
>   
>   	netdev->vlan_features    |= NETIF_F_SG;
Joe Damato May 14, 2024, 7:17 p.m. UTC | #6
On Tue, May 14, 2024 at 09:44:37PM +0300, Tariq Toukan wrote:
> 
> 
> On 10/05/2024 7:17, Joe Damato wrote:
> > Add functions to support the netdev-genl per queue stats API.
> > 
> > ./cli.py --spec netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue"}'
> > 
> > ...snip
> > 
> >   {'ifindex': 7,
> >    'queue-id': 62,
> >    'queue-type': 'rx',
> >    'rx-alloc-fail': 0,
> >    'rx-bytes': 105965251,
> >    'rx-packets': 179790},
> >   {'ifindex': 7,
> >    'queue-id': 0,
> >    'queue-type': 'tx',
> >    'tx-bytes': 9402665,
> >    'tx-packets': 17551},
> > 
> > ...snip
> > 
> > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > in several scenarios to ensure stats tallying was correct:
> > 
> > - on boot (default queue counts)
> > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> > - adding mqprio TCs
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >   .../net/ethernet/mellanox/mlx5/core/en_main.c | 144 ++++++++++++++++++
> >   1 file changed, 144 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index ffe8919494d5..4a675d8b31b5 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -39,6 +39,7 @@
> >   #include <linux/debugfs.h>
> >   #include <linux/if_bridge.h>
> >   #include <linux/filter.h>
> > +#include <net/netdev_queues.h>
> >   #include <net/page_pool/types.h>
> >   #include <net/pkt_sched.h>
> >   #include <net/xdp_sock_drv.h>
> > @@ -5282,6 +5283,148 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
> >   	return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
> >   }
> > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> > +				     struct netdev_queue_stats_rx *stats)
> > +{
> > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > +
> > +	if (mlx5e_is_uplink_rep(priv))
> > +		return;
> > +
> > +	struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +	struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> > +	struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> > +
> 
> Don't we allow variable declaration only at the beginning of a block?
> Is this style accepted in the networking subsystem?

Thanks for the careful review.

I don't know; checkpatch --strict didn't complain, but I can move
all the variable declarations up in the next revision, if you'd
like.

> > +	stats->packets = rq_stats->packets + xskrq_stats->packets;
> > +	stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> > +	stats->alloc_fail = rq_stats->buff_alloc_err +
> > +			    xskrq_stats->buff_alloc_err;
> > +}
> > +
> > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> > +				     struct netdev_queue_stats_tx *stats)
> > +{
> > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > +	struct net_device *netdev = priv->netdev;
> > +	struct mlx5e_txqsq *sq;
> > +	int j;
> > +
> > +	if (mlx5e_is_uplink_rep(priv))
> > +		return;
> > +
> > +	for (j = 0; j < netdev->num_tx_queues; j++) {
> > +		sq = priv->txq2sq[j];
> 
> No sq instance in case interface is down.

Ah, I see. OK.

> This should be a simple arithmetic calculation.
> Need to expose the proper functions for this calculation, and use it here
> and in the sq create flows.

OK. I was trying to avoid adding more code, but I did see where the
math is done for this elsewhere in the driver and it probably makes
sense to factor that out into its own function to be used in
multiple places.

> Here it seems that you need a very involved user, so he passes the correct
> index i of the SQ that he's interested in..
> 
> > +		if (sq->ch_ix == i) {
> 
> So you're looking for the first SQ on channel i?
> But there might be multiple SQs on channel i...
> Also, this SQ might be already included in the base stats.
> In addition, this i might be too large for a channel index (num_tx_queues
> can be 8 * num_channels)
> 
> The logic here (of mapping from i in num_tx_queues to SQ stats) needs
> careful definition.

OK, I'll go back through and try to sort this out again re-reading
the sq code a few more times.

Thanks again for the careful review, I do really appreciate your
time and effort.

> > +			stats->packets = sq->stats->packets;
> > +			stats->bytes = sq->stats->bytes;
> > +			return;
> > +		}
> > +	}
> > +}
> > +
> > +static void mlx5e_get_base_stats(struct net_device *dev,
> > +				 struct netdev_queue_stats_rx *rx,
> > +				 struct netdev_queue_stats_tx *tx)
> > +{
> > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > +	int i, j;
> > +
> > +	if (!mlx5e_is_uplink_rep(priv)) {
> > +		rx->packets = 0;
> > +		rx->bytes = 0;
> > +		rx->alloc_fail = 0;
> > +
> > +		/* compute stats for deactivated RX queues
> > +		 *
> > +		 * if priv->channels.num == 0 the device is down, so compute
> > +		 * stats for every queue.
> > +		 *
> > +		 * otherwise, compute only the queues which have been deactivated.
> > +		 */
> > +		if (priv->channels.num == 0)
> > +			i = 0;
> > +		else
> > +			i = priv->channels.params.num_channels;
> > +
> > +		for (; i < priv->stats_nch; i++) {
> > +			struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +			struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> > +			struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> > +
> > +			rx->packets += rq_stats->packets + xskrq_stats->packets;
> > +			rx->bytes += rq_stats->bytes + xskrq_stats->bytes;
> > +			rx->alloc_fail += rq_stats->buff_alloc_err +
> > +					  xskrq_stats->buff_alloc_err;
> 
> Isn't this equivalent to mlx5e_get_queue_stats_rx(i) ?

It is, yes. If you'd prefer that I call mlx5e_get_queue_stats_rx, I
can do that.

> > +		}
> > +
> > +		if (priv->rx_ptp_opened) {
> > +			struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> > +
> > +			rx->packets += rq_stats->packets;
> > +			rx->bytes += rq_stats->bytes;
> > +		}
> > +	}
> > +
> > +	tx->packets = 0;
> > +	tx->bytes = 0;
> > +
> > +	/* three TX cases to handle:
> > +	 *
> > +	 * case 1: priv->channels.num == 0, get the stats for every TC
> > +	 *         on every queue.
> > +	 *
> > +	 * case 2: priv->channel.num > 0, so get the stats for every TC on
> > +	 *         every deactivated queue.
> > +	 *
> > +	 * case 3: the number of TCs has changed, so get the stats for the
> > +	 *         inactive TCs on active TX queues (handled in the second loop
> > +	 *         below).
> > +	 */
> > +	if (priv->channels.num == 0)
> > +		i = 0;
> > +	else
> > +		i = priv->channels.params.num_channels;
> > +
> 
> All reads/writes to priv->channels must be under the priv->state_lock.

Ah, yes. You are right and I forgot about this part, yet again and
made the same error with RX above.

I think in the next revision I'll hold the state_lock for the entire
function, right?

I presume that dropping the lock right after reading
priv->channels.params.num_channels is incorrect, because the
channels could disappear while iterating through them.

> > +	for (; i < priv->stats_nch; i++) {
> > +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +
> > +		for (j = 0; j < priv->max_opened_tc; j++) {
> > +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> > +
> > +			tx->packets += sq_stats->packets;
> > +			tx->bytes += sq_stats->bytes;
> > +		}
> > +	}
> > +
> > +	/* Handle case 3 described above. */
> > +	for (i = 0; i < priv->channels.params.num_channels; i++) {
> > +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +		u8 dcb_num_tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
> > +
> > +		for (j = dcb_num_tc; j < priv->max_opened_tc; j++) {
> > +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> > +
> > +			tx->packets += sq_stats->packets;
> > +			tx->bytes += sq_stats->bytes;
> > +		}
> > +	}
> > +
> > +	if (priv->tx_ptp_opened) {
> > +		for (j = 0; j < priv->max_opened_tc; j++) {
> > +			struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[j];
> > +
> > +			tx->packets    += sq_stats->packets;
> > +			tx->bytes      += sq_stats->bytes;
> > +		}
> > +	}
> > +}
> > +
> > +static const struct netdev_stat_ops mlx5e_stat_ops = {
> > +	.get_queue_stats_rx     = mlx5e_get_queue_stats_rx,
> > +	.get_queue_stats_tx     = mlx5e_get_queue_stats_tx,
> > +	.get_base_stats         = mlx5e_get_base_stats,
> > +};
> > +
> >   static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >   {
> >   	struct mlx5e_priv *priv = netdev_priv(netdev);
> > @@ -5299,6 +5442,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >   	netdev->watchdog_timeo    = 15 * HZ;
> > +	netdev->stat_ops          = &mlx5e_stat_ops;
> >   	netdev->ethtool_ops	  = &mlx5e_ethtool_ops;
> >   	netdev->vlan_features    |= NETIF_F_SG;
Joe Damato May 15, 2024, 7:19 a.m. UTC | #7
On Tue, May 14, 2024 at 09:44:37PM +0300, Tariq Toukan wrote:
> 
> 
> On 10/05/2024 7:17, Joe Damato wrote:
> > Add functions to support the netdev-genl per queue stats API.
> > 
> > ./cli.py --spec netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue"}'
> > 
> > ...snip
> > 
> >   {'ifindex': 7,
> >    'queue-id': 62,
> >    'queue-type': 'rx',
> >    'rx-alloc-fail': 0,
> >    'rx-bytes': 105965251,
> >    'rx-packets': 179790},
> >   {'ifindex': 7,
> >    'queue-id': 0,
> >    'queue-type': 'tx',
> >    'tx-bytes': 9402665,
> >    'tx-packets': 17551},
> > 
> > ...snip
> > 
> > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > in several scenarios to ensure stats tallying was correct:
> > 
> > - on boot (default queue counts)
> > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> > - adding mqprio TCs
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >   .../net/ethernet/mellanox/mlx5/core/en_main.c | 144 ++++++++++++++++++
> >   1 file changed, 144 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index ffe8919494d5..4a675d8b31b5 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -39,6 +39,7 @@
> >   #include <linux/debugfs.h>
> >   #include <linux/if_bridge.h>
> >   #include <linux/filter.h>
> > +#include <net/netdev_queues.h>
> >   #include <net/page_pool/types.h>
> >   #include <net/pkt_sched.h>
> >   #include <net/xdp_sock_drv.h>
> > @@ -5282,6 +5283,148 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
> >   	return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
> >   }
> > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> > +				     struct netdev_queue_stats_rx *stats)
> > +{
> > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > +
> > +	if (mlx5e_is_uplink_rep(priv))
> > +		return;
> > +
> > +	struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +	struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> > +	struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> > +
> 
> Don't we allow variable declaration only at the beginning of a block?
> Is this style accepted in the networking subsystem?
> 
> > +	stats->packets = rq_stats->packets + xskrq_stats->packets;
> > +	stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> > +	stats->alloc_fail = rq_stats->buff_alloc_err +
> > +			    xskrq_stats->buff_alloc_err;
> > +}
> > +
> > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> > +				     struct netdev_queue_stats_tx *stats)
> > +{
> > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > +	struct net_device *netdev = priv->netdev;
> > +	struct mlx5e_txqsq *sq;
> > +	int j;
> > +
> > +	if (mlx5e_is_uplink_rep(priv))
> > +		return;
> > +
> > +	for (j = 0; j < netdev->num_tx_queues; j++) {
> > +		sq = priv->txq2sq[j];
> 
> No sq instance in case interface is down.

This seems easily fixable by checking:

  priv->channels.num > 0

> This should be a simple arithmetic calculation.

I'm not sure why I can't use txq2sq? Please see below for my
explanation about why I think txq2sq might be all I need.

> Need to expose the proper functions for this calculation, and use it here
> and in the sq create flows.

I re-read the code several times and my apologies, but I am probably
still missing something.

I don't think a calculation function is necessary (see below), but
if one is really needed, I'd probably add something like:

  static inline int tc_to_txq_ix(struct mlx5e_channel *c,
                                 struct mlx5e_params *params,
                                 int tc)
  {
         return c->ix + tc * params->num_channels;
  }

And call it from mlx5e_open_sqs.

But, I don't understand why any calculation is needed in
mlx5e_get_queue_stats_tx. Please see below for explanation.

> 
> Here it seems that you need a very involved user, so he passes the correct
> index i of the SQ that he's interested in..
> 
> > +		if (sq->ch_ix == i) {
> 
> So you're looking for the first SQ on channel i?
> But there might be multiple SQs on channel i...
> Also, this SQ might be already included in the base stats.
> In addition, this i might be too large for a channel index (num_tx_queues
> can be 8 * num_channels)
>
> The logic here (of mapping from i in num_tx_queues to SQ stats) needs
> careful definition.

I read your comments a few times and read the mlx5 source and I am
probably still missing something obvious here; my apologies.

In net/core/netdev-genl.c, calls to the driver's get_queue_stats_tx
appear to pass [0, netdev->real_num_tx_queues) as i.

I think this means i is a txq_ix in mlx5, because mlx5 sets
netdev->real_num_tx_queues in mlx5e_update_tx_netdev_queues, as:

  nch * ntc + qos_queues

which when expanded is

  priv->channels.params.num_channels * mlx5e_get_dcb_num_tc + qos_queues

So, net/core/netdev-genl.c will be using 0 up to that expression as
i when calling mlx5e_get_queue_stats_tx.

In mlx5: 
  - mlx5e_activate_priv_channels calls mlx5e_build_txq_maps which
    generates priv->txq2sq[txq_ix] = sq for every mlx5e_get_dcb_num_tc
    of every priv->channels.num.
 
This seems to happen every time mlx5e_activate_priv_channels is
called, which I think means that priv->txq2sq is always up to date
and will give the right sq for a given txq_ix (assuming the device
isn't down).

Putting all of this together, I think that mlx5e_get_queue_stats_tx
might need to be something like:

  mutex_lock(&priv->state_lock);
  if (priv->channels.num > 0) {
          sq = priv->txq2sq[i];
          stats->packets = sq->stats->packets;
          stats->bytes = sq->stats->bytes;
  }
  mutex_unlock(&priv->state_lock);

Is this still incorrect somehow?

If so, could you explain a bit more about why a calculation is
needed? It seems like txq2sq would provide the mapping from txq_ix's
(which is what mlx5e_get_queue_stats_tx gets as 'i') and an sq,
which seems like all I would need?

Sorry if I am still not following here.

> > +			stats->packets = sq->stats->packets;
> > +			stats->bytes = sq->stats->bytes;
> > +			return;
> > +		}
> > +	}
> > +}
> > +
> > +static void mlx5e_get_base_stats(struct net_device *dev,
> > +				 struct netdev_queue_stats_rx *rx,
> > +				 struct netdev_queue_stats_tx *tx)
> > +{
> > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > +	int i, j;
> > +
> > +	if (!mlx5e_is_uplink_rep(priv)) {
> > +		rx->packets = 0;
> > +		rx->bytes = 0;
> > +		rx->alloc_fail = 0;
> > +
> > +		/* compute stats for deactivated RX queues
> > +		 *
> > +		 * if priv->channels.num == 0 the device is down, so compute
> > +		 * stats for every queue.
> > +		 *
> > +		 * otherwise, compute only the queues which have been deactivated.
> > +		 */
> > +		if (priv->channels.num == 0)
> > +			i = 0;
> > +		else
> > +			i = priv->channels.params.num_channels;
> > +
> > +		for (; i < priv->stats_nch; i++) {
> > +			struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +			struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> > +			struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> > +
> > +			rx->packets += rq_stats->packets + xskrq_stats->packets;
> > +			rx->bytes += rq_stats->bytes + xskrq_stats->bytes;
> > +			rx->alloc_fail += rq_stats->buff_alloc_err +
> > +					  xskrq_stats->buff_alloc_err;
> 
> Isn't this equivalent to mlx5e_get_queue_stats_rx(i) ?
> 
> > +		}
> > +
> > +		if (priv->rx_ptp_opened) {
> > +			struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> > +
> > +			rx->packets += rq_stats->packets;
> > +			rx->bytes += rq_stats->bytes;
> > +		}
> > +	}
> > +
> > +	tx->packets = 0;
> > +	tx->bytes = 0;
> > +
> > +	/* three TX cases to handle:
> > +	 *
> > +	 * case 1: priv->channels.num == 0, get the stats for every TC
> > +	 *         on every queue.
> > +	 *
> > +	 * case 2: priv->channel.num > 0, so get the stats for every TC on
> > +	 *         every deactivated queue.
> > +	 *
> > +	 * case 3: the number of TCs has changed, so get the stats for the
> > +	 *         inactive TCs on active TX queues (handled in the second loop
> > +	 *         below).
> > +	 */
> > +	if (priv->channels.num == 0)
> > +		i = 0;
> > +	else
> > +		i = priv->channels.params.num_channels;
> > +
> 
> All reads/writes to priv->channels must be under the priv->state_lock.
> 
> > +	for (; i < priv->stats_nch; i++) {
> > +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +
> > +		for (j = 0; j < priv->max_opened_tc; j++) {
> > +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> > +
> > +			tx->packets += sq_stats->packets;
> > +			tx->bytes += sq_stats->bytes;
> > +		}
> > +	}
> > +
> > +	/* Handle case 3 described above. */
> > +	for (i = 0; i < priv->channels.params.num_channels; i++) {
> > +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +		u8 dcb_num_tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
> > +
> > +		for (j = dcb_num_tc; j < priv->max_opened_tc; j++) {
> > +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> > +
> > +			tx->packets += sq_stats->packets;
> > +			tx->bytes += sq_stats->bytes;
> > +		}
> > +	}
> > +
> > +	if (priv->tx_ptp_opened) {
> > +		for (j = 0; j < priv->max_opened_tc; j++) {
> > +			struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[j];
> > +
> > +			tx->packets    += sq_stats->packets;
> > +			tx->bytes      += sq_stats->bytes;
> > +		}
> > +	}
> > +}
> > +
> > +static const struct netdev_stat_ops mlx5e_stat_ops = {
> > +	.get_queue_stats_rx     = mlx5e_get_queue_stats_rx,
> > +	.get_queue_stats_tx     = mlx5e_get_queue_stats_tx,
> > +	.get_base_stats         = mlx5e_get_base_stats,
> > +};
> > +
> >   static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >   {
> >   	struct mlx5e_priv *priv = netdev_priv(netdev);
> > @@ -5299,6 +5442,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >   	netdev->watchdog_timeo    = 15 * HZ;
> > +	netdev->stat_ops          = &mlx5e_stat_ops;
> >   	netdev->ethtool_ops	  = &mlx5e_ethtool_ops;
> >   	netdev->vlan_features    |= NETIF_F_SG;
Tariq Toukan May 23, 2024, 7:27 a.m. UTC | #8
On 15/05/2024 10:19, Joe Damato wrote:
> On Tue, May 14, 2024 at 09:44:37PM +0300, Tariq Toukan wrote:
>>
>>
>> On 10/05/2024 7:17, Joe Damato wrote:
>>> Add functions to support the netdev-genl per queue stats API.
>>>
>>> ./cli.py --spec netlink/specs/netdev.yaml \
>>> --dump qstats-get --json '{"scope": "queue"}'
>>>
>>> ...snip
>>>
>>>    {'ifindex': 7,
>>>     'queue-id': 62,
>>>     'queue-type': 'rx',
>>>     'rx-alloc-fail': 0,
>>>     'rx-bytes': 105965251,
>>>     'rx-packets': 179790},
>>>    {'ifindex': 7,
>>>     'queue-id': 0,
>>>     'queue-type': 'tx',
>>>     'tx-bytes': 9402665,
>>>     'tx-packets': 17551},
>>>
>>> ...snip
>>>
>>> Also tested with the script tools/testing/selftests/drivers/net/stats.py
>>> in several scenarios to ensure stats tallying was correct:
>>>
>>> - on boot (default queue counts)
>>> - adjusting queue count up or down (ethtool -L eth0 combined ...)
>>> - adding mqprio TCs
>>>
>>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>>> ---
>>>    .../net/ethernet/mellanox/mlx5/core/en_main.c | 144 ++++++++++++++++++
>>>    1 file changed, 144 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index ffe8919494d5..4a675d8b31b5 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -39,6 +39,7 @@
>>>    #include <linux/debugfs.h>
>>>    #include <linux/if_bridge.h>
>>>    #include <linux/filter.h>
>>> +#include <net/netdev_queues.h>
>>>    #include <net/page_pool/types.h>
>>>    #include <net/pkt_sched.h>
>>>    #include <net/xdp_sock_drv.h>
>>> @@ -5282,6 +5283,148 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
>>>    	return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
>>>    }
>>> +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
>>> +				     struct netdev_queue_stats_rx *stats)
>>> +{
>>> +	struct mlx5e_priv *priv = netdev_priv(dev);
>>> +
>>> +	if (mlx5e_is_uplink_rep(priv))
>>> +		return;
>>> +
>>> +	struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
>>> +	struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
>>> +	struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
>>> +
>>
>> Don't we allow variable declaration only at the beginning of a block?
>> Is this style accepted in the networking subsystem?
>>
>>> +	stats->packets = rq_stats->packets + xskrq_stats->packets;
>>> +	stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
>>> +	stats->alloc_fail = rq_stats->buff_alloc_err +
>>> +			    xskrq_stats->buff_alloc_err;
>>> +}
>>> +
>>> +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
>>> +				     struct netdev_queue_stats_tx *stats)
>>> +{
>>> +	struct mlx5e_priv *priv = netdev_priv(dev);
>>> +	struct net_device *netdev = priv->netdev;
>>> +	struct mlx5e_txqsq *sq;
>>> +	int j;
>>> +
>>> +	if (mlx5e_is_uplink_rep(priv))
>>> +		return;
>>> +
>>> +	for (j = 0; j < netdev->num_tx_queues; j++) {
>>> +		sq = priv->txq2sq[j];
>>
>> No sq instance in case interface is down.
> 
> This seems easily fixable by checking:
> 
>    priv->channels.num > 0
> 

yes, or test_bit(MLX5E_STATE_OPENED, &priv->state).

>> This should be a simple arithmetic calculation.
> 
> I'm not sure why I can't use txq2sq? Please see below for my
> explanation about why I think txq2sq might be all I need.
> 
>> Need to expose the proper functions for this calculation, and use it here
>> and in the sq create flows.
> 
> I re-read the code several times and my apologies, but I am probably
> still missing something.
> 
> I don't think a calculation function is necessary (see below), but
> if one is really needed, I'd probably add something like:
> 
>    static inline int tc_to_txq_ix(struct mlx5e_channel *c,
>                                   struct mlx5e_params *params,
>                                   int tc)
>    {
>           return c->ix + tc * params->num_channels;

We need the opposite direction.

The goal is to calculate the proper pair of (ch_ix, tc) in order to 
access the correct sq_stats struct in the stats array:
priv->channel_stats[ch_ix]->sq[tc];

Given i in [0, real_num_tx_queues), we should extract the pair as follows:

ch_ix = i % params->num_channels;
tc = i / params->num_channels;

>    }
> 
> And call it from mlx5e_open_sqs.
> 
> But, I don't understand why any calculation is needed in
> mlx5e_get_queue_stats_tx. Please see below for explanation.
> 
>>
>> Here it seems that you need a very involved user, so he passes the correct
>> index i of the SQ that he's interested in..
>>
>>> +		if (sq->ch_ix == i) {
>>
>> So you're looking for the first SQ on channel i?
>> But there might be multiple SQs on channel i...
>> Also, this SQ might be already included in the base stats.
>> In addition, this i might be too large for a channel index (num_tx_queues
>> can be 8 * num_channels)
>>
>> The logic here (of mapping from i in num_tx_queues to SQ stats) needs
>> careful definition.
> 
> I read your comments a few times and read the mlx5 source and I am
> probably still missing something obvious here; my apologies.
> 
> In net/core/netdev-genl.c, calls to the driver's get_queue_stats_tx
> appear to pass [0, netdev->real_num_tx_queues) as i.
> 
> I think this means i is a txq_ix in mlx5, because mlx5 sets
> netdev->real_num_tx_queues in mlx5e_update_tx_netdev_queues, as:
> 
>    nch * ntc + qos_queues
> 
> which when expanded is
> 
>    priv->channels.params.num_channels * mlx5e_get_dcb_num_tc + qos_queues
> 
> So, net/core/netdev-genl.c will be using 0 up to that expression as
> i when calling mlx5e_get_queue_stats_tx.
> 

Right.

> In mlx5:
>    - mlx5e_activate_priv_channels calls mlx5e_build_txq_maps which
>      generates priv->txq2sq[txq_ix] = sq for every mlx5e_get_dcb_num_tc
>      of every priv->channels.num.
>   
> This seems to happen every time mlx5e_activate_priv_channels is
> called, which I think means that priv->txq2sq is always up to date
> and will give the right sq for a given txq_ix (assuming the device
> isn't down).
> 

Right.

> Putting all of this together, I think that mlx5e_get_queue_stats_tx
> might need to be something like:
> 
>    mutex_lock(&priv->state_lock);
>    if (priv->channels.num > 0) {
>            sq = priv->txq2sq[i];
>            stats->packets = sq->stats->packets;
>            stats->bytes = sq->stats->bytes;
>    }
>    mutex_unlock(&priv->state_lock);
> 

Right.
But you can also access the sq_stats directly without going through the 
sq. This is important as the channels might be down.


> Is this still incorrect somehow?
> 
> If so, could you explain a bit more about why a calculation is
> needed? It seems like txq2sq would provide the mapping from txq_ix's
> (which is what mlx5e_get_queue_stats_tx gets as 'i') and an sq,
> which seems like all I would need?
> 
> Sorry if I am still not following here.
> 

I see two possible solutions:

1.
a. in the base, sum all stats for entries that are no longer available 
in the current configuration (independently to if the netdev is up or 
down), like sqs for ch_ix >= params->num_channels.
b. in mlx5e_get_queue_stats_tx, access the sq_stats without going 
through the sq (as it might not exist), this will be done for all i in 0 
ti real_num_tx_queues.

2.
a. in the base, sum all stats for all non existing sqs. if interface is 
down, then no sqs exist, so you sum the whole array.
b. in mlx5e_get_queue_stats_tx, go through the txq2sq and the sq, if it 
exists, if interface is down just return empty stats.

I don't have strong preference, although #1 looks slightly better to me.


>>> +			stats->packets = sq->stats->packets;
>>> +			stats->bytes = sq->stats->bytes;
>>> +			return;
>>> +		}
>>> +	}
>>> +}
>>> +
>>> +static void mlx5e_get_base_stats(struct net_device *dev,
>>> +				 struct netdev_queue_stats_rx *rx,
>>> +				 struct netdev_queue_stats_tx *tx)
>>> +{
>>> +	struct mlx5e_priv *priv = netdev_priv(dev);
>>> +	int i, j;
>>> +
>>> +	if (!mlx5e_is_uplink_rep(priv)) {
>>> +		rx->packets = 0;
>>> +		rx->bytes = 0;
>>> +		rx->alloc_fail = 0;
>>> +
>>> +		/* compute stats for deactivated RX queues
>>> +		 *
>>> +		 * if priv->channels.num == 0 the device is down, so compute
>>> +		 * stats for every queue.
>>> +		 *
>>> +		 * otherwise, compute only the queues which have been deactivated.
>>> +		 */
>>> +		if (priv->channels.num == 0)
>>> +			i = 0;
>>> +		else
>>> +			i = priv->channels.params.num_channels;
>>> +
>>> +		for (; i < priv->stats_nch; i++) {
>>> +			struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
>>> +			struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
>>> +			struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
>>> +
>>> +			rx->packets += rq_stats->packets + xskrq_stats->packets;
>>> +			rx->bytes += rq_stats->bytes + xskrq_stats->bytes;
>>> +			rx->alloc_fail += rq_stats->buff_alloc_err +
>>> +					  xskrq_stats->buff_alloc_err;
>>
>> Isn't this equivalent to mlx5e_get_queue_stats_rx(i) ?
>>
>>> +		}
>>> +
>>> +		if (priv->rx_ptp_opened) {
>>> +			struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
>>> +
>>> +			rx->packets += rq_stats->packets;
>>> +			rx->bytes += rq_stats->bytes;
>>> +		}
>>> +	}
>>> +
>>> +	tx->packets = 0;
>>> +	tx->bytes = 0;
>>> +
>>> +	/* three TX cases to handle:
>>> +	 *
>>> +	 * case 1: priv->channels.num == 0, get the stats for every TC
>>> +	 *         on every queue.
>>> +	 *
>>> +	 * case 2: priv->channel.num > 0, so get the stats for every TC on
>>> +	 *         every deactivated queue.
>>> +	 *
>>> +	 * case 3: the number of TCs has changed, so get the stats for the
>>> +	 *         inactive TCs on active TX queues (handled in the second loop
>>> +	 *         below).
>>> +	 */
>>> +	if (priv->channels.num == 0)
>>> +		i = 0;
>>> +	else
>>> +		i = priv->channels.params.num_channels;
>>> +
>>
>> All reads/writes to priv->channels must be under the priv->state_lock.
>>
>>> +	for (; i < priv->stats_nch; i++) {
>>> +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
>>> +
>>> +		for (j = 0; j < priv->max_opened_tc; j++) {
>>> +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
>>> +
>>> +			tx->packets += sq_stats->packets;
>>> +			tx->bytes += sq_stats->bytes;
>>> +		}
>>> +	}
>>> +
>>> +	/* Handle case 3 described above. */
>>> +	for (i = 0; i < priv->channels.params.num_channels; i++) {
>>> +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
>>> +		u8 dcb_num_tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
>>> +
>>> +		for (j = dcb_num_tc; j < priv->max_opened_tc; j++) {
>>> +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
>>> +
>>> +			tx->packets += sq_stats->packets;
>>> +			tx->bytes += sq_stats->bytes;
>>> +		}
>>> +	}
>>> +
>>> +	if (priv->tx_ptp_opened) {
>>> +		for (j = 0; j < priv->max_opened_tc; j++) {
>>> +			struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[j];
>>> +
>>> +			tx->packets    += sq_stats->packets;
>>> +			tx->bytes      += sq_stats->bytes;
>>> +		}
>>> +	}
>>> +}
>>> +
>>> +static const struct netdev_stat_ops mlx5e_stat_ops = {
>>> +	.get_queue_stats_rx     = mlx5e_get_queue_stats_rx,
>>> +	.get_queue_stats_tx     = mlx5e_get_queue_stats_tx,
>>> +	.get_base_stats         = mlx5e_get_base_stats,
>>> +};
>>> +
>>>    static void mlx5e_build_nic_netdev(struct net_device *netdev)
>>>    {
>>>    	struct mlx5e_priv *priv = netdev_priv(netdev);
>>> @@ -5299,6 +5442,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>>>    	netdev->watchdog_timeo    = 15 * HZ;
>>> +	netdev->stat_ops          = &mlx5e_stat_ops;
>>>    	netdev->ethtool_ops	  = &mlx5e_ethtool_ops;
>>>    	netdev->vlan_features    |= NETIF_F_SG;
Joe Damato May 23, 2024, 11:14 p.m. UTC | #9
On Thu, May 23, 2024 at 10:27:54AM +0300, Tariq Toukan wrote:
> 
> 
> On 15/05/2024 10:19, Joe Damato wrote:
> > On Tue, May 14, 2024 at 09:44:37PM +0300, Tariq Toukan wrote:
> > > 
> > > 
> > > On 10/05/2024 7:17, Joe Damato wrote:
> > > > Add functions to support the netdev-genl per queue stats API.
> > > > 
> > > > ./cli.py --spec netlink/specs/netdev.yaml \
> > > > --dump qstats-get --json '{"scope": "queue"}'
> > > > 
> > > > ...snip
> > > > 
> > > >    {'ifindex': 7,
> > > >     'queue-id': 62,
> > > >     'queue-type': 'rx',
> > > >     'rx-alloc-fail': 0,
> > > >     'rx-bytes': 105965251,
> > > >     'rx-packets': 179790},
> > > >    {'ifindex': 7,
> > > >     'queue-id': 0,
> > > >     'queue-type': 'tx',
> > > >     'tx-bytes': 9402665,
> > > >     'tx-packets': 17551},
> > > > 
> > > > ...snip
> > > > 
> > > > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > > > in several scenarios to ensure stats tallying was correct:
> > > > 
> > > > - on boot (default queue counts)
> > > > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> > > > - adding mqprio TCs
> > > > 
> > > > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > > > ---
> > > >    .../net/ethernet/mellanox/mlx5/core/en_main.c | 144 ++++++++++++++++++
> > > >    1 file changed, 144 insertions(+)
> > > > 
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > > index ffe8919494d5..4a675d8b31b5 100644
> > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > > @@ -39,6 +39,7 @@
> > > >    #include <linux/debugfs.h>
> > > >    #include <linux/if_bridge.h>
> > > >    #include <linux/filter.h>
> > > > +#include <net/netdev_queues.h>
> > > >    #include <net/page_pool/types.h>
> > > >    #include <net/pkt_sched.h>
> > > >    #include <net/xdp_sock_drv.h>
> > > > @@ -5282,6 +5283,148 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
> > > >    	return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
> > > >    }
> > > > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> > > > +				     struct netdev_queue_stats_rx *stats)
> > > > +{
> > > > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > > > +
> > > > +	if (mlx5e_is_uplink_rep(priv))
> > > > +		return;
> > > > +
> > > > +	struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > > > +	struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> > > > +	struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> > > > +
> > > 
> > > Don't we allow variable declaration only at the beginning of a block?
> > > Is this style accepted in the networking subsystem?
> > > 
> > > > +	stats->packets = rq_stats->packets + xskrq_stats->packets;
> > > > +	stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> > > > +	stats->alloc_fail = rq_stats->buff_alloc_err +
> > > > +			    xskrq_stats->buff_alloc_err;
> > > > +}
> > > > +
> > > > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> > > > +				     struct netdev_queue_stats_tx *stats)
> > > > +{
> > > > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > > > +	struct net_device *netdev = priv->netdev;
> > > > +	struct mlx5e_txqsq *sq;
> > > > +	int j;
> > > > +
> > > > +	if (mlx5e_is_uplink_rep(priv))
> > > > +		return;
> > > > +
> > > > +	for (j = 0; j < netdev->num_tx_queues; j++) {
> > > > +		sq = priv->txq2sq[j];
> > > 
> > > No sq instance in case interface is down.
> > 
> > This seems easily fixable by checking:
> > 
> >    priv->channels.num > 0
> > 
> 
> yes, or test_bit(MLX5E_STATE_OPENED, &priv->state).
> 
> > > This should be a simple arithmetic calculation.
> > 
> > I'm not sure why I can't use txq2sq? Please see below for my
> > explanation about why I think txq2sq might be all I need.
> > 
> > > Need to expose the proper functions for this calculation, and use it here
> > > and in the sq create flows.
> > 
> > I re-read the code several times and my apologies, but I am probably
> > still missing something.
> > 
> > I don't think a calculation function is necessary (see below), but
> > if one is really needed, I'd probably add something like:
> > 
> >    static inline int tc_to_txq_ix(struct mlx5e_channel *c,
> >                                   struct mlx5e_params *params,
> >                                   int tc)
> >    {
> >           return c->ix + tc * params->num_channels;
> 
> We need the opposite direction.
> 
> The goal is to calculate the proper pair of (ch_ix, tc) in order to access
> the correct sq_stats struct in the stats array:
> priv->channel_stats[ch_ix]->sq[tc];
> 
> Given i in [0, real_num_tx_queues), we should extract the pair as follows:
> 
> ch_ix = i % params->num_channels;
> tc = i / params->num_channels;

Thanks -- yea I should have been more clear that I know we need the
opposite direction and what the formula is, but thank you for
confirming that.

> >    }
> > 
> > And call it from mlx5e_open_sqs.
> > 
> > But, I don't understand why any calculation is needed in
> > mlx5e_get_queue_stats_tx. Please see below for explanation.
> > 
> > > 
> > > Here it seems that you need a very involved user, so he passes the correct
> > > index i of the SQ that he's interested in..
> > > 
> > > > +		if (sq->ch_ix == i) {
> > > 
> > > So you're looking for the first SQ on channel i?
> > > But there might be multiple SQs on channel i...
> > > Also, this SQ might be already included in the base stats.
> > > In addition, this i might be too large for a channel index (num_tx_queues
> > > can be 8 * num_channels)
> > > 
> > > The logic here (of mapping from i in num_tx_queues to SQ stats) needs
> > > careful definition.
> > 
> > I read your comments a few times and read the mlx5 source and I am
> > probably still missing something obvious here; my apologies.
> > 
> > In net/core/netdev-genl.c, calls to the driver's get_queue_stats_tx
> > appear to pass [0, netdev->real_num_tx_queues) as i.
> > 
> > I think this means i is a txq_ix in mlx5, because mlx5 sets
> > netdev->real_num_tx_queues in mlx5e_update_tx_netdev_queues, as:
> > 
> >    nch * ntc + qos_queues
> > 
> > which when expanded is
> > 
> >    priv->channels.params.num_channels * mlx5e_get_dcb_num_tc + qos_queues
> > 
> > So, net/core/netdev-genl.c will be using 0 up to that expression as
> > i when calling mlx5e_get_queue_stats_tx.
> > 
> 
> Right.
> 
> > In mlx5:
> >    - mlx5e_activate_priv_channels calls mlx5e_build_txq_maps which
> >      generates priv->txq2sq[txq_ix] = sq for every mlx5e_get_dcb_num_tc
> >      of every priv->channels.num.
> > This seems to happen every time mlx5e_activate_priv_channels is
> > called, which I think means that priv->txq2sq is always up to date
> > and will give the right sq for a given txq_ix (assuming the device
> > isn't down).
> > 
> 
> Right.
> 
> > Putting all of this together, I think that mlx5e_get_queue_stats_tx
> > might need to be something like:
> > 
> >    mutex_lock(&priv->state_lock);
> >    if (priv->channels.num > 0) {
> >            sq = priv->txq2sq[i];
> >            stats->packets = sq->stats->packets;
> >            stats->bytes = sq->stats->bytes;
> >    }
> >    mutex_unlock(&priv->state_lock);
> > 
> 
> Right.
> But you can also access the sq_stats directly without going through the sq.
> This is important as the channels might be down.

Right, OK, thanks that makes sense to me.
 
> 
> > Is this still incorrect somehow?
> > 
> > If so, could you explain a bit more about why a calculation is
> > needed? It seems like txq2sq would provide the mapping from txq_ix's
> > (which is what mlx5e_get_queue_stats_tx gets as 'i') and an sq,
> > which seems like all I would need?
> > 
> > Sorry if I am still not following here.
> > 
> 
> I see two possible solutions:
> 
> 1.
> a. in the base, sum all stats for entries that are no longer available in
> the current configuration (independently to if the netdev is up or down),
> like sqs for ch_ix >= params->num_channels.
> b. in mlx5e_get_queue_stats_tx, access the sq_stats without going through
> the sq (as it might not exist), this will be done for all i in 0 ti
> real_num_tx_queues.
> 
> 2.
> a. in the base, sum all stats for all non existing sqs. if interface is
> down, then no sqs exist, so you sum the whole array.
> b. in mlx5e_get_queue_stats_tx, go through the txq2sq and the sq, if it
> exists, if interface is down just return empty stats.
> 
> I don't have strong preference, although #1 looks slightly better to me.

I think if I am understanding what you wrote correctly the
implementation I did for v2 for the base combined with the txq2sq
mapping I proposed above is equal to solution #2 you describe. I
think.

That said...  I can understand why you might prefer solution #1 so I
will start over and try implementing that one for the v3.

Thanks for the guidance.

> > > > +			stats->packets = sq->stats->packets;
> > > > +			stats->bytes = sq->stats->bytes;
> > > > +			return;
> > > > +		}
> > > > +	}
> > > > +}
> > > > +
> > > > +static void mlx5e_get_base_stats(struct net_device *dev,
> > > > +				 struct netdev_queue_stats_rx *rx,
> > > > +				 struct netdev_queue_stats_tx *tx)
> > > > +{
> > > > +	struct mlx5e_priv *priv = netdev_priv(dev);
> > > > +	int i, j;
> > > > +
> > > > +	if (!mlx5e_is_uplink_rep(priv)) {
> > > > +		rx->packets = 0;
> > > > +		rx->bytes = 0;
> > > > +		rx->alloc_fail = 0;
> > > > +
> > > > +		/* compute stats for deactivated RX queues
> > > > +		 *
> > > > +		 * if priv->channels.num == 0 the device is down, so compute
> > > > +		 * stats for every queue.
> > > > +		 *
> > > > +		 * otherwise, compute only the queues which have been deactivated.
> > > > +		 */
> > > > +		if (priv->channels.num == 0)
> > > > +			i = 0;
> > > > +		else
> > > > +			i = priv->channels.params.num_channels;
> > > > +
> > > > +		for (; i < priv->stats_nch; i++) {
> > > > +			struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > > > +			struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
> > > > +			struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
> > > > +
> > > > +			rx->packets += rq_stats->packets + xskrq_stats->packets;
> > > > +			rx->bytes += rq_stats->bytes + xskrq_stats->bytes;
> > > > +			rx->alloc_fail += rq_stats->buff_alloc_err +
> > > > +					  xskrq_stats->buff_alloc_err;
> > > 
> > > Isn't this equivalent to mlx5e_get_queue_stats_rx(i) ?
> > > 
> > > > +		}
> > > > +
> > > > +		if (priv->rx_ptp_opened) {
> > > > +			struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> > > > +
> > > > +			rx->packets += rq_stats->packets;
> > > > +			rx->bytes += rq_stats->bytes;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	tx->packets = 0;
> > > > +	tx->bytes = 0;
> > > > +
> > > > +	/* three TX cases to handle:
> > > > +	 *
> > > > +	 * case 1: priv->channels.num == 0, get the stats for every TC
> > > > +	 *         on every queue.
> > > > +	 *
> > > > +	 * case 2: priv->channel.num > 0, so get the stats for every TC on
> > > > +	 *         every deactivated queue.
> > > > +	 *
> > > > +	 * case 3: the number of TCs has changed, so get the stats for the
> > > > +	 *         inactive TCs on active TX queues (handled in the second loop
> > > > +	 *         below).
> > > > +	 */
> > > > +	if (priv->channels.num == 0)
> > > > +		i = 0;
> > > > +	else
> > > > +		i = priv->channels.params.num_channels;
> > > > +
> > > 
> > > All reads/writes to priv->channels must be under the priv->state_lock.
> > > 
> > > > +	for (; i < priv->stats_nch; i++) {
> > > > +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > > > +
> > > > +		for (j = 0; j < priv->max_opened_tc; j++) {
> > > > +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> > > > +
> > > > +			tx->packets += sq_stats->packets;
> > > > +			tx->bytes += sq_stats->bytes;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/* Handle case 3 described above. */
> > > > +	for (i = 0; i < priv->channels.params.num_channels; i++) {
> > > > +		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > > > +		u8 dcb_num_tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
> > > > +
> > > > +		for (j = dcb_num_tc; j < priv->max_opened_tc; j++) {
> > > > +			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
> > > > +
> > > > +			tx->packets += sq_stats->packets;
> > > > +			tx->bytes += sq_stats->bytes;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	if (priv->tx_ptp_opened) {
> > > > +		for (j = 0; j < priv->max_opened_tc; j++) {
> > > > +			struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[j];
> > > > +
> > > > +			tx->packets    += sq_stats->packets;
> > > > +			tx->bytes      += sq_stats->bytes;
> > > > +		}
> > > > +	}
> > > > +}
> > > > +
> > > > +static const struct netdev_stat_ops mlx5e_stat_ops = {
> > > > +	.get_queue_stats_rx     = mlx5e_get_queue_stats_rx,
> > > > +	.get_queue_stats_tx     = mlx5e_get_queue_stats_tx,
> > > > +	.get_base_stats         = mlx5e_get_base_stats,
> > > > +};
> > > > +
> > > >    static void mlx5e_build_nic_netdev(struct net_device *netdev)
> > > >    {
> > > >    	struct mlx5e_priv *priv = netdev_priv(netdev);
> > > > @@ -5299,6 +5442,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> > > >    	netdev->watchdog_timeo    = 15 * HZ;
> > > > +	netdev->stat_ops          = &mlx5e_stat_ops;
> > > >    	netdev->ethtool_ops	  = &mlx5e_ethtool_ops;
> > > >    	netdev->vlan_features    |= NETIF_F_SG;
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ffe8919494d5..4a675d8b31b5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -39,6 +39,7 @@ 
 #include <linux/debugfs.h>
 #include <linux/if_bridge.h>
 #include <linux/filter.h>
+#include <net/netdev_queues.h>
 #include <net/page_pool/types.h>
 #include <net/pkt_sched.h>
 #include <net/xdp_sock_drv.h>
@@ -5282,6 +5283,148 @@  static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
 	return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
 }
 
+static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
+				     struct netdev_queue_stats_rx *stats)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+
+	if (mlx5e_is_uplink_rep(priv))
+		return;
+
+	struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
+	struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
+	struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
+
+	stats->packets = rq_stats->packets + xskrq_stats->packets;
+	stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
+	stats->alloc_fail = rq_stats->buff_alloc_err +
+			    xskrq_stats->buff_alloc_err;
+}
+
+static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
+				     struct netdev_queue_stats_tx *stats)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct net_device *netdev = priv->netdev;
+	struct mlx5e_txqsq *sq;
+	int j;
+
+	if (mlx5e_is_uplink_rep(priv))
+		return;
+
+	for (j = 0; j < netdev->num_tx_queues; j++) {
+		sq = priv->txq2sq[j];
+		if (sq->ch_ix == i) {
+			stats->packets = sq->stats->packets;
+			stats->bytes = sq->stats->bytes;
+			return;
+		}
+	}
+}
+
+static void mlx5e_get_base_stats(struct net_device *dev,
+				 struct netdev_queue_stats_rx *rx,
+				 struct netdev_queue_stats_tx *tx)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	int i, j;
+
+	if (!mlx5e_is_uplink_rep(priv)) {
+		rx->packets = 0;
+		rx->bytes = 0;
+		rx->alloc_fail = 0;
+
+		/* compute stats for deactivated RX queues
+		 *
+		 * if priv->channels.num == 0 the device is down, so compute
+		 * stats for every queue.
+		 *
+		 * otherwise, compute only the queues which have been deactivated.
+		 */
+		if (priv->channels.num == 0)
+			i = 0;
+		else
+			i = priv->channels.params.num_channels;
+
+		for (; i < priv->stats_nch; i++) {
+			struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
+			struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
+			struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
+
+			rx->packets += rq_stats->packets + xskrq_stats->packets;
+			rx->bytes += rq_stats->bytes + xskrq_stats->bytes;
+			rx->alloc_fail += rq_stats->buff_alloc_err +
+					  xskrq_stats->buff_alloc_err;
+		}
+
+		if (priv->rx_ptp_opened) {
+			struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
+
+			rx->packets += rq_stats->packets;
+			rx->bytes += rq_stats->bytes;
+		}
+	}
+
+	tx->packets = 0;
+	tx->bytes = 0;
+
+	/* three TX cases to handle:
+	 *
+	 * case 1: priv->channels.num == 0, get the stats for every TC
+	 *         on every queue.
+	 *
+	 * case 2: priv->channel.num > 0, so get the stats for every TC on
+	 *         every deactivated queue.
+	 *
+	 * case 3: the number of TCs has changed, so get the stats for the
+	 *         inactive TCs on active TX queues (handled in the second loop
+	 *         below).
+	 */
+	if (priv->channels.num == 0)
+		i = 0;
+	else
+		i = priv->channels.params.num_channels;
+
+	for (; i < priv->stats_nch; i++) {
+		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
+
+		for (j = 0; j < priv->max_opened_tc; j++) {
+			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
+
+			tx->packets += sq_stats->packets;
+			tx->bytes += sq_stats->bytes;
+		}
+	}
+
+	/* Handle case 3 described above. */
+	for (i = 0; i < priv->channels.params.num_channels; i++) {
+		struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
+		u8 dcb_num_tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
+
+		for (j = dcb_num_tc; j < priv->max_opened_tc; j++) {
+			struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
+
+			tx->packets += sq_stats->packets;
+			tx->bytes += sq_stats->bytes;
+		}
+	}
+
+	if (priv->tx_ptp_opened) {
+		for (j = 0; j < priv->max_opened_tc; j++) {
+			struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[j];
+
+			tx->packets    += sq_stats->packets;
+			tx->bytes      += sq_stats->bytes;
+		}
+	}
+}
+
+static const struct netdev_stat_ops mlx5e_stat_ops = {
+	.get_queue_stats_rx     = mlx5e_get_queue_stats_rx,
+	.get_queue_stats_tx     = mlx5e_get_queue_stats_tx,
+	.get_base_stats         = mlx5e_get_base_stats,
+};
+
 static void mlx5e_build_nic_netdev(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -5299,6 +5442,7 @@  static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->watchdog_timeo    = 15 * HZ;
 
+	netdev->stat_ops          = &mlx5e_stat_ops;
 	netdev->ethtool_ops	  = &mlx5e_ethtool_ops;
 
 	netdev->vlan_features    |= NETIF_F_SG;