Message ID | 20231113230051.58229-9-saeed@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,01/14] net/mlx5: print change on SW reset semaphore returns busy | expand |
On 13/11/2023 15:00, Saeed Mahameed wrote: > From: Rahul Rameshbabu <rrameshbabu@nvidia.com> > > Track the number of times the a CQE was expected to not be delivered on PTP > Tx port timestamping CQ. A CQE is expected to not be delivered of a certain > amount of time passes since the corresponding CQE containing the DMA > timestamp information has arrived. Increment the late_cqe counter when such > a CQE does manage to be delivered to the CQ. > It looks like missed/late timestamps is common problem for NICs. What do you think about creating common counters in ethtool to have general interface to provide timestamps counters? It may simplify things a lot.
On Tue, 14 Nov, 2023 10:22:43 -0500 Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: > On 13/11/2023 15:00, Saeed Mahameed wrote: >> From: Rahul Rameshbabu <rrameshbabu@nvidia.com> >> Track the number of times the a CQE was expected to not be delivered on PTP >> Tx port timestamping CQ. A CQE is expected to not be delivered of a certain >> amount of time passes since the corresponding CQE containing the DMA >> timestamp information has arrived. Increment the late_cqe counter when such >> a CQE does manage to be delivered to the CQ. >> > > It looks like missed/late timestamps is common problem for NICs. What do > you think about creating common counters in ethtool to have general > interface to provide timestamps counters? It may simplify things a lot. Hi Vadim, I just took a look at the tree and believe devices supported by the following drivers have missed/late timestamps. - mlx5 - i40e - ice - stmicro The above is from a very precursory grep through the netdev tree and maybe inaccurate/incomplete. You probably saw that Saeed already pulled out our vendor specific stat counters from his v2 submission. Lets discuss the more appropriate common counters in ethtool. Similar to fec-stat in Documentation/netlink/specs/ethtool.yaml, should we make a new statistics group for these timestamp related counters (timestamp-stat) as follows? 1. Implement an ethtool_timestamp_stats struct in ethtool.h 2. Add the relevant callback support in ethtool 3. Add the correct spec changes in the ynl spec. 4. Implement the callback in the appropriate drivers 5. Separately prepare relevant userspace changes for ethtool. If this seems reasonable, I can start preparing an RFC to send out to the mailing list. -- Thanks, Rahul Rameshbabu
On 16/11/2023 14:51, Rahul Rameshbabu wrote: > On Tue, 14 Nov, 2023 10:22:43 -0500 Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote: >> On 13/11/2023 15:00, Saeed Mahameed wrote: >>> From: Rahul Rameshbabu <rrameshbabu@nvidia.com> >>> Track the number of times the a CQE was expected to not be delivered on PTP >>> Tx port timestamping CQ. A CQE is expected to not be delivered of a certain >>> amount of time passes since the corresponding CQE containing the DMA >>> timestamp information has arrived. Increment the late_cqe counter when such >>> a CQE does manage to be delivered to the CQ. >>> >> >> It looks like missed/late timestamps is common problem for NICs. What do >> you think about creating common counters in ethtool to have general >> interface to provide timestamps counters? It may simplify things a lot. > > Hi Vadim, > > I just took a look at the tree and believe devices supported by the > following drivers have missed/late timestamps. > > - mlx5 > - i40e > - ice > - stmicro > > The above is from a very precursory grep through the netdev tree and > maybe inaccurate/incomplete. > > You probably saw that Saeed already pulled out our vendor specific stat > counters from his v2 submission. Lets discuss the more appropriate > common counters in ethtool. > > Similar to fec-stat in Documentation/netlink/specs/ethtool.yaml, should > we make a new statistics group for these timestamp related counters > (timestamp-stat) as follows? > > 1. Implement an ethtool_timestamp_stats struct in ethtool.h > 2. Add the relevant callback support in ethtool > 3. Add the correct spec changes in the ynl spec. > 4. Implement the callback in the appropriate drivers > 5. Separately prepare relevant userspace changes for ethtool. > > If this seems reasonable, I can start preparing an RFC to send out to > the mailing list. Hi Rahul! Thanks for taking care of this. The list of drivers seems reasonable at the first look. But I believe more vendors will jump into it once the spec is ready. The new group in the ethtool yml spec seams reasonable. The steps provided look good, I'll be happy to review your RFC patches. Thanks, Vadim
diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst index f69ee1ebee01..5464cd9e2694 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst @@ -702,6 +702,12 @@ the software port. the device typically ensures not posting the CQE. - Error + * - `ptp_cq[i]_lost_cqe` + - Number of times a CQE is expected to not be delivered on the PTP + timestamping CQE by the device due to a time delta elapsing. If such a + CQE is somehow delivered, `ptp_cq[i]_late_cqe` is incremented. + - Error + .. [#ring_global] The corresponding ring and global counters do not share the same name (i.e. do not follow the common naming scheme). diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c index bb11e644d24f..a81f9c672b69 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c @@ -169,6 +169,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq, WARN_ON_ONCE(!pos->inuse); pos->inuse = false; list_del(&pos->entry); + ptpsq->cq_stats->lost_cqe++; } spin_unlock(&cqe_list->tracker_list_lock); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c index 4b96ad657145..7e63d7c88894 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c @@ -2158,6 +2158,7 @@ static const struct counter_desc ptp_cq_stats_desc[] = { { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort) }, { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort_abs_diff_ns) }, { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, late_cqe) }, + { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, lost_cqe) }, }; static const struct counter_desc ptp_rq_stats_desc[] = { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h index 477c547dcc04..2584f049ec53 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h @@ -461,6 +461,7 @@ struct mlx5e_ptp_cq_stats { u64 abort; u64 abort_abs_diff_ns; u64 late_cqe; + u64 lost_cqe; }; struct mlx5e_rep_stats {