diff mbox series

[net-next,08/14] net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQ

Message ID 20231113230051.58229-9-saeed@kernel.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next,01/14] net/mlx5: print change on SW reset semaphore returns busy | expand

Checks

Context Check Description
netdev/series_format success Pull request is its own cover letter
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1135 this patch: 1135
netdev/cc_maintainers warning 3 maintainers not CCed: linux-doc@vger.kernel.org richardcochran@gmail.com corbet@lwn.net
netdev/build_clang success Errors and warnings before: 1161 this patch: 1161
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1164 this patch: 1164
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 33 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Saeed Mahameed Nov. 13, 2023, 11 p.m. UTC
From: Rahul Rameshbabu <rrameshbabu@nvidia.com>

Track the number of times the a CQE was expected to not be delivered on PTP
Tx port timestamping CQ. A CQE is expected to not be delivered of a certain
amount of time passes since the corresponding CQE containing the DMA
timestamp information has arrived. Increment the late_cqe counter when such
a CQE does manage to be delivered to the CQ.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5/counters.rst      | 6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c            | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c          | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h          | 1 +
 4 files changed, 9 insertions(+)

Comments

Vadim Fedorenko Nov. 14, 2023, 3:22 p.m. UTC | #1
On 13/11/2023 15:00, Saeed Mahameed wrote:
> From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> 
> Track the number of times the a CQE was expected to not be delivered on PTP
> Tx port timestamping CQ. A CQE is expected to not be delivered of a certain
> amount of time passes since the corresponding CQE containing the DMA
> timestamp information has arrived. Increment the late_cqe counter when such
> a CQE does manage to be delivered to the CQ.
> 

It looks like missed/late timestamps is common problem for NICs. What do
you think about creating common counters in ethtool to have general
interface to provide timestamps counters? It may simplify things a lot.
Rahul Rameshbabu Nov. 16, 2023, 7:51 p.m. UTC | #2
On Tue, 14 Nov, 2023 10:22:43 -0500 Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote:
> On 13/11/2023 15:00, Saeed Mahameed wrote:
>> From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
>> Track the number of times the a CQE was expected to not be delivered on PTP
>> Tx port timestamping CQ. A CQE is expected to not be delivered of a certain
>> amount of time passes since the corresponding CQE containing the DMA
>> timestamp information has arrived. Increment the late_cqe counter when such
>> a CQE does manage to be delivered to the CQ.
>> 
>
> It looks like missed/late timestamps is common problem for NICs. What do
> you think about creating common counters in ethtool to have general
> interface to provide timestamps counters? It may simplify things a lot.

Hi Vadim,

I just took a look at the tree and believe devices supported by the
following drivers have missed/late timestamps.

  - mlx5
  - i40e
  - ice
  - stmicro

The above is from a very precursory grep through the netdev tree and
maybe inaccurate/incomplete.

You probably saw that Saeed already pulled out our vendor specific stat
counters from his v2 submission. Lets discuss the more appropriate
common counters in ethtool.

Similar to fec-stat in Documentation/netlink/specs/ethtool.yaml, should
we make a new statistics group for these timestamp related counters
(timestamp-stat) as follows?

  1. Implement an ethtool_timestamp_stats struct in ethtool.h
  2. Add the relevant callback support in ethtool
  3. Add the correct spec changes in the ynl spec.
  4. Implement the callback in the appropriate drivers
  5. Separately prepare relevant userspace changes for ethtool.

If this seems reasonable, I can start preparing an RFC to send out to
the mailing list.

--
Thanks,

Rahul Rameshbabu
Vadim Fedorenko Nov. 18, 2023, 7 p.m. UTC | #3
On 16/11/2023 14:51, Rahul Rameshbabu wrote:
> On Tue, 14 Nov, 2023 10:22:43 -0500 Vadim Fedorenko <vadim.fedorenko@linux.dev> wrote:
>> On 13/11/2023 15:00, Saeed Mahameed wrote:
>>> From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
>>> Track the number of times the a CQE was expected to not be delivered on PTP
>>> Tx port timestamping CQ. A CQE is expected to not be delivered of a certain
>>> amount of time passes since the corresponding CQE containing the DMA
>>> timestamp information has arrived. Increment the late_cqe counter when such
>>> a CQE does manage to be delivered to the CQ.
>>>
>>
>> It looks like missed/late timestamps is common problem for NICs. What do
>> you think about creating common counters in ethtool to have general
>> interface to provide timestamps counters? It may simplify things a lot.
> 
> Hi Vadim,
> 
> I just took a look at the tree and believe devices supported by the
> following drivers have missed/late timestamps.
> 
>    - mlx5
>    - i40e
>    - ice
>    - stmicro
> 
> The above is from a very precursory grep through the netdev tree and
> maybe inaccurate/incomplete.
> 
> You probably saw that Saeed already pulled out our vendor specific stat
> counters from his v2 submission. Lets discuss the more appropriate
> common counters in ethtool.
> 
> Similar to fec-stat in Documentation/netlink/specs/ethtool.yaml, should
> we make a new statistics group for these timestamp related counters
> (timestamp-stat) as follows?
> 
>    1. Implement an ethtool_timestamp_stats struct in ethtool.h
>    2. Add the relevant callback support in ethtool
>    3. Add the correct spec changes in the ynl spec.
>    4. Implement the callback in the appropriate drivers
>    5. Separately prepare relevant userspace changes for ethtool.
> 
> If this seems reasonable, I can start preparing an RFC to send out to
> the mailing list.

Hi Rahul!

Thanks for taking care of this. The list of drivers seems reasonable at
the first look. But I believe more vendors will jump into it once the
spec is ready.

The new group in the ethtool yml spec seams reasonable. The steps
provided look good, I'll be happy to review your RFC patches.

Thanks,
Vadim
diff mbox series

Patch

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index f69ee1ebee01..5464cd9e2694 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -702,6 +702,12 @@  the software port.
        the device typically ensures not posting the CQE.
      - Error
 
+   * - `ptp_cq[i]_lost_cqe`
+     - Number of times a CQE is expected to not be delivered on the PTP
+       timestamping CQE by the device due to a time delta elapsing. If such a
+       CQE is somehow delivered, `ptp_cq[i]_late_cqe` is incremented.
+     - Error
+
 .. [#ring_global] The corresponding ring and global counters do not share the
                   same name (i.e. do not follow the common naming scheme).
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index bb11e644d24f..a81f9c672b69 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -169,6 +169,7 @@  static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
 		WARN_ON_ONCE(!pos->inuse);
 		pos->inuse = false;
 		list_del(&pos->entry);
+		ptpsq->cq_stats->lost_cqe++;
 	}
 	spin_unlock(&cqe_list->tracker_list_lock);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 4b96ad657145..7e63d7c88894 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -2158,6 +2158,7 @@  static const struct counter_desc ptp_cq_stats_desc[] = {
 	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort) },
 	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort_abs_diff_ns) },
 	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, late_cqe) },
+	{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, lost_cqe) },
 };
 
 static const struct counter_desc ptp_rq_stats_desc[] = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 477c547dcc04..2584f049ec53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -461,6 +461,7 @@  struct mlx5e_ptp_cq_stats {
 	u64 abort;
 	u64 abort_abs_diff_ns;
 	u64 late_cqe;
+	u64 lost_cqe;
 };
 
 struct mlx5e_rep_stats {