mbox series

[net-next,V2,0/2] mlx5: Expose NIC temperature via hwmon API

Message ID 20230807180507.22984-1-saeed@kernel.org (mailing list archive)
Headers show
Series mlx5: Expose NIC temperature via hwmon API | expand

Message

Saeed Mahameed Aug. 7, 2023, 6:05 p.m. UTC
From: Saeed Mahameed <saeedm@nvidia.com>

V1->V2:
 - Remove internal tracker tags
 - Remove sanitized mlx5 sensor names
 - add HWMON dependency in the mlx5 Kconfig


Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one of the supported and exposed thermal diode sensors, expose
the following attributes:
1) Input temperature.
2) Highest temperature.
3) Temperature label.
4) Temperature critical max value:
   refers to the high threshold of Warning Event. Will be exposed as
   `tempY_crit` hwmon attribute (RO attribute). For example for
   ConnectX5 HCA's this temperature value will be 105 Celsius, 10
   degrees lower than the HW shutdown temperature).
5) Temperature reset history: resets highest temperature.


Adham Faris (2):
  net/mlx5: Expose port.c/mlx5_query_module_num() function
  net/mlx5: Expose NIC temperature via hardware monitoring kernel API

 .../net/ethernet/mellanox/mlx5/core/Kconfig   |   1 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/hwmon.c   | 418 ++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/hwmon.h   |  24 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   8 +-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   1 +
 .../net/ethernet/mellanox/mlx5/core/port.c    |   2 +-
 .../net/ethernet/mellanox/mlx5/core/thermal.c | 114 -----
 .../net/ethernet/mellanox/mlx5/core/thermal.h |  20 -
 include/linux/mlx5/driver.h                   |   3 +-
 include/linux/mlx5/mlx5_ifc.h                 |  14 +-
 11 files changed, 465 insertions(+), 142 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/thermal.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/thermal.h

Comments

Simon Horman Aug. 8, 2023, 8:07 p.m. UTC | #1
On Mon, Aug 07, 2023 at 11:05:05AM -0700, Saeed Mahameed wrote:
> From: Saeed Mahameed <saeedm@nvidia.com>
> 
> V1->V2:
>  - Remove internal tracker tags
>  - Remove sanitized mlx5 sensor names
>  - add HWMON dependency in the mlx5 Kconfig
> 
> 
> Expose NIC temperature by implementing hwmon kernel API, which turns
> current thermal zone kernel API to redundant.
> 
> For each one of the supported and exposed thermal diode sensors, expose
> the following attributes:
> 1) Input temperature.
> 2) Highest temperature.
> 3) Temperature label.
> 4) Temperature critical max value:
>    refers to the high threshold of Warning Event. Will be exposed as
>    `tempY_crit` hwmon attribute (RO attribute). For example for
>    ConnectX5 HCA's this temperature value will be 105 Celsius, 10
>    degrees lower than the HW shutdown temperature).
> 5) Temperature reset history: resets highest temperature.
> 
> 
> Adham Faris (2):
>   net/mlx5: Expose port.c/mlx5_query_module_num() function
>   net/mlx5: Expose NIC temperature via hardware monitoring kernel API

For series,

Reviewed-by: Simon Horman <horms@kernel.org>
patchwork-bot+netdevbpf@kernel.org Aug. 9, 2023, 11 p.m. UTC | #2
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  7 Aug 2023 11:05:05 -0700 you wrote:
> From: Saeed Mahameed <saeedm@nvidia.com>
> 
> V1->V2:
>  - Remove internal tracker tags
>  - Remove sanitized mlx5 sensor names
>  - add HWMON dependency in the mlx5 Kconfig
> 
> [...]

Here is the summary with links:
  - [net-next,V2,1/2] net/mlx5: Expose port.c/mlx5_query_module_num() function
    https://git.kernel.org/netdev/net-next/c/383a4de3b447
  - [net-next,V2,2/2] net/mlx5: Expose NIC temperature via hardware monitoring kernel API
    https://git.kernel.org/netdev/net-next/c/1f507e80c700

You are awesome, thank you!