diff mbox series

[net-next,4/4] net/mlx5: Add sensor name to temperature event message

Message ID 20250213094641.226501-5-tariqt@nvidia.com (mailing list archive)
State Accepted
Commit 46fd50cfcc12368bed9ae5257cc3beaea5b3c193
Delegated to: Netdev Maintainers
Headers show
Series mlx5: Add sensor name in temperature message | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 4 this patch: 4
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-15--03-00 (tests: 891)

Commit Message

Tariq Toukan Feb. 13, 2025, 9:46 a.m. UTC
From: Shahar Shitrit <shshitrit@nvidia.com>

Previously, a temperature event message included a bitmap indicating
which sensors detect high temperatures.

To enhance clarity, we modify the message format to explicitly list
the names of the overheating sensors, alongside the sensors bitmap.
If HWMON is not configured, the event message remains unchanged.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/events.c  | 31 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/hwmon.c   |  5 +++
 .../net/ethernet/mellanox/mlx5/core/hwmon.h   |  1 +
 3 files changed, 34 insertions(+), 3 deletions(-)

Comments

Simon Horman Feb. 15, 2025, 7:29 p.m. UTC | #1
On Thu, Feb 13, 2025 at 11:46:41AM +0200, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> Previously, a temperature event message included a bitmap indicating
> which sensors detect high temperatures.
> 
> To enhance clarity, we modify the message format to explicitly list
> the names of the overheating sensors, alongside the sensors bitmap.
> If HWMON is not configured, the event message remains unchanged.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>

...

> +#if IS_ENABLED(CONFIG_HWMON)
> +static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon,
> +					  u64 bit_set, int bit_set_offset)
> +{
> +	unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
> +	int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
> +	int i;
> +
> +	for_each_set_bit(i, bit_set_ptr, num_bits) {
> +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
> +
> +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
> +	}
> +}

nit:

If you have to respin for some other reason, please consider limiting lines
to 80 columns wide or less here and elsewhere in this patch where it
doesn't reduce readability (subjective I know).

e.g.:

static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev,
                                          struct mlx5_hwmon *hwmon,
                                          u64 bit_set, int bit_set_offset)
{
        unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
        int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
        int i;

        for_each_set_bit(i, bit_set_ptr, num_bits) {
                const char *sensor_name;

                sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);

                mlx5_core_warn(dev, "Sensor name[%d]: %s\n",
                               i + bit_set_offset, sensor_name);
        }
}

...
Jakub Kicinski Feb. 18, 2025, 12:27 a.m. UTC | #2
On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote:
> > +	for_each_set_bit(i, bit_set_ptr, num_bits) {
> > +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
> > +
> > +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
> > +	}
> > +}  
> 
> nit:
> 
> If you have to respin for some other reason, please consider limiting lines
> to 80 columns wide or less here and elsewhere in this patch where it
> doesn't reduce readability (subjective I know).

+1, please try to catch such situations going forward
Tariq Toukan Feb. 19, 2025, 1 p.m. UTC | #3
On 18/02/2025 2:27, Jakub Kicinski wrote:
> On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote:
>>> +	for_each_set_bit(i, bit_set_ptr, num_bits) {
>>> +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
>>> +
>>> +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
>>> +	}
>>> +}
>>
>> nit:
>>
>> If you have to respin for some other reason, please consider limiting lines
>> to 80 columns wide or less here and elsewhere in this patch where it
>> doesn't reduce readability (subjective I know).
> 
> +1, please try to catch such situations going forward
> 

Hi Jakub,

This was not missed.
This is not a new thing...
We've been enforcing a max line length of 100 chars in mlx5 driver for 
the past few years.
I don't have the full image now, but I'm convinced that this dates back 
to an agreement between the mlx5 and netdev maintainers at that time.

80 chars could be too restrictive, especially with today's large 
monitors, while 100-chars is still highly readable.
This is subjective of course...

If you don't have a strong preference, we'll keep the current 100 chars 
limit. Otherwise, just let me know and we'll start enforcing the 
80-chars limit for future patches.

Regards,
Tariq
Jakub Kicinski Feb. 19, 2025, 3:28 p.m. UTC | #4
On Wed, 19 Feb 2025 15:00:57 +0200 Tariq Toukan wrote:
> >> If you have to respin for some other reason, please consider limiting lines
> >> to 80 columns wide or less here and elsewhere in this patch where it
> >> doesn't reduce readability (subjective I know).  
> > 
> > +1, please try to catch such situations going forward
> 
> This was not missed.
> This is not a new thing...
> We've been enforcing a max line length of 100 chars in mlx5 driver for 
> the past few years.
> I don't have the full image now, but I'm convinced that this dates back 
> to an agreement between the mlx5 and netdev maintainers at that time.
> 
> 80 chars could be too restrictive, especially with today's large 
> monitors, while 100-chars is still highly readable.
> This is subjective of course...
> 
> If you don't have a strong preference, we'll keep the current 100 chars 
> limit. Otherwise, just let me know and we'll start enforcing the 
> 80-chars limit for future patches.

Right, I think mlx5 is the only exception to the 80 column guidance.
I don't think it's resulting in more readable code, so yes, my
preference is to end this experiment.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index e85a9042e3c2..01c5f5990f9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -6,6 +6,7 @@ 
 #include "mlx5_core.h"
 #include "lib/eq.h"
 #include "lib/events.h"
+#include "hwmon.h"
 
 struct mlx5_event_nb {
 	struct mlx5_nb  nb;
@@ -153,11 +154,28 @@  static int any_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+#if IS_ENABLED(CONFIG_HWMON)
+static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon,
+					  u64 bit_set, int bit_set_offset)
+{
+	unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
+	int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
+	int i;
+
+	for_each_set_bit(i, bit_set_ptr, num_bits) {
+		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
+
+		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
+	}
+}
+#endif /* CONFIG_HWMON */
+
 /* type == MLX5_EVENT_TYPE_TEMP_WARN_EVENT */
 static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 {
 	struct mlx5_event_nb *event_nb = mlx5_nb_cof(nb, struct mlx5_event_nb, nb);
 	struct mlx5_events   *events   = event_nb->ctx;
+	struct mlx5_core_dev *dev      = events->dev;
 	struct mlx5_eqe      *eqe      = data;
 	u64 value_lsb;
 	u64 value_msb;
@@ -169,10 +187,17 @@  static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 	value_lsb &= 0x1;
 	value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
 
-	if (net_ratelimit())
-		mlx5_core_warn(events->dev,
-			       "High temperature on sensors with bit set %#llx %#llx",
+	if (net_ratelimit()) {
+		mlx5_core_warn(dev, "High temperature on sensors with bit set %#llx %#llx.\n",
 			       value_msb, value_lsb);
+#if IS_ENABLED(CONFIG_HWMON)
+		if (dev->hwmon) {
+			print_sensor_names_in_bit_set(dev, dev->hwmon, value_lsb, 0);
+			print_sensor_names_in_bit_set(dev, dev->hwmon, value_msb,
+						      sizeof(value_lsb) * BITS_PER_BYTE);
+		}
+#endif
+	}
 
 	return NOTIFY_OK;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
index 353f81dccd1c..4ba2636d7fb6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
@@ -416,3 +416,8 @@  void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev)
 	mlx5_hwmon_free(hwmon);
 	mdev->hwmon = NULL;
 }
+
+const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel)
+{
+	return hwmon->temp_channel_desc[channel].sensor_name;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
index 999654a9b9da..f38271c22c10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
@@ -10,6 +10,7 @@ 
 
 int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev);
 void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev);
+const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel);
 
 #else
 static inline int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev)