diff mbox series

[1/1] hwmon (occ): Add temp sensor value check

Message ID 1555500396-11976-1-git-send-email-a.amelkin@yadro.com (mailing list archive)
State Changes Requested
Headers show
Series [1/1] hwmon (occ): Add temp sensor value check | expand

Commit Message

Alexander Amelkin April 17, 2019, 11:26 a.m. UTC
From: Alexander Soldatov <a.soldatov@yadro.com>

The occ driver supports two formats for the temp sensor value.

The OCC firmware for P8 supports only the first format, for which
no range checking or error processing is performed in the driver.
Inspecting the OCC sources for P8 reveals that OCC may send
a special value 0xFFFF to indicate that a sensor read timeout
has occured, see

https://github.com/open-power/occ/blob/master_p8/src/occ/cmdh/cmdh_fsp_cmds.c#L395

That situation wasn't handled in the driver. This patch adds invalid
temp value check for the sensor data format 1 and handles it the same
way as it is done for the format 2, where EREMOTEIO is reported for
this case.

Signed-off-by: Alexander Soldatov <a.soldatov@yadro.com>
Signed-off-by: Alexander Amelkin <a.amelkin@yadro.com>
Reviewed-by: Alexander Amelkin <a.amelkin@yadro.com>
Cc: Edward A. James <eajames@us.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
---
 drivers/hwmon/occ/common.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Guenter Roeck April 17, 2019, 1:03 p.m. UTC | #1
On 4/17/19 4:26 AM, Alexander Amelkin wrote:
> From: Alexander Soldatov <a.soldatov@yadro.com>
> 
> The occ driver supports two formats for the temp sensor value.
> 
> The OCC firmware for P8 supports only the first format, for which
> no range checking or error processing is performed in the driver.
> Inspecting the OCC sources for P8 reveals that OCC may send
> a special value 0xFFFF to indicate that a sensor read timeout
> has occured, see
> 
occurred

> https://github.com/open-power/occ/blob/master_p8/src/occ/cmdh/cmdh_fsp_cmds.c#L395
> 
> That situation wasn't handled in the driver. This patch adds invalid
> temp value check for the sensor data format 1 and handles it the same
> way as it is done for the format 2, where EREMOTEIO is reported for
> this case.
>
ETIMEDOUT ? Though that is really a corner case, so I guess both are fine.

> Signed-off-by: Alexander Soldatov <a.soldatov@yadro.com>
> Signed-off-by: Alexander Amelkin <a.amelkin@yadro.com>
> Reviewed-by: Alexander Amelkin <a.amelkin@yadro.com>
> Cc: Edward A. James <eajames@us.ibm.com>
> Cc: Joel Stanley <joel@jms.id.au>
> ---
>   drivers/hwmon/occ/common.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 4679acb..825631c 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -235,6 +235,10 @@ static ssize_t occ_show_temp_1(struct device *dev,
>   		val = get_unaligned_be16(&temp->sensor_id);
>   		break;
>   	case 1:
> +		/* If a sensor timed out long enough,

"timed out" is sufficient. "timed out long enough" is difficult to understand.

> +		   OCC returns 0xFFFF for that sensor.*/

/*
  * This is how multi-line comments look like
  */

Please run checkpatch on your patches.

Thanks,
Guenter

> +		if (temp->value == 0xFFFF)
> +			return -EREMOTEIO;
>   		val = get_unaligned_be16(&temp->value) * 1000;
>   		break;
>   	default:
>
Alexander Amelkin April 17, 2019, 5:46 p.m. UTC | #2
17.04.2019 16:03, Guenter Roeck wrote:
> On 4/17/19 4:26 AM, Alexander Amelkin wrote:
>> Inspecting the OCC sources for P8 reveals that OCC may send
>> a special value 0xFFFF to indicate that a sensor read timeout
>> has occured, see
>>
> occurred

Yup. A typo. Will fix.

>
>> https://github.com/open-power/occ/blob/master_p8/src/occ/cmdh/cmdh_fsp_cmds.c#L395
>>
>> That situation wasn't handled in the driver. This patch adds invalid
>> temp value check for the sensor data format 1 and handles it the same
>> way as it is done for the format 2, where EREMOTEIO is reported for
>> this case.
>>
> ETIMEDOUT ? Though that is really a corner case, so I guess both are fine.

We just reused the error code used for the same case for format 2 in common.c:309 (inside occ_show_temp_2() function).

We thought it would be strange to report different codes for the same case in different format versions.

Besides, it's quite a remote I/O error indeed.

>
>> Signed-off-by: Alexander Soldatov <a.soldatov@yadro.com>
>> Signed-off-by: Alexander Amelkin <a.amelkin@yadro.com>
>> Reviewed-by: Alexander Amelkin <a.amelkin@yadro.com>
>> Cc: Edward A. James <eajames@us.ibm.com>
>> Cc: Joel Stanley <joel@jms.id.au>
>> ---
>>   drivers/hwmon/occ/common.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
>> index 4679acb..825631c 100644
>> --- a/drivers/hwmon/occ/common.c
>> +++ b/drivers/hwmon/occ/common.c
>> @@ -235,6 +235,10 @@ static ssize_t occ_show_temp_1(struct device *dev,
>>           val = get_unaligned_be16(&temp->sensor_id);
>>           break;
>>       case 1:
>> +        /* If a sensor timed out long enough,
>
> "timed out" is sufficient. "timed out long enough" is difficult to understand.
Agreed. That's a weird wording, but I double-checked before submitting that it was just a copy-paste of the wording from the OCC sources for this case. You're probably right though that it's better to fix it.
>
>> +           OCC returns 0xFFFF for that sensor.*/
>
> /*
>  * This is how multi-line comments look like
>  */
>
> Please run checkpatch on your patches.

Mea culpa. Didn't check. Will fix tomorrow.

>
> Thanks,
> Guenter
>
Thanks.

With best regards,
Alexander Amelkin,
Leading BMC Software Engineer, YADRO
https://yadro.com
diff mbox series

Patch

diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index 4679acb..825631c 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -235,6 +235,10 @@  static ssize_t occ_show_temp_1(struct device *dev,
 		val = get_unaligned_be16(&temp->sensor_id);
 		break;
 	case 1:
+		/* If a sensor timed out long enough,
+		   OCC returns 0xFFFF for that sensor.*/
+		if (temp->value == 0xFFFF)
+			return -EREMOTEIO;
 		val = get_unaligned_be16(&temp->value) * 1000;
 		break;
 	default: