Message ID | 20250304-ntc_oob-v1-1-600d8992478d@gocontroll.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | hwmon: (ntc_thermistor) return error instead of clipping on OOB | expand |
On 3/4/25 06:10, Maud Spierings via B4 Relay wrote: > From: Maud Spierings <maudspierings@gocontroll.com> > > When the ntc is reading Out Of Bounds instead of clipping to the nearest > limit (min/max) return -ENODATA. This prevents malfunctioning sensors > from sending a device into a shutdown loop due to a critical trip. > > This implementation will only work for ntc type thermistors if a ptc > type is to be implemented the min/max ohm calculation must be adjusted > to take that into account. > > Signed-off-by: Maud Spierings <maudspierings@gocontroll.com> > --- > This patch is a continuation of another discussion [1]. I felt like it > should be a new patch, not a v2 as this is a very different change. > > I have left the clamping of n to INT_MAX in the code with a comment, but > it may be possible to find a better solution to it. One I thought of is > to make the ohm field of the ntc_compensation struct a signed int as > well. It would get rid of this weird edge case, but it doesn't make > sense to allow for negative resistances to be entered into the sensor > table. > > Currently the only feedback this provides to the user is when they > manually try to read the temperature and it returns the error. I have > added a simple printk to these error points to see how spammy it gets > and this is the result: > > dmesg | grep hwmon > [ 4.982682] hwmon: sensor out of bounds > [ 5.249758] hwmon: sensor out of bounds > [ 5.633729] hwmon: sensor out of bounds > [ 6.215285] hwmon: sensor out of bounds > [ 7.073882] hwmon: sensor out of bounds > [ 7.486620] hwmon: sensor out of bounds > [ 8.833765] hwmon: sensor out of bounds > [ 10.785969] hwmon: sensor out of bounds > [ 13.793722] hwmon: sensor out of bounds > [ 16.761124] hwmon: sensor out of bounds > [ 17.889706] hwmon: sensor out of bounds > [ 25.057715] hwmon: sensor out of bounds > [ 35.041725] hwmon: sensor out of bounds > [ 50.110346] hwmon: sensor out of bounds > [ 72.945283] hwmon: sensor out of bounds > [ 105.712619] hwmon: sensor out of bounds > [ 154.863976] hwmon: sensor out of bounds > [ 164.937104] hwmon: sensor out of bounds > [ 228.590909] hwmon: sensor out of bounds > [ 315.365777] hwmon: sensor out of bounds > [ 464.718403] hwmon: sensor out of bounds > [ 615.079123] hwmon: sensor out of bounds > [ 764.496780] hwmon: sensor out of bounds > > This is with polling-delay set to 1000, it seems to rate-limit itself? > But I feel there should be a better way to communicate the potential > sensor failure to the user, but I can't think of anything. > Hmm ... we could add "fault" attributes and report a sensor fault if uv == 0 or uv >= puv, together with the -ENODATA temperature reading. That could distinguish the fault from the "resistor value out of range" case. > [1]: https://lore.kernel.org/all/20250304-ntc_min_max-v1-1-b08e70e56459@gocontroll.com/ Most of the above should be after "---" since it is irrelevant for the commit log. > --- > drivers/hwmon/ntc_thermistor.c | 19 +++++++++++++------ > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/drivers/hwmon/ntc_thermistor.c b/drivers/hwmon/ntc_thermistor.c > index 0d29c8f97ba7c2f264588b6309b91ca494012ad6..311a60498d409ea068a18590415b2d5b43e73eb1 100644 > --- a/drivers/hwmon/ntc_thermistor.c > +++ b/drivers/hwmon/ntc_thermistor.c > @@ -387,12 +387,9 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) > puo = data->pullup_ohm; > pdo = data->pulldown_ohm; > > - if (uv == 0) > - return (data->connect == NTC_CONNECTED_POSITIVE) ? > - INT_MAX : 0; > - if (uv >= puv) > - return (data->connect == NTC_CONNECTED_POSITIVE) ? > - 0 : INT_MAX; > + /* faulty adc value */ > + if (uv == 0 || uv >= puv) > + return -ENODATA; > > if (data->connect == NTC_CONNECTED_POSITIVE && puo == 0) > n = div_u64(pdo * (puv - uv), uv); > @@ -404,6 +401,16 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) > else > n = div64_u64_safe(pdo * puo * uv, pdo * (puv - uv) - puo * uv); > > + /* sensor out of bounds */ > + if (n > data->comp[0].ohm || n < data->comp[data->n_comp-1].ohm) > + return -ENODATA; > + > + /* > + * theoretically data->comp[0].ohm can be greater than INT_MAX as it is an > + * unsigned integer, but it doesn't make any sense for it to be so as the > + * maximum return value of this function is INT_MAX, so it will never be > + * able to properly calculate that temperature. > + */ > if (n > INT_MAX) > n = INT_MAX; I am not a friend of theoretic code or comments like this. Please drop. The original code was intended to catch out-of-bounds values which would otherwise have been reported as error, not to catch unrealistic resistor values in the compensation tables. Thanks, Guenter
From: Guenter Roeck <groeck7@gmail.com> on behalf of Guenter Roeck <linux@roeck-us.net> Sent: Tuesday, March 4, 2025 4:04 PM >On 3/4/25 06:10, Maud Spierings via B4 Relay wrote: >> From: Maud Spierings <maudspierings@gocontroll.com> >> >> When the ntc is reading Out Of Bounds instead of clipping to the nearest >> limit (min/max) return -ENODATA. This prevents malfunctioning sensors >> from sending a device into a shutdown loop due to a critical trip. >> >> This implementation will only work for ntc type thermistors if a ptc >> type is to be implemented the min/max ohm calculation must be adjusted >> to take that into account. >> >> Signed-off-by: Maud Spierings <maudspierings@gocontroll.com> >> --- >> This patch is a continuation of another discussion [1]. I felt like it >> should be a new patch, not a v2 as this is a very different change. thoughts about the theoretic code comment here. >> I have left the clamping of n to INT_MAX in the code with a comment, but >> it may be possible to find a better solution to it. One I thought of is >> to make the ohm field of the ntc_compensation struct a signed int as >> well. It would get rid of this weird edge case, but it doesn't make >> sense to allow for negative resistances to be entered into the sensor >> table. >> >> Currently the only feedback this provides to the user is when they >> manually try to read the temperature and it returns the error. I have >> added a simple printk to these error points to see how spammy it gets >> and this is the result: >> >> dmesg | grep hwmon >> [ 4.982682] hwmon: sensor out of bounds >> [ 5.249758] hwmon: sensor out of bounds >> [ 5.633729] hwmon: sensor out of bounds >> [ 6.215285] hwmon: sensor out of bounds >> [ 7.073882] hwmon: sensor out of bounds >> [ 7.486620] hwmon: sensor out of bounds >> [ 8.833765] hwmon: sensor out of bounds >> [ 10.785969] hwmon: sensor out of bounds >> [ 13.793722] hwmon: sensor out of bounds >> [ 16.761124] hwmon: sensor out of bounds >> [ 17.889706] hwmon: sensor out of bounds >> [ 25.057715] hwmon: sensor out of bounds >> [ 35.041725] hwmon: sensor out of bounds >> [ 50.110346] hwmon: sensor out of bounds >> [ 72.945283] hwmon: sensor out of bounds >> [ 105.712619] hwmon: sensor out of bounds >> [ 154.863976] hwmon: sensor out of bounds >> [ 164.937104] hwmon: sensor out of bounds >> [ 228.590909] hwmon: sensor out of bounds >> [ 315.365777] hwmon: sensor out of bounds >> [ 464.718403] hwmon: sensor out of bounds >> [ 615.079123] hwmon: sensor out of bounds >> [ 764.496780] hwmon: sensor out of bounds >> >> This is with polling-delay set to 1000, it seems to rate-limit itself? >> But I feel there should be a better way to communicate the potential >> sensor failure to the user, but I can't think of anything. >> > >Hmm ... we could add "fault" attributes and report a sensor fault >if uv == 0 or uv >= puv, together with the -ENODATA temperature reading. >That could distinguish the fault from the "resistor value out of range" case. I've done some working around that fault attribute, but I can't seem to find a satisfactory way to implement it. This fault also is not triggered if anything is disconnected, maybe the 0 case but I can't even get that. I think leaving it at this current solution is good for now. >> [1]: https://lore.kernel.org/all/20250304-ntc_min_max-v1-1-b08e70e56459@gocontroll.com/ > >Most of the above should be after "---" since it is irrelevant for the commit log. I believe my cover letter seperated it correctly after my signed-off-by tag. >> --- >> drivers/hwmon/ntc_thermistor.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/hwmon/ntc_thermistor.c b/drivers/hwmon/ntc_thermistor.c >> index 0d29c8f97ba7c2f264588b6309b91ca494012ad6..311a60498d409ea068a18590415b2d5b43e73eb1 100644 >> --- a/drivers/hwmon/ntc_thermistor.c >> +++ b/drivers/hwmon/ntc_thermistor.c >> @@ -387,12 +387,9 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) >> puo = data->pullup_ohm; >> pdo = data->pulldown_ohm; >> >> - if (uv == 0) >> - return (data->connect == NTC_CONNECTED_POSITIVE) ? >> - INT_MAX : 0; >> - if (uv >= puv) >> - return (data->connect == NTC_CONNECTED_POSITIVE) ? >> - 0 : INT_MAX; >> + /* faulty adc value */ >> + if (uv == 0 || uv >= puv) >> + return -ENODATA; >> >> if (data->connect == NTC_CONNECTED_POSITIVE && puo == 0) >> n = div_u64(pdo * (puv - uv), uv); >> @@ -404,6 +401,16 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) >> else >> n = div64_u64_safe(pdo * puo * uv, pdo * (puv - uv) - puo * uv); >> >> + /* sensor out of bounds */ >> + if (n > data->comp[0].ohm || n < data->comp[data->n_comp-1].ohm) >> + return -ENODATA; >> + >> + /* >> + * theoretically data->comp[0].ohm can be greater than INT_MAX as it is an >> + * unsigned integer, but it doesn't make any sense for it to be so as the >> + * maximum return value of this function is INT_MAX, so it will never be >> + * able to properly calculate that temperature. >> + */ >> if (n > INT_MAX) >> n = INT_MAX; > >I am not a friend of theoretic code or comments like this. Please drop. >The original code was intended to catch out-of-bounds values which would >otherwise have been reported as error, not to catch unrealistic resistor >values in the compensation tables. So, drop the check and comment? Or just drop the comment? I was thinking to fully remove that check in an earlier comment in my cover letter. Kind regards, Maud
On Thu, Mar 06, 2025 at 01:35:53PM +0000, Maud Spierings | GOcontroll wrote: > > > >Most of the above should be after "---" since it is irrelevant for the commit log. > > I believe my cover letter seperated it correctly after my signed-off-by > tag. > Yes, you are correct. I missed the second (or, rather, first) "---" above. Sorry for the noise. > >> + /* > >> + * theoretically data->comp[0].ohm can be greater than INT_MAX as it is an > >> + * unsigned integer, but it doesn't make any sense for it to be so as the > >> + * maximum return value of this function is INT_MAX, so it will never be > >> + * able to properly calculate that temperature. > >> + */ > >> if (n > INT_MAX) > >> n = INT_MAX; > > > >I am not a friend of theoretic code or comments like this. Please drop. > >The original code was intended to catch out-of-bounds values which would > >otherwise have been reported as error, not to catch unrealistic resistor > >values in the compensation tables. > > So, drop the check and comment? Or just drop the comment? I was thinking > to fully remove that check in an earlier comment in my cover letter. Drop both. Thanks, Guenter
diff --git a/drivers/hwmon/ntc_thermistor.c b/drivers/hwmon/ntc_thermistor.c index 0d29c8f97ba7c2f264588b6309b91ca494012ad6..311a60498d409ea068a18590415b2d5b43e73eb1 100644 --- a/drivers/hwmon/ntc_thermistor.c +++ b/drivers/hwmon/ntc_thermistor.c @@ -387,12 +387,9 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) puo = data->pullup_ohm; pdo = data->pulldown_ohm; - if (uv == 0) - return (data->connect == NTC_CONNECTED_POSITIVE) ? - INT_MAX : 0; - if (uv >= puv) - return (data->connect == NTC_CONNECTED_POSITIVE) ? - 0 : INT_MAX; + /* faulty adc value */ + if (uv == 0 || uv >= puv) + return -ENODATA; if (data->connect == NTC_CONNECTED_POSITIVE && puo == 0) n = div_u64(pdo * (puv - uv), uv); @@ -404,6 +401,16 @@ static int get_ohm_of_thermistor(struct ntc_data *data, unsigned int uv) else n = div64_u64_safe(pdo * puo * uv, pdo * (puv - uv) - puo * uv); + /* sensor out of bounds */ + if (n > data->comp[0].ohm || n < data->comp[data->n_comp-1].ohm) + return -ENODATA; + + /* + * theoretically data->comp[0].ohm can be greater than INT_MAX as it is an + * unsigned integer, but it doesn't make any sense for it to be so as the + * maximum return value of this function is INT_MAX, so it will never be + * able to properly calculate that temperature. + */ if (n > INT_MAX) n = INT_MAX; return n;