Message ID | 20220221164434.4169560-1-linux@roeck-us.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | hwmon: Handle failure to register sensor with thermal zone correctly | expand |
21.02.2022 19:44, Guenter Roeck пишет: > diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c > index 3501a3ead4ba..4bfe3791a5ba 100644 > --- a/drivers/hwmon/hwmon.c > +++ b/drivers/hwmon/hwmon.c > @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index) > > tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, > &hwmon_thermal_ops); > - /* > - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, > - * so ignore that error but forward any other error. > - */ > - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) > - return PTR_ERR(tzd); > + if (IS_ERR(tzd)) { > + if (PTR_ERR(tzd) != -ENODEV) > + return PTR_ERR(tzd); > + dev_warn(dev, "Failed to register temp%d_input with thermal zone\n", > + index + 1); Do we really need this warning? I suppose it should be okay if sensor isn't attached to any device in a device-tree and just reports temperature.
On 2/21/22 08:56, Dmitry Osipenko wrote: > 21.02.2022 19:44, Guenter Roeck пишет: >> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c >> index 3501a3ead4ba..4bfe3791a5ba 100644 >> --- a/drivers/hwmon/hwmon.c >> +++ b/drivers/hwmon/hwmon.c >> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index) >> >> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, >> &hwmon_thermal_ops); >> - /* >> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, >> - * so ignore that error but forward any other error. >> - */ >> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) >> - return PTR_ERR(tzd); >> + if (IS_ERR(tzd)) { >> + if (PTR_ERR(tzd) != -ENODEV) >> + return PTR_ERR(tzd); >> + dev_warn(dev, "Failed to register temp%d_input with thermal zone\n", >> + index + 1); > > Do we really need this warning? I suppose it should be okay if sensor > isn't attached to any device in a device-tree and just reports temperature. I'd rather leave it there for the time being. It will only affect devicetree systems (turns out there is already a check for of_node elsewhere). Thermal zone specification is not always easy and there may be a mismatch between what is reported by the driver and what the user (programmer) expects to see (which I think is what happens here). I don't want to silently ignore such problems without any notification. We could make it dev_notice and/or change the message (instead of "Failed to ..." just say "temp%d_input not registered with thermal zone" , maybe ?). Guenter
21.02.2022 20:12, Guenter Roeck пишет: > On 2/21/22 08:56, Dmitry Osipenko wrote: >> 21.02.2022 19:44, Guenter Roeck пишет: >>> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c >>> index 3501a3ead4ba..4bfe3791a5ba 100644 >>> --- a/drivers/hwmon/hwmon.c >>> +++ b/drivers/hwmon/hwmon.c >>> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct >>> device *dev, int index) >>> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, >>> &hwmon_thermal_ops); >>> - /* >>> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, >>> - * so ignore that error but forward any other error. >>> - */ >>> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) >>> - return PTR_ERR(tzd); >>> + if (IS_ERR(tzd)) { >>> + if (PTR_ERR(tzd) != -ENODEV) >>> + return PTR_ERR(tzd); >>> + dev_warn(dev, "Failed to register temp%d_input with thermal >>> zone\n", >>> + index + 1); >> >> Do we really need this warning? I suppose it should be okay if sensor >> isn't attached to any device in a device-tree and just reports >> temperature. > > I'd rather leave it there for the time being. It will only affect > devicetree > systems (turns out there is already a check for of_node elsewhere). Thermal > zone specification is not always easy and there may be a mismatch between > what is reported by the driver and what the user (programmer) expects to > see (which I think is what happens here). I don't want to silently > ignore such problems without any notification. > > We could make it dev_notice and/or change the message (instead of > "Failed to > ..." just say "temp%d_input not registered with thermal zone" , maybe ?). I'd change it to: dev_info(dev, "temp%d_input not attached to any thermal zone\n", index + 1); I'd also add an info message to print out to which tzd attachment happened.
On 2/21/22 09:59, Dmitry Osipenko wrote: > 21.02.2022 20:12, Guenter Roeck пишет: >> On 2/21/22 08:56, Dmitry Osipenko wrote: >>> 21.02.2022 19:44, Guenter Roeck пишет: >>>> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c >>>> index 3501a3ead4ba..4bfe3791a5ba 100644 >>>> --- a/drivers/hwmon/hwmon.c >>>> +++ b/drivers/hwmon/hwmon.c >>>> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct >>>> device *dev, int index) >>>> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, >>>> &hwmon_thermal_ops); >>>> - /* >>>> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, >>>> - * so ignore that error but forward any other error. >>>> - */ >>>> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) >>>> - return PTR_ERR(tzd); >>>> + if (IS_ERR(tzd)) { >>>> + if (PTR_ERR(tzd) != -ENODEV) >>>> + return PTR_ERR(tzd); >>>> + dev_warn(dev, "Failed to register temp%d_input with thermal >>>> zone\n", >>>> + index + 1); >>> >>> Do we really need this warning? I suppose it should be okay if sensor >>> isn't attached to any device in a device-tree and just reports >>> temperature. >> >> I'd rather leave it there for the time being. It will only affect >> devicetree >> systems (turns out there is already a check for of_node elsewhere). Thermal >> zone specification is not always easy and there may be a mismatch between >> what is reported by the driver and what the user (programmer) expects to >> see (which I think is what happens here). I don't want to silently >> ignore such problems without any notification. >> >> We could make it dev_notice and/or change the message (instead of >> "Failed to >> ..." just say "temp%d_input not registered with thermal zone" , maybe ?). > > I'd change it to: > > dev_info(dev, "temp%d_input not attached to any thermal zone\n", index + 1); > Makes sense. I just sent v2 with that change. > I'd also add an info message to print out to which tzd attachment happened. Oh, I don't know. Kernel logs are already way too noisy. Either case that would not be a bug fix, so such a change should not be part of this patch. Thanks, Guenter
diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c index 3501a3ead4ba..4bfe3791a5ba 100644 --- a/drivers/hwmon/hwmon.c +++ b/drivers/hwmon/hwmon.c @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index) tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, &hwmon_thermal_ops); - /* - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, - * so ignore that error but forward any other error. - */ - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) - return PTR_ERR(tzd); + if (IS_ERR(tzd)) { + if (PTR_ERR(tzd) != -ENODEV) + return PTR_ERR(tzd); + dev_warn(dev, "Failed to register temp%d_input with thermal zone\n", + index + 1); + devm_kfree(dev, tdata); + return 0; + } err = devm_add_action(dev, hwmon_thermal_remove_sensor, &tdata->node); if (err)
If an attempt is made to a sensor with a thermal zone and it fails, the call to devm_thermal_zone_of_sensor_register() may return -ENODEV. This may result in crashes similar to the following. Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd ... Internal error: Oops: 96000021 [#1] PREEMPT SMP ... pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mutex_lock+0x18/0x60 lr : thermal_zone_device_update+0x40/0x2e0 sp : ffff800014c4fc60 x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790 x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000 x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000 x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040 x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000 x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd Call trace: mutex_lock+0x18/0x60 hwmon_notify_event+0xfc/0x110 0xffffdde1cb7a0a90 0xffffdde1cb7a0b7c irq_thread_fn+0x2c/0xa0 irq_thread+0x134/0x240 kthread+0x178/0x190 ret_from_fork+0x10/0x20 Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02) Jon Hunter reports that the exact call sequence is: hwmon_notify_event() --> hwmon_thermal_notify() --> thermal_zone_device_update() --> update_temperature() --> mutex_lock() The hwmon core needs to handle all errors returned from calls to devm_thermal_zone_of_sensor_register(). If the call fails with -ENODEV, report the problem as warning but continue to register the hwmon device. Reported-by: Jon Hunter <jonathanh@nvidia.com> Cc: Dmitry Osipenko <digetx@gmail.com> Fixes: 1597b374af222 ("hwmon: Add notification support") Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- drivers/hwmon/hwmon.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)