Message ID | 20220221182209.1795242-1-linux@roeck-us.net (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v2] hwmon: Handle failure to register sensor with thermal zone correctly | expand |
On 2/21/22 21:22, Guenter Roeck wrote: > If an attempt is made to a sensor with a thermal zone and it fails, > the call to devm_thermal_zone_of_sensor_register() may return -ENODEV. > This may result in crashes similar to the following. > > Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd > ... > Internal error: Oops: 96000021 [#1] PREEMPT SMP > ... > pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : mutex_lock+0x18/0x60 > lr : thermal_zone_device_update+0x40/0x2e0 > sp : ffff800014c4fc60 > x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790 > x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000 > x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd > x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000 > x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040 > x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 > x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000 > x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd > x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd > Call trace: > mutex_lock+0x18/0x60 > hwmon_notify_event+0xfc/0x110 > 0xffffdde1cb7a0a90 > 0xffffdde1cb7a0b7c > irq_thread_fn+0x2c/0xa0 > irq_thread+0x134/0x240 > kthread+0x178/0x190 > ret_from_fork+0x10/0x20 > Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02) > > Jon Hunter reports that the exact call sequence is: > > hwmon_notify_event() > --> hwmon_thermal_notify() > --> thermal_zone_device_update() > --> update_temperature() > --> mutex_lock() > > The hwmon core needs to handle all errors returned from calls > to devm_thermal_zone_of_sensor_register(). If the call fails > with -ENODEV, report that the sensor was not attached to a > thermal zone but continue to register the hwmon device. > > Reported-by: Jon Hunter <jonathanh@nvidia.com> > Cc: Dmitry Osipenko <digetx@gmail.com> > Fixes: 1597b374af222 ("hwmon: Add notification support") > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > v2: Use dev_info instead of dev_warn, and change message to be > less alarming. > > drivers/hwmon/hwmon.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c > index 3501a3ead4ba..3ae961986fc3 100644 > --- a/drivers/hwmon/hwmon.c > +++ b/drivers/hwmon/hwmon.c > @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index) > > tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, > &hwmon_thermal_ops); > - /* > - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, > - * so ignore that error but forward any other error. > - */ > - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) > - return PTR_ERR(tzd); > + if (IS_ERR(tzd)) { > + if (PTR_ERR(tzd) != -ENODEV) > + return PTR_ERR(tzd); > + dev_info(dev, "temp%d_input not attached to any thermal zone\n", > + index + 1); > + devm_kfree(dev, tdata); > + return 0; > + } > > err = devm_add_action(dev, hwmon_thermal_remove_sensor, &tdata->node); > if (err) LGTM, thank you. But still needs a t-b from Jon. Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
On 2/21/22 10:49, Dmitry Osipenko wrote: > > On 2/21/22 21:22, Guenter Roeck wrote: >> If an attempt is made to a sensor with a thermal zone and it fails, >> the call to devm_thermal_zone_of_sensor_register() may return -ENODEV. >> This may result in crashes similar to the following. >> >> Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd >> ... >> Internal error: Oops: 96000021 [#1] PREEMPT SMP >> ... >> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : mutex_lock+0x18/0x60 >> lr : thermal_zone_device_update+0x40/0x2e0 >> sp : ffff800014c4fc60 >> x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790 >> x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000 >> x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd >> x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000 >> x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040 >> x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000 >> x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 >> x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000 >> x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd >> x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd >> Call trace: >> mutex_lock+0x18/0x60 >> hwmon_notify_event+0xfc/0x110 >> 0xffffdde1cb7a0a90 >> 0xffffdde1cb7a0b7c >> irq_thread_fn+0x2c/0xa0 >> irq_thread+0x134/0x240 >> kthread+0x178/0x190 >> ret_from_fork+0x10/0x20 >> Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02) >> >> Jon Hunter reports that the exact call sequence is: >> >> hwmon_notify_event() >> --> hwmon_thermal_notify() >> --> thermal_zone_device_update() >> --> update_temperature() >> --> mutex_lock() >> >> The hwmon core needs to handle all errors returned from calls >> to devm_thermal_zone_of_sensor_register(). If the call fails >> with -ENODEV, report that the sensor was not attached to a >> thermal zone but continue to register the hwmon device. >> >> Reported-by: Jon Hunter <jonathanh@nvidia.com> >> Cc: Dmitry Osipenko <digetx@gmail.com> >> Fixes: 1597b374af222 ("hwmon: Add notification support") >> Signed-off-by: Guenter Roeck <linux@roeck-us.net> >> --- >> v2: Use dev_info instead of dev_warn, and change message to be >> less alarming. >> >> drivers/hwmon/hwmon.c | 14 ++++++++------ >> 1 file changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c >> index 3501a3ead4ba..3ae961986fc3 100644 >> --- a/drivers/hwmon/hwmon.c >> +++ b/drivers/hwmon/hwmon.c >> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index) >> >> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, >> &hwmon_thermal_ops); >> - /* >> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, >> - * so ignore that error but forward any other error. >> - */ >> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) >> - return PTR_ERR(tzd); >> + if (IS_ERR(tzd)) { >> + if (PTR_ERR(tzd) != -ENODEV) >> + return PTR_ERR(tzd); >> + dev_info(dev, "temp%d_input not attached to any thermal zone\n", >> + index + 1); >> + devm_kfree(dev, tdata); >> + return 0; >> + } >> >> err = devm_add_action(dev, hwmon_thermal_remove_sensor, &tdata->node); >> if (err) > > LGTM, thank you. But still needs a t-b from Jon. > Yes, I'll wait for a tested-by: tag before sending the patch upstream to make sure that it actually fixes the problem. Thanks, Guenter > Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Hi Guenter, On 21/02/2022 18:22, Guenter Roeck wrote: > If an attempt is made to a sensor with a thermal zone and it fails, > the call to devm_thermal_zone_of_sensor_register() may return -ENODEV. > This may result in crashes similar to the following. > > Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd > ... > Internal error: Oops: 96000021 [#1] PREEMPT SMP > ... > pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : mutex_lock+0x18/0x60 > lr : thermal_zone_device_update+0x40/0x2e0 > sp : ffff800014c4fc60 > x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790 > x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000 > x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd > x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000 > x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040 > x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 > x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000 > x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd > x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd > Call trace: > mutex_lock+0x18/0x60 > hwmon_notify_event+0xfc/0x110 > 0xffffdde1cb7a0a90 > 0xffffdde1cb7a0b7c > irq_thread_fn+0x2c/0xa0 > irq_thread+0x134/0x240 > kthread+0x178/0x190 > ret_from_fork+0x10/0x20 > Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02) > > Jon Hunter reports that the exact call sequence is: > > hwmon_notify_event() > --> hwmon_thermal_notify() > --> thermal_zone_device_update() > --> update_temperature() > --> mutex_lock() > > The hwmon core needs to handle all errors returned from calls > to devm_thermal_zone_of_sensor_register(). If the call fails > with -ENODEV, report that the sensor was not attached to a > thermal zone but continue to register the hwmon device. > > Reported-by: Jon Hunter <jonathanh@nvidia.com> > Cc: Dmitry Osipenko <digetx@gmail.com> > Fixes: 1597b374af222 ("hwmon: Add notification support") > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > v2: Use dev_info instead of dev_warn, and change message to be > less alarming. > > drivers/hwmon/hwmon.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c > index 3501a3ead4ba..3ae961986fc3 100644 > --- a/drivers/hwmon/hwmon.c > +++ b/drivers/hwmon/hwmon.c > @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index) > > tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, > &hwmon_thermal_ops); > - /* > - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, > - * so ignore that error but forward any other error. > - */ > - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) > - return PTR_ERR(tzd); > + if (IS_ERR(tzd)) { > + if (PTR_ERR(tzd) != -ENODEV) > + return PTR_ERR(tzd); > + dev_info(dev, "temp%d_input not attached to any thermal zone\n", > + index + 1); > + devm_kfree(dev, tdata); > + return 0; > + } > > err = devm_add_action(dev, hwmon_thermal_remove_sensor, &tdata->node); > if (err) Thanks for the quick fix. Works for me! Tested-by: Jon Hunter <jonathanh@nvidia.com> Jon
diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c index 3501a3ead4ba..3ae961986fc3 100644 --- a/drivers/hwmon/hwmon.c +++ b/drivers/hwmon/hwmon.c @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index) tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata, &hwmon_thermal_ops); - /* - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV, - * so ignore that error but forward any other error. - */ - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV)) - return PTR_ERR(tzd); + if (IS_ERR(tzd)) { + if (PTR_ERR(tzd) != -ENODEV) + return PTR_ERR(tzd); + dev_info(dev, "temp%d_input not attached to any thermal zone\n", + index + 1); + devm_kfree(dev, tdata); + return 0; + } err = devm_add_action(dev, hwmon_thermal_remove_sensor, &tdata->node); if (err)
If an attempt is made to a sensor with a thermal zone and it fails, the call to devm_thermal_zone_of_sensor_register() may return -ENODEV. This may result in crashes similar to the following. Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd ... Internal error: Oops: 96000021 [#1] PREEMPT SMP ... pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mutex_lock+0x18/0x60 lr : thermal_zone_device_update+0x40/0x2e0 sp : ffff800014c4fc60 x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790 x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000 x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000 x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040 x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000 x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd Call trace: mutex_lock+0x18/0x60 hwmon_notify_event+0xfc/0x110 0xffffdde1cb7a0a90 0xffffdde1cb7a0b7c irq_thread_fn+0x2c/0xa0 irq_thread+0x134/0x240 kthread+0x178/0x190 ret_from_fork+0x10/0x20 Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02) Jon Hunter reports that the exact call sequence is: hwmon_notify_event() --> hwmon_thermal_notify() --> thermal_zone_device_update() --> update_temperature() --> mutex_lock() The hwmon core needs to handle all errors returned from calls to devm_thermal_zone_of_sensor_register(). If the call fails with -ENODEV, report that the sensor was not attached to a thermal zone but continue to register the hwmon device. Reported-by: Jon Hunter <jonathanh@nvidia.com> Cc: Dmitry Osipenko <digetx@gmail.com> Fixes: 1597b374af222 ("hwmon: Add notification support") Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- v2: Use dev_info instead of dev_warn, and change message to be less alarming. drivers/hwmon/hwmon.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)