diff mbox

[RESEND,v3,2/2] thermal: core: introduce thermal zone device mode control

Message ID 20180226144118.24693-2-enric.balletbo@collabora.com (mailing list archive)
State Not Applicable, archived
Delegated to: Zhang Rui
Headers show

Commit Message

Enric Balletbo i Serra Feb. 26, 2018, 2:41 p.m. UTC
From: Zhang Rui <rui.zhang@intel.com>

Thermal "mode" sysfs attribute is introduced to enable/disable a thermal
zone, and .get_mode()/.set_mode() callback is introduced for platform
thermal driver to enable/disable the hardware thermal control logic. And
thermal core takes no action upon thermal zone enable/disable.

Actually, this is not quite right because thermal core still pokes those
disabled thermal zones occasionally, e.g. upon system resume.

To fix this, a new flag 'mode' is introduced in struct thermal_zone_device
to represent the thermal zone mode, and several decisions have been made
based on this flag, including
1. check the thermal zone mode right after it's registered.
2. skip updating thermal zone if the zone is disabled
3. stop the polling timer when the thermal zone is disabled

Note: basically, this patch doesn't affect the existing always-enabled
thermal zones much, with just one exception -
thermal zone .get_mode() must be well prepared to reflect the real thermal
zone status upon the thermal zone registration.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
---
That's another attempt to land these to patches that were sent long time
ago but never got merged, although, apparently, there is no issue with
it. Latest discussion about these here:

 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1436081.html


Changes in v3:
- [2/2] Rework by Zhang Rui.

Changes in v2:
- [2/2] Implement in thermal subsystem instead of ACPI thermal driver (Zhang Rui)

 drivers/thermal/thermal_core.c  | 37 +++++++++++++++++++++++++++++++++++--
 drivers/thermal/thermal_sysfs.c | 24 +++++++-----------------
 include/linux/thermal.h         |  3 +++
 3 files changed, 45 insertions(+), 19 deletions(-)

Comments

Andy Shevchenko Feb. 26, 2018, 2:48 p.m. UTC | #1
On Mon, Feb 26, 2018 at 4:41 PM, Enric Balletbo i Serra
<enric.balletbo@collabora.com> wrote:

> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c

>         if (!strncmp(buf, "enabled", sizeof("enabled") - 1))
> -               result = tz->ops->set_mode(tz, THERMAL_DEVICE_ENABLED);
> +               mode = THERMAL_DEVICE_ENABLED;
>         else if (!strncmp(buf, "disabled", sizeof("disabled") - 1))
> -               result = tz->ops->set_mode(tz, THERMAL_DEVICE_DISABLED);
> +               mode = THERMAL_DEVICE_DISABLED;
>         else
> -               result = -EINVAL;
> +               return -EINVAL;

sysfs_match_string()?
Matthias Kaehlcke June 29, 2018, 12:33 a.m. UTC | #2
Hi,

I stumbled across this patch since I'm currently poking around with
early thermal bringup on a platform and this patch has been integrated
in our development tree.

I'm seeing some unexpected behaviors, which could entirely due to
wrong expectation from my side. I only have some basic working
knowledge of the thermal framework, just want to double check and
perhaps learn a thing or two.

On Mon, Feb 26, 2018 at 03:41:18PM +0100, Enric Balletbo i Serra wrote:
> From: Zhang Rui <rui.zhang@intel.com>
> 
> Thermal "mode" sysfs attribute is introduced to enable/disable a thermal
> zone, and .get_mode()/.set_mode() callback is introduced for platform
> thermal driver to enable/disable the hardware thermal control logic. And
> thermal core takes no action upon thermal zone enable/disable.
> 
> Actually, this is not quite right because thermal core still pokes those
> disabled thermal zones occasionally, e.g. upon system resume.
> 
> To fix this, a new flag 'mode' is introduced in struct thermal_zone_device
> to represent the thermal zone mode, and several decisions have been made
> based on this flag, including
> 1. check the thermal zone mode right after it's registered.
> 2. skip updating thermal zone if the zone is disabled
> 3. stop the polling timer when the thermal zone is disabled
> 
> Note: basically, this patch doesn't affect the existing always-enabled
> thermal zones much, with just one exception -
> thermal zone .get_mode() must be well prepared to reflect the real thermal
> zone status upon the thermal zone registration.

From my perspective this looks like a pretty significant change. For
the platform I'm working on I added a thermal zone to the device tree,
with the expectation that it would be enabled. Judging from the code
without this patch this expectation seems to be naive, since
of-thermal.c sets tz->mode to THERMAL_DEVICE_DISABLED, so apparently
either userspace or some driver should call _set_mode(tz,
THERMAL_DEVICE_ENABLED). However even without this the thermal zone
appears to be active (I didn't really test end-to-end yet, but at
least thermal_zone_device_update() is called and calls
handle_thermal_trip()). Not sure why this is the case with
THERMAL_DEVICE_DISABLED, but before I learned about the existence of
the flag my expectation was that the zone would be enabled.

With this patch thermal_zone_device_update() is skipped if the zone
hasn't been explictly enabled, which may be consistent with the state
of 'tz->mode', but effectively changes the previous/current behavior.

Not sure if I'm just dumbly overlooking something obvious or if there
is an actual problem with of_thermal (and maybe others).

Any insights?

Thanks

Matthias
Matthias Kaehlcke July 3, 2018, 5:13 p.m. UTC | #3
On Thu, Jun 28, 2018 at 05:33:02PM -0700, Matthias Kaehlcke wrote:
> Hi,
> 
> I stumbled across this patch since I'm currently poking around with
> early thermal bringup on a platform and this patch has been integrated
> in our development tree.
> 
> I'm seeing some unexpected behaviors, which could entirely due to
> wrong expectation from my side. I only have some basic working
> knowledge of the thermal framework, just want to double check and
> perhaps learn a thing or two.
> 
> On Mon, Feb 26, 2018 at 03:41:18PM +0100, Enric Balletbo i Serra wrote:
> > From: Zhang Rui <rui.zhang@intel.com>
> > 
> > Thermal "mode" sysfs attribute is introduced to enable/disable a thermal
> > zone, and .get_mode()/.set_mode() callback is introduced for platform
> > thermal driver to enable/disable the hardware thermal control logic. And
> > thermal core takes no action upon thermal zone enable/disable.
> > 
> > Actually, this is not quite right because thermal core still pokes those
> > disabled thermal zones occasionally, e.g. upon system resume.
> > 
> > To fix this, a new flag 'mode' is introduced in struct thermal_zone_device
> > to represent the thermal zone mode, and several decisions have been made
> > based on this flag, including
> > 1. check the thermal zone mode right after it's registered.
> > 2. skip updating thermal zone if the zone is disabled
> > 3. stop the polling timer when the thermal zone is disabled
> > 
> > Note: basically, this patch doesn't affect the existing always-enabled
> > thermal zones much, with just one exception -
> > thermal zone .get_mode() must be well prepared to reflect the real thermal
> > zone status upon the thermal zone registration.
> 
> From my perspective this looks like a pretty significant change. For
> the platform I'm working on I added a thermal zone to the device tree,
> with the expectation that it would be enabled. Judging from the code
> without this patch this expectation seems to be naive, since
> of-thermal.c sets tz->mode to THERMAL_DEVICE_DISABLED, so apparently
> either userspace or some driver should call _set_mode(tz,
> THERMAL_DEVICE_ENABLED). However even without this the thermal zone
> appears to be active (I didn't really test end-to-end yet, but at
> least thermal_zone_device_update() is called and calls
> handle_thermal_trip()). Not sure why this is the case with
> THERMAL_DEVICE_DISABLED, but before I learned about the existence of
> the flag my expectation was that the zone would be enabled.
> 
> With this patch thermal_zone_device_update() is skipped if the zone
> hasn't been explictly enabled, which may be consistent with the state
> of 'tz->mode', but effectively changes the previous/current behavior.
> 
> Not sure if I'm just dumbly overlooking something obvious or if there
> is an actual problem with of_thermal (and maybe others).

The problem is that there are now two 'mode' fields, tzd->mode and the
other typically tzd->devdata->mode, and tzd->mode is never set to enabled.

> thermal zone .get_mode() must be well prepared to reflect the real thermal
> zone status upon the thermal zone registration.

For of_thermal tzd->mode is initialized with the result of .get_mode()
when the zone is registered. At this time no sensor has been added
to the zone, hence the zone is disabled. When a sensor is added later
tzd->devdata->mode is set to enabled, however tzd->mode remains disabled:

tzd->mode = DISABLED
tzd->devdata->mode = DISABLED

of_parse_thermal_zones
  thermal_zone_device_register
    tzd->mode = tzd->get_mode() // => DISABLED

<sensor>_probe
  thermal_zone_of_sensor_register
    tzd->set_mode(THERMAL_DEVICE_ENABLED)
      tzd->devdata->mode = ENABLED

One way to fix this for of_thermal could be to setting tzd->mode in
.set_mode() in addition to setting tzd->devdata->mode. However this
feels like a workaround/hack. Personally I find it confusing to have
two mode fields for a thermal zone device. Maybe tzd->mode should
replace tzd->devdata->mode?
Enric Balletbo i Serra July 4, 2018, 10:36 a.m. UTC | #4
Hi Matthias,

Sorry for late reply, my memory is bad so I need to look at this again. The
patch was send some time ago and there are pending changes to do but then I
switched. I'll take a look, but did you saw why this patch was not merged [1]?
Maybe that could answer some of your questions.

Best regards,
 Enric

[1] https://lkml.org/lkml/2018/2/27/910

On 03/07/18 19:13, Matthias Kaehlcke wrote:
> On Thu, Jun 28, 2018 at 05:33:02PM -0700, Matthias Kaehlcke wrote:
>> Hi,
>>
>> I stumbled across this patch since I'm currently poking around with
>> early thermal bringup on a platform and this patch has been integrated
>> in our development tree.
>>
>> I'm seeing some unexpected behaviors, which could entirely due to
>> wrong expectation from my side. I only have some basic working
>> knowledge of the thermal framework, just want to double check and
>> perhaps learn a thing or two.
>>
>> On Mon, Feb 26, 2018 at 03:41:18PM +0100, Enric Balletbo i Serra wrote:
>>> From: Zhang Rui <rui.zhang@intel.com>
>>>
>>> Thermal "mode" sysfs attribute is introduced to enable/disable a thermal
>>> zone, and .get_mode()/.set_mode() callback is introduced for platform
>>> thermal driver to enable/disable the hardware thermal control logic. And
>>> thermal core takes no action upon thermal zone enable/disable.
>>>
>>> Actually, this is not quite right because thermal core still pokes those
>>> disabled thermal zones occasionally, e.g. upon system resume.
>>>
>>> To fix this, a new flag 'mode' is introduced in struct thermal_zone_device
>>> to represent the thermal zone mode, and several decisions have been made
>>> based on this flag, including
>>> 1. check the thermal zone mode right after it's registered.
>>> 2. skip updating thermal zone if the zone is disabled
>>> 3. stop the polling timer when the thermal zone is disabled
>>>
>>> Note: basically, this patch doesn't affect the existing always-enabled
>>> thermal zones much, with just one exception -
>>> thermal zone .get_mode() must be well prepared to reflect the real thermal
>>> zone status upon the thermal zone registration.
>>
>> From my perspective this looks like a pretty significant change. For
>> the platform I'm working on I added a thermal zone to the device tree,
>> with the expectation that it would be enabled. Judging from the code
>> without this patch this expectation seems to be naive, since
>> of-thermal.c sets tz->mode to THERMAL_DEVICE_DISABLED, so apparently
>> either userspace or some driver should call _set_mode(tz,
>> THERMAL_DEVICE_ENABLED). However even without this the thermal zone
>> appears to be active (I didn't really test end-to-end yet, but at
>> least thermal_zone_device_update() is called and calls
>> handle_thermal_trip()). Not sure why this is the case with
>> THERMAL_DEVICE_DISABLED, but before I learned about the existence of
>> the flag my expectation was that the zone would be enabled.
>>
>> With this patch thermal_zone_device_update() is skipped if the zone
>> hasn't been explictly enabled, which may be consistent with the state
>> of 'tz->mode', but effectively changes the previous/current behavior.
>>
>> Not sure if I'm just dumbly overlooking something obvious or if there
>> is an actual problem with of_thermal (and maybe others).
> 
> The problem is that there are now two 'mode' fields, tzd->mode and the
> other typically tzd->devdata->mode, and tzd->mode is never set to enabled.
> 
>> thermal zone .get_mode() must be well prepared to reflect the real thermal
>> zone status upon the thermal zone registration.
> 
> For of_thermal tzd->mode is initialized with the result of .get_mode()
> when the zone is registered. At this time no sensor has been added
> to the zone, hence the zone is disabled. When a sensor is added later
> tzd->devdata->mode is set to enabled, however tzd->mode remains disabled:
> 
> tzd->mode = DISABLED
> tzd->devdata->mode = DISABLED
> 
> of_parse_thermal_zones
>   thermal_zone_device_register
>     tzd->mode = tzd->get_mode() // => DISABLED
> 
> <sensor>_probe
>   thermal_zone_of_sensor_register
>     tzd->set_mode(THERMAL_DEVICE_ENABLED)
>       tzd->devdata->mode = ENABLED
> 
> One way to fix this for of_thermal could be to setting tzd->mode in
> .set_mode() in addition to setting tzd->devdata->mode. However this
> feels like a workaround/hack. Personally I find it confusing to have
> two mode fields for a thermal zone device. Maybe tzd->mode should
> replace tzd->devdata->mode?
>
Matthias Kaehlcke July 6, 2018, 6:22 p.m. UTC | #5
Hi Enric,

On Wed, Jul 04, 2018 at 12:36:39PM +0200, Enric Balletbo i Serra wrote:
> Hi Matthias,
> 
> Sorry for late reply, my memory is bad so I need to look at this again. The
> patch was send some time ago and there are pending changes to do but then I
> switched. I'll take a look, but did you saw why this patch was not merged [1]?
> Maybe that could answer some of your questions.
> 
> Best regards,
>  Enric
> 
> [1] https://lkml.org/lkml/2018/2/27/910

I missed this, thanks for the pointer!

> On 03/07/18 19:13, Matthias Kaehlcke wrote:
> > On Thu, Jun 28, 2018 at 05:33:02PM -0700, Matthias Kaehlcke wrote:
> >> Hi,
> >>
> >> I stumbled across this patch since I'm currently poking around with
> >> early thermal bringup on a platform and this patch has been integrated
> >> in our development tree.
> >>
> >> I'm seeing some unexpected behaviors, which could entirely due to
> >> wrong expectation from my side. I only have some basic working
> >> knowledge of the thermal framework, just want to double check and
> >> perhaps learn a thing or two.
> >>
> >> On Mon, Feb 26, 2018 at 03:41:18PM +0100, Enric Balletbo i Serra wrote:
> >>> From: Zhang Rui <rui.zhang@intel.com>
> >>>
> >>> Thermal "mode" sysfs attribute is introduced to enable/disable a thermal
> >>> zone, and .get_mode()/.set_mode() callback is introduced for platform
> >>> thermal driver to enable/disable the hardware thermal control logic. And
> >>> thermal core takes no action upon thermal zone enable/disable.
> >>>
> >>> Actually, this is not quite right because thermal core still pokes those
> >>> disabled thermal zones occasionally, e.g. upon system resume.
> >>>
> >>> To fix this, a new flag 'mode' is introduced in struct thermal_zone_device
> >>> to represent the thermal zone mode, and several decisions have been made
> >>> based on this flag, including
> >>> 1. check the thermal zone mode right after it's registered.
> >>> 2. skip updating thermal zone if the zone is disabled
> >>> 3. stop the polling timer when the thermal zone is disabled
> >>>
> >>> Note: basically, this patch doesn't affect the existing always-enabled
> >>> thermal zones much, with just one exception -
> >>> thermal zone .get_mode() must be well prepared to reflect the real thermal
> >>> zone status upon the thermal zone registration.
> >>
> >> From my perspective this looks like a pretty significant change. For
> >> the platform I'm working on I added a thermal zone to the device tree,
> >> with the expectation that it would be enabled. Judging from the code
> >> without this patch this expectation seems to be naive, since
> >> of-thermal.c sets tz->mode to THERMAL_DEVICE_DISABLED, so apparently
> >> either userspace or some driver should call _set_mode(tz,
> >> THERMAL_DEVICE_ENABLED). However even without this the thermal zone
> >> appears to be active (I didn't really test end-to-end yet, but at
> >> least thermal_zone_device_update() is called and calls
> >> handle_thermal_trip()). Not sure why this is the case with
> >> THERMAL_DEVICE_DISABLED, but before I learned about the existence of
> >> the flag my expectation was that the zone would be enabled.
> >>
> >> With this patch thermal_zone_device_update() is skipped if the zone
> >> hasn't been explictly enabled, which may be consistent with the state
> >> of 'tz->mode', but effectively changes the previous/current behavior.
> >>
> >> Not sure if I'm just dumbly overlooking something obvious or if there
> >> is an actual problem with of_thermal (and maybe others).
> > 
> > The problem is that there are now two 'mode' fields, tzd->mode and the
> > other typically tzd->devdata->mode, and tzd->mode is never set to enabled.
> > 
> >> thermal zone .get_mode() must be well prepared to reflect the real thermal
> >> zone status upon the thermal zone registration.
> > 
> > For of_thermal tzd->mode is initialized with the result of .get_mode()
> > when the zone is registered. At this time no sensor has been added
> > to the zone, hence the zone is disabled. When a sensor is added later
> > tzd->devdata->mode is set to enabled, however tzd->mode remains disabled:
> > 
> > tzd->mode = DISABLED
> > tzd->devdata->mode = DISABLED
> > 
> > of_parse_thermal_zones
> >   thermal_zone_device_register
> >     tzd->mode = tzd->get_mode() // => DISABLED
> > 
> > <sensor>_probe
> >   thermal_zone_of_sensor_register
> >     tzd->set_mode(THERMAL_DEVICE_ENABLED)
> >       tzd->devdata->mode = ENABLED
> > 
> > One way to fix this for of_thermal could be to setting tzd->mode in
> > .set_mode() in addition to setting tzd->devdata->mode. However this
> > feels like a workaround/hack. Personally I find it confusing to have
> > two mode fields for a thermal zone device. Maybe tzd->mode should
> > replace tzd->devdata->mode?
> >
diff mbox

Patch

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2b1b0ba393a4..8716ba5b2761 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -306,9 +306,9 @@  static void monitor_thermal_zone(struct thermal_zone_device *tz)
 {
 	mutex_lock(&tz->lock);
 
-	if (tz->passive)
+	if (tz->mode == THERMAL_DEVICE_ENABLED && tz->passive)
 		thermal_zone_device_set_polling(tz, tz->passive_delay);
-	else if (tz->polling_delay)
+	else if (tz->mode == THERMAL_DEVICE_ENABLED && tz->polling_delay)
 		thermal_zone_device_set_polling(tz, tz->polling_delay);
 	else
 		thermal_zone_device_set_polling(tz, 0);
@@ -464,11 +464,35 @@  static void thermal_zone_device_reset(struct thermal_zone_device *tz)
 		pos->initialized = false;
 }
 
+int thermal_zone_set_mode(struct thermal_zone_device *tz,
+				 enum thermal_device_mode mode)
+{
+	int result;
+
+	if (!tz->ops->set_mode)
+		return -EPERM;
+
+	result = tz->ops->set_mode(tz, mode);
+	if (result)
+		return result;
+
+	if (tz->mode != mode) {
+		tz->mode = mode;
+		monitor_thermal_zone(tz);
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(thermal_zone_set_mode);
+
 void thermal_zone_device_update(struct thermal_zone_device *tz,
 				enum thermal_notify_event event)
 {
 	int count;
 
+	/* Do nothing if the thermal zone is disabled */
+	if (tz->mode == THERMAL_DEVICE_DISABLED)
+		return;
+
 	if (atomic_read(&in_suspend))
 		return;
 
@@ -1278,6 +1302,15 @@  thermal_zone_device_register(const char *type, int trips, int mask,
 	INIT_DELAYED_WORK(&tz->poll_queue, thermal_zone_device_check);
 
 	thermal_zone_device_reset(tz);
+
+	if (tz->ops->get_mode) {
+		enum thermal_device_mode mode;
+
+		result = tz->ops->get_mode(tz, &mode);
+		tz->mode = result ? THERMAL_DEVICE_ENABLED : mode;
+	} else
+		tz->mode = THERMAL_DEVICE_ENABLED;
+
 	/* Update the new thermal zone and mark it as already updated. */
 	if (atomic_cmpxchg(&tz->need_update, 1, 0))
 		thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index ba81c9080f6e..2746540289c4 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -51,18 +51,9 @@  static ssize_t
 mode_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
 	struct thermal_zone_device *tz = to_thermal_zone(dev);
-	enum thermal_device_mode mode;
-	int result;
-
-	if (!tz->ops->get_mode)
-		return -EPERM;
 
-	result = tz->ops->get_mode(tz, &mode);
-	if (result)
-		return result;
-
-	return sprintf(buf, "%s\n", mode == THERMAL_DEVICE_ENABLED ? "enabled"
-		       : "disabled");
+	return sprintf(buf, "%s\n", tz->mode == THERMAL_DEVICE_ENABLED ?
+		       "enabled" : "disabled");
 }
 
 static ssize_t
@@ -70,18 +61,17 @@  mode_store(struct device *dev, struct device_attribute *attr,
 	   const char *buf, size_t count)
 {
 	struct thermal_zone_device *tz = to_thermal_zone(dev);
+	enum thermal_device_mode mode;
 	int result;
 
-	if (!tz->ops->set_mode)
-		return -EPERM;
-
 	if (!strncmp(buf, "enabled", sizeof("enabled") - 1))
-		result = tz->ops->set_mode(tz, THERMAL_DEVICE_ENABLED);
+		mode = THERMAL_DEVICE_ENABLED;
 	else if (!strncmp(buf, "disabled", sizeof("disabled") - 1))
-		result = tz->ops->set_mode(tz, THERMAL_DEVICE_DISABLED);
+		mode = THERMAL_DEVICE_DISABLED;
 	else
-		result = -EINVAL;
+		return -EINVAL;
 
+	result = thermal_zone_set_mode(tz, mode);
 	if (result)
 		return result;
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 8c5302374eaa..ff60977c91da 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -211,6 +211,7 @@  struct thermal_zone_device {
 	struct thermal_attr *trip_type_attrs;
 	struct thermal_attr *trip_hyst_attrs;
 	void *devdata;
+	enum thermal_device_mode mode;
 	int trips;
 	unsigned long trips_disabled;	/* bitmap for disabled trips */
 	int passive_delay;
@@ -466,6 +467,8 @@  struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name);
 int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp);
 int thermal_zone_get_slope(struct thermal_zone_device *tz);
 int thermal_zone_get_offset(struct thermal_zone_device *tz);
+int thermal_zone_set_mode(struct thermal_zone_device *tz,
+			  enum thermal_device_mode mode);
 
 int get_tz_trend(struct thermal_zone_device *, int);
 struct thermal_instance *get_thermal_instance(struct thermal_zone_device *,