diff mbox

thermal: of: Allow selection of thermal governor in DT

Message ID 3b80853abb45a9e067cf7a16754b07bb67712457.1520274879.git.amit.kucheria@linaro.org (mailing list archive)
State Changes Requested
Delegated to: Eduardo Valentin
Headers show

Commit Message

Amit Kucheria March 5, 2018, 6:36 p.m. UTC
From: Ram Chandrasekar <rkumbako@codeaurora.org>

There is currently no way for the governor to be selected for each thermal
zone in devicetree. This results in the default governor being used for all
thermal zones even though no such restriction exists in the core code.

Add support for specifying the thermal governor to be used for a thermal
zone in the devicetree. The devicetree config should specify the governor
name as a string that matches any available governors. If not specified, we
maintain the current behaviour of using the default governor.

Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
---
 Documentation/devicetree/bindings/thermal/thermal.txt | 8 ++++++++
 drivers/thermal/of-thermal.c                          | 6 ++++++
 2 files changed, 14 insertions(+)

Comments

Rob Herring March 5, 2018, 8:08 p.m. UTC | #1
On Mon, Mar 5, 2018 at 12:36 PM, Amit Kucheria <amit.kucheria@linaro.org> wrote:
> From: Ram Chandrasekar <rkumbako@codeaurora.org>
>
> There is currently no way for the governor to be selected for each thermal
> zone in devicetree. This results in the default governor being used for all
> thermal zones even though no such restriction exists in the core code.
>
> Add support for specifying the thermal governor to be used for a thermal
> zone in the devicetree. The devicetree config should specify the governor
> name as a string that matches any available governors. If not specified, we
> maintain the current behaviour of using the default governor.
>
> Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
>  Documentation/devicetree/bindings/thermal/thermal.txt | 8 ++++++++
>  drivers/thermal/of-thermal.c                          | 6 ++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
> index 1719d47..fced9d3 100644
> --- a/Documentation/devicetree/bindings/thermal/thermal.txt
> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
> @@ -168,6 +168,14 @@ Optional property:
>                         by means of sensor ID. Additional coefficients are
>                         interpreted as constant offset.
>
> +- thermal-governor:     Thermal governor to be used for this thermal zone.
> +                       Expected values are:
> +                       "step_wise": Use step wise governor.
> +                       "fair_share": Use fair share governor.
> +                       "user_space": Use user space governor.
> +                       "power_allocator": Use power allocator governor.

This looks pretty Linux specific. Not that we can't have Linux
specific properties, but we try to avoid them.

What determines the selection? I'd imagine only certain governors make
sense for certain devices. We should perhaps describe those
characteristics which can then infer the best governor. Not really
sure though...

Rob
Daniel Lezcano March 5, 2018, 9:11 p.m. UTC | #2
On 05/03/2018 19:36, Amit Kucheria wrote:
> From: Ram Chandrasekar <rkumbako@codeaurora.org>
> 
> There is currently no way for the governor to be selected for each thermal
> zone in devicetree. This results in the default governor being used for all
> thermal zones even though no such restriction exists in the core code.
> 
> Add support for specifying the thermal governor to be used for a thermal
> zone in the devicetree. The devicetree config should specify the governor
> name as a string that matches any available governors. If not specified, we
> maintain the current behaviour of using the default governor.
> 
> Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>

Why not create a kernel parameter (eg. thermal.governor=) ? So everyone
can gain benefit of this feature. And in order to specify that from the
DT, add the 'chosen' node and bootargs with the desired kernel parameter?
Amit Kucheria March 6, 2018, 5:38 a.m. UTC | #3
On Tue, Mar 6, 2018 at 1:38 AM, Rob Herring <robh+dt@kernel.org> wrote:
> On Mon, Mar 5, 2018 at 12:36 PM, Amit Kucheria <amit.kucheria@linaro.org> wrote:
>> From: Ram Chandrasekar <rkumbako@codeaurora.org>
>>
>> There is currently no way for the governor to be selected for each thermal
>> zone in devicetree. This results in the default governor being used for all
>> thermal zones even though no such restriction exists in the core code.
>>
>> Add support for specifying the thermal governor to be used for a thermal
>> zone in the devicetree. The devicetree config should specify the governor
>> name as a string that matches any available governors. If not specified, we
>> maintain the current behaviour of using the default governor.
>>
>> Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
>> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
>> ---
>>  Documentation/devicetree/bindings/thermal/thermal.txt | 8 ++++++++
>>  drivers/thermal/of-thermal.c                          | 6 ++++++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
>> index 1719d47..fced9d3 100644
>> --- a/Documentation/devicetree/bindings/thermal/thermal.txt
>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>> @@ -168,6 +168,14 @@ Optional property:
>>                         by means of sensor ID. Additional coefficients are
>>                         interpreted as constant offset.
>>
>> +- thermal-governor:     Thermal governor to be used for this thermal zone.
>> +                       Expected values are:
>> +                       "step_wise": Use step wise governor.
>> +                       "fair_share": Use fair share governor.
>> +                       "user_space": Use user space governor.
>> +                       "power_allocator": Use power allocator governor.
>
> This looks pretty Linux specific. Not that we can't have Linux
> specific properties, but we try to avoid them.
>
> What determines the selection? I'd imagine only certain governors make
> sense for certain devices. We should perhaps describe those
> characteristics which can then infer the best governor. Not really
> sure though...

I'm not sure if it would be easy to assign preferred governors to
device classes. It is dependent on what devices are present on the
system, what throttling knobs they expose and how the system designer
decided to integrate it all. e.g. A GPU driver might be controlled in
kernel or userspace depending on whether it exposes a devfreq knob or
some more esoteric statistics to userspace.

Bang Bang governor seems to be designed for Fans with a simple ON/OFF iterface.
Userspace governor is designed to move thermal policy to userspace
(e.g. through thermald). So backlight brightness, battery charging,
GPU scaling, even cpu frequency scaling can be offloaded to userspace.
On embedded platforms, modem control typically happens in userspace
Power allocator governor is designed for a closed-loop system to keep
the total TDP of the platform under control while allowing various
devices (cpu, gpu, modem, etc.) to dynamically increase or decrease
their individual budget depending on the usecase.

Regards,
Amit
Amit Kucheria March 6, 2018, 5:48 a.m. UTC | #4
On Tue, Mar 6, 2018 at 2:41 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 05/03/2018 19:36, Amit Kucheria wrote:
>> From: Ram Chandrasekar <rkumbako@codeaurora.org>
>>
>> There is currently no way for the governor to be selected for each thermal
>> zone in devicetree. This results in the default governor being used for all
>> thermal zones even though no such restriction exists in the core code.
>>
>> Add support for specifying the thermal governor to be used for a thermal
>> zone in the devicetree. The devicetree config should specify the governor
>> name as a string that matches any available governors. If not specified, we
>> maintain the current behaviour of using the default governor.
>>
>> Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
>> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
>
> Why not create a kernel parameter (eg. thermal.governor=) ? So everyone
> can gain benefit of this feature. And in order to specify that from the
> DT, add the 'chosen' node and bootargs with the desired kernel parameter?
>

This is supposed to be a per-thermal zone property. So specifying it
on the command-line, while possible, might be a little cumbersome. I'm
not even sure if kernel parameters can have a variable number of
arguments. IOW, thermal.tz0.governor=userspace,
thermal.tz1.governor=step_wise, thermal.tz2.governor=userspace, .....

I'm already seeing SoCs defining 8 or more thermal zones.
Sudeep Holla March 6, 2018, 12:32 p.m. UTC | #5
On 05/03/18 18:36, Amit Kucheria wrote:
> From: Ram Chandrasekar <rkumbako@codeaurora.org>
> 
> There is currently no way for the governor to be selected for each thermal
> zone in devicetree. 

How is that any different from cpufreq, cpuidle or devfreq subsystems ?

> This results in the default governor being used for all
> thermal zones even though no such restriction exists in the core code.
> 

What restrictions ?

> Add support for specifying the thermal governor to be used for a thermal
> zone in the devicetree.

Then what prevents us from doing the same for other subsystems with so
called /inefficient default/ governors ?

> The devicetree config should specify the governor
> name as a string that matches any available governors. If not specified, we
> maintain the current behaviour of using the default governor.

If one is specified, can the user change it from sysfs ? If
yes, then why do we need this binding at all ? If no, we are basically
restricting, then what would happen if the DT was shipped with wrong
governor specified because the firmware author didn't know about any
other governor ?

IMO, having this in DT is bad idea as it may cause more issues than
gain. We already have sysfs support like other PM subsystems and
userspace can deal with it.
Daniel Lezcano March 6, 2018, 3:43 p.m. UTC | #6
On 06/03/2018 06:48, Amit Kucheria wrote:
> On Tue, Mar 6, 2018 at 2:41 AM, Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> On 05/03/2018 19:36, Amit Kucheria wrote:
>>> From: Ram Chandrasekar <rkumbako@codeaurora.org>
>>>
>>> There is currently no way for the governor to be selected for each thermal
>>> zone in devicetree. This results in the default governor being used for all
>>> thermal zones even though no such restriction exists in the core code.
>>>
>>> Add support for specifying the thermal governor to be used for a thermal
>>> zone in the devicetree. The devicetree config should specify the governor
>>> name as a string that matches any available governors. If not specified, we
>>> maintain the current behaviour of using the default governor.
>>>
>>> Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
>>> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
>>
>> Why not create a kernel parameter (eg. thermal.governor=) ? So everyone
>> can gain benefit of this feature. And in order to specify that from the
>> DT, add the 'chosen' node and bootargs with the desired kernel parameter?
>>
> 
> This is supposed to be a per-thermal zone property. So specifying it
> on the command-line, while possible, might be a little cumbersome. I'm
> not even sure if kernel parameters can have a variable number of
> arguments. IOW, thermal.tz0.governor=userspace,
> thermal.tz1.governor=step_wise, thermal.tz2.governor=userspace, .....
> 
> I'm already seeing SoCs defining 8 or more thermal zones.

Ah yes indeed.
Amit Kucheria March 7, 2018, 10:59 a.m. UTC | #7
On Tue, Mar 6, 2018 at 6:02 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
>
> On 05/03/18 18:36, Amit Kucheria wrote:
>> From: Ram Chandrasekar <rkumbako@codeaurora.org>
>>
>> There is currently no way for the governor to be selected for each thermal
>> zone in devicetree.
>
> How is that any different from cpufreq, cpuidle or devfreq subsystems ?

Cpufreq/cpuidle are designed to control a single parameter while
thermal framework is trying to mitigate heat from several disparate
sources that are throttled in different ways. Besides, cpufreq/cpuidle
have somewhat mature governors. Cpuidle has only one governor (for
tickless) - menu governor, cpufreq has ondemand in mainline, replaced
by interactive in android and hopefully soon both will be replaced by
schedutil.

Badly configured cpufreq/cpuidle/devfreq only leads to wasted power,
while badly configured thermal zone leads to the loss of operation
e.g. reboots, too hot to touch, etc.

>> This results in the default governor being used for all
>> thermal zones even though no such restriction exists in the core code.
>>
>
> What restrictions ?

I said "no such restriction exists". IOW, the core code does allow the
governor to be changed.

>> Add support for specifying the thermal governor to be used for a thermal
>> zone in the devicetree.
>
> Then what prevents us from doing the same for other subsystems with so
> called /inefficient default/ governors ?

Nothing? :-)

It depends on how closely we want the board DT to represent a
production-ready setup. If a platform maintainer wants to ship a
well-tested configuration in DT, is that really a problem?

If however, you are requiring them to ship magic runes for userspace
daemons (e.g. per-board thermald config) to tweak the various
governors in sysfs, it takes a bit more work to get to a
production-ready setup.

We should of course continue to improve our govenors to arrive at the
one true governor, but that'll take some time and might never even
happen in the case of thermal IMHO.

>> The devicetree config should specify the governor
>> name as a string that matches any available governors. If not specified, we
>> maintain the current behaviour of using the default governor.
>
> If one is specified, can the user change it from sysfs ? If
> yes, then why do we need this binding at all ? If no, we are basically
> restricting, then what would happen if the DT was shipped with wrong
> governor specified because the firmware author didn't know about any
> other governor ?

The governor can be changed from sysfs today. So it is all override-able.

I guess your main worry is an unmodifiable DT with this option being
shipped in firmware, right? That is a fair point, though that could be
worked around from sysfs.

> IMO, having this in DT is bad idea as it may cause more issues than
> gain. We already have sysfs support like other PM subsystems and
> userspace can deal with it.

As mentioned above, I think thermal is a bit special because bad
configuration may lead to loss of operation. But iff we're strictly
enforcing the rule about no policy in board DTs, then I concede.

Personally, I'd like mainline board DTs to have enough policy to
provide a stable system so I don't have to futz around with userspace
to set it all up correctly (in 3 different distros).

Regards,
Amit
Viresh Kumar March 8, 2018, 4:49 a.m. UTC | #8
On Wed, Mar 7, 2018 at 4:29 PM, Amit Kucheria <amit.kucheria@linaro.org> wrote:
> Cpufreq/cpuidle are designed to control a single parameter while
> thermal framework is trying to mitigate heat from several disparate
> sources that are throttled in different ways. Besides, cpufreq/cpuidle
> have somewhat mature governors. Cpuidle has only one governor (for
> tickless) - menu governor, cpufreq has ondemand in mainline, replaced
> by interactive in android and hopefully soon both will be replaced by

Interactive and schedfreq are already removed from Android 4.4 and 4.9.
It used schedutil now.

> schedutil.
>
> Badly configured cpufreq/cpuidle/devfreq only leads to wasted power,
> while badly configured thermal zone leads to the loss of operation
> e.g. reboots, too hot to touch, etc.

I don't think such heat-ups will happen right during boot, where some
init.rc should
come up and change the governor.

Over that if we are worried about production images only, then what prevents us
to select the right default governor in the defconfig ? We shouldn't
be worried about
multi-platform kernels for production images.

--
viresh
Amit Kucheria March 8, 2018, 5:35 a.m. UTC | #9
On Thu, Mar 8, 2018 at 10:19 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On Wed, Mar 7, 2018 at 4:29 PM, Amit Kucheria <amit.kucheria@linaro.org> wrote:
>> Cpufreq/cpuidle are designed to control a single parameter while
>> thermal framework is trying to mitigate heat from several disparate
>> sources that are throttled in different ways. Besides, cpufreq/cpuidle
>> have somewhat mature governors. Cpuidle has only one governor (for
>> tickless) - menu governor, cpufreq has ondemand in mainline, replaced
>> by interactive in android and hopefully soon both will be replaced by
>
> Interactive and schedfreq are already removed from Android 4.4 and 4.9.
> It used schedutil now.

Good to hear.

>> schedutil.
>>
>> Badly configured cpufreq/cpuidle/devfreq only leads to wasted power,
>> while badly configured thermal zone leads to the loss of operation
>> e.g. reboots, too hot to touch, etc.
>
> I don't think such heat-ups will happen right during boot, where some
> init.rc should
> come up and change the governor.

Interestingly enough, there are some patches that make the thermal
framework start earlier to deal with boot-time thermal issues. I
expect to post them soon. :-) These are required, for example, if the
device goes into a reboot loop - the device might not even make it to
the point in userspace where the governors are setup. In such a loop,
the temperature constantly keeps increasing.

> Over that if we are worried about production images only, then what prevents us
> to select the right default governor in the defconfig ? We shouldn't
> be worried about
> multi-platform kernels for production images.

I was refering to the 'make defconfig' out-of-box experience that
makes the majority of developer boards/devices out there stable to
work on. I can't do that today.

Regards,
Amit
Viresh Kumar March 8, 2018, 5:42 a.m. UTC | #10
On 08-03-18, 11:05, Amit Kucheria wrote:
> Interestingly enough, there are some patches that make the thermal
> framework start earlier to deal with boot-time thermal issues. I
> expect to post them soon. :-) These are required, for example, if the
> device goes into a reboot loop - the device might not even make it to
> the point in userspace where the governors are setup. In such a loop,
> the temperature constantly keeps increasing.
> 
> I was refering to the 'make defconfig' out-of-box experience that
> makes the majority of developer boards/devices out there stable to
> work on. I can't do that today.

Fair enough, but I am still not sure if we should put something that defines our
thermal policy into DT which should describe hardware.
Sudeep Holla March 8, 2018, 12:41 p.m. UTC | #11
On 07/03/18 10:59, Amit Kucheria wrote:
> On Tue, Mar 6, 2018 at 6:02 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 05/03/18 18:36, Amit Kucheria wrote:
>>> From: Ram Chandrasekar <rkumbako@codeaurora.org>
>>>
>>> There is currently no way for the governor to be selected for each thermal
>>> zone in devicetree.
>>
>> How is that any different from cpufreq, cpuidle or devfreq subsystems ?
> 
> Cpufreq/cpuidle are designed to control a single parameter while
> thermal framework is trying to mitigate heat from several disparate
> sources that are throttled in different ways. Besides, cpufreq/cpuidle
> have somewhat mature governors. Cpuidle has only one governor (for
> tickless) - menu governor, cpufreq has ondemand in mainline, replaced
> by interactive in android and hopefully soon both will be replaced by
> schedutil.
> 

While I agree on the facts, I don't think adding this information to DT
because thermal governors are not so mature compared to cpufreq/idle is
acceptable. It's give more reason not to add as new governors may get
added as things mature.

> Badly configured cpufreq/cpuidle/devfreq only leads to wasted power,
> while badly configured thermal zone leads to the loss of operation
> e.g. reboots, too hot to touch, etc.
> 

As Viresh pointed out, if the default hardware or firmware configuration
causes that then it's no good platform to use. If adding more
functionality in Linux(like cpufreq and others) causes this then Linux
needs to deal with it. If default thermal can't help in preventing such
damage then it shouldn't be default.

>>> This results in the default governor being used for all
>>> thermal zones even though no such restriction exists in the core code.
>>>
>>
>> What restrictions ?
> 
> I said "no such restriction exists". IOW, the core code does allow the
> governor to be changed.
> 

Ah ok.

>>> Add support for specifying the thermal governor to be used for a thermal
>>> zone in the devicetree.
>>
>> Then what prevents us from doing the same for other subsystems with so
>> called /inefficient default/ governors ?
> 
> Nothing? :-)
> 

Cool problem solved, just kidding.

> It depends on how closely we want the board DT to represent a
> production-ready setup. If a platform maintainer wants to ship a
> well-tested configuration in DT, is that really a problem?
>

You mean configuration + policy ? Then yes policy in DT is no-no.

> If however, you are requiring them to ship magic runes for userspace
> daemons (e.g. per-board thermald config) to tweak the various
> governors in sysfs, it takes a bit more work to get to a
> production-ready setup.
> 

Fair enough.

> We should of course continue to improve our govenors to arrive at the
> one true governor, but that'll take some time and might never even
> happen in the case of thermal IMHO.
> 

I am not sure about "one true governor" as different users have
different needs.

>>> The devicetree config should specify the governor
>>> name as a string that matches any available governors. If not specified, we
>>> maintain the current behaviour of using the default governor.
>>
>> If one is specified, can the user change it from sysfs ? If
>> yes, then why do we need this binding at all ? If no, we are basically
>> restricting, then what would happen if the DT was shipped with wrong
>> governor specified because the firmware author didn't know about any
>> other governor ?
> 
> The governor can be changed from sysfs today. So it is all override-able.
> 
> I guess your main worry is an unmodifiable DT with this option being
> shipped in firmware, right? That is a fair point, though that could be
> worked around from sysfs.
> 

Yes, and Viresh's point triggered more question to me. If DT has some
governor like interactive which will not in mainline or not configure in
the default build what will be done. If in such case default governor is
good enough to boot without burning or any other issues, why not always?

>> IMO, having this in DT is bad idea as it may cause more issues than
>> gain. We already have sysfs support like other PM subsystems and
>> userspace can deal with it.
> 
> As mentioned above, I think thermal is a bit special because bad
> configuration may lead to loss of operation. But iff we're strictly
> enforcing the rule about no policy in board DTs, then I concede.
> 

As I said IMO I see more issues with that, so I don't like it.

> Personally, I'd like mainline board DTs to have enough policy to
> provide a stable system so I don't have to futz around with userspace
> to set it all up correctly (in 3 different distros).
> 

Sysfs is more flexible than DT and requires some fuzzing to get the best
out of the platform if you need.
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
index 1719d47..fced9d3 100644
--- a/Documentation/devicetree/bindings/thermal/thermal.txt
+++ b/Documentation/devicetree/bindings/thermal/thermal.txt
@@ -168,6 +168,14 @@  Optional property:
 			by means of sensor ID. Additional coefficients are
 			interpreted as constant offset.
 
+- thermal-governor:     Thermal governor to be used for this thermal zone.
+			Expected values are:
+			"step_wise": Use step wise governor.
+			"fair_share": Use fair share governor.
+			"user_space": Use user space governor.
+			"power_allocator": Use power allocator governor.
+  Type: string
+
 - sustainable-power:	An estimate of the sustainable power (in mW) that the
   Type: unsigned	thermal zone can dissipate at the desired
   Size: one cell	control temperature.  For reference, the
diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
index e09f035..a884b01 100644
--- a/drivers/thermal/of-thermal.c
+++ b/drivers/thermal/of-thermal.c
@@ -974,6 +974,7 @@  int __init of_parse_thermal_zones(void)
 		struct thermal_zone_params *tzp;
 		int i, mask = 0;
 		u32 prop;
+		const char *governor_name;
 
 		tz = thermal_of_build_thermal_zone(child);
 		if (IS_ERR(tz)) {
@@ -996,6 +997,11 @@  int __init of_parse_thermal_zones(void)
 		/* No hwmon because there might be hwmon drivers registering */
 		tzp->no_hwmon = true;
 
+		if (!of_property_read_string(child, "thermal-governor",
+						&governor_name))
+			strlcpy(tzp->governor_name, governor_name,
+					THERMAL_NAME_LENGTH);
+
 		if (!of_property_read_u32(child, "sustainable-power", &prop))
 			tzp->sustainable_power = prop;