diff mbox series

thermal/drivers/tegra: Getting rid of the get_thermal_instance() usage

Message ID fa2bd92a-f2ae-a671-b537-87c0f3c03dbd@linaro.org (mailing list archive)
State New, archived
Delegated to: Daniel Lezcano
Headers show
Series thermal/drivers/tegra: Getting rid of the get_thermal_instance() usage | expand

Commit Message

Daniel Lezcano Jan. 24, 2023, 7:57 p.m. UTC
Hi,

does anyone know what is the purpose of the get_thermal_instance() usage 
in this code:

https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623

The driver is using a function which is reserved for the thermal core. 
It should not.

Is the following change ok ?

  			return throttrip_program(dev, sg, stc, temp);
@@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device 
*dev,
  			continue;

  		cdev = ts->throt_cfgs[i].cdev;
-		if (get_thermal_instance(tz, cdev, trip))
-			stc = find_throttle_cfg_by_name(ts, cdev->type);
-		else
+
+		stc = find_throttle_cfg_by_name(ts, cdev->type);
+		if (!stc)
  			continue;

  		ret = throttrip_program(dev, sg, stc, temperature);

Comments

Thierry Reding Jan. 26, 2023, 12:55 p.m. UTC | #1
On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
> 
> Hi,
> 
> does anyone know what is the purpose of the get_thermal_instance() usage in
> this code:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
> 
> The driver is using a function which is reserved for the thermal core. It
> should not.
> 
> Is the following change ok ?
> 
> diff --git a/drivers/thermal/tegra/soctherm.c
> b/drivers/thermal/tegra/soctherm.c
> index 220873298d77..5f552402d987 100644
> --- a/drivers/thermal/tegra/soctherm.c
> +++ b/drivers/thermal/tegra/soctherm.c
> @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
> thermal_zone_device *tz, int trip
>  				continue;
> 
>  			cdev = ts->throt_cfgs[i].cdev;
> -			if (get_thermal_instance(tz, cdev, trip_id))
> -				stc = find_throttle_cfg_by_name(ts, cdev->type);
> -			else
> +			stc = find_throttle_cfg_by_name(ts, cdev->type);
> +			if (!stc)
>  				continue;
> 
>  			return throttrip_program(dev, sg, stc, temp);
> @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
> *dev,
>  			continue;
> 
>  		cdev = ts->throt_cfgs[i].cdev;
> -		if (get_thermal_instance(tz, cdev, trip))
> -			stc = find_throttle_cfg_by_name(ts, cdev->type);
> -		else
> +
> +		stc = find_throttle_cfg_by_name(ts, cdev->type);
> +		if (!stc)
>  			continue;
> 
>  		ret = throttrip_program(dev, sg, stc, temperature);

There's a small difference in behavior after applying this patch. Prior
to this I get (on Tegra210):

	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC

and after these changes, it turns into:

	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC

The "programming throttle ..." messages are something I've added locally
to trace what gets called. So it looks like for "pll" and "mem" thermal
zones, we now program trip points whereas we previously didn't.

I'll take a closer look to see if we can replace the calls to
get_thermal_instance() by something else.

Thierry
Daniel Lezcano Jan. 26, 2023, 3:37 p.m. UTC | #2
On 26/01/2023 13:55, Thierry Reding wrote:
> On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
>>
>> Hi,
>>
>> does anyone know what is the purpose of the get_thermal_instance() usage in
>> this code:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
>>
>> The driver is using a function which is reserved for the thermal core. It
>> should not.
>>
>> Is the following change ok ?
>>
>> diff --git a/drivers/thermal/tegra/soctherm.c
>> b/drivers/thermal/tegra/soctherm.c
>> index 220873298d77..5f552402d987 100644
>> --- a/drivers/thermal/tegra/soctherm.c
>> +++ b/drivers/thermal/tegra/soctherm.c
>> @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
>> thermal_zone_device *tz, int trip
>>   				continue;
>>
>>   			cdev = ts->throt_cfgs[i].cdev;
>> -			if (get_thermal_instance(tz, cdev, trip_id))
>> -				stc = find_throttle_cfg_by_name(ts, cdev->type);
>> -			else
>> +			stc = find_throttle_cfg_by_name(ts, cdev->type);
>> +			if (!stc)
>>   				continue;
>>
>>   			return throttrip_program(dev, sg, stc, temp);
>> @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
>> *dev,
>>   			continue;
>>
>>   		cdev = ts->throt_cfgs[i].cdev;
>> -		if (get_thermal_instance(tz, cdev, trip))
>> -			stc = find_throttle_cfg_by_name(ts, cdev->type);
>> -		else
>> +
>> +		stc = find_throttle_cfg_by_name(ts, cdev->type);
>> +		if (!stc)
>>   			continue;
>>
>>   		ret = throttrip_program(dev, sg, stc, temperature);
> 
> There's a small difference in behavior after applying this patch. Prior
> to this I get (on Tegra210):
> 
> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 
> and after these changes, it turns into:
> 
> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> 
> The "programming throttle ..." messages are something I've added locally
> to trace what gets called. So it looks like for "pll" and "mem" thermal
> zones, we now program trip points whereas we previously didn't.

Hmm, yeah. I did go into the details of the driver but if there is no 
cooling device associated with a trip point it will result in a noop 
from the thermal framework POV. The check is done in the governors by 
going through the thermal zone device list and cdev.


> I'll take a closer look to see if we can replace the calls to
> get_thermal_instance() by something else.

That is great, thanks !
Daniel Lezcano Feb. 6, 2023, 2:50 p.m. UTC | #3
Hi Thierry,

did you have the time to look at the get_thermal_instance() removal ?


On 26/01/2023 13:55, Thierry Reding wrote:

> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 
> and after these changes, it turns into:
> 
> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> 
> The "programming throttle ..." messages are something I've added locally
> to trace what gets called. So it looks like for "pll" and "mem" thermal
> zones, we now program trip points whereas we previously didn't.
> 
> I'll take a closer look to see if we can replace the calls to
> get_thermal_instance() by something else.
> 
> Thierry
Thierry Reding Feb. 7, 2023, 12:18 p.m. UTC | #4
On Mon, Feb 06, 2023 at 03:50:22PM +0100, Daniel Lezcano wrote:
> 
> Hi Thierry,
> 
> did you have the time to look at the get_thermal_instance() removal ?
> 
> 
> On 26/01/2023 13:55, Thierry Reding wrote:
> 
> > 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > 
> > and after these changes, it turns into:
> > 
> > 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> > 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> > 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> > 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> > 
> > The "programming throttle ..." messages are something I've added locally
> > to trace what gets called. So it looks like for "pll" and "mem" thermal
> > zones, we now program trip points whereas we previously didn't.

My diagnosis above wasn't entirely correct. We're not actually skipping
trip point programming for PLL and MEM thermal zones in the current
code. Instead, we skip throttle programming. As far as I can tell this
is a mechanism built into ACTMON to allow it to automatically throttle
when a zone reaches a certain temperature.

This is modelled as a cooling device, but internally it's actually done
automatically, which is why we have this code that programs the throttle
at driver probe time, rather than the on-demand programming that typical
cooling device would do (such as a fan).

The reason why we have get_thermal_instance() here is to check if this
built-in cooling device has been configured for the "hot" trip point. If
not, we don't want the throttle programming to happen. This adds the
added flexibility of explicitly disabling the automatic throttling by
ACTMON and using another cooling device (or none at all) if that's what
is needed.

Dropping just the call to get_thermal_instance() and relying on the
find_throttle_cfg_by_name() function will always return a valid throttle
configuration. This is slightly obfuscated because of this:

	cdev = ts->throt_cfgs[i].cdev;
	if (get_thermal_instance(tz, cdev, trip_id))
		stc = find_throttle_cfg_by_name(ts, cdev->type);

As far as I can tell this will always return &ts->throt_cfgs[i], so the
find_throttle_cfg_by_name() call is a bit redundant here. I'll look into
fixing that.

In any case, the important thing is that it would always find a valid
throttle configuration and therefore program the throttle, even if we
may not want to.

Possibly we could work around that by removing this fiddly special case
and instead add a new callback for the cooling devices that can be run
when they are bound to a thermal zone. This would allow the throttle
programming to be initiated from within the thermal core rather than
"bolted on" like it is now and should allow us to achieve the same
effect but without calling into get_thermal_instance().

I'll try and prototype this, but feel free to suggest anything better if
you can think of something.

Thierry
Daniel Lezcano Feb. 7, 2023, 12:38 p.m. UTC | #5
On 07/02/2023 13:18, Thierry Reding wrote:
> On Mon, Feb 06, 2023 at 03:50:22PM +0100, Daniel Lezcano wrote:
>>
>> Hi Thierry,
>>
>> did you have the time to look at the get_thermal_instance() removal ?
>>
>>
>> On 26/01/2023 13:55, Thierry Reding wrote:
>>
>>> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>>
>>> and after these changes, it turns into:
>>>
>>> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
>>> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
>>> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
>>> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
>>>
>>> The "programming throttle ..." messages are something I've added locally
>>> to trace what gets called. So it looks like for "pll" and "mem" thermal
>>> zones, we now program trip points whereas we previously didn't.
> 
> My diagnosis above wasn't entirely correct. We're not actually skipping
> trip point programming for PLL and MEM thermal zones in the current
> code. Instead, we skip throttle programming. As far as I can tell this
> is a mechanism built into ACTMON to allow it to automatically throttle
> when a zone reaches a certain temperature.
> 
> This is modelled as a cooling device, but internally it's actually done
> automatically, which is why we have this code that programs the throttle
> at driver probe time, rather than the on-demand programming that typical
> cooling device would do (such as a fan).
> 
> The reason why we have get_thermal_instance() here is to check if this
> built-in cooling device has been configured for the "hot" trip point. If
> not, we don't want the throttle programming to happen. This adds the
> added flexibility of explicitly disabling the automatic throttling by
> ACTMON and using another cooling device (or none at all) if that's what
> is needed.
> 
> Dropping just the call to get_thermal_instance() and relying on the
> find_throttle_cfg_by_name() function will always return a valid throttle
> configuration. This is slightly obfuscated because of this:
> 
> 	cdev = ts->throt_cfgs[i].cdev;
> 	if (get_thermal_instance(tz, cdev, trip_id))
> 		stc = find_throttle_cfg_by_name(ts, cdev->type);
> 
> As far as I can tell this will always return &ts->throt_cfgs[i], so the
> find_throttle_cfg_by_name() call is a bit redundant here. I'll look into
> fixing that.
> 
> In any case, the important thing is that it would always find a valid
> throttle configuration and therefore program the throttle, even if we
> may not want to.

Why not rely on the thermal framework mechanism to set the hwtrpis ?

thermal_zone_device_register() calls thermal_zone_device_update(). This 
one calls thermal_zone_set_trips() which programs the hardware trip point.

When we suspend/resume, the PM notifiers are calling 
thermal_zone_device_update() which in turn sets the hw trip points.

May be I'm missing something but isn't enough for the sensor ?


> Possibly we could work around that by removing this fiddly special case
> and instead add a new callback for the cooling devices that can be run
> when they are bound to a thermal zone. This would allow the throttle
> programming to be initiated from within the thermal core rather than
> "bolted on" like it is now and should allow us to achieve the same
> effect but without calling into get_thermal_instance().
> 
> I'll try and prototype this, but feel free to suggest anything better if
> you can think of something.
> 
> Thierry
Thierry Reding Feb. 7, 2023, 2:27 p.m. UTC | #6
On Tue, Feb 07, 2023 at 01:38:08PM +0100, Daniel Lezcano wrote:
> On 07/02/2023 13:18, Thierry Reding wrote:
> > On Mon, Feb 06, 2023 at 03:50:22PM +0100, Daniel Lezcano wrote:
> > > 
> > > Hi Thierry,
> > > 
> > > did you have the time to look at the get_thermal_instance() removal ?
> > > 
> > > 
> > > On 26/01/2023 13:55, Thierry Reding wrote:
> > > 
> > > > 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > > > 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > > > 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > > > 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > > > 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > > > 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > > > 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > > > 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > > > 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > > > 
> > > > and after these changes, it turns into:
> > > > 
> > > > 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > > > 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > > > 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > > > 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > > > 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > > > 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > > > 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > > > 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > > > 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> > > > 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> > > > 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > > > 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> > > > 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> > > > 
> > > > The "programming throttle ..." messages are something I've added locally
> > > > to trace what gets called. So it looks like for "pll" and "mem" thermal
> > > > zones, we now program trip points whereas we previously didn't.
> > 
> > My diagnosis above wasn't entirely correct. We're not actually skipping
> > trip point programming for PLL and MEM thermal zones in the current
> > code. Instead, we skip throttle programming. As far as I can tell this
> > is a mechanism built into ACTMON to allow it to automatically throttle
> > when a zone reaches a certain temperature.
> > 
> > This is modelled as a cooling device, but internally it's actually done
> > automatically, which is why we have this code that programs the throttle
> > at driver probe time, rather than the on-demand programming that typical
> > cooling device would do (such as a fan).
> > 
> > The reason why we have get_thermal_instance() here is to check if this
> > built-in cooling device has been configured for the "hot" trip point. If
> > not, we don't want the throttle programming to happen. This adds the
> > added flexibility of explicitly disabling the automatic throttling by
> > ACTMON and using another cooling device (or none at all) if that's what
> > is needed.
> > 
> > Dropping just the call to get_thermal_instance() and relying on the
> > find_throttle_cfg_by_name() function will always return a valid throttle
> > configuration. This is slightly obfuscated because of this:
> > 
> > 	cdev = ts->throt_cfgs[i].cdev;
> > 	if (get_thermal_instance(tz, cdev, trip_id))
> > 		stc = find_throttle_cfg_by_name(ts, cdev->type);
> > 
> > As far as I can tell this will always return &ts->throt_cfgs[i], so the
> > find_throttle_cfg_by_name() call is a bit redundant here. I'll look into
> > fixing that.
> > 
> > In any case, the important thing is that it would always find a valid
> > throttle configuration and therefore program the throttle, even if we
> > may not want to.
> 
> Why not rely on the thermal framework mechanism to set the hwtrpis ?
> 
> thermal_zone_device_register() calls thermal_zone_device_update(). This one
> calls thermal_zone_set_trips() which programs the hardware trip point.
> 
> When we suspend/resume, the PM notifiers are calling
> thermal_zone_device_update() which in turn sets the hw trip points.
> 
> May be I'm missing something but isn't enough for the sensor ?

These aren't actually trip points getting programmed, but rather the
built-in throttling mechanism. That said, it might be possible to append
that programming to the driver's ->set_trips() implementation. I'll look
into that.

Thanks for the suggestion,
Thierry

> 
> 
> > Possibly we could work around that by removing this fiddly special case
> > and instead add a new callback for the cooling devices that can be run
> > when they are bound to a thermal zone. This would allow the throttle
> > programming to be initiated from within the thermal core rather than
> > "bolted on" like it is now and should allow us to achieve the same
> > effect but without calling into get_thermal_instance().
> > 
> > I'll try and prototype this, but feel free to suggest anything better if
> > you can think of something.
> > 
> > Thierry
> 
> -- 
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
Daniel Lezcano Feb. 10, 2023, 1:17 p.m. UTC | #7
Hi Thierry,

On Thu, Jan 26, 2023 at 01:55:52PM +0100, Thierry Reding wrote:
> On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
> > 
> > Hi,
> > 
> > does anyone know what is the purpose of the get_thermal_instance() usage in
> > this code:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
> > 
> > The driver is using a function which is reserved for the thermal core. It
> > should not.
> > 
> > Is the following change ok ?
> > 
> > diff --git a/drivers/thermal/tegra/soctherm.c
> > b/drivers/thermal/tegra/soctherm.c
> > index 220873298d77..5f552402d987 100644
> > --- a/drivers/thermal/tegra/soctherm.c
> > +++ b/drivers/thermal/tegra/soctherm.c
> > @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
> > thermal_zone_device *tz, int trip
> >  				continue;
> > 
> >  			cdev = ts->throt_cfgs[i].cdev;
> > -			if (get_thermal_instance(tz, cdev, trip_id))
> > -				stc = find_throttle_cfg_by_name(ts, cdev->type);
> > -			else
> > +			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > +			if (!stc)
> >  				continue;
> > 
> >  			return throttrip_program(dev, sg, stc, temp);
> > @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
> > *dev,
> >  			continue;
> > 
> >  		cdev = ts->throt_cfgs[i].cdev;
> > -		if (get_thermal_instance(tz, cdev, trip))
> > -			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > -		else
> > +
> > +		stc = find_throttle_cfg_by_name(ts, cdev->type);
> > +		if (!stc)
> >  			continue;
> > 
> >  		ret = throttrip_program(dev, sg, stc, temperature);
> 
> There's a small difference in behavior after applying this patch. Prior
> to this I get (on Tegra210):
> 
> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 
> and after these changes, it turns into:
> 
> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> 
> The "programming throttle ..." messages are something I've added locally
> to trace what gets called. So it looks like for "pll" and "mem" thermal
> zones, we now program trip points whereas we previously didn't.

The DT descriptioni (tegra210.dtsi) says one thing and the implementation says
something else.

If we refer to the PLL description, there is one 'hot' trip point and
one 'critical' trip point. No polling delay at all, so we need the
interrupts.

Logically, we should set the 'hot' trip point first, when the trip
point is crossed, we setup the next trip point, which is the critical.

With these two trip points, the first one will send a notification to
the userspace and the second one will force a shutdown of the
system. For both, no cooling device is expected.

Well, actually I don't get the logic of the soctherm driver. It should
just rely on the thermal framework to set the trip point regardless
the cooling devices.

The device tree also is strange. For example, the dram sets
cooling-device = <&emc 0 0>; an inoperative action for a 'nominal'
trip point ... If the goal is to stop the mitigation, that is already
done by the governor when the trip point is crossed the way down. The
second trip point is an 'active' cooling device but it refers to a emc
which is, at the first glance, a passive cooling device.

The gpu description only describes hot and critical trip points. The
cooling device maps to the 'hot' trip point ! The governor is not used
in this case, so the cooling device is inoperative. Same for the cpu
thermal zone.

IOW, the driver is not correctly implemented and the device tree is
wrong. Thermal is not working correctly on these board AFAICT.
Thierry Reding Feb. 10, 2023, 2:09 p.m. UTC | #8
On Fri, Feb 10, 2023 at 02:17:03PM +0100, Daniel Lezcano wrote:
> Hi Thierry,
> 
> On Thu, Jan 26, 2023 at 01:55:52PM +0100, Thierry Reding wrote:
> > On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
> > > 
> > > Hi,
> > > 
> > > does anyone know what is the purpose of the get_thermal_instance() usage in
> > > this code:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
> > > 
> > > The driver is using a function which is reserved for the thermal core. It
> > > should not.
> > > 
> > > Is the following change ok ?
> > > 
> > > diff --git a/drivers/thermal/tegra/soctherm.c
> > > b/drivers/thermal/tegra/soctherm.c
> > > index 220873298d77..5f552402d987 100644
> > > --- a/drivers/thermal/tegra/soctherm.c
> > > +++ b/drivers/thermal/tegra/soctherm.c
> > > @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
> > > thermal_zone_device *tz, int trip
> > >  				continue;
> > > 
> > >  			cdev = ts->throt_cfgs[i].cdev;
> > > -			if (get_thermal_instance(tz, cdev, trip_id))
> > > -				stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > -			else
> > > +			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > +			if (!stc)
> > >  				continue;
> > > 
> > >  			return throttrip_program(dev, sg, stc, temp);
> > > @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
> > > *dev,
> > >  			continue;
> > > 
> > >  		cdev = ts->throt_cfgs[i].cdev;
> > > -		if (get_thermal_instance(tz, cdev, trip))
> > > -			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > -		else
> > > +
> > > +		stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > +		if (!stc)
> > >  			continue;
> > > 
> > >  		ret = throttrip_program(dev, sg, stc, temperature);
> > 
> > There's a small difference in behavior after applying this patch. Prior
> > to this I get (on Tegra210):
> > 
> > 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > 
> > and after these changes, it turns into:
> > 
> > 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> > 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> > 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> > 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> > 
> > The "programming throttle ..." messages are something I've added locally
> > to trace what gets called. So it looks like for "pll" and "mem" thermal
> > zones, we now program trip points whereas we previously didn't.
> 
> The DT descriptioni (tegra210.dtsi) says one thing and the implementation says
> something else.
> 
> If we refer to the PLL description, there is one 'hot' trip point and
> one 'critical' trip point. No polling delay at all, so we need the
> interrupts.
> 
> Logically, we should set the 'hot' trip point first, when the trip
> point is crossed, we setup the next trip point, which is the critical.
> 
> With these two trip points, the first one will send a notification to
> the userspace and the second one will force a shutdown of the
> system. For both, no cooling device is expected.

I think the intention here is to use the soctherm's built-in throttling
mechanism as a last resort measure to try and cool the system down. I
suppose that could count as "passive" cooling, so specifying it as the
cooling device for the "passive" trip point may be more appropriate.

The throttling that happens here is quite severe, so we don't want it to
happen too early. I would expect that our "passive" trip point shouldn't
be a lot less than the "hot" temperature. I suspect that's the reason
why the "hot" trip point was reused for this.

I'm also beginning to think that we should just not expose the soctherm
throttling as a cooling device and instead keep it internal to the
soctherm driver entirely.

> Well, actually I don't get the logic of the soctherm driver. It should
> just rely on the thermal framework to set the trip point regardless
> the cooling devices.

Again, "throttrip" doesn't map well to the concept of trip points
because its not a mechanism to notify when a certain temperature is
reached. It's an additional mechanism to automatically start throttling
once a given temperature threshold is crossed. So it's basically an
auto-cooling-device. If we program it only in response to a trip point
notification, there aren't any benefits to this throttle mechanism. So
again, I think we're probably better off just removing the cooling
device implementation for it and always program it with the "hot" or
"passive" trip point temperatures.

> The device tree also is strange. For example, the dram sets
> cooling-device = <&emc 0 0>; an inoperative action for a 'nominal'
> trip point ... If the goal is to stop the mitigation, that is already
> done by the governor when the trip point is crossed the way down. The
> second trip point is an 'active' cooling device but it refers to a emc
> which is, at the first glance, a passive cooling device.

I think this is because for the mem-thermal zone, "passive" is
considered to be less "severe" than "active". My understanding is that
the severity goes "active", "passive", "hot", "critical". "Active" trip
points are those where we want to use active cooling devices (such as a
fan, for example) to try and cool the device. The "passive" trip points
should only be reached when active cooling devices aren't up to the job
and passive mechanisms need to be deployed. Passive in this case meaning
the hardware itself has to be throttled.

If you look at the temperatures defined for passive vs. active for the
"mem" thermal zone, then clearly they are reversed. <&emc 0 0> should be
used for active trip points, and <&emc 1 1> means throttling of the EMC
frequency, i.e. for passive trip points.

> The gpu description only describes hot and critical trip points. The
> cooling device maps to the 'hot' trip point ! The governor is not used
> in this case, so the cooling device is inoperative. Same for the cpu
> thermal zone.
> 
> IOW, the driver is not correctly implemented and the device tree is
> wrong. Thermal is not working correctly on these board AFAICT.

I'll try to rework this. As I mentioned above I think we can just remove
that throttle_heavy cooling device and instead hard-code that in the
driver to a given temperature. Given that this is probably all defunct
anyway, the best would probably be to extend the soctherm's
throttle-cfgs node with a temperature field so we can avoid the reliance
on trip points (which would allow us to get rid of the calls to the
get_thermal_instance() helper).

On the DT side, I think most of the cooling maps can be cleaned up. We
can remove the entries for "critical" and "hot" trip points if the
driver unconditionally programs the automated throttling. For EMC we
want to reverse the "passive" and "active" trip points and possibly drop
the dram-passive cooling map as well, since you mentioned the core would
take care of disabling the cooling device automatically.

Thierry
Daniel Lezcano Feb. 10, 2023, 2:36 p.m. UTC | #9
On 10/02/2023 15:09, Thierry Reding wrote:
> On Fri, Feb 10, 2023 at 02:17:03PM +0100, Daniel Lezcano wrote:
>> Hi Thierry,
>>
>> On Thu, Jan 26, 2023 at 01:55:52PM +0100, Thierry Reding wrote:
>>> On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
>>>>
>>>> Hi,
>>>>
>>>> does anyone know what is the purpose of the get_thermal_instance() usage in
>>>> this code:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
>>>>
>>>> The driver is using a function which is reserved for the thermal core. It
>>>> should not.
>>>>
>>>> Is the following change ok ?
>>>>
>>>> diff --git a/drivers/thermal/tegra/soctherm.c
>>>> b/drivers/thermal/tegra/soctherm.c
>>>> index 220873298d77..5f552402d987 100644
>>>> --- a/drivers/thermal/tegra/soctherm.c
>>>> +++ b/drivers/thermal/tegra/soctherm.c
>>>> @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
>>>> thermal_zone_device *tz, int trip
>>>>   				continue;
>>>>
>>>>   			cdev = ts->throt_cfgs[i].cdev;
>>>> -			if (get_thermal_instance(tz, cdev, trip_id))
>>>> -				stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> -			else
>>>> +			stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> +			if (!stc)
>>>>   				continue;
>>>>
>>>>   			return throttrip_program(dev, sg, stc, temp);
>>>> @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
>>>> *dev,
>>>>   			continue;
>>>>
>>>>   		cdev = ts->throt_cfgs[i].cdev;
>>>> -		if (get_thermal_instance(tz, cdev, trip))
>>>> -			stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> -		else
>>>> +
>>>> +		stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> +		if (!stc)
>>>>   			continue;
>>>>
>>>>   		ret = throttrip_program(dev, sg, stc, temperature);
>>>
>>> There's a small difference in behavior after applying this patch. Prior
>>> to this I get (on Tegra210):
>>>
>>> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>>
>>> and after these changes, it turns into:
>>>
>>> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
>>> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
>>> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
>>> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
>>>
>>> The "programming throttle ..." messages are something I've added locally
>>> to trace what gets called. So it looks like for "pll" and "mem" thermal
>>> zones, we now program trip points whereas we previously didn't.
>>
>> The DT descriptioni (tegra210.dtsi) says one thing and the implementation says
>> something else.
>>
>> If we refer to the PLL description, there is one 'hot' trip point and
>> one 'critical' trip point. No polling delay at all, so we need the
>> interrupts.
>>
>> Logically, we should set the 'hot' trip point first, when the trip
>> point is crossed, we setup the next trip point, which is the critical.
>>
>> With these two trip points, the first one will send a notification to
>> the userspace and the second one will force a shutdown of the
>> system. For both, no cooling device is expected.
> 
> I think the intention here is to use the soctherm's built-in throttling
> mechanism as a last resort measure to try and cool the system down. I
> suppose that could count as "passive" cooling, so specifying it as the
> cooling device for the "passive" trip point may be more appropriate.
> 
> The throttling that happens here is quite severe, so we don't want it to
> happen too early. I would expect that our "passive" trip point shouldn't
> be a lot less than the "hot" temperature. I suspect that's the reason
> why the "hot" trip point was reused for this.
> 
> I'm also beginning to think that we should just not expose the soctherm
> throttling as a cooling device and instead keep it internal to the
> soctherm driver entirely.

Yes, and perhaps separate it from the sensor driver.

There is a similar hardware limiter for the qcom platform [1]. The 
description in the device tree is separated from the sensor and the 
binding has temperatures to begin the mitigation [2].

There is no trip point associated as those are related to the in-kernel 
mitigation.

If this mitigation is a heavy mitigation, above what the kernel is able 
to do with a passive cooling device. It would make sense to just have 
configured outside of the thermal zone.

So the configuration would be something like:

myperformance_limite {
	@ = <0x...>
	temperature_limit = 95000;
};

thermal_zone {

	cpu : {
		trips {
			alert {
			temperature = 90000;
			hysteresis = 2000;
			type = passive;
			};

			hot {
			temperature = 97000;
			type = hot;
			};

			critical {
			temperature = 100000;
			hysteresis = 2000;
			type = critical;
			};

			cooling-maps = <&cpu NO_LIMIT NO_LIMIT>;
		};
	}
};

The behavior will be a passive mitigation, if it fails the hardware 
limiter will take over, if that fails then hot sends a notification to 
the userspace (giving the opportunity to hotplug a cpu or kill a task or 
suspend), if that fails then shutdown.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/qcom/lmh.c?h=thermal/bleeding-edge

[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/qcom/sdm845.dtsi?h=thermal/bleeding-edge#n3922

[ ... ]

> On the DT side, I think most of the cooling maps can be cleaned up. We
> can remove the entries for "critical" and "hot" trip points if the
> driver unconditionally programs the automated throttling. 

You may want to keep the critical trip points at least. Even if the 
hardware limiter is certainly very effective, having the critical point 
is another fail safe allowing to gracefully shutdown the system before a 
wild hardware reset.

> For EMC we
> want to reverse the "passive" and "active" trip points and possibly drop
> the dram-passive cooling map as well, since you mentioned the core would
> take care of disabling the cooling device automatically.
Thierry Reding Feb. 10, 2023, 3:12 p.m. UTC | #10
On Fri, Feb 10, 2023 at 03:36:59PM +0100, Daniel Lezcano wrote:
> On 10/02/2023 15:09, Thierry Reding wrote:
> > On Fri, Feb 10, 2023 at 02:17:03PM +0100, Daniel Lezcano wrote:
> > > Hi Thierry,
> > > 
> > > On Thu, Jan 26, 2023 at 01:55:52PM +0100, Thierry Reding wrote:
> > > > On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > does anyone know what is the purpose of the get_thermal_instance() usage in
> > > > > this code:
> > > > > 
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
> > > > > 
> > > > > The driver is using a function which is reserved for the thermal core. It
> > > > > should not.
> > > > > 
> > > > > Is the following change ok ?
> > > > > 
> > > > > diff --git a/drivers/thermal/tegra/soctherm.c
> > > > > b/drivers/thermal/tegra/soctherm.c
> > > > > index 220873298d77..5f552402d987 100644
> > > > > --- a/drivers/thermal/tegra/soctherm.c
> > > > > +++ b/drivers/thermal/tegra/soctherm.c
> > > > > @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
> > > > > thermal_zone_device *tz, int trip
> > > > >   				continue;
> > > > > 
> > > > >   			cdev = ts->throt_cfgs[i].cdev;
> > > > > -			if (get_thermal_instance(tz, cdev, trip_id))
> > > > > -				stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > > > -			else
> > > > > +			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > > > +			if (!stc)
> > > > >   				continue;
> > > > > 
> > > > >   			return throttrip_program(dev, sg, stc, temp);
> > > > > @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
> > > > > *dev,
> > > > >   			continue;
> > > > > 
> > > > >   		cdev = ts->throt_cfgs[i].cdev;
> > > > > -		if (get_thermal_instance(tz, cdev, trip))
> > > > > -			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > > > -		else
> > > > > +
> > > > > +		stc = find_throttle_cfg_by_name(ts, cdev->type);
> > > > > +		if (!stc)
> > > > >   			continue;
> > > > > 
> > > > >   		ret = throttrip_program(dev, sg, stc, temperature);
> > > > 
> > > > There's a small difference in behavior after applying this patch. Prior
> > > > to this I get (on Tegra210):
> > > > 
> > > > 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > > > 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > > > 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > > > 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > > > 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > > > 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > > > 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > > > 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > > > 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > > > 
> > > > and after these changes, it turns into:
> > > > 
> > > > 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > > > 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > > > 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > > > 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > > > 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > > > 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > > > 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > > > 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > > > 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> > > > 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> > > > 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > > > 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> > > > 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> > > > 
> > > > The "programming throttle ..." messages are something I've added locally
> > > > to trace what gets called. So it looks like for "pll" and "mem" thermal
> > > > zones, we now program trip points whereas we previously didn't.
> > > 
> > > The DT descriptioni (tegra210.dtsi) says one thing and the implementation says
> > > something else.
> > > 
> > > If we refer to the PLL description, there is one 'hot' trip point and
> > > one 'critical' trip point. No polling delay at all, so we need the
> > > interrupts.
> > > 
> > > Logically, we should set the 'hot' trip point first, when the trip
> > > point is crossed, we setup the next trip point, which is the critical.
> > > 
> > > With these two trip points, the first one will send a notification to
> > > the userspace and the second one will force a shutdown of the
> > > system. For both, no cooling device is expected.
> > 
> > I think the intention here is to use the soctherm's built-in throttling
> > mechanism as a last resort measure to try and cool the system down. I
> > suppose that could count as "passive" cooling, so specifying it as the
> > cooling device for the "passive" trip point may be more appropriate.
> > 
> > The throttling that happens here is quite severe, so we don't want it to
> > happen too early. I would expect that our "passive" trip point shouldn't
> > be a lot less than the "hot" temperature. I suspect that's the reason
> > why the "hot" trip point was reused for this.
> > 
> > I'm also beginning to think that we should just not expose the soctherm
> > throttling as a cooling device and instead keep it internal to the
> > soctherm driver entirely.
> 
> Yes, and perhaps separate it from the sensor driver.
> 
> There is a similar hardware limiter for the qcom platform [1]. The
> description in the device tree is separated from the sensor and the binding
> has temperatures to begin the mitigation [2].

The hardware throttling is controlled using registers that are part of
the SOCTHERM block, so we can't separate it from the sensor driver. I
don't think that's much of a problem, though. The code for this already
exists in the current soctherm driver, so it's just a matter of removing
the cooling device registration code.

> 
> There is no trip point associated as those are related to the in-kernel
> mitigation.
> 
> If this mitigation is a heavy mitigation, above what the kernel is able to
> do with a passive cooling device. It would make sense to just have
> configured outside of the thermal zone.
> 
> So the configuration would be something like:
> 
> myperformance_limite {
> 	@ = <0x...>
> 	temperature_limit = 95000;
> };
> 
> thermal_zone {
> 
> 	cpu : {
> 		trips {
> 			alert {
> 			temperature = 90000;
> 			hysteresis = 2000;
> 			type = passive;
> 			};
> 
> 			hot {
> 			temperature = 97000;
> 			type = hot;
> 			};
> 
> 			critical {
> 			temperature = 100000;
> 			hysteresis = 2000;
> 			type = critical;
> 			};
> 
> 			cooling-maps = <&cpu NO_LIMIT NO_LIMIT>;
> 		};
> 	}
> };
> 
> The behavior will be a passive mitigation, if it fails the hardware limiter
> will take over, if that fails then hot sends a notification to the userspace
> (giving the opportunity to hotplug a cpu or kill a task or suspend), if that
> fails then shutdown.

Yeah, that's exactly what I had in mind.

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/qcom/lmh.c?h=thermal/bleeding-edge
> 
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/qcom/sdm845.dtsi?h=thermal/bleeding-edge#n3922
> 
> [ ... ]
> 
> > On the DT side, I think most of the cooling maps can be cleaned up. We
> > can remove the entries for "critical" and "hot" trip points if the
> > driver unconditionally programs the automated throttling.
> 
> You may want to keep the critical trip points at least. Even if the hardware
> limiter is certainly very effective, having the critical point is another
> fail safe allowing to gracefully shutdown the system before a wild hardware
> reset.

Yeah. What I meant was to remove only the cooling map entries for
critical and hot since they would be unused. We absolutely want to
keep the trip points themselves around to make sure the system will
forcefully shutdown as a last resort.

Thierry
Daniel Lezcano March 8, 2023, 5:21 p.m. UTC | #11
Hi Thierry,

did you have time to look to the changes ?

Or at least a way to remove the get_thermal_instance() usage ?




On 10/02/2023 15:09, Thierry Reding wrote:
> On Fri, Feb 10, 2023 at 02:17:03PM +0100, Daniel Lezcano wrote:
>> Hi Thierry,
>>
>> On Thu, Jan 26, 2023 at 01:55:52PM +0100, Thierry Reding wrote:
>>> On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
>>>>
>>>> Hi,
>>>>
>>>> does anyone know what is the purpose of the get_thermal_instance() usage in
>>>> this code:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
>>>>
>>>> The driver is using a function which is reserved for the thermal core. It
>>>> should not.
>>>>
>>>> Is the following change ok ?
>>>>
>>>> diff --git a/drivers/thermal/tegra/soctherm.c
>>>> b/drivers/thermal/tegra/soctherm.c
>>>> index 220873298d77..5f552402d987 100644
>>>> --- a/drivers/thermal/tegra/soctherm.c
>>>> +++ b/drivers/thermal/tegra/soctherm.c
>>>> @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
>>>> thermal_zone_device *tz, int trip
>>>>   				continue;
>>>>
>>>>   			cdev = ts->throt_cfgs[i].cdev;
>>>> -			if (get_thermal_instance(tz, cdev, trip_id))
>>>> -				stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> -			else
>>>> +			stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> +			if (!stc)
>>>>   				continue;
>>>>
>>>>   			return throttrip_program(dev, sg, stc, temp);
>>>> @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
>>>> *dev,
>>>>   			continue;
>>>>
>>>>   		cdev = ts->throt_cfgs[i].cdev;
>>>> -		if (get_thermal_instance(tz, cdev, trip))
>>>> -			stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> -		else
>>>> +
>>>> +		stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>> +		if (!stc)
>>>>   			continue;
>>>>
>>>>   		ret = throttrip_program(dev, sg, stc, temperature);
>>>
>>> There's a small difference in behavior after applying this patch. Prior
>>> to this I get (on Tegra210):
>>>
>>> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>>
>>> and after these changes, it turns into:
>>>
>>> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
>>> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
>>> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
>>> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
>>>
>>> The "programming throttle ..." messages are something I've added locally
>>> to trace what gets called. So it looks like for "pll" and "mem" thermal
>>> zones, we now program trip points whereas we previously didn't.
>>
>> The DT descriptioni (tegra210.dtsi) says one thing and the implementation says
>> something else.
>>
>> If we refer to the PLL description, there is one 'hot' trip point and
>> one 'critical' trip point. No polling delay at all, so we need the
>> interrupts.
>>
>> Logically, we should set the 'hot' trip point first, when the trip
>> point is crossed, we setup the next trip point, which is the critical.
>>
>> With these two trip points, the first one will send a notification to
>> the userspace and the second one will force a shutdown of the
>> system. For both, no cooling device is expected.
> 
> I think the intention here is to use the soctherm's built-in throttling
> mechanism as a last resort measure to try and cool the system down. I
> suppose that could count as "passive" cooling, so specifying it as the
> cooling device for the "passive" trip point may be more appropriate.
> 
> The throttling that happens here is quite severe, so we don't want it to
> happen too early. I would expect that our "passive" trip point shouldn't
> be a lot less than the "hot" temperature. I suspect that's the reason
> why the "hot" trip point was reused for this.
> 
> I'm also beginning to think that we should just not expose the soctherm
> throttling as a cooling device and instead keep it internal to the
> soctherm driver entirely.
> 
>> Well, actually I don't get the logic of the soctherm driver. It should
>> just rely on the thermal framework to set the trip point regardless
>> the cooling devices.
> 
> Again, "throttrip" doesn't map well to the concept of trip points
> because its not a mechanism to notify when a certain temperature is
> reached. It's an additional mechanism to automatically start throttling
> once a given temperature threshold is crossed. So it's basically an
> auto-cooling-device. If we program it only in response to a trip point
> notification, there aren't any benefits to this throttle mechanism. So
> again, I think we're probably better off just removing the cooling
> device implementation for it and always program it with the "hot" or
> "passive" trip point temperatures.
> 
>> The device tree also is strange. For example, the dram sets
>> cooling-device = <&emc 0 0>; an inoperative action for a 'nominal'
>> trip point ... If the goal is to stop the mitigation, that is already
>> done by the governor when the trip point is crossed the way down. The
>> second trip point is an 'active' cooling device but it refers to a emc
>> which is, at the first glance, a passive cooling device.
> 
> I think this is because for the mem-thermal zone, "passive" is
> considered to be less "severe" than "active". My understanding is that
> the severity goes "active", "passive", "hot", "critical". "Active" trip
> points are those where we want to use active cooling devices (such as a
> fan, for example) to try and cool the device. The "passive" trip points
> should only be reached when active cooling devices aren't up to the job
> and passive mechanisms need to be deployed. Passive in this case meaning
> the hardware itself has to be throttled.
> 
> If you look at the temperatures defined for passive vs. active for the
> "mem" thermal zone, then clearly they are reversed. <&emc 0 0> should be
> used for active trip points, and <&emc 1 1> means throttling of the EMC
> frequency, i.e. for passive trip points.
> 
>> The gpu description only describes hot and critical trip points. The
>> cooling device maps to the 'hot' trip point ! The governor is not used
>> in this case, so the cooling device is inoperative. Same for the cpu
>> thermal zone.
>>
>> IOW, the driver is not correctly implemented and the device tree is
>> wrong. Thermal is not working correctly on these board AFAICT.
> 
> I'll try to rework this. As I mentioned above I think we can just remove
> that throttle_heavy cooling device and instead hard-code that in the
> driver to a given temperature. Given that this is probably all defunct
> anyway, the best would probably be to extend the soctherm's
> throttle-cfgs node with a temperature field so we can avoid the reliance
> on trip points (which would allow us to get rid of the calls to the
> get_thermal_instance() helper).
> 
> On the DT side, I think most of the cooling maps can be cleaned up. We
> can remove the entries for "critical" and "hot" trip points if the
> driver unconditionally programs the automated throttling. For EMC we
> want to reverse the "passive" and "active" trip points and possibly drop
> the dram-passive cooling map as well, since you mentioned the core would
> take care of disabling the cooling device automatically.
> 
> Thierry
Daniel Lezcano April 11, 2023, 10:48 a.m. UTC | #12
Hi Thierry,

did you have time to look at this ?

This driver is the only one using get_thermal_instance() and I would 
like to remove this function along with the thermal_core.h inclusion in 
this driver

Thanks
   -- Daniel


On 10/02/2023 16:12, Thierry Reding wrote:
> On Fri, Feb 10, 2023 at 03:36:59PM +0100, Daniel Lezcano wrote:
>> On 10/02/2023 15:09, Thierry Reding wrote:
>>> On Fri, Feb 10, 2023 at 02:17:03PM +0100, Daniel Lezcano wrote:
>>>> Hi Thierry,
>>>>
>>>> On Thu, Jan 26, 2023 at 01:55:52PM +0100, Thierry Reding wrote:
>>>>> On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> does anyone know what is the purpose of the get_thermal_instance() usage in
>>>>>> this code:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
>>>>>>
>>>>>> The driver is using a function which is reserved for the thermal core. It
>>>>>> should not.
>>>>>>
>>>>>> Is the following change ok ?
>>>>>>
>>>>>> diff --git a/drivers/thermal/tegra/soctherm.c
>>>>>> b/drivers/thermal/tegra/soctherm.c
>>>>>> index 220873298d77..5f552402d987 100644
>>>>>> --- a/drivers/thermal/tegra/soctherm.c
>>>>>> +++ b/drivers/thermal/tegra/soctherm.c
>>>>>> @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
>>>>>> thermal_zone_device *tz, int trip
>>>>>>    				continue;
>>>>>>
>>>>>>    			cdev = ts->throt_cfgs[i].cdev;
>>>>>> -			if (get_thermal_instance(tz, cdev, trip_id))
>>>>>> -				stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>>>> -			else
>>>>>> +			stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>>>> +			if (!stc)
>>>>>>    				continue;
>>>>>>
>>>>>>    			return throttrip_program(dev, sg, stc, temp);
>>>>>> @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
>>>>>> *dev,
>>>>>>    			continue;
>>>>>>
>>>>>>    		cdev = ts->throt_cfgs[i].cdev;
>>>>>> -		if (get_thermal_instance(tz, cdev, trip))
>>>>>> -			stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>>>> -		else
>>>>>> +
>>>>>> +		stc = find_throttle_cfg_by_name(ts, cdev->type);
>>>>>> +		if (!stc)
>>>>>>    			continue;
>>>>>>
>>>>>>    		ret = throttrip_program(dev, sg, stc, temperature);
>>>>>
>>>>> There's a small difference in behavior after applying this patch. Prior
>>>>> to this I get (on Tegra210):
>>>>>
>>>>> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>>>> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>>>> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>>>> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>>>> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>>>> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>>>> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>>>> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>>>> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>>>>
>>>>> and after these changes, it turns into:
>>>>>
>>>>> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
>>>>> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
>>>>> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
>>>>> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
>>>>> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
>>>>> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
>>>>> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
>>>>> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
>>>>> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
>>>>> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
>>>>> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
>>>>> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
>>>>> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
>>>>>
>>>>> The "programming throttle ..." messages are something I've added locally
>>>>> to trace what gets called. So it looks like for "pll" and "mem" thermal
>>>>> zones, we now program trip points whereas we previously didn't.
>>>>
>>>> The DT descriptioni (tegra210.dtsi) says one thing and the implementation says
>>>> something else.
>>>>
>>>> If we refer to the PLL description, there is one 'hot' trip point and
>>>> one 'critical' trip point. No polling delay at all, so we need the
>>>> interrupts.
>>>>
>>>> Logically, we should set the 'hot' trip point first, when the trip
>>>> point is crossed, we setup the next trip point, which is the critical.
>>>>
>>>> With these two trip points, the first one will send a notification to
>>>> the userspace and the second one will force a shutdown of the
>>>> system. For both, no cooling device is expected.
>>>
>>> I think the intention here is to use the soctherm's built-in throttling
>>> mechanism as a last resort measure to try and cool the system down. I
>>> suppose that could count as "passive" cooling, so specifying it as the
>>> cooling device for the "passive" trip point may be more appropriate.
>>>
>>> The throttling that happens here is quite severe, so we don't want it to
>>> happen too early. I would expect that our "passive" trip point shouldn't
>>> be a lot less than the "hot" temperature. I suspect that's the reason
>>> why the "hot" trip point was reused for this.
>>>
>>> I'm also beginning to think that we should just not expose the soctherm
>>> throttling as a cooling device and instead keep it internal to the
>>> soctherm driver entirely.
>>
>> Yes, and perhaps separate it from the sensor driver.
>>
>> There is a similar hardware limiter for the qcom platform [1]. The
>> description in the device tree is separated from the sensor and the binding
>> has temperatures to begin the mitigation [2].
> 
> The hardware throttling is controlled using registers that are part of
> the SOCTHERM block, so we can't separate it from the sensor driver. I
> don't think that's much of a problem, though. The code for this already
> exists in the current soctherm driver, so it's just a matter of removing
> the cooling device registration code.
> 
>>
>> There is no trip point associated as those are related to the in-kernel
>> mitigation.
>>
>> If this mitigation is a heavy mitigation, above what the kernel is able to
>> do with a passive cooling device. It would make sense to just have
>> configured outside of the thermal zone.
>>
>> So the configuration would be something like:
>>
>> myperformance_limite {
>> 	@ = <0x...>
>> 	temperature_limit = 95000;
>> };
>>
>> thermal_zone {
>>
>> 	cpu : {
>> 		trips {
>> 			alert {
>> 			temperature = 90000;
>> 			hysteresis = 2000;
>> 			type = passive;
>> 			};
>>
>> 			hot {
>> 			temperature = 97000;
>> 			type = hot;
>> 			};
>>
>> 			critical {
>> 			temperature = 100000;
>> 			hysteresis = 2000;
>> 			type = critical;
>> 			};
>>
>> 			cooling-maps = <&cpu NO_LIMIT NO_LIMIT>;
>> 		};
>> 	}
>> };
>>
>> The behavior will be a passive mitigation, if it fails the hardware limiter
>> will take over, if that fails then hot sends a notification to the userspace
>> (giving the opportunity to hotplug a cpu or kill a task or suspend), if that
>> fails then shutdown.
> 
> Yeah, that's exactly what I had in mind.
> 
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/qcom/lmh.c?h=thermal/bleeding-edge
>>
>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/arch/arm64/boot/dts/qcom/sdm845.dtsi?h=thermal/bleeding-edge#n3922
>>
>> [ ... ]
>>
>>> On the DT side, I think most of the cooling maps can be cleaned up. We
>>> can remove the entries for "critical" and "hot" trip points if the
>>> driver unconditionally programs the automated throttling.
>>
>> You may want to keep the critical trip points at least. Even if the hardware
>> limiter is certainly very effective, having the critical point is another
>> fail safe allowing to gracefully shutdown the system before a wild hardware
>> reset.
> 
> Yeah. What I meant was to remove only the cooling map entries for
> critical and hot since they would be unused. We absolutely want to
> keep the trip points themselves around to make sure the system will
> forcefully shutdown as a last resort.
> 
> Thierry
Thierry Reding April 11, 2023, 4:30 p.m. UTC | #13
On Tue, Apr 11, 2023 at 12:48:25PM +0200, Daniel Lezcano wrote:
> 
> Hi Thierry,
> 
> did you have time to look at this ?
> 
> This driver is the only one using get_thermal_instance() and I would like to
> remove this function along with the thermal_core.h inclusion in this driver

Yeah, I've had work in progress patches for this for a few weeks but
haven't had the time to test these much. I'd like to take a bit longer
to test them before sending them out.

Thierry
diff mbox series

Patch

diff --git a/drivers/thermal/tegra/soctherm.c 
b/drivers/thermal/tegra/soctherm.c
index 220873298d77..5f552402d987 100644
--- a/drivers/thermal/tegra/soctherm.c
+++ b/drivers/thermal/tegra/soctherm.c
@@ -620,9 +620,8 @@  static int tegra_thermctl_set_trip_temp(struct 
thermal_zone_device *tz, int trip
  				continue;

  			cdev = ts->throt_cfgs[i].cdev;
-			if (get_thermal_instance(tz, cdev, trip_id))
-				stc = find_throttle_cfg_by_name(ts, cdev->type);
-			else
+			stc = find_throttle_cfg_by_name(ts, cdev->type);
+			if (!stc)
  				continue;