diff mbox series

[1/2] arm64: dts: qcom: sm8650: setup cpu thermal with idle on high temperatures

Message ID 20250103-topic-sm8650-thermal-cpu-idle-v1-1-faa1f011ecd9@linaro.org (mailing list archive)
State Superseded
Headers show
Series arm64: dts: qcom: sm8650: rework CPU & GPU thermal zones | expand

Commit Message

Neil Armstrong Jan. 3, 2025, 2:38 p.m. UTC
On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
hardware controlled loop using the LMH and EPSS blocks with constraints and
OPPs programmed in the board firmware.

Since the Hardware does a better job at maintaining the CPUs temperature
in an acceptable range by taking in account more parameters like the die
characteristics or other factory fused values, it makes no sense to try
and reproduce a similar set of constraints with the Linux cpufreq thermal
core.

In addition, the tsens IP is responsible for monitoring the temperature
across the SoC and the current settings will heavily trigger the tsens
UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
constraints which are currently defined in the DT. And since the CPUs
are not hooked in the thermal trip points, the potential interrupts and
calculations are a waste of system resources.

Instead, set higher temperatures in the CPU trip points, and hook some CPU
idle injector with a 100% duty cycle at the highest trip point in the case
the hardware DCVS cannot handle the temperature surge, and try our best to
avoid reaching the critical temperature trip point which should trigger an
inevitable thermal shutdown.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++--------
 1 file changed, 214 insertions(+), 60 deletions(-)

Comments

Bjorn Andersson Jan. 6, 2025, 11:39 p.m. UTC | #1
On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
> hardware controlled loop using the LMH and EPSS blocks with constraints and
> OPPs programmed in the board firmware.
> 
> Since the Hardware does a better job at maintaining the CPUs temperature
> in an acceptable range by taking in account more parameters like the die
> characteristics or other factory fused values, it makes no sense to try
> and reproduce a similar set of constraints with the Linux cpufreq thermal
> core.
> 
> In addition, the tsens IP is responsible for monitoring the temperature
> across the SoC and the current settings will heavily trigger the tsens
> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
> constraints which are currently defined in the DT. And since the CPUs
> are not hooked in the thermal trip points, the potential interrupts and
> calculations are a waste of system resources.
> 
> Instead, set higher temperatures in the CPU trip points, and hook some CPU
> idle injector with a 100% duty cycle at the highest trip point in the case
> the hardware DCVS cannot handle the temperature surge, and try our best to
> avoid reaching the critical temperature trip point which should trigger an
> inevitable thermal shutdown.
> 

Are you able to hit these higher temperatures? Do you have some test
case where the idle-injection shows to be successful in blocking us from
reaching the critical temp?

E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
the critical trip for when the hardware fails us.


I have no concerns at all about "removing" the 90C trip point, that
makes total sense to me - let the hardware keep the cores as close to
max as possible, and then use some slower sensor for keeping the system
temperature in check (such as the x13s skin sensor).


PS. The described behavior should apply to anything SDM845 and newer, so
I'd like to see this set/document precedence for other platforms.

Regards,
Bjorn

> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++--------
>  1 file changed, 214 insertions(+), 60 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi
> index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644
> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi
> @@ -99,6 +99,13 @@ l3_0: l3-cache {
>  					cache-unified;
>  				};
>  			};
> +
> +			cpu0_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
> +
>  		};
>  
>  		cpu1: cpu@100 {
> @@ -119,6 +126,12 @@ cpu1: cpu@100 {
>  			qcom,freq-domain = <&cpufreq_hw 0>;
>  
>  			#cooling-cells = <2>;
> +
> +			cpu1_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
>  		};
>  
>  		cpu2: cpu@200 {
> @@ -146,6 +159,12 @@ l2_200: l2-cache {
>  				cache-unified;
>  				next-level-cache = <&l3_0>;
>  			};
> +
> +			cpu2_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
>  		};
>  
>  		cpu3: cpu@300 {
> @@ -166,6 +185,12 @@ cpu3: cpu@300 {
>  			qcom,freq-domain = <&cpufreq_hw 3>;
>  
>  			#cooling-cells = <2>;
> +
> +			cpu3_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
>  		};
>  
>  		cpu4: cpu@400 {
> @@ -193,6 +218,12 @@ l2_400: l2-cache {
>  				cache-unified;
>  				next-level-cache = <&l3_0>;
>  			};
> +
> +			cpu4_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
>  		};
>  
>  		cpu5: cpu@500 {
> @@ -220,6 +251,12 @@ l2_500: l2-cache {
>  				cache-unified;
>  				next-level-cache = <&l3_0>;
>  			};
> +
> +			cpu5_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
>  		};
>  
>  		cpu6: cpu@600 {
> @@ -247,6 +284,12 @@ l2_600: l2-cache {
>  				cache-unified;
>  				next-level-cache = <&l3_0>;
>  			};
> +
> +			cpu6_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
>  		};
>  
>  		cpu7: cpu@700 {
> @@ -274,6 +317,12 @@ l2_700: l2-cache {
>  				cache-unified;
>  				next-level-cache = <&l3_0>;
>  			};
> +
> +			cpu7_idle: thermal-idle {
> +				#cooling-cells = <2>;
> +				duration-us = <800000>;
> +				exit-latency-us = <10000>;
> +			};
>  		};
>  
>  		cpu-map {
> @@ -5752,23 +5801,30 @@ cpu2-top-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu2_top_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu2-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu2_top_alert1>;
> +					cooling-device = <&cpu2_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu2-bottom-thermal {
> @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu2_bottom_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu2-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu2_bottom_alert1>;
> +					cooling-device = <&cpu2_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu3-top-thermal {
> @@ -5800,23 +5863,30 @@ cpu3-top-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu3_top_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu3-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu3_top_alert1>;
> +					cooling-device = <&cpu3_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu3-bottom-thermal {
> @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu3_bottom_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu3-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu3_bottom_alert1>;
> +					cooling-device = <&cpu3_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu4-top-thermal {
> @@ -5848,23 +5925,30 @@ cpu4-top-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu4_top_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu4-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu4_top_alert1>;
> +					cooling-device = <&cpu4_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu4-bottom-thermal {
> @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu4_bottom_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu4-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu4_bottom_alert1>;
> +					cooling-device = <&cpu4_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu5-top-thermal {
> @@ -5896,23 +5987,30 @@ cpu5-top-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu5_top_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu5-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu5_top_alert1>;
> +					cooling-device = <&cpu5_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu5-bottom-thermal {
> @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu5_bottom_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu5-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu5_bottom_alert1>;
> +					cooling-device = <&cpu5_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu6-top-thermal {
> @@ -5944,23 +6049,30 @@ cpu6-top-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu6_top_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu6-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu6_top_alert1>;
> +					cooling-device = <&cpu6_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu6-bottom-thermal {
> @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu6_bottom_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu6-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu6_bottom_alert1>;
> +					cooling-device = <&cpu6_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		aoss1-thermal {
> @@ -6010,23 +6129,30 @@ cpu7-top-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu7_top_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu7-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu7_top_alert1>;
> +					cooling-device = <&cpu7_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu7-middle-thermal {
> @@ -6034,23 +6160,30 @@ cpu7-middle-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu7_middle_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu7-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu7_middle_alert1>;
> +					cooling-device = <&cpu7_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu7-bottom-thermal {
> @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu7_bottom_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu7-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu7_bottom_alert1>;
> +					cooling-device = <&cpu7_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu0-thermal {
> @@ -6082,23 +6222,30 @@ cpu0-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu0_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu0-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu0_alert1>;
> +					cooling-device = <&cpu0_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		cpu1-thermal {
> @@ -6106,23 +6253,30 @@ cpu1-thermal {
>  
>  			trips {
>  				trip-point0 {
> -					temperature = <90000>;
> +					temperature = <108000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				trip-point1 {
> -					temperature = <95000>;
> +				cpu1_alert1: trip-point1 {
> +					temperature = <110000>;
>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
>  				cpu1-critical {
> -					temperature = <110000>;
> +					temperature = <115000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu1_alert1>;
> +					cooling-device = <&cpu1_idle 100 100>;
> +				};
> +			};
>  		};
>  
>  		nsphvx0-thermal {
> 
> -- 
> 2.34.1
>
Neil Armstrong Jan. 7, 2025, 8:13 a.m. UTC | #2
Hi,

On 07/01/2025 00:39, Bjorn Andersson wrote:
> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
>> hardware controlled loop using the LMH and EPSS blocks with constraints and
>> OPPs programmed in the board firmware.
>>
>> Since the Hardware does a better job at maintaining the CPUs temperature
>> in an acceptable range by taking in account more parameters like the die
>> characteristics or other factory fused values, it makes no sense to try
>> and reproduce a similar set of constraints with the Linux cpufreq thermal
>> core.
>>
>> In addition, the tsens IP is responsible for monitoring the temperature
>> across the SoC and the current settings will heavily trigger the tsens
>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
>> constraints which are currently defined in the DT. And since the CPUs
>> are not hooked in the thermal trip points, the potential interrupts and
>> calculations are a waste of system resources.
>>
>> Instead, set higher temperatures in the CPU trip points, and hook some CPU
>> idle injector with a 100% duty cycle at the highest trip point in the case
>> the hardware DCVS cannot handle the temperature surge, and try our best to
>> avoid reaching the critical temperature trip point which should trigger an
>> inevitable thermal shutdown.
>>
> 
> Are you able to hit these higher temperatures? Do you have some test
> case where the idle-injection shows to be successful in blocking us from
> reaching the critical temp?

No, I've been able to test idle-injection and observed a noticeable effect
but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
scaling down and let the temp go higher ?

> 
> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
> the critical trip for when the hardware fails us.

It's the goal here aswell

> 
> 
> I have no concerns at all about "removing" the 90C trip point, that
> makes total sense to me - let the hardware keep the cores as close to
> max as possible, and then use some slower sensor for keeping the system
> temperature in check (such as the x13s skin sensor).
> 
> 
> PS. The described behavior should apply to anything SDM845 and newer, so
> I'd like to see this set/document precedence for other platforms.
> 
> Regards,
> Bjorn
> 
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>   arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++--------
>>   1 file changed, 214 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi
>> index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644
>> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi
>> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi
>> @@ -99,6 +99,13 @@ l3_0: l3-cache {
>>   					cache-unified;
>>   				};
>>   			};
>> +
>> +			cpu0_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>> +
>>   		};
>>   
>>   		cpu1: cpu@100 {
>> @@ -119,6 +126,12 @@ cpu1: cpu@100 {
>>   			qcom,freq-domain = <&cpufreq_hw 0>;
>>   
>>   			#cooling-cells = <2>;
>> +
>> +			cpu1_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>>   		};
>>   
>>   		cpu2: cpu@200 {
>> @@ -146,6 +159,12 @@ l2_200: l2-cache {
>>   				cache-unified;
>>   				next-level-cache = <&l3_0>;
>>   			};
>> +
>> +			cpu2_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>>   		};
>>   
>>   		cpu3: cpu@300 {
>> @@ -166,6 +185,12 @@ cpu3: cpu@300 {
>>   			qcom,freq-domain = <&cpufreq_hw 3>;
>>   
>>   			#cooling-cells = <2>;
>> +
>> +			cpu3_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>>   		};
>>   
>>   		cpu4: cpu@400 {
>> @@ -193,6 +218,12 @@ l2_400: l2-cache {
>>   				cache-unified;
>>   				next-level-cache = <&l3_0>;
>>   			};
>> +
>> +			cpu4_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>>   		};
>>   
>>   		cpu5: cpu@500 {
>> @@ -220,6 +251,12 @@ l2_500: l2-cache {
>>   				cache-unified;
>>   				next-level-cache = <&l3_0>;
>>   			};
>> +
>> +			cpu5_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>>   		};
>>   
>>   		cpu6: cpu@600 {
>> @@ -247,6 +284,12 @@ l2_600: l2-cache {
>>   				cache-unified;
>>   				next-level-cache = <&l3_0>;
>>   			};
>> +
>> +			cpu6_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>>   		};
>>   
>>   		cpu7: cpu@700 {
>> @@ -274,6 +317,12 @@ l2_700: l2-cache {
>>   				cache-unified;
>>   				next-level-cache = <&l3_0>;
>>   			};
>> +
>> +			cpu7_idle: thermal-idle {
>> +				#cooling-cells = <2>;
>> +				duration-us = <800000>;
>> +				exit-latency-us = <10000>;
>> +			};
>>   		};
>>   
>>   		cpu-map {
>> @@ -5752,23 +5801,30 @@ cpu2-top-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu2_top_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu2-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu2_top_alert1>;
>> +					cooling-device = <&cpu2_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu2-bottom-thermal {
>> @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu2_bottom_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu2-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu2_bottom_alert1>;
>> +					cooling-device = <&cpu2_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu3-top-thermal {
>> @@ -5800,23 +5863,30 @@ cpu3-top-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu3_top_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu3-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu3_top_alert1>;
>> +					cooling-device = <&cpu3_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu3-bottom-thermal {
>> @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu3_bottom_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu3-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu3_bottom_alert1>;
>> +					cooling-device = <&cpu3_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu4-top-thermal {
>> @@ -5848,23 +5925,30 @@ cpu4-top-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu4_top_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu4-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu4_top_alert1>;
>> +					cooling-device = <&cpu4_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu4-bottom-thermal {
>> @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu4_bottom_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu4-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu4_bottom_alert1>;
>> +					cooling-device = <&cpu4_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu5-top-thermal {
>> @@ -5896,23 +5987,30 @@ cpu5-top-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu5_top_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu5-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu5_top_alert1>;
>> +					cooling-device = <&cpu5_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu5-bottom-thermal {
>> @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu5_bottom_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu5-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu5_bottom_alert1>;
>> +					cooling-device = <&cpu5_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu6-top-thermal {
>> @@ -5944,23 +6049,30 @@ cpu6-top-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu6_top_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu6-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu6_top_alert1>;
>> +					cooling-device = <&cpu6_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu6-bottom-thermal {
>> @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu6_bottom_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu6-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu6_bottom_alert1>;
>> +					cooling-device = <&cpu6_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		aoss1-thermal {
>> @@ -6010,23 +6129,30 @@ cpu7-top-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu7_top_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu7-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu7_top_alert1>;
>> +					cooling-device = <&cpu7_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu7-middle-thermal {
>> @@ -6034,23 +6160,30 @@ cpu7-middle-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu7_middle_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu7-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu7_middle_alert1>;
>> +					cooling-device = <&cpu7_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu7-bottom-thermal {
>> @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu7_bottom_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu7-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu7_bottom_alert1>;
>> +					cooling-device = <&cpu7_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu0-thermal {
>> @@ -6082,23 +6222,30 @@ cpu0-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu0_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu0-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu0_alert1>;
>> +					cooling-device = <&cpu0_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		cpu1-thermal {
>> @@ -6106,23 +6253,30 @@ cpu1-thermal {
>>   
>>   			trips {
>>   				trip-point0 {
>> -					temperature = <90000>;
>> +					temperature = <108000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>> -				trip-point1 {
>> -					temperature = <95000>;
>> +				cpu1_alert1: trip-point1 {
>> +					temperature = <110000>;
>>   					hysteresis = <2000>;
>>   					type = "passive";
>>   				};
>>   
>>   				cpu1-critical {
>> -					temperature = <110000>;
>> +					temperature = <115000>;
>>   					hysteresis = <1000>;
>>   					type = "critical";
>>   				};
>>   			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu1_alert1>;
>> +					cooling-device = <&cpu1_idle 100 100>;
>> +				};
>> +			};
>>   		};
>>   
>>   		nsphvx0-thermal {
>>
>> -- 
>> 2.34.1
>>
Bjorn Andersson Jan. 8, 2025, 3:11 a.m. UTC | #3
On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote:
> Hi,
> 
> On 07/01/2025 00:39, Bjorn Andersson wrote:
> > On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
> > > On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
> > > hardware controlled loop using the LMH and EPSS blocks with constraints and
> > > OPPs programmed in the board firmware.
> > > 
> > > Since the Hardware does a better job at maintaining the CPUs temperature
> > > in an acceptable range by taking in account more parameters like the die
> > > characteristics or other factory fused values, it makes no sense to try
> > > and reproduce a similar set of constraints with the Linux cpufreq thermal
> > > core.
> > > 
> > > In addition, the tsens IP is responsible for monitoring the temperature
> > > across the SoC and the current settings will heavily trigger the tsens
> > > UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
> > > constraints which are currently defined in the DT. And since the CPUs
> > > are not hooked in the thermal trip points, the potential interrupts and
> > > calculations are a waste of system resources.
> > > 
> > > Instead, set higher temperatures in the CPU trip points, and hook some CPU
> > > idle injector with a 100% duty cycle at the highest trip point in the case
> > > the hardware DCVS cannot handle the temperature surge, and try our best to
> > > avoid reaching the critical temperature trip point which should trigger an
> > > inevitable thermal shutdown.
> > > 
> > 
> > Are you able to hit these higher temperatures? Do you have some test
> > case where the idle-injection shows to be successful in blocking us from
> > reaching the critical temp?
> 
> No, I've been able to test idle-injection and observed a noticeable effect
> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
> scaling down and let the temp go higher ?
> 

I don't know how to override that configuration.

> > 
> > E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
> > the critical trip for when the hardware fails us.
> 
> It's the goal here aswell
> 

How about simplifying the patch by removing the idle-injection step and
just rely on LMH/EPSS and the "critical" trip (at least until someone
can prove that there's value in the extra mitigation)?

Regards,
Bjorn

> > 
> > 
> > I have no concerns at all about "removing" the 90C trip point, that
> > makes total sense to me - let the hardware keep the cores as close to
> > max as possible, and then use some slower sensor for keeping the system
> > temperature in check (such as the x13s skin sensor).
> > 
> > 
> > PS. The described behavior should apply to anything SDM845 and newer, so
> > I'd like to see this set/document precedence for other platforms.
> > 
> > Regards,
> > Bjorn
> > 
> > > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> > > ---
> > >   arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++--------
> > >   1 file changed, 214 insertions(+), 60 deletions(-)
> > > 
> > > diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi
> > > index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644
> > > --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi
> > > @@ -99,6 +99,13 @@ l3_0: l3-cache {
> > >   					cache-unified;
> > >   				};
> > >   			};
> > > +
> > > +			cpu0_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > > +
> > >   		};
> > >   		cpu1: cpu@100 {
> > > @@ -119,6 +126,12 @@ cpu1: cpu@100 {
> > >   			qcom,freq-domain = <&cpufreq_hw 0>;
> > >   			#cooling-cells = <2>;
> > > +
> > > +			cpu1_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > >   		};
> > >   		cpu2: cpu@200 {
> > > @@ -146,6 +159,12 @@ l2_200: l2-cache {
> > >   				cache-unified;
> > >   				next-level-cache = <&l3_0>;
> > >   			};
> > > +
> > > +			cpu2_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > >   		};
> > >   		cpu3: cpu@300 {
> > > @@ -166,6 +185,12 @@ cpu3: cpu@300 {
> > >   			qcom,freq-domain = <&cpufreq_hw 3>;
> > >   			#cooling-cells = <2>;
> > > +
> > > +			cpu3_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > >   		};
> > >   		cpu4: cpu@400 {
> > > @@ -193,6 +218,12 @@ l2_400: l2-cache {
> > >   				cache-unified;
> > >   				next-level-cache = <&l3_0>;
> > >   			};
> > > +
> > > +			cpu4_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > >   		};
> > >   		cpu5: cpu@500 {
> > > @@ -220,6 +251,12 @@ l2_500: l2-cache {
> > >   				cache-unified;
> > >   				next-level-cache = <&l3_0>;
> > >   			};
> > > +
> > > +			cpu5_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > >   		};
> > >   		cpu6: cpu@600 {
> > > @@ -247,6 +284,12 @@ l2_600: l2-cache {
> > >   				cache-unified;
> > >   				next-level-cache = <&l3_0>;
> > >   			};
> > > +
> > > +			cpu6_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > >   		};
> > >   		cpu7: cpu@700 {
> > > @@ -274,6 +317,12 @@ l2_700: l2-cache {
> > >   				cache-unified;
> > >   				next-level-cache = <&l3_0>;
> > >   			};
> > > +
> > > +			cpu7_idle: thermal-idle {
> > > +				#cooling-cells = <2>;
> > > +				duration-us = <800000>;
> > > +				exit-latency-us = <10000>;
> > > +			};
> > >   		};
> > >   		cpu-map {
> > > @@ -5752,23 +5801,30 @@ cpu2-top-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu2_top_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu2-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu2_top_alert1>;
> > > +					cooling-device = <&cpu2_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu2-bottom-thermal {
> > > @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu2_bottom_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu2-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu2_bottom_alert1>;
> > > +					cooling-device = <&cpu2_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu3-top-thermal {
> > > @@ -5800,23 +5863,30 @@ cpu3-top-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu3_top_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu3-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu3_top_alert1>;
> > > +					cooling-device = <&cpu3_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu3-bottom-thermal {
> > > @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu3_bottom_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu3-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu3_bottom_alert1>;
> > > +					cooling-device = <&cpu3_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu4-top-thermal {
> > > @@ -5848,23 +5925,30 @@ cpu4-top-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu4_top_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu4-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu4_top_alert1>;
> > > +					cooling-device = <&cpu4_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu4-bottom-thermal {
> > > @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu4_bottom_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu4-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu4_bottom_alert1>;
> > > +					cooling-device = <&cpu4_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu5-top-thermal {
> > > @@ -5896,23 +5987,30 @@ cpu5-top-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu5_top_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu5-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu5_top_alert1>;
> > > +					cooling-device = <&cpu5_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu5-bottom-thermal {
> > > @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu5_bottom_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu5-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu5_bottom_alert1>;
> > > +					cooling-device = <&cpu5_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu6-top-thermal {
> > > @@ -5944,23 +6049,30 @@ cpu6-top-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu6_top_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu6-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu6_top_alert1>;
> > > +					cooling-device = <&cpu6_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu6-bottom-thermal {
> > > @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu6_bottom_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu6-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu6_bottom_alert1>;
> > > +					cooling-device = <&cpu6_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		aoss1-thermal {
> > > @@ -6010,23 +6129,30 @@ cpu7-top-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu7_top_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu7-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu7_top_alert1>;
> > > +					cooling-device = <&cpu7_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu7-middle-thermal {
> > > @@ -6034,23 +6160,30 @@ cpu7-middle-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu7_middle_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu7-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu7_middle_alert1>;
> > > +					cooling-device = <&cpu7_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu7-bottom-thermal {
> > > @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu7_bottom_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu7-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu7_bottom_alert1>;
> > > +					cooling-device = <&cpu7_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu0-thermal {
> > > @@ -6082,23 +6222,30 @@ cpu0-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu0_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu0-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu0_alert1>;
> > > +					cooling-device = <&cpu0_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		cpu1-thermal {
> > > @@ -6106,23 +6253,30 @@ cpu1-thermal {
> > >   			trips {
> > >   				trip-point0 {
> > > -					temperature = <90000>;
> > > +					temperature = <108000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > > -				trip-point1 {
> > > -					temperature = <95000>;
> > > +				cpu1_alert1: trip-point1 {
> > > +					temperature = <110000>;
> > >   					hysteresis = <2000>;
> > >   					type = "passive";
> > >   				};
> > >   				cpu1-critical {
> > > -					temperature = <110000>;
> > > +					temperature = <115000>;
> > >   					hysteresis = <1000>;
> > >   					type = "critical";
> > >   				};
> > >   			};
> > > +
> > > +			cooling-maps {
> > > +				map0 {
> > > +					trip = <&cpu1_alert1>;
> > > +					cooling-device = <&cpu1_idle 100 100>;
> > > +				};
> > > +			};
> > >   		};
> > >   		nsphvx0-thermal {
> > > 
> > > -- 
> > > 2.34.1
> > > 
>
Neil Armstrong Jan. 8, 2025, 9:15 a.m. UTC | #4
On 08/01/2025 04:11, Bjorn Andersson wrote:
> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote:
>> Hi,
>>
>> On 07/01/2025 00:39, Bjorn Andersson wrote:
>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and
>>>> OPPs programmed in the board firmware.
>>>>
>>>> Since the Hardware does a better job at maintaining the CPUs temperature
>>>> in an acceptable range by taking in account more parameters like the die
>>>> characteristics or other factory fused values, it makes no sense to try
>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal
>>>> core.
>>>>
>>>> In addition, the tsens IP is responsible for monitoring the temperature
>>>> across the SoC and the current settings will heavily trigger the tsens
>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
>>>> constraints which are currently defined in the DT. And since the CPUs
>>>> are not hooked in the thermal trip points, the potential interrupts and
>>>> calculations are a waste of system resources.
>>>>
>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU
>>>> idle injector with a 100% duty cycle at the highest trip point in the case
>>>> the hardware DCVS cannot handle the temperature surge, and try our best to
>>>> avoid reaching the critical temperature trip point which should trigger an
>>>> inevitable thermal shutdown.
>>>>
>>>
>>> Are you able to hit these higher temperatures? Do you have some test
>>> case where the idle-injection shows to be successful in blocking us from
>>> reaching the critical temp?
>>
>> No, I've been able to test idle-injection and observed a noticeable effect
>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
>> scaling down and let the temp go higher ?
>>
> 
> I don't know how to override that configuration.
> 
>>>
>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
>>> the critical trip for when the hardware fails us.
>>
>> It's the goal here aswell
>>
> 
> How about simplifying the patch by removing the idle-injection step and
> just rely on LMH/EPSS and the "critical" trip (at least until someone
> can prove that there's value in the extra mitigation)?

OK, but I see value in this idle injection mitigation in that case LMH/EPSS
fails, the only factor in control of HLOS is by stopping scheduling tasks
since frequency won't be able to scale anymore.

Anyway, I agree it can be added later on, so should I drop the 2 trip points
and only leave the critical one ?

> 
> Regards,
> Bjorn
> 
>>>
>>>
>>> I have no concerns at all about "removing" the 90C trip point, that
>>> makes total sense to me - let the hardware keep the cores as close to
>>> max as possible, and then use some slower sensor for keeping the system
>>> temperature in check (such as the x13s skin sensor).
>>>
>>>
>>> PS. The described behavior should apply to anything SDM845 and newer, so
>>> I'd like to see this set/document precedence for other platforms.
>>>
>>> Regards,
>>> Bjorn
>>>
>>>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>>>> ---
>>>>    arch/arm64/boot/dts/qcom/sm8650.dtsi | 274 +++++++++++++++++++++++++++--------
>>>>    1 file changed, 214 insertions(+), 60 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi
>>>> index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644
>>>> --- a/arch/arm64/boot/dts/qcom/sm8650.dtsi
>>>> +++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi
>>>> @@ -99,6 +99,13 @@ l3_0: l3-cache {
>>>>    					cache-unified;
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cpu0_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>> +
>>>>    		};
>>>>    		cpu1: cpu@100 {
>>>> @@ -119,6 +126,12 @@ cpu1: cpu@100 {
>>>>    			qcom,freq-domain = <&cpufreq_hw 0>;
>>>>    			#cooling-cells = <2>;
>>>> +
>>>> +			cpu1_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>>    		};
>>>>    		cpu2: cpu@200 {
>>>> @@ -146,6 +159,12 @@ l2_200: l2-cache {
>>>>    				cache-unified;
>>>>    				next-level-cache = <&l3_0>;
>>>>    			};
>>>> +
>>>> +			cpu2_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>>    		};
>>>>    		cpu3: cpu@300 {
>>>> @@ -166,6 +185,12 @@ cpu3: cpu@300 {
>>>>    			qcom,freq-domain = <&cpufreq_hw 3>;
>>>>    			#cooling-cells = <2>;
>>>> +
>>>> +			cpu3_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>>    		};
>>>>    		cpu4: cpu@400 {
>>>> @@ -193,6 +218,12 @@ l2_400: l2-cache {
>>>>    				cache-unified;
>>>>    				next-level-cache = <&l3_0>;
>>>>    			};
>>>> +
>>>> +			cpu4_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>>    		};
>>>>    		cpu5: cpu@500 {
>>>> @@ -220,6 +251,12 @@ l2_500: l2-cache {
>>>>    				cache-unified;
>>>>    				next-level-cache = <&l3_0>;
>>>>    			};
>>>> +
>>>> +			cpu5_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>>    		};
>>>>    		cpu6: cpu@600 {
>>>> @@ -247,6 +284,12 @@ l2_600: l2-cache {
>>>>    				cache-unified;
>>>>    				next-level-cache = <&l3_0>;
>>>>    			};
>>>> +
>>>> +			cpu6_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>>    		};
>>>>    		cpu7: cpu@700 {
>>>> @@ -274,6 +317,12 @@ l2_700: l2-cache {
>>>>    				cache-unified;
>>>>    				next-level-cache = <&l3_0>;
>>>>    			};
>>>> +
>>>> +			cpu7_idle: thermal-idle {
>>>> +				#cooling-cells = <2>;
>>>> +				duration-us = <800000>;
>>>> +				exit-latency-us = <10000>;
>>>> +			};
>>>>    		};
>>>>    		cpu-map {
>>>> @@ -5752,23 +5801,30 @@ cpu2-top-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu2_top_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu2-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu2_top_alert1>;
>>>> +					cooling-device = <&cpu2_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu2-bottom-thermal {
>>>> @@ -5776,23 +5832,30 @@ cpu2-bottom-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu2_bottom_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu2-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu2_bottom_alert1>;
>>>> +					cooling-device = <&cpu2_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu3-top-thermal {
>>>> @@ -5800,23 +5863,30 @@ cpu3-top-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu3_top_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu3-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu3_top_alert1>;
>>>> +					cooling-device = <&cpu3_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu3-bottom-thermal {
>>>> @@ -5824,23 +5894,30 @@ cpu3-bottom-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu3_bottom_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu3-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu3_bottom_alert1>;
>>>> +					cooling-device = <&cpu3_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu4-top-thermal {
>>>> @@ -5848,23 +5925,30 @@ cpu4-top-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu4_top_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu4-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu4_top_alert1>;
>>>> +					cooling-device = <&cpu4_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu4-bottom-thermal {
>>>> @@ -5872,23 +5956,30 @@ cpu4-bottom-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu4_bottom_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu4-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu4_bottom_alert1>;
>>>> +					cooling-device = <&cpu4_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu5-top-thermal {
>>>> @@ -5896,23 +5987,30 @@ cpu5-top-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu5_top_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu5-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu5_top_alert1>;
>>>> +					cooling-device = <&cpu5_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu5-bottom-thermal {
>>>> @@ -5920,23 +6018,30 @@ cpu5-bottom-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu5_bottom_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu5-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu5_bottom_alert1>;
>>>> +					cooling-device = <&cpu5_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu6-top-thermal {
>>>> @@ -5944,23 +6049,30 @@ cpu6-top-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu6_top_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu6-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu6_top_alert1>;
>>>> +					cooling-device = <&cpu6_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu6-bottom-thermal {
>>>> @@ -5968,23 +6080,30 @@ cpu6-bottom-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu6_bottom_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu6-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu6_bottom_alert1>;
>>>> +					cooling-device = <&cpu6_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		aoss1-thermal {
>>>> @@ -6010,23 +6129,30 @@ cpu7-top-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu7_top_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu7-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu7_top_alert1>;
>>>> +					cooling-device = <&cpu7_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu7-middle-thermal {
>>>> @@ -6034,23 +6160,30 @@ cpu7-middle-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu7_middle_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu7-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu7_middle_alert1>;
>>>> +					cooling-device = <&cpu7_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu7-bottom-thermal {
>>>> @@ -6058,23 +6191,30 @@ cpu7-bottom-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu7_bottom_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu7-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu7_bottom_alert1>;
>>>> +					cooling-device = <&cpu7_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu0-thermal {
>>>> @@ -6082,23 +6222,30 @@ cpu0-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu0_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu0-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu0_alert1>;
>>>> +					cooling-device = <&cpu0_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		cpu1-thermal {
>>>> @@ -6106,23 +6253,30 @@ cpu1-thermal {
>>>>    			trips {
>>>>    				trip-point0 {
>>>> -					temperature = <90000>;
>>>> +					temperature = <108000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>> -				trip-point1 {
>>>> -					temperature = <95000>;
>>>> +				cpu1_alert1: trip-point1 {
>>>> +					temperature = <110000>;
>>>>    					hysteresis = <2000>;
>>>>    					type = "passive";
>>>>    				};
>>>>    				cpu1-critical {
>>>> -					temperature = <110000>;
>>>> +					temperature = <115000>;
>>>>    					hysteresis = <1000>;
>>>>    					type = "critical";
>>>>    				};
>>>>    			};
>>>> +
>>>> +			cooling-maps {
>>>> +				map0 {
>>>> +					trip = <&cpu1_alert1>;
>>>> +					cooling-device = <&cpu1_idle 100 100>;
>>>> +				};
>>>> +			};
>>>>    		};
>>>>    		nsphvx0-thermal {
>>>>
>>>> -- 
>>>> 2.34.1
>>>>
>>
Konrad Dybcio Jan. 9, 2025, 3:18 p.m. UTC | #5
On 8.01.2025 10:15 AM, Neil Armstrong wrote:
> On 08/01/2025 04:11, Bjorn Andersson wrote:
>> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote:
>>> Hi,
>>>
>>> On 07/01/2025 00:39, Bjorn Andersson wrote:
>>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
>>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
>>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and
>>>>> OPPs programmed in the board firmware.
>>>>>
>>>>> Since the Hardware does a better job at maintaining the CPUs temperature
>>>>> in an acceptable range by taking in account more parameters like the die
>>>>> characteristics or other factory fused values, it makes no sense to try
>>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal
>>>>> core.
>>>>>
>>>>> In addition, the tsens IP is responsible for monitoring the temperature
>>>>> across the SoC and the current settings will heavily trigger the tsens
>>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
>>>>> constraints which are currently defined in the DT. And since the CPUs
>>>>> are not hooked in the thermal trip points, the potential interrupts and
>>>>> calculations are a waste of system resources.
>>>>>
>>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU
>>>>> idle injector with a 100% duty cycle at the highest trip point in the case
>>>>> the hardware DCVS cannot handle the temperature surge, and try our best to
>>>>> avoid reaching the critical temperature trip point which should trigger an
>>>>> inevitable thermal shutdown.
>>>>>
>>>>
>>>> Are you able to hit these higher temperatures? Do you have some test
>>>> case where the idle-injection shows to be successful in blocking us from
>>>> reaching the critical temp?
>>>
>>> No, I've been able to test idle-injection and observed a noticeable effect
>>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
>>> scaling down and let the temp go higher ?
>>>
>>
>> I don't know how to override that configuration.

I'll try to get some answers. SDM845 seems to expose a couple SCM calls for
this purpose and it's already wired up in drivers/thermal/qcom/lmh.c

>>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
>>>> the critical trip for when the hardware fails us.
>>>
>>> It's the goal here aswell
>>>
>>
>> How about simplifying the patch by removing the idle-injection step and
>> just rely on LMH/EPSS and the "critical" trip (at least until someone
>> can prove that there's value in the extra mitigation)?
> 
> OK, but I see value in this idle injection mitigation in that case LMH/EPSS
> fails, the only factor in control of HLOS is by stopping scheduling tasks
> since frequency won't be able to scale anymore.

If LMH fails, your SoC is probably cooked already, anyway :(

I'm not sure why idle injection isn't enabled by default if no other cooling
methods are found. Perhaps that could be discussed with some thermal folks..

> Anyway, I agree it can be added later on, so should I drop the 2 trip points
> and only leave the critical one ?

I think sticking with critical=Tjmax + critical-action = "reboot" may be the
way to go here.

We may want to give some folks a heads up, so they can wire up skin sensors
on their devices ahead of these changes landing tree-wide.

Konrad
Bjorn Andersson Jan. 9, 2025, 9:01 p.m. UTC | #6
On Wed, Jan 08, 2025 at 10:15:34AM +0100, Neil Armstrong wrote:
> On 08/01/2025 04:11, Bjorn Andersson wrote:
> > On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote:
> > > Hi,
> > > 
> > > On 07/01/2025 00:39, Bjorn Andersson wrote:
> > > > On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
> > > > > On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
> > > > > hardware controlled loop using the LMH and EPSS blocks with constraints and
> > > > > OPPs programmed in the board firmware.
> > > > > 
> > > > > Since the Hardware does a better job at maintaining the CPUs temperature
> > > > > in an acceptable range by taking in account more parameters like the die
> > > > > characteristics or other factory fused values, it makes no sense to try
> > > > > and reproduce a similar set of constraints with the Linux cpufreq thermal
> > > > > core.
> > > > > 
> > > > > In addition, the tsens IP is responsible for monitoring the temperature
> > > > > across the SoC and the current settings will heavily trigger the tsens
> > > > > UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
> > > > > constraints which are currently defined in the DT. And since the CPUs
> > > > > are not hooked in the thermal trip points, the potential interrupts and
> > > > > calculations are a waste of system resources.
> > > > > 
> > > > > Instead, set higher temperatures in the CPU trip points, and hook some CPU
> > > > > idle injector with a 100% duty cycle at the highest trip point in the case
> > > > > the hardware DCVS cannot handle the temperature surge, and try our best to
> > > > > avoid reaching the critical temperature trip point which should trigger an
> > > > > inevitable thermal shutdown.
> > > > > 
> > > > 
> > > > Are you able to hit these higher temperatures? Do you have some test
> > > > case where the idle-injection shows to be successful in blocking us from
> > > > reaching the critical temp?
> > > 
> > > No, I've been able to test idle-injection and observed a noticeable effect
> > > but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
> > > scaling down and let the temp go higher ?
> > > 
> > 
> > I don't know how to override that configuration.
> > 
> > > > 
> > > > E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
> > > > the critical trip for when the hardware fails us.
> > > 
> > > It's the goal here aswell
> > > 
> > 
> > How about simplifying the patch by removing the idle-injection step and
> > just rely on LMH/EPSS and the "critical" trip (at least until someone
> > can prove that there's value in the extra mitigation)?
> 
> OK, but I see value in this idle injection mitigation in that case LMH/EPSS
> fails, the only factor in control of HLOS is by stopping scheduling tasks
> since frequency won't be able to scale anymore.
> 

I think that sounds good, but afaict we don't have any indication of
this being a problem and we don't have any way to test that it actually
solves that problem.

> Anyway, I agree it can be added later on, so should I drop the 2 trip points
> and only leave the critical one ?
> 

I think that's a simple and functional starting point - and it solves
your IRQ issue.

Regards,
Bjorn
Neil Armstrong Jan. 10, 2025, 9:40 a.m. UTC | #7
On 09/01/2025 22:01, Bjorn Andersson wrote:
> On Wed, Jan 08, 2025 at 10:15:34AM +0100, Neil Armstrong wrote:
>> On 08/01/2025 04:11, Bjorn Andersson wrote:
>>> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote:
>>>> Hi,
>>>>
>>>> On 07/01/2025 00:39, Bjorn Andersson wrote:
>>>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
>>>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
>>>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and
>>>>>> OPPs programmed in the board firmware.
>>>>>>
>>>>>> Since the Hardware does a better job at maintaining the CPUs temperature
>>>>>> in an acceptable range by taking in account more parameters like the die
>>>>>> characteristics or other factory fused values, it makes no sense to try
>>>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal
>>>>>> core.
>>>>>>
>>>>>> In addition, the tsens IP is responsible for monitoring the temperature
>>>>>> across the SoC and the current settings will heavily trigger the tsens
>>>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
>>>>>> constraints which are currently defined in the DT. And since the CPUs
>>>>>> are not hooked in the thermal trip points, the potential interrupts and
>>>>>> calculations are a waste of system resources.
>>>>>>
>>>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU
>>>>>> idle injector with a 100% duty cycle at the highest trip point in the case
>>>>>> the hardware DCVS cannot handle the temperature surge, and try our best to
>>>>>> avoid reaching the critical temperature trip point which should trigger an
>>>>>> inevitable thermal shutdown.
>>>>>>
>>>>>
>>>>> Are you able to hit these higher temperatures? Do you have some test
>>>>> case where the idle-injection shows to be successful in blocking us from
>>>>> reaching the critical temp?
>>>>
>>>> No, I've been able to test idle-injection and observed a noticeable effect
>>>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
>>>> scaling down and let the temp go higher ?
>>>>
>>>
>>> I don't know how to override that configuration.
>>>
>>>>>
>>>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
>>>>> the critical trip for when the hardware fails us.
>>>>
>>>> It's the goal here aswell
>>>>
>>>
>>> How about simplifying the patch by removing the idle-injection step and
>>> just rely on LMH/EPSS and the "critical" trip (at least until someone
>>> can prove that there's value in the extra mitigation)?
>>
>> OK, but I see value in this idle injection mitigation in that case LMH/EPSS
>> fails, the only factor in control of HLOS is by stopping scheduling tasks
>> since frequency won't be able to scale anymore.
>>
> 
> I think that sounds good, but afaict we don't have any indication of
> this being a problem and we don't have any way to test that it actually
> solves that problem.

Sure, let's postpone the idle injection when we can actually test it.

> 
>> Anyway, I agree it can be added later on, so should I drop the 2 trip points
>> and only leave the critical one ?
>>
> 
> I think that's a simple and functional starting point - and it solves
> your IRQ issue.

Ack

Thanks,
Neil

> 
> Regards,
> Bjorn
Neil Armstrong Jan. 10, 2025, 9:41 a.m. UTC | #8
On 09/01/2025 16:18, Konrad Dybcio wrote:
> On 8.01.2025 10:15 AM, Neil Armstrong wrote:
>> On 08/01/2025 04:11, Bjorn Andersson wrote:
>>> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote:
>>>> Hi,
>>>>
>>>> On 07/01/2025 00:39, Bjorn Andersson wrote:
>>>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
>>>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
>>>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and
>>>>>> OPPs programmed in the board firmware.
>>>>>>
>>>>>> Since the Hardware does a better job at maintaining the CPUs temperature
>>>>>> in an acceptable range by taking in account more parameters like the die
>>>>>> characteristics or other factory fused values, it makes no sense to try
>>>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal
>>>>>> core.
>>>>>>
>>>>>> In addition, the tsens IP is responsible for monitoring the temperature
>>>>>> across the SoC and the current settings will heavily trigger the tsens
>>>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
>>>>>> constraints which are currently defined in the DT. And since the CPUs
>>>>>> are not hooked in the thermal trip points, the potential interrupts and
>>>>>> calculations are a waste of system resources.
>>>>>>
>>>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU
>>>>>> idle injector with a 100% duty cycle at the highest trip point in the case
>>>>>> the hardware DCVS cannot handle the temperature surge, and try our best to
>>>>>> avoid reaching the critical temperature trip point which should trigger an
>>>>>> inevitable thermal shutdown.
>>>>>>
>>>>>
>>>>> Are you able to hit these higher temperatures? Do you have some test
>>>>> case where the idle-injection shows to be successful in blocking us from
>>>>> reaching the critical temp?
>>>>
>>>> No, I've been able to test idle-injection and observed a noticeable effect
>>>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
>>>> scaling down and let the temp go higher ?
>>>>
>>>
>>> I don't know how to override that configuration.
> 
> I'll try to get some answers. SDM845 seems to expose a couple SCM calls for
> this purpose and it's already wired up in drivers/thermal/qcom/lmh.c

Would be great, thx

> 
>>>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
>>>>> the critical trip for when the hardware fails us.
>>>>
>>>> It's the goal here aswell
>>>>
>>>
>>> How about simplifying the patch by removing the idle-injection step and
>>> just rely on LMH/EPSS and the "critical" trip (at least until someone
>>> can prove that there's value in the extra mitigation)?
>>
>> OK, but I see value in this idle injection mitigation in that case LMH/EPSS
>> fails, the only factor in control of HLOS is by stopping scheduling tasks
>> since frequency won't be able to scale anymore.
> 
> If LMH fails, your SoC is probably cooked already, anyway :(
> 
> I'm not sure why idle injection isn't enabled by default if no other cooling
> methods are found. Perhaps that could be discussed with some thermal folks..

Yeah this is good question, this should probably be the default "hot" behaviour

> 
>> Anyway, I agree it can be added later on, so should I drop the 2 trip points
>> and only leave the critical one ?
> 
> I think sticking with critical=Tjmax + critical-action = "reboot" may be the
> way to go here.
> 
> We may want to give some folks a heads up, so they can wire up skin sensors
> on their devices ahead of these changes landing tree-wide.

Yeah it's also my goal, will respin with only critical.

Thanks,
Neil

> 
> Konrad
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/qcom/sm8650.dtsi b/arch/arm64/boot/dts/qcom/sm8650.dtsi
index 25e47505adcb790d09f1d2726386438487255824..448374a32e07151e35727d92fab77356769aea8a 100644
--- a/arch/arm64/boot/dts/qcom/sm8650.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8650.dtsi
@@ -99,6 +99,13 @@  l3_0: l3-cache {
 					cache-unified;
 				};
 			};
+
+			cpu0_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
+
 		};
 
 		cpu1: cpu@100 {
@@ -119,6 +126,12 @@  cpu1: cpu@100 {
 			qcom,freq-domain = <&cpufreq_hw 0>;
 
 			#cooling-cells = <2>;
+
+			cpu1_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
 		};
 
 		cpu2: cpu@200 {
@@ -146,6 +159,12 @@  l2_200: l2-cache {
 				cache-unified;
 				next-level-cache = <&l3_0>;
 			};
+
+			cpu2_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
 		};
 
 		cpu3: cpu@300 {
@@ -166,6 +185,12 @@  cpu3: cpu@300 {
 			qcom,freq-domain = <&cpufreq_hw 3>;
 
 			#cooling-cells = <2>;
+
+			cpu3_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
 		};
 
 		cpu4: cpu@400 {
@@ -193,6 +218,12 @@  l2_400: l2-cache {
 				cache-unified;
 				next-level-cache = <&l3_0>;
 			};
+
+			cpu4_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
 		};
 
 		cpu5: cpu@500 {
@@ -220,6 +251,12 @@  l2_500: l2-cache {
 				cache-unified;
 				next-level-cache = <&l3_0>;
 			};
+
+			cpu5_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
 		};
 
 		cpu6: cpu@600 {
@@ -247,6 +284,12 @@  l2_600: l2-cache {
 				cache-unified;
 				next-level-cache = <&l3_0>;
 			};
+
+			cpu6_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
 		};
 
 		cpu7: cpu@700 {
@@ -274,6 +317,12 @@  l2_700: l2-cache {
 				cache-unified;
 				next-level-cache = <&l3_0>;
 			};
+
+			cpu7_idle: thermal-idle {
+				#cooling-cells = <2>;
+				duration-us = <800000>;
+				exit-latency-us = <10000>;
+			};
 		};
 
 		cpu-map {
@@ -5752,23 +5801,30 @@  cpu2-top-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu2_top_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu2-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu2_top_alert1>;
+					cooling-device = <&cpu2_idle 100 100>;
+				};
+			};
 		};
 
 		cpu2-bottom-thermal {
@@ -5776,23 +5832,30 @@  cpu2-bottom-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu2_bottom_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu2-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu2_bottom_alert1>;
+					cooling-device = <&cpu2_idle 100 100>;
+				};
+			};
 		};
 
 		cpu3-top-thermal {
@@ -5800,23 +5863,30 @@  cpu3-top-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu3_top_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu3-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu3_top_alert1>;
+					cooling-device = <&cpu3_idle 100 100>;
+				};
+			};
 		};
 
 		cpu3-bottom-thermal {
@@ -5824,23 +5894,30 @@  cpu3-bottom-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu3_bottom_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu3-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu3_bottom_alert1>;
+					cooling-device = <&cpu3_idle 100 100>;
+				};
+			};
 		};
 
 		cpu4-top-thermal {
@@ -5848,23 +5925,30 @@  cpu4-top-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu4_top_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu4-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu4_top_alert1>;
+					cooling-device = <&cpu4_idle 100 100>;
+				};
+			};
 		};
 
 		cpu4-bottom-thermal {
@@ -5872,23 +5956,30 @@  cpu4-bottom-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu4_bottom_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu4-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu4_bottom_alert1>;
+					cooling-device = <&cpu4_idle 100 100>;
+				};
+			};
 		};
 
 		cpu5-top-thermal {
@@ -5896,23 +5987,30 @@  cpu5-top-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu5_top_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu5-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu5_top_alert1>;
+					cooling-device = <&cpu5_idle 100 100>;
+				};
+			};
 		};
 
 		cpu5-bottom-thermal {
@@ -5920,23 +6018,30 @@  cpu5-bottom-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu5_bottom_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu5-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu5_bottom_alert1>;
+					cooling-device = <&cpu5_idle 100 100>;
+				};
+			};
 		};
 
 		cpu6-top-thermal {
@@ -5944,23 +6049,30 @@  cpu6-top-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu6_top_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu6-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu6_top_alert1>;
+					cooling-device = <&cpu6_idle 100 100>;
+				};
+			};
 		};
 
 		cpu6-bottom-thermal {
@@ -5968,23 +6080,30 @@  cpu6-bottom-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu6_bottom_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu6-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu6_bottom_alert1>;
+					cooling-device = <&cpu6_idle 100 100>;
+				};
+			};
 		};
 
 		aoss1-thermal {
@@ -6010,23 +6129,30 @@  cpu7-top-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu7_top_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu7-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu7_top_alert1>;
+					cooling-device = <&cpu7_idle 100 100>;
+				};
+			};
 		};
 
 		cpu7-middle-thermal {
@@ -6034,23 +6160,30 @@  cpu7-middle-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu7_middle_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu7-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu7_middle_alert1>;
+					cooling-device = <&cpu7_idle 100 100>;
+				};
+			};
 		};
 
 		cpu7-bottom-thermal {
@@ -6058,23 +6191,30 @@  cpu7-bottom-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu7_bottom_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu7-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu7_bottom_alert1>;
+					cooling-device = <&cpu7_idle 100 100>;
+				};
+			};
 		};
 
 		cpu0-thermal {
@@ -6082,23 +6222,30 @@  cpu0-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu0_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu0-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu0_alert1>;
+					cooling-device = <&cpu0_idle 100 100>;
+				};
+			};
 		};
 
 		cpu1-thermal {
@@ -6106,23 +6253,30 @@  cpu1-thermal {
 
 			trips {
 				trip-point0 {
-					temperature = <90000>;
+					temperature = <108000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
-				trip-point1 {
-					temperature = <95000>;
+				cpu1_alert1: trip-point1 {
+					temperature = <110000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
 
 				cpu1-critical {
-					temperature = <110000>;
+					temperature = <115000>;
 					hysteresis = <1000>;
 					type = "critical";
 				};
 			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu1_alert1>;
+					cooling-device = <&cpu1_idle 100 100>;
+				};
+			};
 		};
 
 		nsphvx0-thermal {