diff mbox

Exynos5422-odroidxu3: Incomplete thermal-zones definition

Message ID 9e32c530-0d15-223f-c280-617f8ae2540d@gmail.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Willy WOLFF Feb. 2, 2017, 5:14 p.m. UTC
Odroid XU3-familly boards has thermal sensors per A15 core, but the actual thermal-zones define only cooling-maps action for cpu0.
If the application is running on all cores but core4 (first core of the A15 cluster), the CPU can reach high temperature without cooling action.

Also, the comment for cpu_alert4 in cooling-maps definition is not accurate. 11 steps for A15 correspond to 700MHz.

Signed-off-by: Willy Wolff <willy.mh.wolff@gmail.com>
---
 arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi | 287 +++++++++++++++++++--
 1 file changed, 269 insertions(+), 18 deletions(-)

Comments

Krzysztof Kozlowski Feb. 7, 2017, 9:28 p.m. UTC | #1
Hi,

Thanks for the patch. Please use:
	git log --oneline arch/arm/boot/dts/exynos*
to get the appropriate commit subject prefix.

On Thu, Feb 02, 2017 at 05:14:40PM +0000, Willy WOLFF wrote:
> Odroid XU3-familly boards has thermal sensors per A15 core, but the actual thermal-zones define only cooling-maps action for cpu0.
> If the application is running on all cores but core4 (first core of the A15 cluster), the CPU can reach high temperature without cooling action.

This is still the same cluster so it is a little bit surprising that
only cpu0 stayed cool.

Any specific reproduction steps (other than setting process affinity?)?
It should also affect other SoCs, right?

> 
> Also, the comment for cpu_alert4 in cooling-maps definition is not accurate. 11 steps for A15 correspond to 700MHz.

Wrap lines using regular git or editor settings, which is 72-75
characters. See "The canonical patch format". Usually the git commit
editor does it already.

> 
> Signed-off-by: Willy Wolff <willy.mh.wolff@gmail.com>
> ---
>  arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi | 287 +++++++++++++++++++--
>  1 file changed, 269 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
> index 05b9afdd..92328c1 100644
> --- a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
> +++ b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
> @@ -64,22 +64,22 @@
>  			polling-delay-passive = <250>;
>  			polling-delay = <0>;
>  			trips {
> -				cpu_alert0: cpu-alert-0 {
> +				cpu0_alert0: cpu-alert-0 {
>  					temperature = <50000>; /* millicelsius */
>  					hysteresis = <5000>; /* millicelsius */
>  					type = "active";
>  				};
> -				cpu_alert1: cpu-alert-1 {
> +				cpu0_alert1: cpu-alert-1 {
>  					temperature = <60000>; /* millicelsius */
>  					hysteresis = <5000>; /* millicelsius */
>  					type = "active";
>  				};
> -				cpu_alert2: cpu-alert-2 {
> +				cpu0_alert2: cpu-alert-2 {
>  					temperature = <70000>; /* millicelsius */
>  					hysteresis = <5000>; /* millicelsius */
>  					type = "active";
>  				};
> -				cpu_crit0: cpu-crit-0 {
> +				cpu0_crit0: cpu-crit-0 {
>  					temperature = <120000>; /* millicelsius */
>  					hysteresis = <0>; /* millicelsius */
>  					type = "critical";
> @@ -88,59 +88,310 @@
>  				 * Exynos542x supports only 4 trip-points
>  				 * so for these polling mode is required.
>  				 * Start polling at temperature level of last
> -				 * interrupt-driven trip: cpu_alert2
> +				 * interrupt-driven trip: cpu0_alert2
>  				 */
> -				cpu_alert3: cpu-alert-3 {
> +				cpu0_alert3: cpu-alert-3 {
>  					temperature = <70000>; /* millicelsius */
>  					hysteresis = <10000>; /* millicelsius */
>  					type = "passive";
>  				};
> -				cpu_alert4: cpu-alert-4 {
> +				cpu0_alert4: cpu-alert-4 {
>  					temperature = <85000>; /* millicelsius */
>  					hysteresis = <10000>; /* millicelsius */
>  					type = "passive";
>  				};
> +			};
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu0_alert0>;
> +					cooling-device = <&fan0 0 1>;
> +				};
> +				map1 {
> +					trip = <&cpu0_alert1>;
> +					cooling-device = <&fan0 1 2>;
> +				};
> +				map2 {
> +					trip = <&cpu0_alert2>;
> +					cooling-device = <&fan0 2 3>;
> +				};
> +				/*
> +				 * When reaching cpu0_alert3, reduce CPU
> +				 * by 2 steps. On Exynos5422/5800 that would
> +				 * be: 1600 MHz and 1100 MHz.
> +				 */
> +				map3 {
> +					trip = <&cpu0_alert3>;
> +					cooling-device = <&cpu0 0 2>;
> +				};
> +				map4 {
> +					trip = <&cpu0_alert3>;
> +					cooling-device = <&cpu4 0 2>;
> +				};
> +
> +				/*
> +				 * When reaching cpu0_alert4, reduce CPU
> +				 * further, down to 600 MHz (12 steps for big,
> +				 * 7 steps for LITTLE).
> +				 */
> +				map5 {
> +					trip = <&cpu0_alert4>;
> +					cooling-device = <&cpu0 3 7>;
> +				};
> +				map6 {
> +					trip = <&cpu0_alert4>;
> +					cooling-device = <&cpu4 3 12>;
> +				};

That is a quite big code duplication with a lot of possible errors to do
(differs only by cpu number). We should do this in a smarter way.

How about extending the thermal driver for setting the trips also for
other thermal zones?


Best regards,
Krzysztof

> +			};
> +		};
> +		cpu1_thermal: cpu1-thermal {
> +			thermal-sensors = <&tmu_cpu1 0>;
> +			polling-delay-passive = <250>;
> +			polling-delay = <0>;
> +			trips {
> +				cpu1_alert0: cpu-alert-0 {
> +					temperature = <50000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu1_alert1: cpu-alert-1 {
> +					temperature = <60000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu1_alert2: cpu-alert-2 {
> +					temperature = <70000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu1_crit0: cpu-crit-0 {
> +					temperature = <120000>; /* millicelsius */
> +					hysteresis = <0>; /* millicelsius */
> +					type = "critical";
> +				};
> +				/*
> +				 * Exynos542x supports only 4 trip-points
> +				 * so for these polling mode is required.
> +				 * Start polling at temperature level of last
> +				 * interrupt-driven trip: cpu1_alert2
> +				 */
> +				cpu1_alert3: cpu-alert-3 {
> +					temperature = <70000>; /* millicelsius */
> +					hysteresis = <10000>; /* millicelsius */
> +					type = "passive";
> +				};
> +				cpu1_alert4: cpu-alert-4 {
> +					temperature = <85000>; /* millicelsius */
> +					hysteresis = <10000>; /* millicelsius */
> +					type = "passive";
> +				};
> +			};
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu1_alert0>;
> +					cooling-device = <&fan0 0 1>;
> +				};
> +				map1 {
> +					trip = <&cpu1_alert1>;
> +					cooling-device = <&fan0 1 2>;
> +				};
> +				map2 {
> +					trip = <&cpu1_alert2>;
> +					cooling-device = <&fan0 2 3>;
> +				};
> +				/*
> +				 * When reaching cpu1_alert3, reduce CPU
> +				 * by 2 steps. On Exynos5422/5800 that would
> +				 * be: 1600 MHz and 1100 MHz.
> +				 */
> +				map3 {
> +					trip = <&cpu1_alert3>;
> +					cooling-device = <&cpu0 0 2>;
> +				};
> +				map4 {
> +					trip = <&cpu1_alert3>;
> +					cooling-device = <&cpu4 0 2>;
> +				};
>  
> +				/*
> +				 * When reaching cpu1_alert4, reduce CPU
> +				 * further, down to 600 MHz (12 steps for big,
> +				 * 7 steps for LITTLE).
> +				 */
> +				map5 {
> +					trip = <&cpu1_alert4>;
> +					cooling-device = <&cpu0 3 7>;
> +				};
> +				map6 {
> +					trip = <&cpu1_alert4>;
> +					cooling-device = <&cpu4 3 12>;
> +				};
> +			};
> +		};
> +		cpu2_thermal: cpu2-thermal {
> +			thermal-sensors = <&tmu_cpu2 0>;
> +			polling-delay-passive = <250>;
> +			polling-delay = <0>;
> +			trips {
> +				cpu2_alert0: cpu-alert-0 {
> +					temperature = <50000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu2_alert1: cpu-alert-1 {
> +					temperature = <60000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu2_alert2: cpu-alert-2 {
> +					temperature = <70000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu2_crit0: cpu-crit-0 {
> +					temperature = <120000>; /* millicelsius */
> +					hysteresis = <0>; /* millicelsius */
> +					type = "critical";
> +				};
> +				/*
> +				 * Exynos542x supports only 4 trip-points
> +				 * so for these polling mode is required.
> +				 * Start polling at temperature level of last
> +				 * interrupt-driven trip: cpu2_alert2
> +				 */
> +				cpu2_alert3: cpu-alert-3 {
> +					temperature = <70000>; /* millicelsius */
> +					hysteresis = <10000>; /* millicelsius */
> +					type = "passive";
> +				};
> +				cpu2_alert4: cpu-alert-4 {
> +					temperature = <85000>; /* millicelsius */
> +					hysteresis = <10000>; /* millicelsius */
> +					type = "passive";
> +				};
> +			};
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu2_alert0>;
> +					cooling-device = <&fan0 0 1>;
> +				};
> +				map1 {
> +					trip = <&cpu2_alert1>;
> +					cooling-device = <&fan0 1 2>;
> +				};
> +				map2 {
> +					trip = <&cpu2_alert2>;
> +					cooling-device = <&fan0 2 3>;
> +				};
> +				/*
> +				 * When reaching cpu2_alert3, reduce CPU
> +				 * by 2 steps. On Exynos5422/5800 that would
> +				 * be: 1600 MHz and 1100 MHz.
> +				 */
> +				map3 {
> +					trip = <&cpu2_alert3>;
> +					cooling-device = <&cpu0 0 2>;
> +				};
> +				map4 {
> +					trip = <&cpu2_alert3>;
> +					cooling-device = <&cpu4 0 2>;
> +				};
> +
> +				/*
> +				 * When reaching cpu2_alert4, reduce CPU
> +				 * further, down to 600 MHz (12 steps for big,
> +				 * 7 steps for LITTLE).
> +				 */
> +				map5 {
> +					trip = <&cpu2_alert4>;
> +					cooling-device = <&cpu0 3 7>;
> +				};
> +				map6 {
> +					trip = <&cpu2_alert4>;
> +					cooling-device = <&cpu4 3 12>;
> +				};
> +			};
> +		};
> +		cpu3_thermal: cpu3-thermal {
> +			thermal-sensors = <&tmu_cpu3 0>;
> +			polling-delay-passive = <250>;
> +			polling-delay = <0>;
> +			trips {
> +				cpu3_alert0: cpu-alert-0 {
> +					temperature = <50000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu3_alert1: cpu-alert-1 {
> +					temperature = <60000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu3_alert2: cpu-alert-2 {
> +					temperature = <70000>; /* millicelsius */
> +					hysteresis = <5000>; /* millicelsius */
> +					type = "active";
> +				};
> +				cpu3_crit0: cpu-crit-0 {
> +					temperature = <120000>; /* millicelsius */
> +					hysteresis = <0>; /* millicelsius */
> +					type = "critical";
> +				};
> +				/*
> +				 * Exynos542x supports only 4 trip-points
> +				 * so for these polling mode is required.
> +				 * Start polling at temperature level of last
> +				 * interrupt-driven trip: cpu3_alert2
> +				 */
> +				cpu3_alert3: cpu-alert-3 {
> +					temperature = <70000>; /* millicelsius */
> +					hysteresis = <10000>; /* millicelsius */
> +					type = "passive";
> +				};
> +				cpu3_alert4: cpu-alert-4 {
> +					temperature = <85000>; /* millicelsius */
> +					hysteresis = <10000>; /* millicelsius */
> +					type = "passive";
> +				};
>  			};
>  			cooling-maps {
>  				map0 {
> -					trip = <&cpu_alert0>;
> +					trip = <&cpu3_alert0>;
>  					cooling-device = <&fan0 0 1>;
>  				};
>  				map1 {
> -					trip = <&cpu_alert1>;
> +					trip = <&cpu3_alert1>;
>  					cooling-device = <&fan0 1 2>;
>  				};
>  				map2 {
> -					trip = <&cpu_alert2>;
> +					trip = <&cpu3_alert2>;
>  					cooling-device = <&fan0 2 3>;
>  				};
>  				/*
> -				 * When reaching cpu_alert3, reduce CPU
> +				 * When reaching cpu3_alert3, reduce CPU
>  				 * by 2 steps. On Exynos5422/5800 that would
>  				 * be: 1600 MHz and 1100 MHz.
>  				 */
>  				map3 {
> -					trip = <&cpu_alert3>;
> +					trip = <&cpu3_alert3>;
>  					cooling-device = <&cpu0 0 2>;
>  				};
>  				map4 {
> -					trip = <&cpu_alert3>;
> +					trip = <&cpu3_alert3>;
>  					cooling-device = <&cpu4 0 2>;
>  				};
>  
>  				/*
> -				 * When reaching cpu_alert4, reduce CPU
> -				 * further, down to 600 MHz (11 steps for big,
> +				 * When reaching cpu3_alert4, reduce CPU
> +				 * further, down to 600 MHz (12 steps for big,
>  				 * 7 steps for LITTLE).
>  				 */
>  				map5 {
> -					trip = <&cpu_alert4>;
> +					trip = <&cpu3_alert4>;
>  					cooling-device = <&cpu0 3 7>;
>  				};
>  				map6 {
> -					trip = <&cpu_alert4>;
> -					cooling-device = <&cpu4 3 11>;
> +					trip = <&cpu3_alert4>;
> +					cooling-device = <&cpu4 3 12>;
>  				};
>  			};
>  		};
> -- 
> 2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Kozlowski Feb. 10, 2017, 3:46 p.m. UTC | #2
On Tue, Feb 07, 2017 at 11:28:30PM +0200, Krzysztof Kozlowski wrote:

(...)

> > +				/*
> > +				 * When reaching cpu0_alert4, reduce CPU
> > +				 * further, down to 600 MHz (12 steps for big,
> > +				 * 7 steps for LITTLE).
> > +				 */
> > +				map5 {
> > +					trip = <&cpu0_alert4>;
> > +					cooling-device = <&cpu0 3 7>;
> > +				};
> > +				map6 {
> > +					trip = <&cpu0_alert4>;
> > +					cooling-device = <&cpu4 3 12>;
> > +				};
> 
> That is a quite big code duplication with a lot of possible errors to do
> (differs only by cpu number). We should do this in a smarter way.
> 
> How about extending the thermal driver for setting the trips also for
> other thermal zones?
> 
> > +			};
> > +		};
> > +		cpu1_thermal: cpu1-thermal {
> > +			thermal-sensors = <&tmu_cpu1 0>;

I confirmed your findings (taskset -c 5 dd if=/dev/zero of=/dev/null)
however thermal zones indices do not match CPU.

busy CPU              | the most rising temp | TMU zone for this diff
=====================================================================
CPU4 (first A15)      | 52                   | thermal_zone0
CPU5                  | 70                   | thermal_zone3
CPU6                  | 74                   | thermal_zone2
CPU7                  | 66                   | thermal_zone1
CPU5-CPU7             | 97                   | thermal_zone3 (but 2 similar)

The sensors are not showing the same temperature - neither in idle nor
in busy cycle. Maybe they are not properly calibrated or located in
symmetric location?

However the difference between idle and busy states is more or less
similar between thermal zones - making one A15 busy heats:
1. associated thermal zone by 21-26 degrees of C,
2. other thermal zones by 13-18 degrees (zone 3 is weird here).

Thermal zone 4 is probably GPU.

Anyway the bindings for thermal zones are not nice in this case and they
require huge duplication of data... Maybe this could be done smarter
way...

Full data:
https://docs.google.com/spreadsheets/d/1XOpRCK0YjnxaZgjLcasRdKXaFFVh3EqyvrDi_YcmBGE/edit?usp=sharing


Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willy WOLFF Feb. 11, 2017, 6:29 p.m. UTC | #3
Hi Krzysztof,

Thanks for your interest.

On 10/02/2017 15:46, Krzysztof Kozlowski wrote:
> On Tue, Feb 07, 2017 at 11:28:30PM +0200, Krzysztof Kozlowski wrote:
> 
> (...)
> 
>>> +				/*
>>> +				 * When reaching cpu0_alert4, reduce CPU
>>> +				 * further, down to 600 MHz (12 steps for big,
>>> +				 * 7 steps for LITTLE).
>>> +				 */
>>> +				map5 {
>>> +					trip = <&cpu0_alert4>;
>>> +					cooling-device = <&cpu0 3 7>;
>>> +				};
>>> +				map6 {
>>> +					trip = <&cpu0_alert4>;
>>> +					cooling-device = <&cpu4 3 12>;
>>> +				};
>>
>> That is a quite big code duplication with a lot of possible errors to do
>> (differs only by cpu number). We should do this in a smarter way.
>>
>> How about extending the thermal driver for setting the trips also for
>> other thermal zones?
>>
>>> +			};
>>> +		};
>>> +		cpu1_thermal: cpu1-thermal {
>>> +			thermal-sensors = <&tmu_cpu1 0>;
> 
> I confirmed your findings (taskset -c 5 dd if=/dev/zero of=/dev/null)
> however thermal zones indices do not match CPU.
> 
> busy CPU              | the most rising temp | TMU zone for this diff
> =====================================================================
> CPU4 (first A15)      | 52                   | thermal_zone0
> CPU5                  | 70                   | thermal_zone3
> CPU6                  | 74                   | thermal_zone2
> CPU7                  | 66                   | thermal_zone1
> CPU5-CPU7             | 97                   | thermal_zone3 (but 2 similar)
> 
> The sensors are not showing the same temperature - neither in idle nor
> in busy cycle. Maybe they are not properly calibrated or located in
> symmetric location?
I think it's both. I have four xu3, two of them give wrong values,
like around 20degC when busy, where the idle cores are around 55decC.
But my two other boards give sound values. So, maybe the industrial
process behind it's not the best, or something wrong happen on the initialisation ...

Also, even if all cores are idling, it's not surprising that they don't
have same thermal values. Depending on the floorplan, you will never have
exact same homogeneous behaviour.


> 
> However the difference between idle and busy states is more or less
> similar between thermal zones - making one A15 busy heats:
> 1. associated thermal zone by 21-26 degrees of C,
> 2. other thermal zones by 13-18 degrees (zone 3 is weird here).
> 
> Thermal zone 4 is probably GPU.
Yes it is, the actual definition for the gpu is made by 
includes chain exynos5800.dtsi -> exynos5420.dtsi
And no cooling maps action are defined. I don't use the gpu at all for now, 
so I don't check its thermal behaviour.

> Anyway the bindings for thermal zones are not nice in this case and they
> require huge duplication of data... Maybe this could be done smarter
> way...
Indeed! I check the thermal framework source code, and it's a known problem:
https://git.kernel.org/cgit/linux/kernel/git/arm/arm-soc.git/tree/drivers/thermal/of-thermal.c?h=for-next#n461
So, if we modify the thermal framework for handling multiple sensors by one
thermal zone, what we can actually do in the dts is something like:
[...]
cpu_thermal: cpu-thermal {
			thermal-sensors = <&tmu_cpu0 0 &tmu_cpu1 0 &tmu_cpu2 0 &tmu_cpu3 0>;
			polling-delay-passive = <250>;
			polling-delay = <0>;
[...]
gpu_thermal: ...
[...]

I plan to change that, but I will need some advice.
I will start working on it next month. But feel free to do it correctly,
I'm still young on kernel hacking.


By the way, what is the purpose of the parameter
 thermal-sensors = <&tmu_cpu0 0>; in it's current definition?
I try to understand, but found no clue. 
By https://git.kernel.org/cgit/linux/kernel/git/arm/arm-soc.git/tree/Documentation/devicetree/bindings/thermal/thermal.txt?h=for-next#n318
the parameters are used to identify the accounting sensor from the sensors rail definition.
But in the current definition, #thermal-sensor-cells is set to 0,
by includes chain: exynos5800.dtsi -> exynos5420.dtsi -> exynos4412-tmu-sensor-conf.dtsi

So something is wrong here, or I don't understand exactly.
I see no difference by removing this paramter.


I have only a xu3 board. I don't have any other derivate machine with exynos chip.
Does all exynos big.LITTLE based has a thermal sensor per big core?

> 
> Full data:
> https://docs.google.com/spreadsheets/d/1XOpRCK0YjnxaZgjLcasRdKXaFFVh3EqyvrDi_YcmBGE/edit?usp=sharing
On your test, it's two different kernel, but executed on the same board?
Stupid question but want to be sure.

> 
> 
> Best regards,
> Krzysztof
> 

Best Regards,
Willy
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Kozlowski Feb. 11, 2017, 7:02 p.m. UTC | #4
On Sat, Feb 11, 2017 at 06:29:55PM +0000, Willy WOLFF wrote:
> > I confirmed your findings (taskset -c 5 dd if=/dev/zero of=/dev/null)
> > however thermal zones indices do not match CPU.
> > 
> > busy CPU              | the most rising temp | TMU zone for this diff
> > =====================================================================
> > CPU4 (first A15)      | 52                   | thermal_zone0
> > CPU5                  | 70                   | thermal_zone3
> > CPU6                  | 74                   | thermal_zone2
> > CPU7                  | 66                   | thermal_zone1
> > CPU5-CPU7             | 97                   | thermal_zone3 (but 2 similar)
> > 
> > The sensors are not showing the same temperature - neither in idle nor
> > in busy cycle. Maybe they are not properly calibrated or located in
> > symmetric location?
> I think it's both. I have four xu3, two of them give wrong values,
> like around 20degC when busy, where the idle cores are around 55decC.
> But my two other boards give sound values. So, maybe the industrial
> process behind it's not the best, or something wrong happen on the initialisation ...

I am debugging this now (and getting angry because my XU3 board is 2500
km away...) but it looks like issue in the current mainline thermal
driver. The Hardkernel's 3.10 kernel shows different values.

> 
> Also, even if all cores are idling, it's not surprising that they don't
> have same thermal values. Depending on the floorplan, you will never have
> exact same homogeneous behaviour.

Yes, okay, but thermal zone0 should not have ~28 degrees which is just
few above room temperature. :)

> > However the difference between idle and busy states is more or less
> > similar between thermal zones - making one A15 busy heats:
> > 1. associated thermal zone by 21-26 degrees of C,
> > 2. other thermal zones by 13-18 degrees (zone 3 is weird here).
> > 
> > Thermal zone 4 is probably GPU.
> Yes it is, the actual definition for the gpu is made by 
> includes chain exynos5800.dtsi -> exynos5420.dtsi
> And no cooling maps action are defined. I don't use the gpu at all for now, 
> so I don't check its thermal behaviour.
> 
> > Anyway the bindings for thermal zones are not nice in this case and they
> > require huge duplication of data... Maybe this could be done smarter
> > way...
> Indeed! I check the thermal framework source code, and it's a known problem:
> https://git.kernel.org/cgit/linux/kernel/git/arm/arm-soc.git/tree/drivers/thermal/of-thermal.c?h=for-next#n461
> So, if we modify the thermal framework for handling multiple sensors by one
> thermal zone, what we can actually do in the dts is something like:
> [...]
> cpu_thermal: cpu-thermal {
> 			thermal-sensors = <&tmu_cpu0 0 &tmu_cpu1 0 &tmu_cpu2 0 &tmu_cpu3 0>;
> 			polling-delay-passive = <250>;
> 			polling-delay = <0>;

This would make sense. It needs however fixing thermal zone0 weird
levels.

> [...]
> gpu_thermal: ...
> [...]
> 
> I plan to change that, but I will need some advice.
> I will start working on it next month. But feel free to do it correctly,
> I'm still young on kernel hacking.
> 
> 
> By the way, what is the purpose of the parameter
>  thermal-sensors = <&tmu_cpu0 0>; in it's current definition?
> I try to understand, but found no clue. 
> By https://git.kernel.org/cgit/linux/kernel/git/arm/arm-soc.git/tree/Documentation/devicetree/bindings/thermal/thermal.txt?h=for-next#n318
> the parameters are used to identify the accounting sensor from the sensors rail definition.
> But in the current definition, #thermal-sensor-cells is set to 0,
> by includes chain: exynos5800.dtsi -> exynos5420.dtsi -> exynos4412-tmu-sensor-conf.dtsi
> 
> So something is wrong here, or I don't understand exactly.
> I see no difference by removing this paramter.
>

It is the index of sensor within device providing it - tmu_cpu0. Some
device might provide many sensors so it would be <&tmu_cpu0 0>,
<&tmu_cpu0 1> etc. In our case we have only one sensor per thermal
device (TMU).

This is documented in the bindings you linked
(Documentation/devicetree/bindings/thermal/thermal.txt) in chapter "(b)
- IC with several internal sensors".

Solution interesting in Exynos case might be "(c) - Several sensors within one
single thermal zone".

> I have only a xu3 board. I don't have any other derivate machine with exynos chip.
> Does all exynos big.LITTLE based has a thermal sensor per big core?

Exynos4 has only one TMU. Exynos5410 has the same as Exynos5422. It all
depends on the chipset.

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
index 05b9afdd..92328c1 100644
--- a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
+++ b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
@@ -64,22 +64,22 @@ 
 			polling-delay-passive = <250>;
 			polling-delay = <0>;
 			trips {
-				cpu_alert0: cpu-alert-0 {
+				cpu0_alert0: cpu-alert-0 {
 					temperature = <50000>; /* millicelsius */
 					hysteresis = <5000>; /* millicelsius */
 					type = "active";
 				};
-				cpu_alert1: cpu-alert-1 {
+				cpu0_alert1: cpu-alert-1 {
 					temperature = <60000>; /* millicelsius */
 					hysteresis = <5000>; /* millicelsius */
 					type = "active";
 				};
-				cpu_alert2: cpu-alert-2 {
+				cpu0_alert2: cpu-alert-2 {
 					temperature = <70000>; /* millicelsius */
 					hysteresis = <5000>; /* millicelsius */
 					type = "active";
 				};
-				cpu_crit0: cpu-crit-0 {
+				cpu0_crit0: cpu-crit-0 {
 					temperature = <120000>; /* millicelsius */
 					hysteresis = <0>; /* millicelsius */
 					type = "critical";
@@ -88,59 +88,310 @@ 
 				 * Exynos542x supports only 4 trip-points
 				 * so for these polling mode is required.
 				 * Start polling at temperature level of last
-				 * interrupt-driven trip: cpu_alert2
+				 * interrupt-driven trip: cpu0_alert2
 				 */
-				cpu_alert3: cpu-alert-3 {
+				cpu0_alert3: cpu-alert-3 {
 					temperature = <70000>; /* millicelsius */
 					hysteresis = <10000>; /* millicelsius */
 					type = "passive";
 				};
-				cpu_alert4: cpu-alert-4 {
+				cpu0_alert4: cpu-alert-4 {
 					temperature = <85000>; /* millicelsius */
 					hysteresis = <10000>; /* millicelsius */
 					type = "passive";
 				};
+			};
+			cooling-maps {
+				map0 {
+					trip = <&cpu0_alert0>;
+					cooling-device = <&fan0 0 1>;
+				};
+				map1 {
+					trip = <&cpu0_alert1>;
+					cooling-device = <&fan0 1 2>;
+				};
+				map2 {
+					trip = <&cpu0_alert2>;
+					cooling-device = <&fan0 2 3>;
+				};
+				/*
+				 * When reaching cpu0_alert3, reduce CPU
+				 * by 2 steps. On Exynos5422/5800 that would
+				 * be: 1600 MHz and 1100 MHz.
+				 */
+				map3 {
+					trip = <&cpu0_alert3>;
+					cooling-device = <&cpu0 0 2>;
+				};
+				map4 {
+					trip = <&cpu0_alert3>;
+					cooling-device = <&cpu4 0 2>;
+				};
+
+				/*
+				 * When reaching cpu0_alert4, reduce CPU
+				 * further, down to 600 MHz (12 steps for big,
+				 * 7 steps for LITTLE).
+				 */
+				map5 {
+					trip = <&cpu0_alert4>;
+					cooling-device = <&cpu0 3 7>;
+				};
+				map6 {
+					trip = <&cpu0_alert4>;
+					cooling-device = <&cpu4 3 12>;
+				};
+			};
+		};
+		cpu1_thermal: cpu1-thermal {
+			thermal-sensors = <&tmu_cpu1 0>;
+			polling-delay-passive = <250>;
+			polling-delay = <0>;
+			trips {
+				cpu1_alert0: cpu-alert-0 {
+					temperature = <50000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu1_alert1: cpu-alert-1 {
+					temperature = <60000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu1_alert2: cpu-alert-2 {
+					temperature = <70000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu1_crit0: cpu-crit-0 {
+					temperature = <120000>; /* millicelsius */
+					hysteresis = <0>; /* millicelsius */
+					type = "critical";
+				};
+				/*
+				 * Exynos542x supports only 4 trip-points
+				 * so for these polling mode is required.
+				 * Start polling at temperature level of last
+				 * interrupt-driven trip: cpu1_alert2
+				 */
+				cpu1_alert3: cpu-alert-3 {
+					temperature = <70000>; /* millicelsius */
+					hysteresis = <10000>; /* millicelsius */
+					type = "passive";
+				};
+				cpu1_alert4: cpu-alert-4 {
+					temperature = <85000>; /* millicelsius */
+					hysteresis = <10000>; /* millicelsius */
+					type = "passive";
+				};
+			};
+			cooling-maps {
+				map0 {
+					trip = <&cpu1_alert0>;
+					cooling-device = <&fan0 0 1>;
+				};
+				map1 {
+					trip = <&cpu1_alert1>;
+					cooling-device = <&fan0 1 2>;
+				};
+				map2 {
+					trip = <&cpu1_alert2>;
+					cooling-device = <&fan0 2 3>;
+				};
+				/*
+				 * When reaching cpu1_alert3, reduce CPU
+				 * by 2 steps. On Exynos5422/5800 that would
+				 * be: 1600 MHz and 1100 MHz.
+				 */
+				map3 {
+					trip = <&cpu1_alert3>;
+					cooling-device = <&cpu0 0 2>;
+				};
+				map4 {
+					trip = <&cpu1_alert3>;
+					cooling-device = <&cpu4 0 2>;
+				};
 
+				/*
+				 * When reaching cpu1_alert4, reduce CPU
+				 * further, down to 600 MHz (12 steps for big,
+				 * 7 steps for LITTLE).
+				 */
+				map5 {
+					trip = <&cpu1_alert4>;
+					cooling-device = <&cpu0 3 7>;
+				};
+				map6 {
+					trip = <&cpu1_alert4>;
+					cooling-device = <&cpu4 3 12>;
+				};
+			};
+		};
+		cpu2_thermal: cpu2-thermal {
+			thermal-sensors = <&tmu_cpu2 0>;
+			polling-delay-passive = <250>;
+			polling-delay = <0>;
+			trips {
+				cpu2_alert0: cpu-alert-0 {
+					temperature = <50000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu2_alert1: cpu-alert-1 {
+					temperature = <60000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu2_alert2: cpu-alert-2 {
+					temperature = <70000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu2_crit0: cpu-crit-0 {
+					temperature = <120000>; /* millicelsius */
+					hysteresis = <0>; /* millicelsius */
+					type = "critical";
+				};
+				/*
+				 * Exynos542x supports only 4 trip-points
+				 * so for these polling mode is required.
+				 * Start polling at temperature level of last
+				 * interrupt-driven trip: cpu2_alert2
+				 */
+				cpu2_alert3: cpu-alert-3 {
+					temperature = <70000>; /* millicelsius */
+					hysteresis = <10000>; /* millicelsius */
+					type = "passive";
+				};
+				cpu2_alert4: cpu-alert-4 {
+					temperature = <85000>; /* millicelsius */
+					hysteresis = <10000>; /* millicelsius */
+					type = "passive";
+				};
+			};
+			cooling-maps {
+				map0 {
+					trip = <&cpu2_alert0>;
+					cooling-device = <&fan0 0 1>;
+				};
+				map1 {
+					trip = <&cpu2_alert1>;
+					cooling-device = <&fan0 1 2>;
+				};
+				map2 {
+					trip = <&cpu2_alert2>;
+					cooling-device = <&fan0 2 3>;
+				};
+				/*
+				 * When reaching cpu2_alert3, reduce CPU
+				 * by 2 steps. On Exynos5422/5800 that would
+				 * be: 1600 MHz and 1100 MHz.
+				 */
+				map3 {
+					trip = <&cpu2_alert3>;
+					cooling-device = <&cpu0 0 2>;
+				};
+				map4 {
+					trip = <&cpu2_alert3>;
+					cooling-device = <&cpu4 0 2>;
+				};
+
+				/*
+				 * When reaching cpu2_alert4, reduce CPU
+				 * further, down to 600 MHz (12 steps for big,
+				 * 7 steps for LITTLE).
+				 */
+				map5 {
+					trip = <&cpu2_alert4>;
+					cooling-device = <&cpu0 3 7>;
+				};
+				map6 {
+					trip = <&cpu2_alert4>;
+					cooling-device = <&cpu4 3 12>;
+				};
+			};
+		};
+		cpu3_thermal: cpu3-thermal {
+			thermal-sensors = <&tmu_cpu3 0>;
+			polling-delay-passive = <250>;
+			polling-delay = <0>;
+			trips {
+				cpu3_alert0: cpu-alert-0 {
+					temperature = <50000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu3_alert1: cpu-alert-1 {
+					temperature = <60000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu3_alert2: cpu-alert-2 {
+					temperature = <70000>; /* millicelsius */
+					hysteresis = <5000>; /* millicelsius */
+					type = "active";
+				};
+				cpu3_crit0: cpu-crit-0 {
+					temperature = <120000>; /* millicelsius */
+					hysteresis = <0>; /* millicelsius */
+					type = "critical";
+				};
+				/*
+				 * Exynos542x supports only 4 trip-points
+				 * so for these polling mode is required.
+				 * Start polling at temperature level of last
+				 * interrupt-driven trip: cpu3_alert2
+				 */
+				cpu3_alert3: cpu-alert-3 {
+					temperature = <70000>; /* millicelsius */
+					hysteresis = <10000>; /* millicelsius */
+					type = "passive";
+				};
+				cpu3_alert4: cpu-alert-4 {
+					temperature = <85000>; /* millicelsius */
+					hysteresis = <10000>; /* millicelsius */
+					type = "passive";
+				};
 			};
 			cooling-maps {
 				map0 {
-					trip = <&cpu_alert0>;
+					trip = <&cpu3_alert0>;
 					cooling-device = <&fan0 0 1>;
 				};
 				map1 {
-					trip = <&cpu_alert1>;
+					trip = <&cpu3_alert1>;
 					cooling-device = <&fan0 1 2>;
 				};
 				map2 {
-					trip = <&cpu_alert2>;
+					trip = <&cpu3_alert2>;
 					cooling-device = <&fan0 2 3>;
 				};
 				/*
-				 * When reaching cpu_alert3, reduce CPU
+				 * When reaching cpu3_alert3, reduce CPU
 				 * by 2 steps. On Exynos5422/5800 that would
 				 * be: 1600 MHz and 1100 MHz.
 				 */
 				map3 {
-					trip = <&cpu_alert3>;
+					trip = <&cpu3_alert3>;
 					cooling-device = <&cpu0 0 2>;
 				};
 				map4 {
-					trip = <&cpu_alert3>;
+					trip = <&cpu3_alert3>;
 					cooling-device = <&cpu4 0 2>;
 				};
 
 				/*
-				 * When reaching cpu_alert4, reduce CPU
-				 * further, down to 600 MHz (11 steps for big,
+				 * When reaching cpu3_alert4, reduce CPU
+				 * further, down to 600 MHz (12 steps for big,
 				 * 7 steps for LITTLE).
 				 */
 				map5 {
-					trip = <&cpu_alert4>;
+					trip = <&cpu3_alert4>;
 					cooling-device = <&cpu0 3 7>;
 				};
 				map6 {
-					trip = <&cpu_alert4>;
-					cooling-device = <&cpu4 3 11>;
+					trip = <&cpu3_alert4>;
+					cooling-device = <&cpu4 3 12>;
 				};
 			};
 		};