[RFC] ARM: dts: omap36xx: Enable thermal throttling
diff mbox series

Message ID 20190912183037.18449-1-aford173@gmail.com
State New
Headers show
Series
  • [RFC] ARM: dts: omap36xx: Enable thermal throttling
Related show

Commit Message

Adam Ford Sept. 12, 2019, 6:30 p.m. UTC
The thermal sensor in the omap3 family isn't accurate, but it's
better than nothing.  The various OPP's enabled for the omap3630
support up to OPP1G, however the datasheet for the DM3730 states
that OPP130 and OPP1G are not available above TJ of 90C.

This patch configures the thermal throttling to limit the
operating points of the omap3630 to Only OPP50 and OPP100 if
the thermal sensor reads a value above 90C.

Signed-off-by: Adam Ford <aford173@gmail.com>

Comments

Daniel Lezcano Sept. 12, 2019, 9:12 p.m. UTC | #1
On 12/09/2019 20:30, Adam Ford wrote:
> The thermal sensor in the omap3 family isn't accurate, but it's
> better than nothing.  The various OPP's enabled for the omap3630
> support up to OPP1G, however the datasheet for the DM3730 states
> that OPP130 and OPP1G are not available above TJ of 90C.
> 
> This patch configures the thermal throttling to limit the
> operating points of the omap3630 to Only OPP50 and OPP100 if
> the thermal sensor reads a value above 90C.

Out of curiosity, what are the OPP50 and OPP100 mentioned above? and
what does mean "OPP130 and OPP1G are not available above TJ of 90C"?

I don't see the connection between these OPP names and the definition in
the DT.

> Signed-off-by: Adam Ford <aford173@gmail.com>
> 
> diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
> index 4bb4f534afe2..58b9d347019f 100644
> --- a/arch/arm/boot/dts/omap36xx.dtsi
> +++ b/arch/arm/boot/dts/omap36xx.dtsi
> @@ -25,6 +25,7 @@
>  
>  			vbb-supply = <&abb_mpu_iva>;
>  			clock-latency = <300000>; /* From omap-cpufreq driver */
> +			#cooling-cells = <2>;
>  		};
>  	};
>  
> @@ -195,6 +196,31 @@
>  	};
>  };
>  
> +&cpu_thermal {
> +	cpu_trips: trips {
> +		/* OPP130 and OPP1G are not available above TJ of 90C. */
> +		cpu_alert0: cpu_alert {
> +			temperature = <90000>; /* millicelsius */
> +			hysteresis = <2000>; /* millicelsius */
> +			type = "passive";
> +		};
> +
> +		cpu_crit: cpu_crit {
> +			temperature = <125000>; /* millicelsius */
> +			hysteresis = <2000>; /* millicelsius */
> +			type = "critical";
> +		};
> +	};
> +
> +	cpu_cooling_maps: cooling-maps {
> +		map0 {
> +			trip = <&cpu_alert0>;
> +			/* Only allow OPP50 and OPP100 */
> +			cooling-device = <&cpu 0 1>;
> +		};
> +	};
> +};
> +
>  /* OMAP3630 needs dss_96m_fck for VENC */
>  &venc {
>  	clocks = <&dss_tv_fck>, <&dss_96m_fck>;
>
Adam Ford Sept. 12, 2019, 9:19 p.m. UTC | #2
On Thu, Sep 12, 2019 at 4:12 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 12/09/2019 20:30, Adam Ford wrote:
> > The thermal sensor in the omap3 family isn't accurate, but it's
> > better than nothing.  The various OPP's enabled for the omap3630
> > support up to OPP1G, however the datasheet for the DM3730 states
> > that OPP130 and OPP1G are not available above TJ of 90C.
> >
> > This patch configures the thermal throttling to limit the
> > operating points of the omap3630 to Only OPP50 and OPP100 if
> > the thermal sensor reads a value above 90C.
>
> Out of curiosity, what are the OPP50 and OPP100 mentioned above? and
> what does mean "OPP130 and OPP1G are not available above TJ of 90C"?
>
OPP130 is the 800 MHz and OPP1G is 1GHz operating point.
The 90C is the max junction temperature.  When the temperature exceeds
90C, the processor is not designed to operate at 800+ MHz.  The
statement itself is a direct quote from the public datasheet for the
dm3730, Table 4-19.

The datasheet is: http://www.ti.com/lit/ds/symlink/dm3730.pdf

The operating points were updated in [1], but they haven't yet been
fully applied yet, but during the discussion, the question came about
regarding how to limit the speed at high temp, so that's why this
patch was done.

[1] - https://patchwork.kernel.org/patch/11141643/


adam

> I don't see the connection between these OPP names and the definition in
> the DT.
>
> > Signed-off-by: Adam Ford <aford173@gmail.com>
> >
> > diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
> > index 4bb4f534afe2..58b9d347019f 100644
> > --- a/arch/arm/boot/dts/omap36xx.dtsi
> > +++ b/arch/arm/boot/dts/omap36xx.dtsi
> > @@ -25,6 +25,7 @@
> >
> >                       vbb-supply = <&abb_mpu_iva>;
> >                       clock-latency = <300000>; /* From omap-cpufreq driver */
> > +                     #cooling-cells = <2>;
> >               };
> >       };
> >
> > @@ -195,6 +196,31 @@
> >       };
> >  };
> >
> > +&cpu_thermal {
> > +     cpu_trips: trips {
> > +             /* OPP130 and OPP1G are not available above TJ of 90C. */
> > +             cpu_alert0: cpu_alert {
> > +                     temperature = <90000>; /* millicelsius */
> > +                     hysteresis = <2000>; /* millicelsius */
> > +                     type = "passive";
> > +             };
> > +
> > +             cpu_crit: cpu_crit {
> > +                     temperature = <125000>; /* millicelsius */
> > +                     hysteresis = <2000>; /* millicelsius */
> > +                     type = "critical";
> > +             };
> > +     };
> > +
> > +     cpu_cooling_maps: cooling-maps {
> > +             map0 {
> > +                     trip = <&cpu_alert0>;
> > +                     /* Only allow OPP50 and OPP100 */
> > +                     cooling-device = <&cpu 0 1>;
> > +             };
> > +     };
> > +};
> > +
> >  /* OMAP3630 needs dss_96m_fck for VENC */
> >  &venc {
> >       clocks = <&dss_tv_fck>, <&dss_96m_fck>;
> >
>
>
> --
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
Daniel Lezcano Sept. 12, 2019, 10:33 p.m. UTC | #3
Hi Adam,

On 12/09/2019 23:19, Adam Ford wrote:
> On Thu, Sep 12, 2019 at 4:12 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 12/09/2019 20:30, Adam Ford wrote:
>>> The thermal sensor in the omap3 family isn't accurate, but it's
>>> better than nothing.  The various OPP's enabled for the omap3630
>>> support up to OPP1G, however the datasheet for the DM3730 states
>>> that OPP130 and OPP1G are not available above TJ of 90C.
>>>
>>> This patch configures the thermal throttling to limit the
>>> operating points of the omap3630 to Only OPP50 and OPP100 if
>>> the thermal sensor reads a value above 90C.

Oh, that's a very interesting use case.

AFAICT the thermal framework is not designed to deal with this
situation. I agree this setup may work (even if I'm not convinced about
the stability of the whole).

May be Viresh can help for the cpufreq side?

>> Out of curiosity, what are the OPP50 and OPP100 mentioned above? and
>> what does mean "OPP130 and OPP1G are not available above TJ of 90C"?
>>
> OPP130 is the 800 MHz and OPP1G is 1GHz operating point.
> The 90C is the max junction temperature.  When the temperature exceeds
> 90C, the processor is not designed to operate at 800+ MHz.  The
> statement itself is a direct quote from the public datasheet for the
> dm3730, Table 4-19.

> The datasheet is: http://www.ti.com/lit/ds/symlink/dm3730.pdf

It is ambiguous how it is stated:

"OPP130 and OPP1G are not available above TJ of 90C"

that can be interpreted the OPP is disabled by the hardware, no?

> The operating points were updated in [1], but they haven't yet been
> fully applied yet, but during the discussion, the question came about
> regarding how to limit the speed at high temp, so that's why this
> patch was done.
> 
> [1] - https://patchwork.kernel.org/patch/11141643/

I see, you switched to opp-v2.

Thanks for the detailed answer.



>> I don't see the connection between these OPP names and the definition in
>> the DT.
>>
>>> Signed-off-by: Adam Ford <aford173@gmail.com>
>>>
>>> diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
>>> index 4bb4f534afe2..58b9d347019f 100644
>>> --- a/arch/arm/boot/dts/omap36xx.dtsi
>>> +++ b/arch/arm/boot/dts/omap36xx.dtsi
>>> @@ -25,6 +25,7 @@
>>>
>>>                       vbb-supply = <&abb_mpu_iva>;
>>>                       clock-latency = <300000>; /* From omap-cpufreq driver */
>>> +                     #cooling-cells = <2>;
>>>               };
>>>       };
>>>
>>> @@ -195,6 +196,31 @@
>>>       };
>>>  };
>>>
>>> +&cpu_thermal {
>>> +     cpu_trips: trips {
>>> +             /* OPP130 and OPP1G are not available above TJ of 90C. */
>>> +             cpu_alert0: cpu_alert {
>>> +                     temperature = <90000>; /* millicelsius */
>>> +                     hysteresis = <2000>; /* millicelsius */
>>> +                     type = "passive";
>>> +             };
>>> +
>>> +             cpu_crit: cpu_crit {
>>> +                     temperature = <125000>; /* millicelsius */
>>> +                     hysteresis = <2000>; /* millicelsius */
>>> +                     type = "critical";
>>> +             };
>>> +     };
>>> +
>>> +     cpu_cooling_maps: cooling-maps {
>>> +             map0 {
>>> +                     trip = <&cpu_alert0>;
>>> +                     /* Only allow OPP50 and OPP100 */
>>> +                     cooling-device = <&cpu 0 1>;
>>> +             };
>>> +     };
>>> +};
>>> +
>>>  /* OMAP3630 needs dss_96m_fck for VENC */
>>>  &venc {
>>>       clocks = <&dss_tv_fck>, <&dss_96m_fck>;
>>>
>>
>>
>> --
>>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>
H. Nikolaus Schaller Sept. 13, 2019, 6:55 a.m. UTC | #4
Hi Adam,

> Am 12.09.2019 um 20:30 schrieb Adam Ford <aford173@gmail.com>:
> 
> The thermal sensor in the omap3 family isn't accurate, but it's
> better than nothing.  The various OPP's enabled for the omap3630
> support up to OPP1G, however the datasheet for the DM3730 states
> that OPP130 and OPP1G are not available above TJ of 90C.

We may have to add similar things for omap34xx as well. See
data sheet 3.3 Recommended Operating Conditions

But when reading them they do not limit temperature but
number of operation hours of each OPP depending on temperature...
That is clearly beyond what a kernel can do (we would have to
have access to some NVRAM in the kernel counting hours).

> 
> This patch configures the thermal throttling to limit the
> operating points of the omap3630 to Only OPP50 and OPP100 if

s/Only/only/

> the thermal sensor reads a value above 90C.
> 
> Signed-off-by: Adam Ford <aford173@gmail.com>
> 
> diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
> index 4bb4f534afe2..58b9d347019f 100644
> --- a/arch/arm/boot/dts/omap36xx.dtsi
> +++ b/arch/arm/boot/dts/omap36xx.dtsi
> @@ -25,6 +25,7 @@
> 
> 			vbb-supply = <&abb_mpu_iva>;
> 			clock-latency = <300000>; /* From omap-cpufreq driver */
> +			#cooling-cells = <2>;
> 		};
> 	};
> 
> @@ -195,6 +196,31 @@
> 	};
> };
> 
> +&cpu_thermal {
> +	cpu_trips: trips {

Yes, that is comparable to what I have seen in omap5 DT where I know
that thermal throttling works.

> +		/* OPP130 and OPP1G are not available above TJ of 90C. */
> +		cpu_alert0: cpu_alert {
> +			temperature = <90000>; /* millicelsius */
> +			hysteresis = <2000>; /* millicelsius */
> +			type = "passive";
> +		};
> +
> +		cpu_crit: cpu_crit {
> +			temperature = <125000>; /* millicelsius */

Shouldn't this be 105°C for all omap3 chips (industrial temperature range)?

> +			hysteresis = <2000>; /* millicelsius */
> +			type = "critical";
> +		};
> +	};
> +
> +	cpu_cooling_maps: cooling-maps {
> +		map0 {
> +			trip = <&cpu_alert0>;
> +			/* Only allow OPP50 and OPP100 */
> +			cooling-device = <&cpu 0 1>;

omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
understand their meaning (and how it relates to the opp list).

> +		};
> +	};

Seems to make sense when comparing to omap4-cpu-thermal.dtsi...

Maybe we can add the trip points to omap3-cpu-thermal.dtsi
because they seem to be the same for all omap3 variants and
just have a SoC variant specific cooling map for omap36xx.dtsi.

> +};
> +
> /* OMAP3630 needs dss_96m_fck for VENC */
> &venc {
> 	clocks = <&dss_tv_fck>, <&dss_96m_fck>;
> -- 
> 2.17.1
> 

The question is how we can test that. Heating up the omap36xx to 90°C
or even 105°C isn't that easy like with omap5...

Maybe we can modify the millicelsius values for testing purposes to
something in reachable range, e.g. 60°C and 70°C and watch what happens?

BR,
Nikolaus
Adam Ford Sept. 13, 2019, 11:07 a.m. UTC | #5
On Fri, Sep 13, 2019 at 1:56 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>
> Hi Adam,
>
> > Am 12.09.2019 um 20:30 schrieb Adam Ford <aford173@gmail.com>:
> >
> > The thermal sensor in the omap3 family isn't accurate, but it's
> > better than nothing.  The various OPP's enabled for the omap3630
> > support up to OPP1G, however the datasheet for the DM3730 states
> > that OPP130 and OPP1G are not available above TJ of 90C.
>
> We may have to add similar things for omap34xx as well. See
> data sheet 3.3 Recommended Operating Conditions
>
> But when reading them they do not limit temperature but
> number of operation hours of each OPP depending on temperature...
> That is clearly beyond what a kernel can do (we would have to
> have access to some NVRAM in the kernel counting hours).
>
> >
> > This patch configures the thermal throttling to limit the
> > operating points of the omap3630 to Only OPP50 and OPP100 if
>
> s/Only/only/

I will fix when I do V2
>
> > the thermal sensor reads a value above 90C.
> >
> > Signed-off-by: Adam Ford <aford173@gmail.com>
> >
> > diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
> > index 4bb4f534afe2..58b9d347019f 100644
> > --- a/arch/arm/boot/dts/omap36xx.dtsi
> > +++ b/arch/arm/boot/dts/omap36xx.dtsi
> > @@ -25,6 +25,7 @@
> >
> >                       vbb-supply = <&abb_mpu_iva>;
> >                       clock-latency = <300000>; /* From omap-cpufreq driver */
> > +                     #cooling-cells = <2>;
> >               };
> >       };
> >
> > @@ -195,6 +196,31 @@
> >       };
> > };
> >
> > +&cpu_thermal {
> > +     cpu_trips: trips {
>
> Yes, that is comparable to what I have seen in omap5 DT where I know
> that thermal throttling works.
>
> > +             /* OPP130 and OPP1G are not available above TJ of 90C. */
> > +             cpu_alert0: cpu_alert {
> > +                     temperature = <90000>; /* millicelsius */
> > +                     hysteresis = <2000>; /* millicelsius */
> > +                     type = "passive";
> > +             };
> > +
> > +             cpu_crit: cpu_crit {
> > +                     temperature = <125000>; /* millicelsius */
>
> Shouldn't this be 105°C for all omap3 chips (industrial temperature range)?

You are correct.  I forgot to change this when I did my copy-paste.
>
> > +                     hysteresis = <2000>; /* millicelsius */
> > +                     type = "critical";
> > +             };
> > +     };
> > +
> > +     cpu_cooling_maps: cooling-maps {
> > +             map0 {
> > +                     trip = <&cpu_alert0>;
> > +                     /* Only allow OPP50 and OPP100 */
> > +                     cooling-device = <&cpu 0 1>;
>
> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> understand their meaning (and how it relates to the opp list).

I read through the documentation, but it wasn't completely clear to
me. AFAICT, the numbers after &cpu represent the min and max index in
the OPP table when the condition is hit.
>
> > +             };
> > +     };
>
> Seems to make sense when comparing to omap4-cpu-thermal.dtsi...
>
> Maybe we can add the trip points to omap3-cpu-thermal.dtsi
> because they seem to be the same for all omap3 variants and
> just have a SoC variant specific cooling map for omap36xx.dtsi.

The OPP's for OMAP3530 show that OPP5 and OPP6 are capable of
operating at 105C.  AM3517 is a little different also, so I didn't
want to make a generic omap3 throttling table.  Since my goal was to
try to remove the need for the turbo option from the newly supported
1GHz omap3630/3730, I was hoping to get this pushed first.  From
there, we can tweak the 34xx.dtsi and 3517.dtsi for their respective
thermal information.

>
> > +};
> > +
> > /* OMAP3630 needs dss_96m_fck for VENC */
> > &venc {
> >       clocks = <&dss_tv_fck>, <&dss_96m_fck>;
> > --
> > 2.17.1
> >
>
> The question is how we can test that. Heating up the omap36xx to 90°C
> or even 105°C isn't that easy like with omap5...
>
> Maybe we can modify the millicelsius values for testing purposes to
> something in reachable range, e.g. 60°C and 70°C and watch what happens?

I have access to a thermal chamber at work, but the guy who knows how
to use it is out for the rest of the week.  My plan was do as you
suggested and change the milicelsius values, but I wanted to get some
buy-in from TI people and/or Tony.  This also means enabling the omap3
thermal stuff which clearly throws a message that it's inaccurate.  I
don't know how much it's inaccurate, so we may have to make the 90C
value lower to compensate.

adam
>
> BR,
> Nikolaus
>
>
>
Adam Ford Sept. 13, 2019, 1:28 p.m. UTC | #6
On Fri, Sep 13, 2019 at 6:07 AM Adam Ford <aford173@gmail.com> wrote:
>
> On Fri, Sep 13, 2019 at 1:56 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >
> > Hi Adam,
> >
> > > Am 12.09.2019 um 20:30 schrieb Adam Ford <aford173@gmail.com>:
> > >
> > > The thermal sensor in the omap3 family isn't accurate, but it's
> > > better than nothing.  The various OPP's enabled for the omap3630
> > > support up to OPP1G, however the datasheet for the DM3730 states
> > > that OPP130 and OPP1G are not available above TJ of 90C.
> >
> > We may have to add similar things for omap34xx as well. See
> > data sheet 3.3 Recommended Operating Conditions
> >
> > But when reading them they do not limit temperature but
> > number of operation hours of each OPP depending on temperature...
> > That is clearly beyond what a kernel can do (we would have to
> > have access to some NVRAM in the kernel counting hours).
> >
> > >
> > > This patch configures the thermal throttling to limit the
> > > operating points of the omap3630 to Only OPP50 and OPP100 if
> >
> > s/Only/only/
>
> I will fix when I do V2
> >
> > > the thermal sensor reads a value above 90C.
> > >
> > > Signed-off-by: Adam Ford <aford173@gmail.com>
> > >
> > > diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
> > > index 4bb4f534afe2..58b9d347019f 100644
> > > --- a/arch/arm/boot/dts/omap36xx.dtsi
> > > +++ b/arch/arm/boot/dts/omap36xx.dtsi
> > > @@ -25,6 +25,7 @@
> > >
> > >                       vbb-supply = <&abb_mpu_iva>;
> > >                       clock-latency = <300000>; /* From omap-cpufreq driver */
> > > +                     #cooling-cells = <2>;
> > >               };
> > >       };
> > >
> > > @@ -195,6 +196,31 @@
> > >       };
> > > };
> > >
> > > +&cpu_thermal {
> > > +     cpu_trips: trips {
> >
> > Yes, that is comparable to what I have seen in omap5 DT where I know
> > that thermal throttling works.
> >
> > > +             /* OPP130 and OPP1G are not available above TJ of 90C. */
> > > +             cpu_alert0: cpu_alert {
> > > +                     temperature = <90000>; /* millicelsius */
> > > +                     hysteresis = <2000>; /* millicelsius */
> > > +                     type = "passive";
> > > +             };
> > > +
> > > +             cpu_crit: cpu_crit {
> > > +                     temperature = <125000>; /* millicelsius */
> >
> > Shouldn't this be 105°C for all omap3 chips (industrial temperature range)?
>
> You are correct.  I forgot to change this when I did my copy-paste.
> >
> > > +                     hysteresis = <2000>; /* millicelsius */
> > > +                     type = "critical";
> > > +             };
> > > +     };
> > > +
> > > +     cpu_cooling_maps: cooling-maps {
> > > +             map0 {
> > > +                     trip = <&cpu_alert0>;
> > > +                     /* Only allow OPP50 and OPP100 */
> > > +                     cooling-device = <&cpu 0 1>;
> >
> > omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> > understand their meaning (and how it relates to the opp list).
>
> I read through the documentation, but it wasn't completely clear to
> me. AFAICT, the numbers after &cpu represent the min and max index in
> the OPP table when the condition is hit.
> >
> > > +             };
> > > +     };
> >
> > Seems to make sense when comparing to omap4-cpu-thermal.dtsi...
> >
> > Maybe we can add the trip points to omap3-cpu-thermal.dtsi
> > because they seem to be the same for all omap3 variants and
> > just have a SoC variant specific cooling map for omap36xx.dtsi.
>
> The OPP's for OMAP3530 show that OPP5 and OPP6 are capable of
> operating at 105C.  AM3517 is a little different also, so I didn't
> want to make a generic omap3 throttling table.  Since my goal was to
> try to remove the need for the turbo option from the newly supported
> 1GHz omap3630/3730, I was hoping to get this pushed first.  From
> there, we can tweak the 34xx.dtsi and 3517.dtsi for their respective
> thermal information.
>
> >
> > > +};
> > > +
> > > /* OMAP3630 needs dss_96m_fck for VENC */
> > > &venc {
> > >       clocks = <&dss_tv_fck>, <&dss_96m_fck>;
> > > --
> > > 2.17.1
> > >
> >
> > The question is how we can test that. Heating up the omap36xx to 90°C
> > or even 105°C isn't that easy like with omap5...
> >
> > Maybe we can modify the millicelsius values for testing purposes to
> > something in reachable range, e.g. 60°C and 70°C and watch what happens?
>
> I have access to a thermal chamber at work, but the guy who knows how
> to use it is out for the rest of the week.  My plan was do as you
> suggested and change the milicelsius values, but I wanted to get some
> buy-in from TI people and/or Tony.  This also means enabling the omap3
> thermal stuff which clearly throws a message that it's inaccurate.  I
> don't know how much it's inaccurate, so we may have to make the 90C
> value lower to compensate.

I set the alert to 60C then booted the system.  It initially read
58.6C, then I ran a benchmark to push the processor over 60C.
Unfortunately, it didn't appear to throttle like I expected.  I was
expecting it to only make 300 amd 600 MHz available.

cat /sys/devices/virtual/thermal/thermal_zone0/temp
58500

whetstone 200000
Loops: 200000, Iterations: 1, Duration: 31 sec.
C Converted Double Precision Whetstones: 645.2 MIPS

cat /sys/devices/virtual/thermal/thermal_zone0/temp
62000

cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
300000 600000 800000

I am going to investigate how other processors do this.  I may have
the cpu reference wrong.

adam
>
> adam
> >
> > BR,
> > Nikolaus
> >
> >
> >
H. Nikolaus Schaller Sept. 13, 2019, 1:32 p.m. UTC | #7
Hi Adam,

> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:

>>> +     cpu_cooling_maps: cooling-maps {
>>> +             map0 {
>>> +                     trip = <&cpu_alert0>;
>>> +                     /* Only allow OPP50 and OPP100 */
>>> +                     cooling-device = <&cpu 0 1>;
>> 
>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
>> understand their meaning (and how it relates to the opp list).
> 
> I read through the documentation, but it wasn't completely clear to
> me. AFAICT, the numbers after &cpu represent the min and max index in
> the OPP table when the condition is hit.

Ok. It seems to use "cooling state" for those and the first is minimum
and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
no limits.

Since here we use the &cpu node it is likely that the "cooling state"
is the same as the OPP index currently in use.

I have looked through the .dts which use cpu_crit and the picture is
not unique...

omap4		seems to only define it
am57xx		has two different grade dtsi files
dra7		overwrites critical temperature value
am57xx-beagle	defines a gpio to control a fan

The fan is only a map0 for the board_alert0 starting to do something at 40°C
and above but there is nothing for the "critical" point.
But throttling is working on omap5...

Could there be some automatic handling by the govenors for the "critical"
trip points? So that we do not need to define any cooling-maps for "critical"?

Even for the exynos54xx where the crical trip point is only defined.

>> 
>>> +             };
>>> +     };
>> 
>> Seems to make sense when comparing to omap4-cpu-thermal.dtsi...
>> 
>> Maybe we can add the trip points to omap3-cpu-thermal.dtsi
>> because they seem to be the same for all omap3 variants and
>> just have a SoC variant specific cooling map for omap36xx.dtsi.
> 
> The OPP's for OMAP3530 show that OPP5 and OPP6 are capable of
> operating at 105C.  AM3517 is a little different also, so I didn't
> want to make a generic omap3 throttling table.  Since my goal was to
> try to remove the need for the turbo option from the newly supported
> 1GHz omap3630/3730, I was hoping to get this pushed first.  From
> there, we can tweak the 34xx.dtsi and 3517.dtsi for their respective
> thermal information.

My observation is that all omap3 have
commercial range	0°C .. 90°C
extended		-40°C .. 105°C

So there is no difference in defining these as trip points and
my proposal would be to have these in omap3.dtsi.

Only disabling OPPs differs. Which would make only the mapping
go to omap36xx.

> 
>> 
>>> +};
>>> +
>>> /* OMAP3630 needs dss_96m_fck for VENC */
>>> &venc {
>>>      clocks = <&dss_tv_fck>, <&dss_96m_fck>;
>>> --
>>> 2.17.1
>>> 
>> 
>> The question is how we can test that. Heating up the omap36xx to 90°C
>> or even 105°C isn't that easy like with omap5...
>> 
>> Maybe we can modify the millicelsius values for testing purposes to
>> something in reachable range, e.g. 60°C and 70°C and watch what happens?
> 
> I have access to a thermal chamber at work, but the guy who knows how
> to use it is out for the rest of the week.  My plan was do as you
> suggested and change the milicelsius values, but I wanted to get some
> buy-in from TI people and/or Tony.  This also means enabling the omap3
> thermal stuff which clearly throws a message that it's inaccurate.

Basically we need two different types of test:
1. is the DTS setup correct so that the bandgap sensor is read and
   triggers correct actions
2. is the bandgap sensor correct enough to that the data sheet limits
   are really met

>  I don't know how much it's inaccurate, so we may have to make the 90C
> value lower to compensate.

Hm. If the bandgap sensor is inaccurate, we should make sure it reports the
maximum value, just to be on the safe side. So that the real temperature
is guaranteed to be lower than what is reported.

Then we can use the data sheet limits of 90°C and 105°C in the trip point
table (which should not be tweaked for sensor inaccuracy).

BR,
Nikolaus
Adam Ford Sept. 13, 2019, 2:05 p.m. UTC | #8
On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>
> Hi Adam,
>
> > Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:
>
> >>> +     cpu_cooling_maps: cooling-maps {
> >>> +             map0 {
> >>> +                     trip = <&cpu_alert0>;
> >>> +                     /* Only allow OPP50 and OPP100 */
> >>> +                     cooling-device = <&cpu 0 1>;
> >>
> >> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> >> understand their meaning (and how it relates to the opp list).
> >
> > I read through the documentation, but it wasn't completely clear to
> > me. AFAICT, the numbers after &cpu represent the min and max index in
> > the OPP table when the condition is hit.
>
> Ok. It seems to use "cooling state" for those and the first is minimum
> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
> no limits.
>
> Since here we use the &cpu node it is likely that the "cooling state"
> is the same as the OPP index currently in use.
>
> I have looked through the .dts which use cpu_crit and the picture is
> not unique...
>
> omap4           seems to only define it
> am57xx          has two different grade dtsi files
> dra7            overwrites critical temperature value
> am57xx-beagle   defines a gpio to control a fan

Checkout rk3288-veyron-mickey.dts

They have almost_warm, warm, almost_hot, hot, hotter, very_hot, and
critical for trips, and they have as many corresponding cooling maps
which appear to limit the CPU speeds, but their index references are
still confusing to me.

For that device,
Warm and no limit first, then 4:   coolling-device = <&cpu0 THERMAL_NO_LIMIT 4>
...
very_hot uses a number then no limit: cooling-device = <&cpu0 8
THERMAL_NO_LIMIT>

This makes me wonder if the min and max are switched or the index
values go backwards.

>
> The fan is only a map0 for the board_alert0 starting to do something at 40°C
> and above but there is nothing for the "critical" point.
> But throttling is working on omap5...
>
> Could there be some automatic handling by the govenors for the "critical"
> trip points? So that we do not need to define any cooling-maps for "critical"?
>
> Even for the exynos54xx where the crical trip point is only defined.
>



> >>
> >>> +             };
> >>> +     };
> >>
> >> Seems to make sense when comparing to omap4-cpu-thermal.dtsi...
> >>
> >> Maybe we can add the trip points to omap3-cpu-thermal.dtsi
> >> because they seem to be the same for all omap3 variants and
> >> just have a SoC variant specific cooling map for omap36xx.dtsi.
> >
> > The OPP's for OMAP3530 show that OPP5 and OPP6 are capable of
> > operating at 105C.  AM3517 is a little different also, so I didn't
> > want to make a generic omap3 throttling table.  Since my goal was to
> > try to remove the need for the turbo option from the newly supported
> > 1GHz omap3630/3730, I was hoping to get this pushed first.  From
> > there, we can tweak the 34xx.dtsi and 3517.dtsi for their respective
> > thermal information.
>
> My observation is that all omap3 have
> commercial range        0°C .. 90°C
> extended                -40°C .. 105°C
>
> So there is no difference in defining these as trip points and
> my proposal would be to have these in omap3.dtsi.

I can do that.  I'll copy-paste the omap4 stuff using 90 and 105 as
limits without any mapping.  I'll then keep the mapping in omap3630

>
> Only disabling OPPs differs. Which would make only the mapping
> go to omap36xx.
>
> >
> >>
> >>> +};
> >>> +
> >>> /* OMAP3630 needs dss_96m_fck for VENC */
> >>> &venc {
> >>>      clocks = <&dss_tv_fck>, <&dss_96m_fck>;
> >>> --
> >>> 2.17.1
> >>>
> >>
> >> The question is how we can test that. Heating up the omap36xx to 90°C
> >> or even 105°C isn't that easy like with omap5...
> >>
> >> Maybe we can modify the millicelsius values for testing purposes to
> >> something in reachable range, e.g. 60°C and 70°C and watch what happens?
> >
> > I have access to a thermal chamber at work, but the guy who knows how
> > to use it is out for the rest of the week.  My plan was do as you
> > suggested and change the milicelsius values, but I wanted to get some
> > buy-in from TI people and/or Tony.  This also means enabling the omap3
> > thermal stuff which clearly throws a message that it's inaccurate.
>
> Basically we need two different types of test:
> 1. is the DTS setup correct so that the bandgap sensor is read and
>    triggers correct actions
> 2. is the bandgap sensor correct enough to that the data sheet limits
>    are really met
>
> >  I don't know how much it's inaccurate, so we may have to make the 90C
> > value lower to compensate.
>
> Hm. If the bandgap sensor is inaccurate, we should make sure it reports the
> maximum value, just to be on the safe side. So that the real temperature
> is guaranteed to be lower than what is reported.
>
> Then we can use the data sheet limits of 90°C and 105°C in the trip point
> table (which should not be tweaked for sensor inaccuracy).

I can see not compensating if it reads high, but if the temp reads
low, shouldn't compensate so we don't over temp the processor?

adam
>
> BR,
> Nikolaus
>
H. Nikolaus Schaller Sept. 13, 2019, 2:24 p.m. UTC | #9
> Am 13.09.2019 um 16:05 schrieb Adam Ford <aford173@gmail.com>:
> 
> On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>> 
>> Hi Adam,
>> 
>>> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:
>> 
>>>>> +     cpu_cooling_maps: cooling-maps {
>>>>> +             map0 {
>>>>> +                     trip = <&cpu_alert0>;
>>>>> +                     /* Only allow OPP50 and OPP100 */
>>>>> +                     cooling-device = <&cpu 0 1>;
>>>> 
>>>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
>>>> understand their meaning (and how it relates to the opp list).
>>> 
>>> I read through the documentation, but it wasn't completely clear to
>>> me. AFAICT, the numbers after &cpu represent the min and max index in
>>> the OPP table when the condition is hit.
>> 
>> Ok. It seems to use "cooling state" for those and the first is minimum
>> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
>> no limits.
>> 
>> Since here we use the &cpu node it is likely that the "cooling state"
>> is the same as the OPP index currently in use.
>> 
>> I have looked through the .dts which use cpu_crit and the picture is
>> not unique...
>> 
>> omap4           seems to only define it
>> am57xx          has two different grade dtsi files
>> dra7            overwrites critical temperature value
>> am57xx-beagle   defines a gpio to control a fan
> 
> Checkout rk3288-veyron-mickey.dts
> 
> They have almost_warm, warm, almost_hot, hot, hotter, very_hot, and
> critical for trips, and they have as many corresponding cooling maps
> which appear to limit the CPU speeds, but their index references are
> still confusing to me.

Seems to be quite sophistcated.

The arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi also has a lot
of trip points. So there may be very different needs...

But it has potentially helpful comments...

				/* 
				 * When reaching cpu0_alert3, reduce CPU
				 * by 2 steps. On Exynos5422/5800 that would
				 * be: 1600 MHz and 1100 MHz.
				 */
				map3 {
					trip = <&cpu0_alert3>;
					cooling-device = <&cpu0 0 2>;
				};
				map4 {
					trip = <&cpu0_alert3>;
					cooling-device = <&cpu4 0 2>;
				};
				/*
				 * When reaching cpu0_alert4, reduce CPU
				 * further, down to 600 MHz (12 steps for big,
				 * 7 steps for LITTLE).
				 */
				map5 {
					trip = <&cpu0_alert4>;
					cooling-device = <&cpu0 3 7>;
				};
				map6 {
					trip = <&cpu0_alert4>;
					cooling-device = <&cpu4 3 12>;
				};

That would mean the second integer is something about how
many steps to reduce.

But the first is not explained.

BTW: this also demonstrates how a single trip point can map to multiple
cooling-device actions (something we likely do not need).

> 
> For that device,
> Warm and no limit first, then 4:   coolling-device = <&cpu0 THERMAL_NO_LIMIT 4>
> ...
> very_hot uses a number then no limit: cooling-device = <&cpu0 8
> THERMAL_NO_LIMIT>
> 
> This makes me wonder if the min and max are switched or the index
> values go backwards.

It may depend on the specific cpu driver? Maybe even omap rk and exynos
have different interpretation in code?

>> 
>> Then we can use the data sheet limits of 90°C and 105°C in the trip point
>> table (which should not be tweaked for sensor inaccuracy).
> 
> I can see not compensating if it reads high, but if the temp reads
> low, shouldn't compensate so we don't over temp the processor?

I just mean that we must ensure that the TJ is <= 90° if the bandgap
ever reports 90°. So it may report 10 or 20 or even 30 degrees more than the
real temperature but never less (reaching the critical temperature too early
but not too late).

We can achieve that by adding bias or changing slope etc. in the bandgap sensor
driver.

If I find some time I am curious enough to look into the code and the data
sheets to understand why it is said to be inaccurate... Maybe there is
jitter from some LDO and it needs a median filter?

And why it seems to add a bias of ca. 10° as soon as I read it more than
for the first time. And how well temperature correlates to ambient temperature
(it definitively correlates to cpufreq-set -f).

But we should not modify the trip temperatures by 10 or 20 or 30 degrees.
IMHO they should have the values defined by the data sheet.

BR,
Nikolaus
Adam Ford Sept. 13, 2019, 3:01 p.m. UTC | #10
On Fri, Sep 13, 2019 at 9:24 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>
>
> > Am 13.09.2019 um 16:05 schrieb Adam Ford <aford173@gmail.com>:
> >
> > On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >>
> >> Hi Adam,
> >>
> >>> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:
> >>
> >>>>> +     cpu_cooling_maps: cooling-maps {
> >>>>> +             map0 {
> >>>>> +                     trip = <&cpu_alert0>;
> >>>>> +                     /* Only allow OPP50 and OPP100 */
> >>>>> +                     cooling-device = <&cpu 0 1>;
> >>>>
> >>>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> >>>> understand their meaning (and how it relates to the opp list).
> >>>
> >>> I read through the documentation, but it wasn't completely clear to
> >>> me. AFAICT, the numbers after &cpu represent the min and max index in
> >>> the OPP table when the condition is hit.
> >>
> >> Ok. It seems to use "cooling state" for those and the first is minimum
> >> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
> >> no limits.
> >>
> >> Since here we use the &cpu node it is likely that the "cooling state"
> >> is the same as the OPP index currently in use.
> >>
> >> I have looked through the .dts which use cpu_crit and the picture is
> >> not unique...
> >>
> >> omap4           seems to only define it
> >> am57xx          has two different grade dtsi files
> >> dra7            overwrites critical temperature value
> >> am57xx-beagle   defines a gpio to control a fan
> >

I am going to push a separate but related RFC with 2 patches in the
series.  This new one will setup the alerts and maps without any
throttling for all omap3's in the first patch.  The second patch will
consolidate the thermal references to omap3.dtsi so omap34, omap36 and
am35 can all use them without having to duplicate the entries.

It will make the omap36xx changes simpler to manage, because we can
just modify a portion of the entries instead of having the whole
table.

Once this parallel RFC gets comments/feedback, I'll re-integrate the
omap36xx throttling.

adam

> > Checkout rk3288-veyron-mickey.dts
> >
> > They have almost_warm, warm, almost_hot, hot, hotter, very_hot, and
> > critical for trips, and they have as many corresponding cooling maps
> > which appear to limit the CPU speeds, but their index references are
> > still confusing to me.
>
> Seems to be quite sophistcated.
>
> The arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi also has a lot
> of trip points. So there may be very different needs...
>
> But it has potentially helpful comments...
>
>                                 /*
>                                  * When reaching cpu0_alert3, reduce CPU
>                                  * by 2 steps. On Exynos5422/5800 that would
>                                  * be: 1600 MHz and 1100 MHz.
>                                  */
>                                 map3 {
>                                         trip = <&cpu0_alert3>;
>                                         cooling-device = <&cpu0 0 2>;
>                                 };
>                                 map4 {
>                                         trip = <&cpu0_alert3>;
>                                         cooling-device = <&cpu4 0 2>;
>                                 };
>                                 /*
>                                  * When reaching cpu0_alert4, reduce CPU
>                                  * further, down to 600 MHz (12 steps for big,
>                                  * 7 steps for LITTLE).
>                                  */
>                                 map5 {
>                                         trip = <&cpu0_alert4>;
>                                         cooling-device = <&cpu0 3 7>;
>                                 };
>                                 map6 {
>                                         trip = <&cpu0_alert4>;
>                                         cooling-device = <&cpu4 3 12>;
>                                 };
>
> That would mean the second integer is something about how
> many steps to reduce.
>
> But the first is not explained.
>
> BTW: this also demonstrates how a single trip point can map to multiple
> cooling-device actions (something we likely do not need).
>
> >
> > For that device,
> > Warm and no limit first, then 4:   coolling-device = <&cpu0 THERMAL_NO_LIMIT 4>
> > ...
> > very_hot uses a number then no limit: cooling-device = <&cpu0 8
> > THERMAL_NO_LIMIT>
> >
> > This makes me wonder if the min and max are switched or the index
> > values go backwards.
>
> It may depend on the specific cpu driver? Maybe even omap rk and exynos
> have different interpretation in code?
>
> >>
> >> Then we can use the data sheet limits of 90°C and 105°C in the trip point
> >> table (which should not be tweaked for sensor inaccuracy).
> >
> > I can see not compensating if it reads high, but if the temp reads
> > low, shouldn't compensate so we don't over temp the processor?
>
> I just mean that we must ensure that the TJ is <= 90° if the bandgap
> ever reports 90°. So it may report 10 or 20 or even 30 degrees more than the
> real temperature but never less (reaching the critical temperature too early
> but not too late).
>
> We can achieve that by adding bias or changing slope etc. in the bandgap sensor
> driver.
>
> If I find some time I am curious enough to look into the code and the data
> sheets to understand why it is said to be inaccurate... Maybe there is
> jitter from some LDO and it needs a median filter?
>
> And why it seems to add a bias of ca. 10° as soon as I read it more than
> for the first time. And how well temperature correlates to ambient temperature
> (it definitively correlates to cpufreq-set -f).
>
> But we should not modify the trip temperatures by 10 or 20 or 30 degrees.
> IMHO they should have the values defined by the data sheet.
>
> BR,
> Nikolaus
>
H. Nikolaus Schaller Sept. 13, 2019, 3:09 p.m. UTC | #11
> Am 13.09.2019 um 17:01 schrieb Adam Ford <aford173@gmail.com>:
> 
> On Fri, Sep 13, 2019 at 9:24 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>> 
>> 
>>> Am 13.09.2019 um 16:05 schrieb Adam Ford <aford173@gmail.com>:
>>> 
>>> On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>>> 
>>>> Hi Adam,
>>>> 
>>>>> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:
>>>> 
>>>>>>> +     cpu_cooling_maps: cooling-maps {
>>>>>>> +             map0 {
>>>>>>> +                     trip = <&cpu_alert0>;
>>>>>>> +                     /* Only allow OPP50 and OPP100 */
>>>>>>> +                     cooling-device = <&cpu 0 1>;
>>>>>> 
>>>>>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
>>>>>> understand their meaning (and how it relates to the opp list).
>>>>> 
>>>>> I read through the documentation, but it wasn't completely clear to
>>>>> me. AFAICT, the numbers after &cpu represent the min and max index in
>>>>> the OPP table when the condition is hit.
>>>> 
>>>> Ok. It seems to use "cooling state" for those and the first is minimum
>>>> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
>>>> no limits.
>>>> 
>>>> Since here we use the &cpu node it is likely that the "cooling state"
>>>> is the same as the OPP index currently in use.
>>>> 
>>>> I have looked through the .dts which use cpu_crit and the picture is
>>>> not unique...
>>>> 
>>>> omap4           seems to only define it
>>>> am57xx          has two different grade dtsi files
>>>> dra7            overwrites critical temperature value
>>>> am57xx-beagle   defines a gpio to control a fan
>>> 
> 
> I am going to push a separate but related RFC with 2 patches in the
> series.  This new one will setup the alerts and maps without any
> throttling for all omap3's in the first patch.  The second patch will
> consolidate the thermal references to omap3.dtsi so omap34, omap36 and
> am35 can all use them without having to duplicate the entries.
> 
> It will make the omap36xx changes simpler to manage, because we can
> just modify a portion of the entries instead of having the whole
> table.
> 
> Once this parallel RFC gets comments/feedback, I'll re-integrate the
> omap36xx throttling.

Good idea. I have looked over them and they seem to be ok.

> 
> adam

BR and thanks,
Nikolaus
Adam Ford Sept. 13, 2019, 4:35 p.m. UTC | #12
On Fri, Sep 13, 2019 at 10:09 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>
>
> > Am 13.09.2019 um 17:01 schrieb Adam Ford <aford173@gmail.com>:
> >
> > On Fri, Sep 13, 2019 at 9:24 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >>
> >>
> >>> Am 13.09.2019 um 16:05 schrieb Adam Ford <aford173@gmail.com>:
> >>>
> >>> On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >>>>
> >>>> Hi Adam,
> >>>>
> >>>>> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:
> >>>>
> >>>>>>> +     cpu_cooling_maps: cooling-maps {
> >>>>>>> +             map0 {
> >>>>>>> +                     trip = <&cpu_alert0>;
> >>>>>>> +                     /* Only allow OPP50 and OPP100 */
> >>>>>>> +                     cooling-device = <&cpu 0 1>;
> >>>>>>
> >>>>>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> >>>>>> understand their meaning (and how it relates to the opp list).
> >>>>>
> >>>>> I read through the documentation, but it wasn't completely clear to
> >>>>> me. AFAICT, the numbers after &cpu represent the min and max index in
> >>>>> the OPP table when the condition is hit.
> >>>>
> >>>> Ok. It seems to use "cooling state" for those and the first is minimum
> >>>> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
> >>>> no limits.
> >>>>
> >>>> Since here we use the &cpu node it is likely that the "cooling state"
> >>>> is the same as the OPP index currently in use.
> >>>>
> >>>> I have looked through the .dts which use cpu_crit and the picture is
> >>>> not unique...
> >>>>
> >>>> omap4           seems to only define it
> >>>> am57xx          has two different grade dtsi files
> >>>> dra7            overwrites critical temperature value
> >>>> am57xx-beagle   defines a gpio to control a fan
> >>>
> >
> > I am going to push a separate but related RFC with 2 patches in the
> > series.  This new one will setup the alerts and maps without any
> > throttling for all omap3's in the first patch.  The second patch will
> > consolidate the thermal references to omap3.dtsi so omap34, omap36 and
> > am35 can all use them without having to duplicate the entries.
> >
> > It will make the omap36xx changes simpler to manage, because we can
> > just modify a portion of the entries instead of having the whole
> > table.
> >
> > Once this parallel RFC gets comments/feedback, I'll re-integrate the
> > omap36xx throttling.
>
> Good idea. I have looked over them and they seem to be ok.

Rebasing my omap36xx throttling after the v2 RFC I pushed moving the
omap3-cpu thermal stuff, I modified the omap36xx accordingly to try
and force the alert condition:

&cpu_alert0 {
     temperature = <55000>; /* millicelsius */
};

&cpu_cooling_maps {
     map0 {
          /* OPP130 and OPP1G are not available above 90C */
          cooling-device = <&cpu 0 2>;
     };
};

I would expect that with the temperature set for 55C, it would have
done something, but it doesn't appear to be working as I would expect.

# cat /sys/devices/virtual/thermal/thermal_zone0/temp
58500

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
300000 600000 800000
#

:-(


>
> >
> > adam
>
> BR and thanks,
> Nikolaus
>
Adam Ford Sept. 13, 2019, 4:42 p.m. UTC | #13
On Fri, Sep 13, 2019 at 11:35 AM Adam Ford <aford173@gmail.com> wrote:
>
> On Fri, Sep 13, 2019 at 10:09 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >
> >
> > > Am 13.09.2019 um 17:01 schrieb Adam Ford <aford173@gmail.com>:
> > >
> > > On Fri, Sep 13, 2019 at 9:24 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> > >>
> > >>
> > >>> Am 13.09.2019 um 16:05 schrieb Adam Ford <aford173@gmail.com>:
> > >>>
> > >>> On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> > >>>>
> > >>>> Hi Adam,
> > >>>>
> > >>>>> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:
> > >>>>
> > >>>>>>> +     cpu_cooling_maps: cooling-maps {
> > >>>>>>> +             map0 {
> > >>>>>>> +                     trip = <&cpu_alert0>;
> > >>>>>>> +                     /* Only allow OPP50 and OPP100 */
> > >>>>>>> +                     cooling-device = <&cpu 0 1>;
> > >>>>>>
> > >>>>>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
> > >>>>>> understand their meaning (and how it relates to the opp list).
> > >>>>>
> > >>>>> I read through the documentation, but it wasn't completely clear to
> > >>>>> me. AFAICT, the numbers after &cpu represent the min and max index in
> > >>>>> the OPP table when the condition is hit.
> > >>>>
> > >>>> Ok. It seems to use "cooling state" for those and the first is minimum
> > >>>> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
> > >>>> no limits.
> > >>>>
> > >>>> Since here we use the &cpu node it is likely that the "cooling state"
> > >>>> is the same as the OPP index currently in use.
> > >>>>
> > >>>> I have looked through the .dts which use cpu_crit and the picture is
> > >>>> not unique...
> > >>>>
> > >>>> omap4           seems to only define it
> > >>>> am57xx          has two different grade dtsi files
> > >>>> dra7            overwrites critical temperature value
> > >>>> am57xx-beagle   defines a gpio to control a fan
> > >>>
> > >
> > > I am going to push a separate but related RFC with 2 patches in the
> > > series.  This new one will setup the alerts and maps without any
> > > throttling for all omap3's in the first patch.  The second patch will
> > > consolidate the thermal references to omap3.dtsi so omap34, omap36 and
> > > am35 can all use them without having to duplicate the entries.
> > >
> > > It will make the omap36xx changes simpler to manage, because we can
> > > just modify a portion of the entries instead of having the whole
> > > table.
> > >
> > > Once this parallel RFC gets comments/feedback, I'll re-integrate the
> > > omap36xx throttling.
> >
> > Good idea. I have looked over them and they seem to be ok.
>
> Rebasing my omap36xx throttling after the v2 RFC I pushed moving the
> omap3-cpu thermal stuff, I modified the omap36xx accordingly to try
> and force the alert condition:
>
> &cpu_alert0 {
>      temperature = <55000>; /* millicelsius */
> };
>
> &cpu_cooling_maps {
>      map0 {
>           /* OPP130 and OPP1G are not available above 90C */
>           cooling-device = <&cpu 0 2>;
>      };
> };
>
> I would expect that with the temperature set for 55C, it would have
> done something, but it doesn't appear to be working as I would expect.
>
> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
> 58500
>
> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
> 300000 600000 800000
> #
>
> :-(
>

Good news (I think)

With cooling-device = <&cpu 1 2> setup, I was able to ask the max
frequency and it returned 600MHz.

# cat /sys/devices/virtual/thermal/thermal_zone0/temp
58500
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
300000 600000 800000
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
scaling_max_freq  scaling_min_freq
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
600000

adam
>
> >
> > >
> > > adam
> >
> > BR and thanks,
> > Nikolaus
> >
H. Nikolaus Schaller Sept. 13, 2019, 4:51 p.m. UTC | #14
> Am 13.09.2019 um 18:42 schrieb Adam Ford <aford173@gmail.com>:
> 
> On Fri, Sep 13, 2019 at 11:35 AM Adam Ford <aford173@gmail.com> wrote:
>> 
>> On Fri, Sep 13, 2019 at 10:09 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>> 
>>> 
>>>> Am 13.09.2019 um 17:01 schrieb Adam Ford <aford173@gmail.com>:
>>>> 
>>>> On Fri, Sep 13, 2019 at 9:24 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>>>> 
>>>>> 
>>>>>> Am 13.09.2019 um 16:05 schrieb Adam Ford <aford173@gmail.com>:
>>>>>> 
>>>>>> On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>>>>>> 
>>>>>>> Hi Adam,
>>>>>>> 
>>>>>>>> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@gmail.com>:
>>>>>>> 
>>>>>>>>>> +     cpu_cooling_maps: cooling-maps {
>>>>>>>>>> +             map0 {
>>>>>>>>>> +                     trip = <&cpu_alert0>;
>>>>>>>>>> +                     /* Only allow OPP50 and OPP100 */
>>>>>>>>>> +                     cooling-device = <&cpu 0 1>;
>>>>>>>>> 
>>>>>>>>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
>>>>>>>>> understand their meaning (and how it relates to the opp list).
>>>>>>>> 
>>>>>>>> I read through the documentation, but it wasn't completely clear to
>>>>>>>> me. AFAICT, the numbers after &cpu represent the min and max index in
>>>>>>>> the OPP table when the condition is hit.
>>>>>>> 
>>>>>>> Ok. It seems to use "cooling state" for those and the first is minimum
>>>>>>> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
>>>>>>> no limits.
>>>>>>> 
>>>>>>> Since here we use the &cpu node it is likely that the "cooling state"
>>>>>>> is the same as the OPP index currently in use.
>>>>>>> 
>>>>>>> I have looked through the .dts which use cpu_crit and the picture is
>>>>>>> not unique...
>>>>>>> 
>>>>>>> omap4           seems to only define it
>>>>>>> am57xx          has two different grade dtsi files
>>>>>>> dra7            overwrites critical temperature value
>>>>>>> am57xx-beagle   defines a gpio to control a fan
>>>>>> 
>>>> 
>>>> I am going to push a separate but related RFC with 2 patches in the
>>>> series.  This new one will setup the alerts and maps without any
>>>> throttling for all omap3's in the first patch.  The second patch will
>>>> consolidate the thermal references to omap3.dtsi so omap34, omap36 and
>>>> am35 can all use them without having to duplicate the entries.
>>>> 
>>>> It will make the omap36xx changes simpler to manage, because we can
>>>> just modify a portion of the entries instead of having the whole
>>>> table.
>>>> 
>>>> Once this parallel RFC gets comments/feedback, I'll re-integrate the
>>>> omap36xx throttling.
>>> 
>>> Good idea. I have looked over them and they seem to be ok.
>> 
>> Rebasing my omap36xx throttling after the v2 RFC I pushed moving the
>> omap3-cpu thermal stuff, I modified the omap36xx accordingly to try
>> and force the alert condition:
>> 
>> &cpu_alert0 {
>>     temperature = <55000>; /* millicelsius */
>> };
>> 
>> &cpu_cooling_maps {
>>     map0 {
>>          /* OPP130 and OPP1G are not available above 90C */
>>          cooling-device = <&cpu 0 2>;
>>     };
>> };
>> 
>> I would expect that with the temperature set for 55C, it would have
>> done something, but it doesn't appear to be working as I would expect.
>> 
>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
>> 58500
>> 
>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
>> 300000 600000 800000
>> #
>> 
>> :-(
>> 
> 
> Good news (I think)
> 
> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
> frequency and it returned 600MHz.
> 
> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
> 58500
> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
> 300000 600000 800000
> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
> scaling_max_freq  scaling_min_freq
> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
> 600000

looks good!
But we have to understand what the <&cpu 1 2> exactly means...

Hopefully someone reading your RFCv2 can answer...

What happens with trip point 60000?
(unfortunately one has to reboot in between or can you kexec between two kernel/dtb versions?)

BR,
Nikolaus
Daniel Lezcano Sept. 13, 2019, 5:18 p.m. UTC | #15
On 13/09/2019 18:51, H. Nikolaus Schaller wrote:

[ ... ]

>> Good news (I think)
>>
>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
>> frequency and it returned 600MHz.
>>
>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
>> 58500
>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
>> 300000 600000 800000
>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
>> scaling_max_freq  scaling_min_freq
>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
>> 600000
> 
> looks good!
> But we have to understand what the <&cpu 1 2> exactly means...
> 
> Hopefully someone reading your RFCv2 can answer...

I may have missed the question :)

These are the states allowed for the cooling device (the one you can see
in the /sys/class/thermal/cooling_device0/max_state. As the logic is
inverted for cpufreq, that can be confusing.

If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
when the mitigation begins the cooling device state is 0 and then the
thermal governor increase the state until it sees a cooling effect.

If <&fan 0 2> is set, the governor won't set a state above 2 even if the
temperature increases.

When the cooling driver is able to return the number of states it
supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
governor to find the balance point.

Now if the cooling device is cpufreq, the state order is inverted,
because the cooling effects happens when decreasing the OPP.

If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
etc.

Now the different combinations:

<&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
0 to 7.

<&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2

<&cpu 1 2> the governor will use the state 1 and 2. That means there is
always the cooling effect as the governor won't set it to zero thus
stopping the mitigation.


Does it clarify the DT spec?




> What happens with trip point 60000?
> (unfortunately one has to reboot in between or can you kexec between two kernel/dtb versions?)
> 
> BR,
> Nikolaus
>
Adam Ford Sept. 13, 2019, 6:46 p.m. UTC | #16
On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
>
> [ ... ]
>
> >> Good news (I think)
> >>
> >> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
> >> frequency and it returned 600MHz.
> >>
> >> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
> >> 58500
> >> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
> >> 300000 600000 800000
> >> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
> >> scaling_max_freq  scaling_min_freq
> >> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
> >> 600000
> >
> > looks good!
> > But we have to understand what the <&cpu 1 2> exactly means...
> >
> > Hopefully someone reading your RFCv2 can answer...
>
Daniel,

Thank you for replying.

> I may have missed the question :)
>
> These are the states allowed for the cooling device (the one you can see
> in the /sys/class/thermal/cooling_device0/max_state. As the logic is
> inverted for cpufreq, that can be confusing.

I think that's what has be confused.

>
> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
> when the mitigation begins the cooling device state is 0 and then the
> thermal governor increase the state until it sees a cooling effect.
>
> If <&fan 0 2> is set, the governor won't set a state above 2 even if the
> temperature increases.

I am not sure I know what you mean by 'state' in this context.

>
> When the cooling driver is able to return the number of states it
> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
> governor to find the balance point.

If the cooling driver is using cpufreq, is the number of supported
states equal to the number of operating points given to cpufreq?

>
> Now if the cooling device is cpufreq, the state order is inverted,
> because the cooling effects happens when decreasing the OPP.
>
> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
> etc.

I am not sure how the state would be set to 2.

>
> Now the different combinations:
>
> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
> 0 to 7.
>
> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2

What would be the difference between  <&cpu THERMAL_NO_LIMIT 2>  and
<&cpu 0 2> ?
(if there is any)

>
> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
> always the cooling effect as the governor won't set it to zero thus
> stopping the mitigation.

For the purposes of the board in question, we have 4 operating points,
300MHz, 600MHz, 800MHz and 1GHz.  Once the board reaches 90C, we need
them to cease operation at 800MHz and 1GHz and only permit operation
at 300MHz and 600MHz.  I am going under the assumption that the cpu
index[0] would be for 300MHz, index[1] = 600MHz, etc.

If I am interpreting your comment correctly, I should set <&cpu
THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
to 600MHz and not exceed, is that correct?

>
>
> Does it clarify the DT spec?
>

I think your reply to my inquiry might.  If possible, it would be nice
to get this documented into the bindings doc for others in the future.
I can do it, but someone with a better understanding of the concept
maybe more qualified.  I can totally understand why some may want to
integrate this into their SoC device trees to slow the processor when
hot.

Thank you for taking the time to review this.  I appreciate it.

adam

>
>
>
> > What happens with trip point 60000?
> > (unfortunately one has to reboot in between or can you kexec between two kernel/dtb versions?)
> >
> > BR,
> > Nikolaus
> >
>
>
> --
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
Adam Ford Sept. 13, 2019, 8:01 p.m. UTC | #17
On Fri, Sep 13, 2019 at 1:46 PM Adam Ford <aford173@gmail.com> wrote:
>
> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
> >
> > On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
> >
> > [ ... ]
> >
> > >> Good news (I think)
> > >>
> > >> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
> > >> frequency and it returned 600MHz.
> > >>
> > >> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
> > >> 58500
> > >> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
> > >> 300000 600000 800000
> > >> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
> > >> scaling_max_freq  scaling_min_freq
> > >> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
> > >> 600000
> > >
> > > looks good!
> > > But we have to understand what the <&cpu 1 2> exactly means...
> > >
> > > Hopefully someone reading your RFCv2 can answer...
> >
> Daniel,
>
> Thank you for replying.
>
> > I may have missed the question :)
> >
> > These are the states allowed for the cooling device (the one you can see
> > in the /sys/class/thermal/cooling_device0/max_state. As the logic is
> > inverted for cpufreq, that can be confusing.
>
> I think that's what has be confused.
>
> >
> > If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
> > when the mitigation begins the cooling device state is 0 and then the
> > thermal governor increase the state until it sees a cooling effect.
> >
> > If <&fan 0 2> is set, the governor won't set a state above 2 even if the
> > temperature increases.
>
> I am not sure I know what you mean by 'state' in this context.
>
> >
> > When the cooling driver is able to return the number of states it
> > supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
> > governor to find the balance point.
>
> If the cooling driver is using cpufreq, is the number of supported
> states equal to the number of operating points given to cpufreq?
>
> >
> > Now if the cooling device is cpufreq, the state order is inverted,
> > because the cooling effects happens when decreasing the OPP.
> >
> > If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
> > the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
> > etc.
>
> I am not sure how the state would be set to 2.
>
> >
> > Now the different combinations:
> >
> > <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
> > 0 to 7.
> >
> > <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
>
> What would be the difference between  <&cpu THERMAL_NO_LIMIT 2>  and
> <&cpu 0 2> ?
> (if there is any)
>
> >
> > <&cpu 1 2> the governor will use the state 1 and 2. That means there is
> > always the cooling effect as the governor won't set it to zero thus
> > stopping the mitigation.
>
> For the purposes of the board in question, we have 4 operating points,
> 300MHz, 600MHz, 800MHz and 1GHz.  Once the board reaches 90C, we need
> them to cease operation at 800MHz and 1GHz and only permit operation
> at 300MHz and 600MHz.  I am going under the assumption that the cpu
> index[0] would be for 300MHz, index[1] = 600MHz, etc.
>
> If I am interpreting your comment correctly, I should set <&cpu
> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
> to 600MHz and not exceed, is that correct?
>
> >
> >
> > Does it clarify the DT spec?
> >
>
> I think your reply to my inquiry might.  If possible, it would be nice
> to get this documented into the bindings doc for others in the future.
> I can do it, but someone with a better understanding of the concept
> maybe more qualified.  I can totally understand why some may want to
> integrate this into their SoC device trees to slow the processor when
> hot.
>
> Thank you for taking the time to review this.  I appreciate it.
>
> adam
>
> >
> >
> >
> > > What happens with trip point 60000?
> > > (unfortunately one has to reboot in between or can you kexec between two kernel/dtb versions?)

I set the trip point just above the ambient temp.  I then tried to run
some benchmarks in the background while constantly polling the max
frequency and it changes, but it needs to skip the 800MHz point and
jump right to the 600 MHz point (or lower)

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
800000

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
600000


Loops: 80000, Iterations: 1, Duration: 12 sec.
C Converted Double Precision Whetstones: 666.7 MIPS
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
800000

[1]+  Done                       whetstone 80000

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
800000

# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
1000000


> > >
> > > BR,
> > > Nikolaus
> > >
> >
> >
> > --
> >  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> >
> > Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> > <http://twitter.com/#!/linaroorg> Twitter |
> > <http://www.linaro.org/linaro-blog/> Blog
> >
Daniel Lezcano Sept. 13, 2019, 8:11 p.m. UTC | #18
On 13/09/2019 20:46, Adam Ford wrote:
> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
>>
>> [ ... ]
>>
>>>> Good news (I think)
>>>>
>>>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
>>>> frequency and it returned 600MHz.
>>>>
>>>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
>>>> 58500
>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
>>>> 300000 600000 800000
>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
>>>> scaling_max_freq  scaling_min_freq
>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
>>>> 600000
>>>
>>> looks good!
>>> But we have to understand what the <&cpu 1 2> exactly means...
>>>
>>> Hopefully someone reading your RFCv2 can answer...
>>
> Daniel,
> 
> Thank you for replying.
> 
>> I may have missed the question :)
>>
>> These are the states allowed for the cooling device (the one you can see
>> in the /sys/class/thermal/cooling_device0/max_state. As the logic is
>> inverted for cpufreq, that can be confusing.
> 
> I think that's what has be confused.
> 
>>
>> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
>> when the mitigation begins the cooling device state is 0 and then the
>> thermal governor increase the state until it sees a cooling effect.
>>
>> If <&fan 0 2> is set, the governor won't set a state above 2 even if the
>> temperature increases.
> 
> I am not sure I know what you mean by 'state' in this context.

A thermal zone is managed by the thermal framework as the following:
 - a sensor
 - a governor
 - a cooling device

The governor gets the temperature via the sensor and depending on the
temperature it will increase or decrease the cooling effect of the
cooling device. With a fan, that means it will increase or decrease its
speed. With cpufreq, it will decrease or increase the OPP.

These are discrete values the governor will use to set the cooling
effect. The state is one of these value (the current speed or the
current OPP index).

Depending on the cooling device, the number of states are different.

In the context above, the fan cooling device can be stopped (state=0),
running (state=1), running faster (state=2).

As the node tells to use no more than 2, then the governor will never go
to running much faster (state=3). (That's an example).

>> When the cooling driver is able to return the number of states it
>> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
>> governor to find the balance point.
> 
> If the cooling driver is using cpufreq, is the number of supported
> states equal to the number of operating points given to cpufreq?

Yes, absolutely if THERMAL_NO_LIMIT is set [1] (which is what is done
most of the cases). Otherwise it will use the boundaries set in <&cpu
limit_low limit_high>

When changing the limits, a state=1 has a different meaning.

For example: 7 OPPs available

<&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> : state=[0..7]

<&cpu 0 2> : state=[0..2] (1, 2)

<&cpu 5 7> : state=[0..3] (5, 6, 7)

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/cpu_cooling.c#n334

>> Now if the cooling device is cpufreq, the state order is inverted,
>> because the cooling effects happens when decreasing the OPP.
>>
>> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
>> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
>> etc.
> 
> I am not sure how the state would be set to 2.

That is a governor decision. Let me give an example with a hikey960
board which has very fast temperature transitions, so it is simpler to
illustrate the behavior. The trip point is 75°C.

Imagine the CPU gets loaded 100%, the cpufreq sets the OPP to the max
(2.36GHz), as the temperature is still under 75°C, there is no
mitigation yet, so the cooling device state is 0.

In a very few seconds the temperature reaches 75°C, that trigger the
monitoring of the thermal zone and the mitigation begins, then the
temperature continues to increase very quickly to 78°C, the governor see
we are above the trip point and increment the cooling device state
(state=>1). That leads to an OPP change from 2.36GHz to 2.11GHz.

The governor continues to read the temperature and see the temperature
is still increasing (even if it is that happens more slowly), so it
increases the state again (state=>2). That leads to an OPP change from
2.11GHz to 1.8GHz.

The governor continues to read the temperature and see the temperature
decrease, it does nothing.

The governor continues to read the temperature, see the temperature
decreases and is below 75°C, it decrease the state (state=>1), the OPP
change to 2.36GHz.

The temperature then increases, etc ...

Actually the governors do more than that but it is for the example.

So it is a bad idea to set boundaries for the cooling device state as
that may prevent the governor to take the right decision for the cooling
effect. Imagine in the example above, we set the max state to 1 for the
cooling device, that would mean the governor won't be able to stop the
temperature increasing, thus ending up to a hard reboot.

>> Now the different combinations:
>>
>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
>> 0 to 7.
>>
>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
> 
> What would be the difference between  <&cpu THERMAL_NO_LIMIT 2>  and
> <&cpu 0 2> ?
> (if there is any)

There is no difference.


>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
>> always the cooling effect as the governor won't set it to zero thus
>> stopping the mitigation.
> 
> For the purposes of the board in question, we have 4 operating points,
> 300MHz, 600MHz, 800MHz and 1GHz.  Once the board reaches 90C, we need
> them to cease operation at 800MHz and 1GHz and only permit operation
> at 300MHz and 600MHz.  I am going under the assumption that the cpu
> index[0] would be for 300MHz, index[1] = 600MHz, etc.
> 
> If I am interpreting your comment correctly, I should set <&cpu
> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
> to 600MHz and not exceed, is that correct?

Nope, it will mean the cooling device can only reduce to 800MHz and to
600MHz to mitigate.

Actually the thermal framework neither the kernel are designed to handle
this case. They assume the OPPs are stable whatever the thermal situation.

That is the reason why I think it is a very interesting use case because
it introduces a temperature constraint in addition to a duration for a
certain OPP. IMO, that could be an extension of the turbo-mode.

With what we have now, I doubt it is feasible.

The best we can do is preventing to reach the 90°C, so we remove the OPP
temperature constraint. I suppose 85°C is a safe temperature to stick on.

And in order to let the governor have free hand.

<&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>

I don't think that will have a significant impact on performances
compared to be able to run at a higher temperature with less OPPs.


>> Does it clarify the DT spec?
>>
> 
> I think your reply to my inquiry might.  If possible, it would be nice
> to get this documented into the bindings doc for others in the future.
> I can do it, but someone with a better understanding of the concept
> maybe more qualified.  I can totally understand why some may want to
> integrate this into their SoC device trees to slow the processor when
> hot.
> 
> Thank you for taking the time to review this.  I appreciate it.
> 
> adam
> 
>>
>>
>>
>>> What happens with trip point 60000?
>>> (unfortunately one has to reboot in between or can you kexec between two kernel/dtb versions?)
>>>
>>> BR,
>>> Nikolaus
>>>
>>
>>
>> --
>>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>
H. Nikolaus Schaller Sept. 13, 2019, 8:34 p.m. UTC | #19
Hi Daniel,

> Am 13.09.2019 um 22:11 schrieb Daniel Lezcano <daniel.lezcano@linaro.org>:
> 
> On 13/09/2019 20:46, Adam Ford wrote:
>> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
>> <daniel.lezcano@linaro.org> wrote:
>>> 
>>> On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
>>> 
>>> [ ... ]
>>> 
>>>>> Good news (I think)
>>>>> 
>>>>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
>>>>> frequency and it returned 600MHz.
>>>>> 
>>>>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
>>>>> 58500
>>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
>>>>> 300000 600000 800000
>>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
>>>>> scaling_max_freq  scaling_min_freq
>>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
>>>>> 600000
>>>> 
>>>> looks good!
>>>> But we have to understand what the <&cpu 1 2> exactly means...
>>>> 
>>>> Hopefully someone reading your RFCv2 can answer...
>>> 
>> Daniel,
>> 
>> Thank you for replying.
>> 
>>> I may have missed the question :)
>>> 
>>> These are the states allowed for the cooling device (the one you can see
>>> in the /sys/class/thermal/cooling_device0/max_state. As the logic is
>>> inverted for cpufreq, that can be confusing.
>> 
>> I think that's what has be confused.
>> 
>>> 
>>> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
>>> when the mitigation begins the cooling device state is 0 and then the
>>> thermal governor increase the state until it sees a cooling effect.
>>> 
>>> If <&fan 0 2> is set, the governor won't set a state above 2 even if the
>>> temperature increases.
>> 
>> I am not sure I know what you mean by 'state' in this context.
> 
> A thermal zone is managed by the thermal framework as the following:
> - a sensor
> - a governor
> - a cooling device
> 
> The governor gets the temperature via the sensor and depending on the
> temperature it will increase or decrease the cooling effect of the
> cooling device. With a fan, that means it will increase or decrease its
> speed. With cpufreq, it will decrease or increase the OPP.
> 
> These are discrete values the governor will use to set the cooling
> effect. The state is one of these value (the current speed or the
> current OPP index).
> 
> Depending on the cooling device, the number of states are different.
> 
> In the context above, the fan cooling device can be stopped (state=0),
> running (state=1), running faster (state=2).
> 
> As the node tells to use no more than 2, then the governor will never go
> to running much faster (state=3). (That's an example).
> 
>>> When the cooling driver is able to return the number of states it
>>> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
>>> governor to find the balance point.
>> 
>> If the cooling driver is using cpufreq, is the number of supported
>> states equal to the number of operating points given to cpufreq?
> 
> Yes, absolutely if THERMAL_NO_LIMIT is set [1] (which is what is done
> most of the cases). Otherwise it will use the boundaries set in <&cpu
> limit_low limit_high>
> 
> When changing the limits, a state=1 has a different meaning.
> 
> For example: 7 OPPs available
> 
> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> : state=[0..7]
> 
> <&cpu 0 2> : state=[0..2] (1, 2)
> 
> <&cpu 5 7> : state=[0..3] (5, 6, 7)
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/cpu_cooling.c#n334
> 
>>> Now if the cooling device is cpufreq, the state order is inverted,
>>> because the cooling effects happens when decreasing the OPP.
>>> 
>>> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
>>> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
>>> etc.
>> 
>> I am not sure how the state would be set to 2.
> 
> That is a governor decision. Let me give an example with a hikey960
> board which has very fast temperature transitions, so it is simpler to
> illustrate the behavior. The trip point is 75°C.
> 
> Imagine the CPU gets loaded 100%, the cpufreq sets the OPP to the max
> (2.36GHz), as the temperature is still under 75°C, there is no
> mitigation yet, so the cooling device state is 0.
> 
> In a very few seconds the temperature reaches 75°C, that trigger the
> monitoring of the thermal zone and the mitigation begins, then the
> temperature continues to increase very quickly to 78°C, the governor see
> we are above the trip point and increment the cooling device state
> (state=>1). That leads to an OPP change from 2.36GHz to 2.11GHz.
> 
> The governor continues to read the temperature and see the temperature
> is still increasing (even if it is that happens more slowly), so it
> increases the state again (state=>2). That leads to an OPP change from
> 2.11GHz to 1.8GHz.
> 
> The governor continues to read the temperature and see the temperature
> decrease, it does nothing.

Ah, I think our misunderstanding is that the govenor "enables" and
"disables" a set of OPPs. Rather it goes down or up in the list if
above or below a trip point.

> 
> The governor continues to read the temperature, see the temperature
> decreases and is below 75°C, it decrease the state (state=>1), the OPP
> change to 2.36GHz.
> 
> The temperature then increases, etc ...
> 
> Actually the governors do more than that but it is for the example.
> 
> So it is a bad idea to set boundaries for the cooling device state as
> that may prevent the governor to take the right decision for the cooling
> effect. Imagine in the example above, we set the max state to 1 for the
> cooling device, that would mean the governor won't be able to stop the
> temperature increasing, thus ending up to a hard reboot.

Well, the data sheet only requires that the high speed OPPs are only
used below 90°C. If I understand correctly if we set the trip point to
90°C it will simply go down through the full list of OPPs. This will
clearly avoid the high speed OPPs (and potentially some low-speed
ones, but that does not harm).

So our approach "how to make it disable these two OPPs" seems to be
wrong. Rather, we have to think "make sure the temperature
stays below 90°C".

And is it true that we do not have to define mapping for the "critical"
trip points?

> 
>>> Now the different combinations:
>>> 
>>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
>>> 0 to 7.
>>> 
>>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
>> 
>> What would be the difference between  <&cpu THERMAL_NO_LIMIT 2>  and
>> <&cpu 0 2> ?
>> (if there is any)
> 
> There is no difference.
> 
> 
>>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
>>> always the cooling effect as the governor won't set it to zero thus
>>> stopping the mitigation.
>> 
>> For the purposes of the board in question, we have 4 operating points,
>> 300MHz, 600MHz, 800MHz and 1GHz.  Once the board reaches 90C, we need
>> them to cease operation at 800MHz and 1GHz and only permit operation
>> at 300MHz and 600MHz.  I am going under the assumption that the cpu
>> index[0] would be for 300MHz, index[1] = 600MHz, etc.
>> 
>> If I am interpreting your comment correctly, I should set <&cpu
>> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
>> to 600MHz and not exceed, is that correct?
> 
> Nope, it will mean the cooling device can only reduce to 800MHz and to
> 600MHz to mitigate.
> 
> Actually the thermal framework neither the kernel are designed to handle
> this case. They assume the OPPs are stable whatever the thermal situation.
> 
> That is the reason why I think it is a very interesting use case because
> it introduces a temperature constraint in addition to a duration for a
> certain OPP. IMO, that could be an extension of the turbo-mode.
> 
> With what we have now, I doubt it is feasible.
> 
> The best we can do is preventing to reach the 90°C, so we remove the OPP
> temperature constraint. I suppose 85°C is a safe temperature to stick on.
> 
> And in order to let the governor have free hand.
> 
> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>
> 
> I don't think that will have a significant impact on performances
> compared to be able to run at a higher temperature with less OPPs.
> 
> 
>>> Does it clarify the DT spec?
>>> 
>> 
>> I think your reply to my inquiry might.  If possible, it would be nice
>> to get this documented into the bindings doc for others in the future.
>> I can do it, but someone with a better understanding of the concept
>> maybe more qualified.  I can totally understand why some may want to
>> integrate this into their SoC device trees to slow the processor when
>> hot.
>> 
>> Thank you for taking the time to review this.  I appreciate it.
>> 
>> adam

BR,
Nikolaus
Adam Ford Sept. 13, 2019, 9:01 p.m. UTC | #20
On Fri, Sep 13, 2019 at 3:35 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>
> Hi Daniel,
>
> > Am 13.09.2019 um 22:11 schrieb Daniel Lezcano <daniel.lezcano@linaro.org>:
> >
> > On 13/09/2019 20:46, Adam Ford wrote:
> >> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano
> >> <daniel.lezcano@linaro.org> wrote:
> >>>
> >>> On 13/09/2019 18:51, H. Nikolaus Schaller wrote:
> >>>
> >>> [ ... ]
> >>>
> >>>>> Good news (I think)
> >>>>>
> >>>>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max
> >>>>> frequency and it returned 600MHz.
> >>>>>
> >>>>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp
> >>>>> 58500
> >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
> >>>>> 300000 600000 800000
> >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m
> >>>>> scaling_max_freq  scaling_min_freq
> >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
> >>>>> 600000
> >>>>
> >>>> looks good!
> >>>> But we have to understand what the <&cpu 1 2> exactly means...
> >>>>
> >>>> Hopefully someone reading your RFCv2 can answer...
> >>>
> >> Daniel,
> >>
> >> Thank you for replying.
> >>
> >>> I may have missed the question :)
> >>>
> >>> These are the states allowed for the cooling device (the one you can see
> >>> in the /sys/class/thermal/cooling_device0/max_state. As the logic is
> >>> inverted for cpufreq, that can be confusing.
> >>
> >> I think that's what has be confused.
> >>
> >>>
> >>> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so
> >>> when the mitigation begins the cooling device state is 0 and then the
> >>> thermal governor increase the state until it sees a cooling effect.
> >>>
> >>> If <&fan 0 2> is set, the governor won't set a state above 2 even if the
> >>> temperature increases.
> >>
> >> I am not sure I know what you mean by 'state' in this context.
> >
> > A thermal zone is managed by the thermal framework as the following:
> > - a sensor
> > - a governor
> > - a cooling device
> >
> > The governor gets the temperature via the sensor and depending on the
> > temperature it will increase or decrease the cooling effect of the
> > cooling device. With a fan, that means it will increase or decrease its
> > speed. With cpufreq, it will decrease or increase the OPP.
> >
> > These are discrete values the governor will use to set the cooling
> > effect. The state is one of these value (the current speed or the
> > current OPP index).
> >
> > Depending on the cooling device, the number of states are different.
> >
> > In the context above, the fan cooling device can be stopped (state=0),
> > running (state=1), running faster (state=2).
> >
> > As the node tells to use no more than 2, then the governor will never go
> > to running much faster (state=3). (That's an example).
> >
> >>> When the cooling driver is able to return the number of states it
> >>> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the
> >>> governor to find the balance point.
> >>
> >> If the cooling driver is using cpufreq, is the number of supported
> >> states equal to the number of operating points given to cpufreq?
> >
> > Yes, absolutely if THERMAL_NO_LIMIT is set [1] (which is what is done
> > most of the cases). Otherwise it will use the boundaries set in <&cpu
> > limit_low limit_high>
> >
> > When changing the limits, a state=1 has a different meaning.
> >
> > For example: 7 OPPs available
> >
> > <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> : state=[0..7]
> >
> > <&cpu 0 2> : state=[0..2] (1, 2)
> >
> > <&cpu 5 7> : state=[0..3] (5, 6, 7)
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/cpu_cooling.c#n334
> >
> >>> Now if the cooling device is cpufreq, the state order is inverted,
> >>> because the cooling effects happens when decreasing the OPP.
> >>>
> >>> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if
> >>> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP
> >>> etc.
> >>
> >> I am not sure how the state would be set to 2.
> >
> > That is a governor decision. Let me give an example with a hikey960
> > board which has very fast temperature transitions, so it is simpler to
> > illustrate the behavior. The trip point is 75°C.
> >
> > Imagine the CPU gets loaded 100%, the cpufreq sets the OPP to the max
> > (2.36GHz), as the temperature is still under 75°C, there is no
> > mitigation yet, so the cooling device state is 0.
> >
> > In a very few seconds the temperature reaches 75°C, that trigger the
> > monitoring of the thermal zone and the mitigation begins, then the
> > temperature continues to increase very quickly to 78°C, the governor see
> > we are above the trip point and increment the cooling device state
> > (state=>1). That leads to an OPP change from 2.36GHz to 2.11GHz.
> >
> > The governor continues to read the temperature and see the temperature
> > is still increasing (even if it is that happens more slowly), so it
> > increases the state again (state=>2). That leads to an OPP change from
> > 2.11GHz to 1.8GHz.
> >
> > The governor continues to read the temperature and see the temperature
> > decrease, it does nothing.
>
> Ah, I think our misunderstanding is that the govenor "enables" and
> "disables" a set of OPPs. Rather it goes down or up in the list if
> above or below a trip point.
>
> >
> > The governor continues to read the temperature, see the temperature
> > decreases and is below 75°C, it decrease the state (state=>1), the OPP
> > change to 2.36GHz.
> >
> > The temperature then increases, etc ...
> >
> > Actually the governors do more than that but it is for the example.
> >
> > So it is a bad idea to set boundaries for the cooling device state as
> > that may prevent the governor to take the right decision for the cooling
> > effect. Imagine in the example above, we set the max state to 1 for the
> > cooling device, that would mean the governor won't be able to stop the
> > temperature increasing, thus ending up to a hard reboot.
>
> Well, the data sheet only requires that the high speed OPPs are only
> used below 90°C. If I understand correctly if we set the trip point to
> 90°C it will simply go down through the full list of OPPs. This will
> clearly avoid the high speed OPPs (and potentially some low-speed
> ones, but that does not harm).
>
> So our approach "how to make it disable these two OPPs" seems to be
> wrong. Rather, we have to think "make sure the temperature
> stays below 90°C".
>
> And is it true that we do not have to define mapping for the "critical"
> trip points?
>
> >
> >>> Now the different combinations:
> >>>
> >>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
> >>> 0 to 7.
> >>>
> >>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
> >>
> >> What would be the difference between  <&cpu THERMAL_NO_LIMIT 2>  and
> >> <&cpu 0 2> ?
> >> (if there is any)
> >
> > There is no difference.
> >
> >
> >>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
> >>> always the cooling effect as the governor won't set it to zero thus
> >>> stopping the mitigation.
> >>
> >> For the purposes of the board in question, we have 4 operating points,
> >> 300MHz, 600MHz, 800MHz and 1GHz.  Once the board reaches 90C, we need
> >> them to cease operation at 800MHz and 1GHz and only permit operation
> >> at 300MHz and 600MHz.  I am going under the assumption that the cpu
> >> index[0] would be for 300MHz, index[1] = 600MHz, etc.
> >>
> >> If I am interpreting your comment correctly, I should set <&cpu
> >> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
> >> to 600MHz and not exceed, is that correct?
> >
> > Nope, it will mean the cooling device can only reduce to 800MHz and to
> > 600MHz to mitigate.
> >
> > Actually the thermal framework neither the kernel are designed to handle
> > this case. They assume the OPPs are stable whatever the thermal situation.
> >
> > That is the reason why I think it is a very interesting use case because
> > it introduces a temperature constraint in addition to a duration for a
> > certain OPP. IMO, that could be an extension of the turbo-mode.
> >
> > With what we have now, I doubt it is feasible.
> >
> > The best we can do is preventing to reach the 90°C, so we remove the OPP
> > temperature constraint. I suppose 85°C is a safe temperature to stick on.
> >
> > And in order to let the governor have free hand.
> >
> > <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>
> >
> > I don't think that will have a significant impact on performances
> > compared to be able to run at a higher temperature with less OPPs.

Thank you for the explanation.  I think I'll ask Tony to drop this RFC
since we have what you're proposing already in a separate series.  I
appreciate your explanations.

adam
> >
> >
> >>> Does it clarify the DT spec?
> >>>
> >>
> >> I think your reply to my inquiry might.  If possible, it would be nice
> >> to get this documented into the bindings doc for others in the future.
> >> I can do it, but someone with a better understanding of the concept
> >> maybe more qualified.  I can totally understand why some may want to
> >> integrate this into their SoC device trees to slow the processor when
> >> hot.
> >>
> >> Thank you for taking the time to review this.  I appreciate it.
> >>
> >> adam
>
> BR,
> Nikolaus
>
Daniel Lezcano Sept. 14, 2019, 9:53 a.m. UTC | #21
Hi Nikolauss,

On 13/
09/2019 22:34, H. Nikolaus Schaller wrote:

[ ... ]

>> The governor continues to read the temperature and see the temperature
>> decrease, it does nothing.
> 
> Ah, I think our misunderstanding is that the govenor "enables" and
> "disables" a set of OPPs. Rather it goes down or up in the list if
> above or below a trip point.

Right.

>> The governor continues to read the temperature, see the temperature
>> decreases and is below 75°C, it decrease the state (state=>1), the OPP
>> change to 2.36GHz.
>>
>> The temperature then increases, etc ...
>>
>> Actually the governors do more than that but it is for the example.
>>
>> So it is a bad idea to set boundaries for the cooling device state as
>> that may prevent the governor to take the right decision for the cooling
>> effect. Imagine in the example above, we set the max state to 1 for the
>> cooling device, that would mean the governor won't be able to stop the
>> temperature increasing, thus ending up to a hard reboot.
> 
> Well, the data sheet only requires that the high speed OPPs are only
> used below 90°C. If I understand correctly if we set the trip point to
> 90°C it will simply go down through the full list of OPPs. This will
> clearly avoid the high speed OPPs (and potentially some low-speed
> ones, but that does not harm).

Yes, right.

> So our approach "how to make it disable these two OPPs" seems to be
> wrong. Rather, we have to think "make sure the temperature
> stays below 90°C".

Your approach is not wrong, it proves there is a limitation in the
thermal/cpufreq framework.

There is the 'turbo mode' [1] which describes exactly what you want but
I'm not sure it is fully implemented.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/opp/opp.txt#n138

> And is it true that we do not have to define mapping for the "critical"
> trip points?

Right, you don't have to, it is optional. But the critical trip point
will make your system to shutdown in case something is going wrong, for
example an external heating source, like the sun or whatever. It is good
to set a high temperature to force the shutdown .
Usually it is below the hardware reset temperature.


>>>> Now the different combinations:
>>>>
>>>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state
>>>> 0 to 7.
>>>>
>>>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2
>>>
>>> What would be the difference between  <&cpu THERMAL_NO_LIMIT 2>  and
>>> <&cpu 0 2> ?
>>> (if there is any)
>>
>> There is no difference.
>>
>>
>>>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is
>>>> always the cooling effect as the governor won't set it to zero thus
>>>> stopping the mitigation.
>>>
>>> For the purposes of the board in question, we have 4 operating points,
>>> 300MHz, 600MHz, 800MHz and 1GHz.  Once the board reaches 90C, we need
>>> them to cease operation at 800MHz and 1GHz and only permit operation
>>> at 300MHz and 600MHz.  I am going under the assumption that the cpu
>>> index[0] would be for 300MHz, index[1] = 600MHz, etc.
>>>
>>> If I am interpreting your comment correctly, I should set <&cpu
>>> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up
>>> to 600MHz and not exceed, is that correct?
>>
>> Nope, it will mean the cooling device can only reduce to 800MHz and to
>> 600MHz to mitigate.
>>
>> Actually the thermal framework neither the kernel are designed to handle
>> this case. They assume the OPPs are stable whatever the thermal situation.
>>
>> That is the reason why I think it is a very interesting use case because
>> it introduces a temperature constraint in addition to a duration for a
>> certain OPP. IMO, that could be an extension of the turbo-mode.
>>
>> With what we have now, I doubt it is feasible.
>>
>> The best we can do is preventing to reach the 90°C, so we remove the OPP
>> temperature constraint. I suppose 85°C is a safe temperature to stick on.
>>
>> And in order to let the governor have free hand.
>>
>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>
>>
>> I don't think that will have a significant impact on performances
>> compared to be able to run at a higher temperature with less OPPs.
>>
>>
>>>> Does it clarify the DT spec?
>>>>
>>>
>>> I think your reply to my inquiry might.  If possible, it would be nice
>>> to get this documented into the bindings doc for others in the future.
>>> I can do it, but someone with a better understanding of the concept
>>> maybe more qualified.  I can totally understand why some may want to
>>> integrate this into their SoC device trees to slow the processor when
>>> hot.
>>>
>>> Thank you for taking the time to review this.  I appreciate it.
>>>
>>> adam
> 
> BR,
> Nikolaus
>
Viresh Kumar Sept. 18, 2019, 9:24 a.m. UTC | #22
On 13-09-19, 00:33, Daniel Lezcano wrote:
> 
> Hi Adam,
> 
> On 12/09/2019 23:19, Adam Ford wrote:
> > On Thu, Sep 12, 2019 at 4:12 PM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 12/09/2019 20:30, Adam Ford wrote:
> >>> The thermal sensor in the omap3 family isn't accurate, but it's
> >>> better than nothing.  The various OPP's enabled for the omap3630
> >>> support up to OPP1G, however the datasheet for the DM3730 states
> >>> that OPP130 and OPP1G are not available above TJ of 90C.
> >>>
> >>> This patch configures the thermal throttling to limit the
> >>> operating points of the omap3630 to Only OPP50 and OPP100 if
> >>> the thermal sensor reads a value above 90C.
> 
> Oh, that's a very interesting use case.
> 
> AFAICT the thermal framework is not designed to deal with this
> situation. I agree this setup may work (even if I'm not convinced about
> the stability of the whole).
> 
> May be Viresh can help for the cpufreq side?

Sorry but I am not able to understand what's not supported by thermal framework
here and what can I do to help :)
H. Nikolaus Schaller Sept. 18, 2019, 9:36 a.m. UTC | #23
Hi,

> Am 18.09.2019 um 11:24 schrieb Viresh Kumar <viresh.kumar@linaro.org>:
> 
> On 13-09-19, 00:33, Daniel Lezcano wrote:
>> 
>> Hi Adam,
>> 
>> On 12/09/2019 23:19, Adam Ford wrote:
>>> On Thu, Sep 12, 2019 at 4:12 PM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>> 
>>>> On 12/09/2019 20:30, Adam Ford wrote:
>>>>> The thermal sensor in the omap3 family isn't accurate, but it's
>>>>> better than nothing.  The various OPP's enabled for the omap3630
>>>>> support up to OPP1G, however the datasheet for the DM3730 states
>>>>> that OPP130 and OPP1G are not available above TJ of 90C.
>>>>> 
>>>>> This patch configures the thermal throttling to limit the
>>>>> operating points of the omap3630 to Only OPP50 and OPP100 if
>>>>> the thermal sensor reads a value above 90C.
>> 
>> Oh, that's a very interesting use case.
>> 
>> AFAICT the thermal framework is not designed to deal with this
>> situation. I agree this setup may work (even if I'm not convinced about
>> the stability of the whole).
>> 
>> May be Viresh can help for the cpufreq side?
> 
> Sorry but I am not able to understand what's not supported by thermal framework
> here and what can I do to help :)

Well, it appears that it is not supported that the thermal
framework can enable/disable individual OPPs. Rather it disables
more and more of them from highest index to lowest (like a
fan spinning up in speed).

But it has turned out that we do not need that. It was only
a too verbatim interpretation of the data sheet which says
that certain OPPs (OPP130 and OPP1G) must not be enabled above
a thermal limit of 90°C while the others can be used above.

But just going down stepwise to lower frequency OPPs as long
as above thermal limit is fine for practical situations.

So instead of "limit the operating points of the omap3630 to
Only OPP50 and OPP100 if the thermal sensor reads a value above
90C" we can just "go down from OPP1G -> OPP130 -> OPP100 -> OPP50
until the thermal sensors reads a value below 90C". And
that is easily done with existing cpufreq and thermal framework
(as we have tested).

This will of course also exclude OPP100 and OPP50 above 90°C
where no thermal limit is given in the data sheet, but as far
as we know it is impossible (without running the board in a thermal
chamber) to get the chip above 90C in these low-frequency/voltage
OPPs.

Hope this clarifies.

BR and thanks,
Nikolaus
Daniel Lezcano Sept. 18, 2019, 9:37 a.m. UTC | #24
On 18/09/2019 11:24, Viresh Kumar wrote:
> On 13-09-19, 00:33, Daniel Lezcano wrote:
>>
>> Hi Adam,
>>
>> On 12/09/2019 23:19, Adam Ford wrote:
>>> On Thu, Sep 12, 2019 at 4:12 PM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> On 12/09/2019 20:30, Adam Ford wrote:
>>>>> The thermal sensor in the omap3 family isn't accurate, but it's
>>>>> better than nothing.  The various OPP's enabled for the omap3630
>>>>> support up to OPP1G, however the datasheet for the DM3730 states
>>>>> that OPP130 and OPP1G are not available above TJ of 90C.
>>>>>
>>>>> This patch configures the thermal throttling to limit the
>>>>> operating points of the omap3630 to Only OPP50 and OPP100 if
>>>>> the thermal sensor reads a value above 90C.
>>
>> Oh, that's a very interesting use case.
>>
>> AFAICT the thermal framework is not designed to deal with this
>> situation. I agree this setup may work (even if I'm not convinced about
>> the stability of the whole).
>>
>> May be Viresh can help for the cpufreq side?
> 
> Sorry but I am not able to understand what's not supported by thermal framework
> here and what can I do to help :)

The solution of preventing running above the 90°C by changing the OPPs
is fine and works. It is a way of workaround the spec.

But AFAIU, the specs of the board say the OPPs 800MHz and 1GHz are only
permitted during an amount of time above 90°C which makes the constraint
inverted and recall somehow the 'turbo-mode' description:

"
- turbo-mode: Marks the OPP to be used only for turbo modes. Turbo mode
is available on some platforms, where the device can run over its
operating frequency for a short duration of time limited by the device's
power, current and thermal limits.
"

This is where I thought you can give an input.

Patch
diff mbox series

diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi
index 4bb4f534afe2..58b9d347019f 100644
--- a/arch/arm/boot/dts/omap36xx.dtsi
+++ b/arch/arm/boot/dts/omap36xx.dtsi
@@ -25,6 +25,7 @@ 
 
 			vbb-supply = <&abb_mpu_iva>;
 			clock-latency = <300000>; /* From omap-cpufreq driver */
+			#cooling-cells = <2>;
 		};
 	};
 
@@ -195,6 +196,31 @@ 
 	};
 };
 
+&cpu_thermal {
+	cpu_trips: trips {
+		/* OPP130 and OPP1G are not available above TJ of 90C. */
+		cpu_alert0: cpu_alert {
+			temperature = <90000>; /* millicelsius */
+			hysteresis = <2000>; /* millicelsius */
+			type = "passive";
+		};
+
+		cpu_crit: cpu_crit {
+			temperature = <125000>; /* millicelsius */
+			hysteresis = <2000>; /* millicelsius */
+			type = "critical";
+		};
+	};
+
+	cpu_cooling_maps: cooling-maps {
+		map0 {
+			trip = <&cpu_alert0>;
+			/* Only allow OPP50 and OPP100 */
+			cooling-device = <&cpu 0 1>;
+		};
+	};
+};
+
 /* OMAP3630 needs dss_96m_fck for VENC */
 &venc {
 	clocks = <&dss_tv_fck>, <&dss_96m_fck>;