diff mbox

[RFC] OPP: Redefine bindings to overcome shortcomings

Message ID 52c403454c3b8fc201abe7ac74cf657638479311.1417691389.git.viresh.kumar@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Viresh Kumar Dec. 4, 2014, 11:14 a.m. UTC
Hi Rob, et al..

Current OPP (Operating performance point) DT bindings are proven to be
insufficient at multiple instances.

There had been multiple band-aid approaches to get them fixed (The latest one
being: http://www.mail-archive.com/devicetree@vger.kernel.org/msg53398.html).
For obvious reasons Rob rejected them and shown the right path forward. And this
is the first try to get those with a pen and paper.

The shortcomings we are trying to solve here:

- Some kind of compatibility string to probe the right cpufreq driver for
  platforms, when multiple drivers are available. For example: how to choose
  between cpufreq-dt and arm_big_little drivers.

- Getting clock sharing information between CPUs. Single shared clock vs.
  independent clock per core vs. shared clock per cluster.

- Support for turbo modes

- Other per OPP settings: transition latencies, disabled status, etc.?

The below document should be enough to describe how I am trying to fix these.
Please let me know what all I need to fix, surely there would be lots of
obstacles. I am prepared to get beaten up :)

I accept in advance that naming is extremely bad here, I need some suggestions
for sure.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 Documentation/devicetree/bindings/power/opp.txt | 147 ++++++++++++++++++++++++
 1 file changed, 147 insertions(+)

Comments

Lucas Stach Dec. 4, 2014, 11:34 a.m. UTC | #1
Hi Viresh,

not commenting on the overall structure as I have to think a bit more
about this. But small comments below.

Am Donnerstag, den 04.12.2014, 16:44 +0530 schrieb Viresh Kumar:
> Hi Rob, et al..
> 
> Current OPP (Operating performance point) DT bindings are proven to be
> insufficient at multiple instances.
> 
> There had been multiple band-aid approaches to get them fixed (The latest one
> being: http://www.mail-archive.com/devicetree@vger.kernel.org/msg53398.html).
> For obvious reasons Rob rejected them and shown the right path forward. And this
> is the first try to get those with a pen and paper.
> 
> The shortcomings we are trying to solve here:
> 
> - Some kind of compatibility string to probe the right cpufreq driver for
>   platforms, when multiple drivers are available. For example: how to choose
>   between cpufreq-dt and arm_big_little drivers.
> 
> - Getting clock sharing information between CPUs. Single shared clock vs.
>   independent clock per core vs. shared clock per cluster.
> 
> - Support for turbo modes
> 
> - Other per OPP settings: transition latencies, disabled status, etc.?
> 
> The below document should be enough to describe how I am trying to fix these.
> Please let me know what all I need to fix, surely there would be lots of
> obstacles. I am prepared to get beaten up :)
> 
> I accept in advance that naming is extremely bad here, I need some suggestions
> for sure.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  Documentation/devicetree/bindings/power/opp.txt | 147 ++++++++++++++++++++++++
>  1 file changed, 147 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> index 74499e5..5efd8d4 100644
> --- a/Documentation/devicetree/bindings/power/opp.txt
> +++ b/Documentation/devicetree/bindings/power/opp.txt
> @@ -4,6 +4,153 @@ SoCs have a standard set of tuples consisting of frequency and
>  voltage pairs that the device will support per voltage domain. These
>  are called Operating Performance Points or OPPs.
>  
> +This documents defines OPP bindings with its required/optional properties.
> +OPPs can be defined for any device, this file uses CPU device as an example to
> +illustrate how to define OPPs.
> +
> +linux,operating-points, opp-lists and opps:
> +
> +- linux,operating-points:
> +  Container of all OPP nodes.
> +
> +  Required properties:
> +  - opp nodes (explained below)
> +
> +  Optional properties:
> +  - compatible: allow OPPs to express their compatibility with devices
> +
> +
> +- opp-list@*:
> +  List of nodes defining performance points. Following belong to the nodes
> +  within the opp-lists.
> +
> +  Required properties:
> +  - frequency-kHz: Frequency in kHz
> +  - voltage-uV: voltage in micro Volts
> +
> +  Optional properties:
> +  - turbo-mode: Marks the volt-freq pair as turbo pair.
> +  - status: Marks the node enabled/disabled.

What about devices with multiple different turbo states? We have seen
CPUs that boost to different states in the x86 world, surely we will
encounter something like this in the ARM world too. Do we just mark them
all as turbo OPPs and let the driver decide what to do? If we want to
keep using cpufreq-dt for as much devices as possible is it really
sufficient to know that this is a turbo state, without knowing the
conditions required for activating the state?

> +
> +
> +- opp@*:
> +  Operating performance point node per device. Multiple devices sharing it can
> +  use its phandle in their 'opp' property.
> +
> +  Required properties:
> +  - opp-list: phandle to opp-list defined above.
> +
> +  Optional properties:
> +  - clocks: Tuple of clock providers
> +  - clock-names: Clock names
> +  - opp-supply: phandle to the parent supply/regulator node
> +  - voltage-tolerance: Specify the CPU voltage tolerance in percentage.

This is extremely ill defined. It doesn't say in which direction the
tolerance is to be applied. Can you go below or above the OPP specified
voltage? For now everyone just assumes that it has to work both ways.
Also with this binding the tolerance is applied for all OPPs, where is
very much depends on the individual OPP.

If you are going to redefine OPPs anyway I would really like to see this
property die and rather have a min/max voltage per OPP. That way you can
properly express the OPP constraints. Most OPPs will likely allow a much
higher voltage than their minimal specified one, except when you go over
thermal limits with a high clock/voltage combination.

> +  - clock-latency: Specify the possible maximum transition latency for clock,
> +    in unit of nanoseconds.

Why do we need this? This is property of the clock. We should be able to
handle this completely internally in the kernel. I don't know if the
clock API has something like this right now, but it should be a trivial
addition.

Regards,
Lucas
Viresh Kumar Dec. 4, 2014, 2:07 p.m. UTC | #2
Hi Lucas,

On 4 December 2014 at 17:04, Lucas Stach <l.stach@pengutronix.de> wrote:

>> +- opp-list@*:
>> +  List of nodes defining performance points. Following belong to the nodes
>> +  within the opp-lists.
>> +
>> +  Required properties:
>> +  - frequency-kHz: Frequency in kHz
>> +  - voltage-uV: voltage in micro Volts
>> +
>> +  Optional properties:
>> +  - turbo-mode: Marks the volt-freq pair as turbo pair.
>> +  - status: Marks the node enabled/disabled.
>
> What about devices with multiple different turbo states? We have seen

You mean that a state may or maynot be turbo at some point of time ?

> CPUs that boost to different states in the x86 world, surely we will
> encounter something like this in the ARM world too. Do we just mark them
> all as turbo OPPs and let the driver decide what to do? If we want to

Maybe yes. But the good thing about binding this time is, it is expandable.
So, if there is a future need that we can't think of today, then we can surely
do incremental changes here.

> keep using cpufreq-dt for as much devices as possible is it really

Its not about cpufreq-dt alone. We maybe using other drivers as well..

> sufficient to know that this is a turbo state, without knowing the
> conditions required for activating the state?

Can you elaborate more on this? If something is required and we
know what exactly it is, then we can put up the right binding right
now as well..

>> +- opp@*:
>> +  Operating performance point node per device. Multiple devices sharing it can
>> +  use its phandle in their 'opp' property.
>> +
>> +  Required properties:
>> +  - opp-list: phandle to opp-list defined above.
>> +
>> +  Optional properties:
>> +  - clocks: Tuple of clock providers
>> +  - clock-names: Clock names
>> +  - opp-supply: phandle to the parent supply/regulator node
>> +  - voltage-tolerance: Specify the CPU voltage tolerance in percentage.
>
> This is extremely ill defined. It doesn't say in which direction the
> tolerance is to be applied. Can you go below or above the OPP specified
> voltage? For now everyone just assumes that it has to work both ways.

Yes, the binding is as per today's requirements (or rather implementations).
So it is both ways. But if everybody agrees on it, we can improve it..

> Also with this binding the tolerance is applied for all OPPs, where is
> very much depends on the individual OPP.

Hmm, Not only this but the same is true for clock latency as well. We
*may* need that per opp node sometime..

> If you are going to redefine OPPs anyway I would really like to see this
> property die and rather have a min/max voltage per OPP. That way you can

Maybe yes.

> properly express the OPP constraints. Most OPPs will likely allow a much
> higher voltage than their minimal specified one, except when you go over
> thermal limits with a high clock/voltage combination.

Yes.

>> +  - clock-latency: Specify the possible maximum transition latency for clock,
>> +    in unit of nanoseconds.
>
> Why do we need this? This is property of the clock. We should be able to
> handle this completely internally in the kernel. I don't know if the
> clock API has something like this right now, but it should be a trivial
> addition.

This is not only clock's latency, but is somehow named this way. This should
give the time it takes to change from frequency A to frequency B, which include
change in supplies as well.. So, this probably is dvfs-latency ..

This is required by cpufreq right now, but would be useful for the energy aware
scheduler as well. So, yes this is important. Also, it might also be required to
be per OPP...

Probably we can use voltage-tolerance and clock-latency at both
levels. list-level
and OPP level. list level being at higher priority ?

Thanks for your quick comments :)
Mark Brown Dec. 4, 2014, 5:18 p.m. UTC | #3
On Thu, Dec 04, 2014 at 12:34:28PM +0100, Lucas Stach wrote:
> Am Donnerstag, den 04.12.2014, 16:44 +0530 schrieb Viresh Kumar:

> > +  - voltage-tolerance: Specify the CPU voltage tolerance in percentage.

> This is extremely ill defined. It doesn't say in which direction the
> tolerance is to be applied. Can you go below or above the OPP specified
> voltage? For now everyone just assumes that it has to work both ways.
> Also with this binding the tolerance is applied for all OPPs, where is
> very much depends on the individual OPP.

Almost all specifications for voltages are done as either min/typ/max or
+/- a target voltage.

> If you are going to redefine OPPs anyway I would really like to see this
> property die and rather have a min/max voltage per OPP. That way you can
> properly express the OPP constraints. Most OPPs will likely allow a much
> higher voltage than their minimal specified one, except when you go over
> thermal limits with a high clock/voltage combination.

If you've got a minimum and maximum you also need to specify a target,
generally it's going to be better to go for the target voltage which may
not be the midpoint and is unlikely to be one of the bounds.  I do think
it's sensible to have the option of doing both to more closely match
datasheets.

> > +  - clock-latency: Specify the possible maximum transition latency for clock,
> > +    in unit of nanoseconds.

> Why do we need this? This is property of the clock. We should be able to
> handle this completely internally in the kernel. I don't know if the
> clock API has something like this right now, but it should be a trivial
> addition.

Or have it be part of the clock binding at any rate.
Viresh Kumar Dec. 5, 2014, 5:24 a.m. UTC | #4
On 4 December 2014 at 19:37, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> This is not only clock's latency, but is somehow named this way. This should
> give the time it takes to change from frequency A to frequency B, which include
> change in supplies as well.. So, this probably is dvfs-latency ..

Oops. No this is just clock-latency. We are calculating voltage-latency
separately.
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
index 74499e5..5efd8d4 100644
--- a/Documentation/devicetree/bindings/power/opp.txt
+++ b/Documentation/devicetree/bindings/power/opp.txt
@@ -4,6 +4,153 @@  SoCs have a standard set of tuples consisting of frequency and
 voltage pairs that the device will support per voltage domain. These
 are called Operating Performance Points or OPPs.
 
+This documents defines OPP bindings with its required/optional properties.
+OPPs can be defined for any device, this file uses CPU device as an example to
+illustrate how to define OPPs.
+
+linux,operating-points, opp-lists and opps:
+
+- linux,operating-points:
+  Container of all OPP nodes.
+
+  Required properties:
+  - opp nodes (explained below)
+
+  Optional properties:
+  - compatible: allow OPPs to express their compatibility with devices
+
+
+- opp-list@*:
+  List of nodes defining performance points. Following belong to the nodes
+  within the opp-lists.
+
+  Required properties:
+  - frequency-kHz: Frequency in kHz
+  - voltage-uV: voltage in micro Volts
+
+  Optional properties:
+  - turbo-mode: Marks the volt-freq pair as turbo pair.
+  - status: Marks the node enabled/disabled.
+
+
+- opp@*:
+  Operating performance point node per device. Multiple devices sharing it can
+  use its phandle in their 'opp' property.
+
+  Required properties:
+  - opp-list: phandle to opp-list defined above.
+
+  Optional properties:
+  - clocks: Tuple of clock providers
+  - clock-names: Clock names
+  - opp-supply: phandle to the parent supply/regulator node
+  - voltage-tolerance: Specify the CPU voltage tolerance in percentage.
+  - clock-latency: Specify the possible maximum transition latency for clock,
+    in unit of nanoseconds.
+
+Example: Multi-cluster system with separate clock lines for clusters. All CPUs
+         in the clusters share same clock lines.
+
+/ {
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		linux,operating-points {
+			compatible = "linux,cpufreq-dt";
+
+			opp-list0: opp-list@0 {
+				{
+					frequency-kHz = <1000000>;
+					voltage-uV = <975000>;
+					status = "okay";
+				};
+				{
+					frequency-kHz = <1100000>;
+					voltage-uV = <1000000>;
+					status = "okay";
+				};
+				{
+					frequency-kHz = <1200000>;
+					voltage-uV = <1025000>;
+					status = "okay";
+					turbo-mode;
+				};
+			};
+
+			opp-list1: opp-list@1 {
+				{
+					frequency-kHz = <1300000>;
+					voltage-uV = <1050000>;
+					status = "okay";
+				};
+				{
+					frequency-kHz = <1400000>;
+					voltage-uV = <1075000>;
+					status = "disabled";
+				};
+				{
+					frequency-kHz = <1500000>;
+					voltage-uV = <1100000>;
+					status = "okay";
+					turbo-mode;
+				};
+			};
+
+			opp0: opp@0 {
+				clocks = <&clk-controller 0>;
+				clock-names = "cpu";
+				opp-supply = <&cpu-supply0>;
+				voltage-tolerance = <2>; /* percentage */
+				clock-latency = <300000>;
+				opp-list = <&opp-list0>;
+			};
+
+			opp1: opp@1 {
+				clocks = <&clk-controller 1>;
+				clock-names = "cpu";
+				opp-supply = <&cpu-supply1>;
+				voltage-tolerance = <2>; /* percentage */
+				clock-latency = <400000>;
+				opp-list = <&opp-list1>;
+			};
+		};
+
+		cpu@0 {
+			compatible = "arm,cortex-a7";
+			reg = <0>;
+			next-level-cache = <&L2>;
+			opps = <opp0>;
+		};
+
+		cpu@1 {
+			compatible = "arm,cortex-a7";
+			reg = <1>;
+			next-level-cache = <&L2>;
+			opps = <opp0>;
+		};
+
+		cpu@100 {
+			compatible = "arm,cortex-a15";
+			reg = <100>;
+			next-level-cache = <&L2>;
+			opps = <opp1>;
+		};
+
+		cpu@101 {
+			compatible = "arm,cortex-a15";
+			reg = <101>;
+			next-level-cache = <&L2>;
+			opps = <opp1>;
+		};
+	};
+};
+
+
+
+Deprecated Bindings
+-------------------
+
 Properties:
 - operating-points: An array of 2-tuples items, and each item consists
   of frequency and voltage like <freq-kHz vol-uV>.