[V4,1/3] OPP: Redefine bindings to overcome shortcomings

Message ID	d225e73f183e01fa0b71e4b9248b6a19a3f7d697.1430394884.git.viresh.kumar@linaro.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Viresh Kumar <viresh.kumar@linaro.org> To: Rafael Wysocki <rjw@rjwysocki.net>, rob.herring@linaro.org, arnd.bergmann@linaro.org, nm@ti.com, broonie@kernel.org, mike.turquette@linaro.org, sboyd@codeaurora.org Subject: [PATCH V4 1/3] OPP: Redefine bindings to overcome shortcomings Date: Thu, 30 Apr 2015 17:37:59 +0530 Message-Id: <d225e73f183e01fa0b71e4b9248b6a19a3f7d697.1430394884.git.viresh.kumar@linaro.org> In-Reply-To: <cover.1430394884.git.viresh.kumar@linaro.org> References: <cover.1430394884.git.viresh.kumar@linaro.org> In-Reply-To: <cover.1430394884.git.viresh.kumar@linaro.org> References: <cover.1430394884.git.viresh.kumar@linaro.org> Cc: thomas.petazzoni@free-electrons.com, devicetree@vger.kernel.org, kesavan.abhilash@gmail.com, linaro-kernel@lists.linaro.org, ta.omasab@gmail.com, linux-pm@vger.kernel.org, viswanath.puttagunta@linaro.org, Viresh Kumar <viresh.kumar@linaro.org>, santosh.shilimkar@oracle.com, olof@lixom.net, khilman@linaro.org, Sudeep.Holla@arm.com, grant.likely@linaro.org, linux-arm-kernel@lists.infradead.org, l.stach@pengutronix.de Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Viresh Kumar April 30, 2015, 12:07 p.m. UTC

Current OPP (Operating performance point) DT bindings are proven to be
insufficient at multiple instances.

The shortcomings we are trying to solve here:

- Getting clock sharing information between CPUs. Single shared clock vs
  independent clock per core vs shared clock per cluster.

- Support for specifying current levels along with voltages.

- Support for multiple regulators.

- Support for turbo modes.

- Other per OPP settings: transition latencies, disabled status, etc.?

- Expandability of OPPs in future.

This patch introduces new bindings "operating-points-v2" to get these problems
solved. Refer to the bindings for more details.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++-
 1 file changed, 362 insertions(+), 4 deletions(-)

Mark Brown May 4, 2015, 12:12 p.m. UTC | #1

On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:

> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's current is specified with an array of size one or three.
> +  Single entry is for target current and three entries are for <target min max>
> +  currents.

What is this for - are you trying to define OPPs for current regulators?
If you are that's worrying, I can't think of a sensible use case.  If
that's not what's happening then the binding needs to be more specific
about what this is supposed to mean.

Viresh Kumar May 5, 2015, 10:48 a.m. UTC | #2

On 4 May 2015 at 17:42, Mark Brown <broonie@kernel.org> wrote:
> On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:
>
>> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
>> +  regulators.
>> +
>> +  A single regulator's current is specified with an array of size one or three.
>> +  Single entry is for target current and three entries are for <target min max>
>> +  currents.
>
> What is this for - are you trying to define OPPs for current regulators?
> If you are that's worrying, I can't think of a sensible use case.  If
> that's not what's happening then the binding needs to be more specific
> about what this is supposed to mean.

Its another property of the same voltage regulator we are using in
opp-microvolt.
I hope that makes sense ?

And that's why I wrote this as well:

+  Entries for multiple regulators must be present in the same order as
+  regulators are specified in device's DT node. If few regulators don't provide
+  capability to configure current, then values for then should be marked as
+  zero.

i.e. this is optional but the voltage regulator isn't..

But, perhaps I need to write it with more clarity? Or this field doesn't make
sense ?

FWIW, its an outcome of this request from Stephen:

https://www.marc.info/?l=linux-arm-kernel&m=142370250522589&w=3

Mark Brown May 5, 2015, 10:57 a.m. UTC | #3

On Tue, May 05, 2015 at 04:18:59PM +0530, Viresh Kumar wrote:
> On 4 May 2015 at 17:42, Mark Brown <broonie@kernel.org> wrote:
> > On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:

> >> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
> >> +  regulators.
> >> +
> >> +  A single regulator's current is specified with an array of size one or three.
> >> +  Single entry is for target current and three entries are for <target min max>
> >> +  currents.

> > What is this for - are you trying to define OPPs for current regulators?
> > If you are that's worrying, I can't think of a sensible use case.  If
> > that's not what's happening then the binding needs to be more specific
> > about what this is supposed to mean.

> Its another property of the same voltage regulator we are using in
> opp-microvolt.
> I hope that makes sense ?

No, it doesn't - you're not answering the question about what this is
for.

> But, perhaps I need to write it with more clarity? Or this field doesn't make
> sense ?

To know if this makes sense I need to know what you beleive "setting the
current" does.  If you literally mean setting the current it makes no
sense at all.  If you mean something else that something else should
probably be written into the binding.

Nishanth Menon May 11, 2015, 1:02 a.m. UTC | #4

On 04/30/2015 07:07 AM, Viresh Kumar wrote:
> Current OPP (Operating performance point) DT bindings are proven to be
> insufficient at multiple instances.
> 
> The shortcomings we are trying to solve here:
> 
> - Getting clock sharing information between CPUs. Single shared clock vs
>   independent clock per core vs shared clock per cluster.
> 
> - Support for specifying current levels along with voltages.
> 
> - Support for multiple regulators.
> 
> - Support for turbo modes.
> 
> - Other per OPP settings: transition latencies, disabled status, etc.?
> 
> - Expandability of OPPs in future.
> 
> This patch introduces new bindings "operating-points-v2" to get these problems
> solved. Refer to the bindings for more details.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++-
>  1 file changed, 362 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> index 74499e5033fc..3b67a5c8d965 100644
> --- a/Documentation/devicetree/bindings/power/opp.txt
> +++ b/Documentation/devicetree/bindings/power/opp.txt
> @@ -1,8 +1,366 @@
> -* Generic OPP Interface
> +Generic OPP (Operating Performance Points) Bindings
> +----------------------------------------------------
>  
> -SoCs have a standard set of tuples consisting of frequency and
> -voltage pairs that the device will support per voltage domain. These
> -are called Operating Performance Points or OPPs.
> +Devices work at voltage-current-frequency triplets and some implementations have
> +the liberty of choosing these. These triplets are called Operating Performance

I suggest we dont call them as triplets either -> note, we have added
clk transition latency per OPP, so they are not exactly triplets either.

> +Points aka OPPs. This document defines bindings for these OPPs applicable across
> +wide range of devices. For illustration purpose, this document uses CPU as a
> +device.
> +
> +

Would be good to point to operating-points (the legacy document) as
well. It might be good to state that only one of the properties should
be used for the device node OPP description.

> +* Property: operating-points-v2
> +
> +Devices supporting OPPs must set their "operating-points-v2" property with
> +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle
> +to find the operating points for the device.
> +
> +
> +* OPP Descriptor Node
> +
> +This describes the OPPs belonging to a device. This node can have following
> +properties:
> +
> +Required properties:
> +- compatible: Allow OPPs to express their compatibility. It should be:
> +  "operating-points-v2".
> +- OPP nodes: One or more OPP nodes describing voltage-current-frequency
> +  triplets. Their name isn't significant but their phandle can be used to

we might want to use something generic instead of "triplets" here. ->
considering that voltage, current are optional properties, we may not
even need to describe a triplet.

> +  reference an OPP.
> +
> +Optional properties:
> +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> +  Missing property means devices have independent clock/voltage/current lines,
> +  but they share OPP tables.

Is'nt this property redundant? by the fact that phandle is reused for
cpu1 should indicate shared OPP table, correct?

> +
> +
> +* OPP Node
> +
> +This defines voltage-current-frequency triplets along with other related
> +properties.
> +
> +Required properties:
> +- opp-khz: Frequency in kHz
> +
> +Optional properties:
> +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's voltage is specified with an array of size one or three.
> +  Single entry is for target voltage and three entries are for <target min max>
> +  voltages.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node.
> +
> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's current is specified with an array of size one or three.
> +  Single entry is for target current and three entries are for <target min max>
> +  currents.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node. If few regulators don't provide
> +  capability to configure current, then values for then should be marked as
> +  zero.


> +
> +- clock-latency-ns: Specifies the maximum possible transition latency (in
> +  nanoseconds) for switching to this OPP from any other OPP.
> +- turbo-mode: Marks the OPP to be used only for turbo modes.

Might be great to add some description for what "turbo mode" means here.

That said, flexibility of this approach is awesome, thanks for doing
this, I believe we did have a discussion about "safe OPP" for thermal
behavior -> now that we have phandles for OPPs, we  can give phandle
pointer to the thermal framework. in addition, we can also use phandle
for marking throttling information in thermal framework instead of
indices as well. which allows a lot of flexibility.

in some systems, we'd have need to mark certain OPPs for entry into
suspend(tegra?) or during shutdown (for example) - do we allow for such
description as phandle pointer back to the OPP nodes (from cpu0 device)
OR do we describe them as additional properties?

> +- status: Marks the node enabled/disabled.

One question we'd probably want the new framework to address is the need
for OPPs based on Efuse options - Am I correct in believing that the
solution that iMX, and TI SoCs probably need is something similar to
modifying the status to disabled? or do we have Vendor specfic modifier
properties that would be allowed?


One additional behavior need I noticed in various old threads is a need
to restrict transition OPPs -> certain SoCs might have constraints on
next OPP they can transition from certain OPPs in-order to achieve a new
OPP. again, such behavior could be phandle lists OR properties that link
the chain up together. Number of such needs did sound less, but still
just thought to bring it up.

> +
> +Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
> +
> +/ {
> +	cpus {
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		cpu@0 {
> +			compatible = "arm,cortex-a9";
> +			reg = <0>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 0>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply0>;
> +			operating-points-v2 = <&cpu0_opp>;
> +		};
> +
> +		cpu@1 {
> +			compatible = "arm,cortex-a9";
> +			reg = <1>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 0>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply0>;
> +			operating-points-v2 = <&cpu0_opp>;
> +		};
> +	};
> +
> +	cpu0_opp: opp0 {
> +		compatible = "operating-points-v2";
> +		shared-opp;
> +
> +		entry00 {
> +			opp-khz = <1000000>;
> +			opp-microvolt = <970000 975000 985000>;
> +			opp-microamp = <70000 75000 85000>;
> +			clock-latency-ns = <300000>;
> +		};
> +		entry01 {
> +			opp-khz = <1100000>;
> +			opp-microvolt = <980000 1000000 1010000>;
> +			opp-microamp = <80000 81000 82000>;
> +			clock-latency-ns = <310000>;
> +		};
> +		entry02 {
> +			opp-khz = <1200000>;
> +			opp-microvolt = <1025000>;
> +			clock-latency-ns = <290000>;
> +			turbo-mode;
> +		};
> +	};
> +};
> +
> +Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states
> +independently.
> +
> +/ {
> +	cpus {
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		cpu@0 {
> +			compatible = "qcom,krait";
> +			reg = <0>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 0>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply0>;
> +			operating-points-v2 = <&cpu0_opp>;
> +		};
> +
> +		cpu@1 {
> +			compatible = "qcom,krait";
> +			reg = <1>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 1>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply1>;
> +			operating-points-v2 = <&cpu0_opp>;
> +		};
> +
> +		cpu@2 {
> +			compatible = "qcom,krait";
> +			reg = <2>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 2>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply2>;
> +			operating-points-v2 = <&cpu0_opp>;
> +		};
> +
> +		cpu@3 {
> +			compatible = "qcom,krait";
> +			reg = <3>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 3>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply3>;
> +			operating-points-v2 = <&cpu0_opp>;
> +		};
> +	};
> +
> +	cpu0_opp: opp0 {
> +		compatible = "operating-points-v2";
> +
> +		/*
> +		 * Missing shared-opp property means CPUs switch DVFS states
> +		 * independently.
> +		 */
> +
> +		entry00 {
> +			opp-khz = <1000000>;
> +			opp-microvolt = <970000 975000 985000>;
> +			opp-microamp = <70000 75000 85000>;
> +			clock-latency-ns = <300000>;
> +		};
> +		entry01 {
> +			opp-khz = <1100000>;
> +			opp-microvolt = <980000 1000000 1010000>;
> +			opp-microamp = <80000 81000 82000>;
> +			clock-latency-ns = <310000>;
> +		};
> +		entry02 {
> +			opp-khz = <1200000>;
> +			opp-microvolt = <1025000>;
> +			opp-microamp = <90000;
> +			lock-latency-ns = <290000>;
> +			turbo-mode;
> +		};
> +	};
> +};
> +
> +Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch
> +DVFS state together.
> +
> +/ {
> +	cpus {
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		cpu@0 {
> +			compatible = "arm,cortex-a7";
> +			reg = <0>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 0>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply0>;
> +			operating-points-v2 = <&cluster0_opp>;
> +		};
> +
> +		cpu@1 {
> +			compatible = "arm,cortex-a7";
> +			reg = <1>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 0>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply0>;
> +			operating-points-v2 = <&cluster0_opp>;
> +		};
> +
> +		cpu@100 {
> +			compatible = "arm,cortex-a15";
> +			reg = <100>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 1>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply1>;
> +			operating-points-v2 = <&cluster1_opp>;
> +		};
> +
> +		cpu@101 {
> +			compatible = "arm,cortex-a15";
> +			reg = <101>;
> +			next-level-cache = <&L2>;
> +			clocks = <&clk_controller 1>;
> +			clock-names = "cpu";
> +			opp-supply = <&cpu_supply1>;
> +			operating-points-v2 = <&cluster1_opp>;
> +		};
> +	};
> +
> +	cluster0_opp: opp0 {
> +		compatible = "operating-points-v2";
> +		shared-opp;
> +
> +		entry00 {
> +			opp-khz = <1000000>;
> +			opp-microvolt = <970000 975000 985000>;
> +			opp-microamp = <70000 75000 85000>;
> +			clock-latency-ns = <300000>;
> +		};
> +		entry01 {
> +			opp-khz = <1100000>;
> +			opp-microvolt = <980000 1000000 1010000>;
> +			opp-microamp = <80000 81000 82000>;
> +			clock-latency-ns = <310000>;
> +		};
> +		entry02 {
> +			opp-khz = <1200000>;
> +			opp-microvolt = <1025000>;
> +			opp-microamp = <90000>;
> +			clock-latency-ns = <290000>;
> +			turbo-mode;
> +		};
> +	};
> +
> +	cluster1_opp: opp1 {
> +		compatible = "operating-points-v2";
> +		shared-opp;
> +
> +		entry10 {
> +			opp-khz = <1300000>;
> +			opp-microvolt = <1045000 1050000 1055000>;
> +			opp-microamp = <95000 100000 105000>;
> +			clock-latency-ns = <400000>;
> +		};
> +		entry11 {
> +			opp-khz = <1400000>;
> +			opp-microvolt = <1075000>;
> +			opp-microamp = <100000>;
> +			clock-latency-ns = <400000>;
> +		};
> +		entry12 {
> +			opp-khz = <1500000>;
> +			opp-microvolt = <1010000 1100000 1110000>;
> +			opp-microamp = <95000 100000 105000>;
> +			clock-latency-ns = <400000>;
> +			turbo-mode;
> +		};
> +	};
> +};
> +
> +Example 4: Handling multiple regulators
> +
> +/ {
> +	cpus {
> +		cpu@0 {
> +			compatible = "arm,cortex-a7";
> +			...
> +
> +			opp-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>;
> +			operating-points-v2 = <&cpu0_opp>;
> +		};
> +	};
> +
> +	cpu0_opp: opp0 {
> +		compatible = "operating-points-v2";
> +		shared-opp;
> +
> +		entry00 {
> +			opp-khz = <1000000>;
> +			opp-microvolt = <970000>, /* Supply 0 */
> +					<960000>, /* Supply 1 */
> +					<960000>; /* Supply 2 */
> +			opp-microamp =  <70000>,  /* Supply 0 */
> +					<70000>,  /* Supply 1 */
> +					<70000>;  /* Supply 2 */
> +			clock-latency-ns = <300000>;
> +		};
> +
> +		/* OR */
> +
> +		entry00 {
> +			opp-khz = <1000000>;
> +			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
> +					<960000 965000 975000>, /* Supply 1 */
> +					<960000 965000 975000>; /* Supply 2 */
> +			opp-microamp =  <70000  75000  85000>,  /* Supply 0 */
> +					<70000  75000  85000>,  /* Supply 1 */
> +					<70000  75000  85000>;  /* Supply 2 */
> +			clock-latency-ns = <300000>;
> +		};
> +
> +		/* OR */
> +
> +		entry00 {
> +			opp-khz = <1000000>;
> +			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
> +					<960000 965000 975000>, /* Supply 1 */
> +					<960000 965000 975000>; /* Supply 2 */
> +			opp-microamp =  <70000  75000  85000>,  /* Supply 0 */
> +					<0	0      0>,	/* Supply 1 doesn't support current change */
> +					<70000  75000  85000>;  /* Supply 2 */
> +			clock-latency-ns = <300000>;
> +		};
> +	};
> +};
> +
> +
> +Deprecated Bindings
> +-------------------
>  
>  Properties:
>  - operating-points: An array of 2-tuples items, and each item consists
>

Viresh Kumar May 12, 2015, 5:16 a.m. UTC | #5

Hi Nishanth,

On 10-05-15, 20:02, Nishanth Menon wrote:
> On 04/30/2015 07:07 AM, Viresh Kumar wrote:

> > +the liberty of choosing these. These triplets are called Operating Performance
> 
> I suggest we dont call them as triplets either -> note, we have added
> clk transition latency per OPP, so they are not exactly triplets either.

Sure.

> > +Points aka OPPs. This document defines bindings for these OPPs applicable across
> > +wide range of devices. For illustration purpose, this document uses CPU as a
> > +device.
> > +
> > +
> 
> Would be good to point to operating-points (the legacy document) as
> well. It might be good to state that only one of the properties should
> be used for the device node OPP description.

Hmm, okay.

> > +Optional properties:
> > +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> > +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> > +  Missing property means devices have independent clock/voltage/current lines,
> > +  but they share OPP tables.
> 
> Is'nt this property redundant? by the fact that phandle is reused for
> cpu1 should indicate shared OPP table, correct?

There are two things we can share:
- OPP tables (And not the clock lines themselves). For example,
  consider a quad-core platform with independent clock/voltage lines
  for all CPUs but all the CPUs have same OPP table.

- Clock/voltage rails: This is described by the above property. I
  thought the examples should have made this clear, anything left
  there ?

> > +- turbo-mode: Marks the OPP to be used only for turbo modes.
> 
> Might be great to add some description for what "turbo mode" means here.

Okay.

> That said, flexibility of this approach is awesome, thanks for doing
> this, I believe we did have a discussion about "safe OPP" for thermal
> behavior -> now that we have phandles for OPPs, we  can give phandle
> pointer to the thermal framework. in addition, we can also use phandle
> for marking throttling information in thermal framework instead of
> indices as well. which allows a lot of flexibility.
> 
> in some systems, we'd have need to mark certain OPPs for entry into
> suspend(tegra?) or during shutdown (for example) - do we allow for such
> description as phandle pointer back to the OPP nodes (from cpu0 device)
> OR do we describe them as additional properties?

Haven't thought about it earlier. In case we need them, probably it
should go in the OPP table.

NOTE: Mike T. once told me that we shouldn't over-abuse the bindings
by putting every detail there. And I am not sure at all on how to draw
that line.

> > +- status: Marks the node enabled/disabled.
> 
> One question we'd probably want the new framework to address is the need
> for OPPs based on Efuse options - Am I correct in believing that the
> solution that iMX, and TI SoCs probably need is something similar to
> modifying the status to disabled? or do we have Vendor specfic modifier
> properties that would be allowed?

See PATCH 2/3 for that.
 
> One additional behavior need I noticed in various old threads is a need
> to restrict transition OPPs -> certain SoCs might have constraints on
> next OPP they can transition from certain OPPs in-order to achieve a new
> OPP. again, such behavior could be phandle lists OR properties that link
> the chain up together. Number of such needs did sound less, but still
> just thought to bring it up.

See 0/3 and 3/3 for that. Its present in my solution but Mike T. is
strictly against keeping that in OPP table. And so 3/3 may get Nak'd.

Thanks a lot for your reviews.

Nishanth Menon May 12, 2015, 4:04 p.m. UTC | #6

Hi Viresh,
On 05/12/2015 12:16 AM, Viresh Kumar wrote:
[..]
> On 10-05-15, 20:02, Nishanth Menon wrote:
>> On 04/30/2015 07:07 AM, Viresh Kumar wrote:

One additional comment - after re-reading the bindings again..
> 
> +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's voltage is specified with an array of size one or three.
> +  Single entry is for target voltage and three entries are for <target min max>
> +  voltages.

Just curious -> is'nt it better to just have min<->max range? binding
as it stands right now is open to interpretation as to what will be
attempted and in what sequence? with a valid min, target or max -
is'nt it more power efficient always to go for a "min" than a target?

Further, min<->max implies anywhere in that range and is more
consistent with "regulator like" description.


>> That said, flexibility of this approach is awesome, thanks for doing
>> this, I believe we did have a discussion about "safe OPP" for thermal
>> behavior -> now that we have phandles for OPPs, we  can give phandle
>> pointer to the thermal framework. in addition, we can also use phandle
>> for marking throttling information in thermal framework instead of
>> indices as well. which allows a lot of flexibility.
>>
>> in some systems, we'd have need to mark certain OPPs for entry into
>> suspend(tegra?) or during shutdown (for example) - do we allow for such
>> description as phandle pointer back to the OPP nodes (from cpu0 device)
>> OR do we describe them as additional properties?
> 
> Haven't thought about it earlier. In case we need them, probably it
> should go in the OPP table.

> 
> NOTE: Mike T. once told me that we shouldn't over-abuse the bindings
> by putting every detail there. And I am not sure at all on how to draw
> that line.

True. one option might be to allow for vendor specific property
extensions - that will let vendors add in additional quirky data
custom to their SoCs as needed.

> 
>>> +- status: Marks the node enabled/disabled.
>>
>> One question we'd probably want the new framework to address is the need
>> for OPPs based on Efuse options - Am I correct in believing that the
>> solution that iMX, and TI SoCs probably need is something similar to
>> modifying the status to disabled? or do we have Vendor specfic modifier
>> properties that would be allowed?
> 
> See PATCH 2/3 for that.

Sorry about that. I had kind of expected all bindings to be a single
patch :)..

>  
>> One additional behavior need I noticed in various old threads is a need
>> to restrict transition OPPs -> certain SoCs might have constraints on
>> next OPP they can transition from certain OPPs in-order to achieve a new
>> OPP. again, such behavior could be phandle lists OR properties that link
>> the chain up together. Number of such needs did sound less, but still
>> just thought to bring it up.
> 
> See 0/3 and 3/3 for that. Its present in my solution but Mike T. is
> strictly against keeping that in OPP table. And so 3/3 may get Nak'd.
> 

missed this as well :(

Felipe Balbi May 12, 2015, 4:19 p.m. UTC | #7

On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:
> Current OPP (Operating performance point) DT bindings are proven to be
> insufficient at multiple instances.
> 
> The shortcomings we are trying to solve here:
> 
> - Getting clock sharing information between CPUs. Single shared clock vs
>   independent clock per core vs shared clock per cluster.
> 
> - Support for specifying current levels along with voltages.
> 
> - Support for multiple regulators.
> 
> - Support for turbo modes.
> 
> - Other per OPP settings: transition latencies, disabled status, etc.?
> 
> - Expandability of OPPs in future.
> 
> This patch introduces new bindings "operating-points-v2" to get these problems

operating-points-v2 is kinda awkward, how about
operating-points-extended, much like interrupts-extended ?

Other than this, I very much like the new approach.

Regards

Mike Turquette May 12, 2015, 9:42 p.m. UTC | #8

Quoting Viresh Kumar (2015-04-30 05:07:59)
> Current OPP (Operating performance point) DT bindings are proven to be
> insufficient at multiple instances.

Hi Viresh,

Sorry for chiming in so late in the process. Also, sorry for the long
email. Lots of repetition below:

Why should this new binding exist? Is Devicetree really the right place
to put all of this data? If DT is the right place for some users, some
of the time ... is it always the right place for all users, all of the
time?

> 
> The shortcomings we are trying to solve here:
> 
> - Getting clock sharing information between CPUs. Single shared clock vs
>   independent clock per core vs shared clock per cluster.
> 
> - Support for specifying current levels along with voltages.
> 
> - Support for multiple regulators.
> 
> - Support for turbo modes.
> 
> - Other per OPP settings: transition latencies, disabled status, etc.?

I agree that operating-points-v1 does not do this stuff. There is a need
to express this data, but is DT the right place to do so? Why doesn't a
driver contain this data?

> 
> - Expandability of OPPs in future.

This point above gives me heartburn. It appears that this binding is not
meant to be "sub-classed" by vendor-specific compatible strings. Please
let me know if I'm wrong on that.

The problem with this approach is that every little problem that someone
has with the binding will have to be solved by changing the generic opp
binding. I do not see how this can scale given the complexity of data,
programming sequences and other stuff associated with operating points
and dvfs transitions.

> 
> This patch introduces new bindings "operating-points-v2" to get these problems
> solved. Refer to the bindings for more details.

I like the existing operating points binding. It is very simple. The
data can be entirely encoded in DT and then used by the driver for
simple dvfs use cases.

Maybe we should just stop there and keep it simple? If the data needed
to express an operating point or dvfs transition is very complex, why
use DT for that?

We went through the same issue with the clock bindings where some people
(myself included) wanted to use DT as a data-driven interface into the
kernel. Turns out DT isn't great for that. What we have now is more
scalable: clock providers (usually a clock generator IP block) get a
node in DT. Clock consumers reference that provider node plus an index
value which corresponds to a specific clock signal that the downstream
node consumes. That's it. The binding basically only creates connections
across devices using phandles.

All of the data about clock rates, parents, programming sequences,
register definitions and bitfields, supported operations, etc all belong
in drivers.

For very simple clock providers (such as a fixed-rate crystal) we do
have a dedicated binding that allows for all of the data to belong in DT
(just the clock name and rate is required). I think this is analogous to
the existing operating-points-v1 today, which works nicely for the
simple case.

For the complex case, why not just create a link between the device that
provides the operating points (e.g. a pmic, or system controller, or
clock provider, or whatever) and the device that consumes them (e.g.
cpufreq driver, gpu driver, io controller driver, etc)? Leave all of the
data in C.

> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++-
>  1 file changed, 362 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> index 74499e5033fc..3b67a5c8d965 100644
> --- a/Documentation/devicetree/bindings/power/opp.txt
> +++ b/Documentation/devicetree/bindings/power/opp.txt
> @@ -1,8 +1,366 @@
> -* Generic OPP Interface
> +Generic OPP (Operating Performance Points) Bindings
> +----------------------------------------------------
>  
> -SoCs have a standard set of tuples consisting of frequency and
> -voltage pairs that the device will support per voltage domain. These
> -are called Operating Performance Points or OPPs.
> +Devices work at voltage-current-frequency triplets and some implementations have
> +the liberty of choosing these. These triplets are called Operating Performance
> +Points aka OPPs. This document defines bindings for these OPPs applicable across
> +wide range of devices. For illustration purpose, this document uses CPU as a
> +device.
> +
> +
> +* Property: operating-points-v2
> +
> +Devices supporting OPPs must set their "operating-points-v2" property with
> +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle
> +to find the operating points for the device.
> +
> +
> +* OPP Descriptor Node
> +
> +This describes the OPPs belonging to a device. This node can have following
> +properties:
> +
> +Required properties:
> +- compatible: Allow OPPs to express their compatibility. It should be:
> +  "operating-points-v2".
> +- OPP nodes: One or more OPP nodes describing voltage-current-frequency
> +  triplets. Their name isn't significant but their phandle can be used to
> +  reference an OPP.
> +
> +Optional properties:
> +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> +  Missing property means devices have independent clock/voltage/current lines,
> +  but they share OPP tables.

What is the behavior of not setting 'shared-opp'? The same table is
re-used by multiple consumers/devices?

I think a provider/consumer model works better here. E.g. if we have 4
cpus that scale independently then there would be 4 opp providers, each
provider corresponding to the unique frequency and voltage domains per
cpu. If multiple cpu nodes consume the same opp phandle then the sharing
becomes implicit: those cpus are in the same frequency and power
domains.

This is how it works for other resources, such as specifying a clock or
regulator in DT. If two device nodes reference that same resource then
it clear that they are using the same physical hardware. Having just a
table of data in a node that does not clearly map onto hardware (or a
software abstraction that provides access to that hardware) is not as
nice IMO.

> +
> +
> +* OPP Node
> +
> +This defines voltage-current-frequency triplets along with other related
> +properties.
> +
> +Required properties:
> +- opp-khz: Frequency in kHz
> +
> +Optional properties:
> +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's voltage is specified with an array of size one or three.
> +  Single entry is for target voltage and three entries are for <target min max>
> +  voltages.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node.
> +
> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's current is specified with an array of size one or three.
> +  Single entry is for target current and three entries are for <target min max>
> +  currents.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node. If few regulators don't provide
> +  capability to configure current, then values for then should be marked as
> +  zero.
> +
> +- clock-latency-ns: Specifies the maximum possible transition latency (in
> +  nanoseconds) for switching to this OPP from any other OPP.
> +- turbo-mode: Marks the OPP to be used only for turbo modes.
> +- status: Marks the node enabled/disabled.

s/clock/transition/

Scaling the regulator can take longer than the clock. Better to reflect
that this value is capturing the total transition time.

> +
> +Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
> +
> +/ {
> +       cpus {
> +               #address-cells = <1>;
> +               #size-cells = <0>;
> +
> +               cpu@0 {
> +                       compatible = "arm,cortex-a9";
> +                       reg = <0>;
> +                       next-level-cache = <&L2>;
> +                       clocks = <&clk_controller 0>;
> +                       clock-names = "cpu";
> +                       opp-supply = <&cpu_supply0>;
> +                       operating-points-v2 = <&cpu0_opp>;
> +               };
> +
> +               cpu@1 {
> +                       compatible = "arm,cortex-a9";
> +                       reg = <1>;
> +                       next-level-cache = <&L2>;
> +                       clocks = <&clk_controller 0>;
> +                       clock-names = "cpu";
> +                       opp-supply = <&cpu_supply0>;
> +                       operating-points-v2 = <&cpu0_opp>;
> +               };
> +       };
> +
> +       cpu0_opp: opp0 {
> +               compatible = "operating-points-v2";
> +               shared-opp;
> +
> +               entry00 {
> +                       opp-khz = <1000000>;
> +                       opp-microvolt = <970000 975000 985000>;
> +                       opp-microamp = <70000 75000 85000>;
> +                       clock-latency-ns = <300000>;
> +               };
> +               entry01 {
> +                       opp-khz = <1100000>;
> +                       opp-microvolt = <980000 1000000 1010000>;
> +                       opp-microamp = <80000 81000 82000>;
> +                       clock-latency-ns = <310000>;
> +               };
> +               entry02 {
> +                       opp-khz = <1200000>;
> +                       opp-microvolt = <1025000>;
> +                       clock-latency-ns = <290000>;
> +                       turbo-mode;
> +               };
> +       };
> +};

It seems wrong to me that the clock and supply data is owned by the cpu
node, and not the opp descriptor. Everything about the opp transition
should belong to a provider node. Then the cpu simply needs to consume
that via a phandle.

<snip>

> +Deprecated Bindings
> +-------------------
>  
>  Properties:
>  - operating-points: An array of 2-tuples items, and each item consists

I think we should keep these. They work for the simple case.

I know that DT bindings are notorious for bike shedding. But in this
case I'm not debating the color of the bike shed but instead questioning
whether we need the shed at all :-)

Regards,
Mike

> -- 
> 2.3.0.rc0.44.ga94655d
>

Viresh Kumar May 13, 2015, 4:45 a.m. UTC | #9

On 12-05-15, 11:19, Felipe Balbi wrote:
> operating-points-v2 is kinda awkward, how about
> operating-points-extended, much like interrupts-extended ?

What about the next version if required ? :)

> Other than this, I very much like the new approach.

Thanks.

Viresh Kumar May 13, 2015, 5:05 a.m. UTC | #10

On 12-05-15, 11:04, Nishanth Menon wrote:
> Just curious -> is'nt it better to just have min<->max range? binding
> as it stands right now is open to interpretation as to what will be
> attempted and in what sequence? with a valid min, target or max -
> is'nt it more power efficient always to go for a "min" than a target?
> 
> Further, min<->max implies anywhere in that range and is more
> consistent with "regulator like" description.

It came out after some discussions on the list, you may want to go
through that.

https://lists.linaro.org/pipermail/linaro-kernel/2015-January/019844.html

> True. one option might be to allow for vendor specific property
> extensions - that will let vendors add in additional quirky data
> custom to their SoCs as needed.

Yeah, I am planning to support them.

Viresh Kumar May 13, 2015, 8:55 a.m. UTC | #11

On 12-05-15, 14:42, Michael Turquette wrote:
> Quoting Viresh Kumar (2015-04-30 05:07:59)

> Sorry for chiming in so late in the process. Also, sorry for the long
> email. Lots of repetition below:

You are always welcome..

> Why should this new binding exist?

The answer to this particular query is perhaps simple, i.e. we have
unsolved problems that we wanted to solve in a generic way.

But probably the bigger question is "Should we really put the OPPs
(new or old bindings) in DT".

Lets start by agreeing on what can be kept in DT. AFAIU, anything that
describes the device in a OS independent way. Like:
  - base address
  - irq line + parent
  - clocks, regulators, etc..
  - What about things like register information? That's not what we do
    normally, but why?

    Perhaps because we want to get rid of redundancy as much as
    possible.
    - An implementation of the same device is going to be same across
      all platforms (unless the platform has tweaked it). And so there
      is no fun passing the same register information from multiple DT
      files.
    - And so we better keep things like register information, bit
      field descriptions, etc. in driver itself.
    - For example consider a proprietary controller that has three
      registers A, B and C. Now the bitwise description/behavior of
      all the registers in perfectly same for every implementation but
      the order of these in memory varies per implementation.

      Now we can keep the offsets of these registers in either DT or C
      code. If only few implementations are using it, then we might
      keep it in C code, but if there are 10-20 implementations using
      it in order ABC or BAC or CAB, then it will probably be a good
      option to try keeping these offsets in DT.

So its not really that we just describe connectivity between
devices/nodes in DT, it can be anything related to the device.

What if OPPs were kept in C code instead ?
- In most of the cases (not all of course), we will end up replicating
  code for platforms. But yes we can still do it and this is already
  allowed if you really think your platform is special.
- (Copied from booting-without-of.txt):
  "It also makes it more flexible for board vendors to do minor
  hardware upgrades without significantly impacting the kernel code or
  cluttering it with special cases."

> Is Devicetree really the right place
> to put all of this data? If DT is the right place for some users, some
> of the time ... is it always the right place for all users, all of the
> time?

This is just an attempt to reuse generic code. And platforms are free
to choose DT or non-DT way for keeping this information.

For some users it might be a good place to keep it, while for others
it may not be. For example, if a platform really needs to do some stuff
from its platform code, then it is free to do so.

> > - Expandability of OPPs in future.
> 
> This point above gives me heartburn. It appears that this binding is not
> meant to be "sub-classed" by vendor-specific compatible strings. Please
> let me know if I'm wrong on that.
> 
> The problem with this approach is that every little problem that someone
> has with the binding will have to be solved by changing the generic opp
> binding. I do not see how this can scale given the complexity of data,
> programming sequences and other stuff associated with operating points
> and dvfs transitions.

(I thought about it a bit more after our offline discussion):

If there is something generic enough, which is required by multiple
platforms, then of course we can make additions to the bindings.

But I believe we can 'sub-class' this by vendor-specific compatible
strings as well.

What I told you earlier (in our offline discussion) was that it isn't
allowed to put driver specific strings here to just choose a driver
amongst few available at runtime. For example, choosing between
cpufreq-dt or arm_big_little or any platform driver.

But if a Vendor do need few bindings just for his platforms, then we
can have a compatible sting for that. As the compatible string now
will express device's compatibility.

> I like the existing operating points binding. It is very simple. The
> data can be entirely encoded in DT and then used by the driver for
> simple dvfs use cases.

The new ones are also simple :)

> Maybe we should just stop there and keep it simple? If the data needed
> to express an operating point or dvfs transition is very complex, why
> use DT for that?

Its simple but not complete. And I don't really agree that things are
way too complex here. In some cases, yes they can be.

> > +Optional properties:
> > +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> > +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> > +  Missing property means devices have independent clock/voltage/current lines,
> > +  but they share OPP tables.
> 
> What is the behavior of not setting 'shared-opp'? The same table is
> re-used by multiple consumers/devices?

Yes.

> I think a provider/consumer model works better here. E.g. if we have 4
> cpus that scale independently then there would be 4 opp providers, each
> provider corresponding to the unique frequency and voltage domains per
> cpu. If multiple cpu nodes consume the same opp phandle then the sharing
> becomes implicit: those cpus are in the same frequency and power
> domains.
> 
> This is how it works for other resources, such as specifying a clock or
> regulator in DT. If two device nodes reference that same resource then
> it clear that they are using the same physical hardware. Having just a
> table of data in a node that does not clearly map onto hardware (or a
> software abstraction that provides access to that hardware) is not as
> nice IMO.

I partially agree to what you are saying. It is written this way to
reduce redundancy in DT. For example, in a quad-core system with
independently scalable CPUs that have exactly same tables for all
CPUs, we do not want to replicate the same set of 20 OPPs, four times.

But perhaps we can improve it, based on your suggestion. The way I
have written it today:

/ {
	cpus {
		cpu@0 {
			operating-points-v2 = <&cpu_opp>;
		};

		cpu@1 {
			operating-points-v2 = <&cpu_opp>;
		};
	};

	cpu_opp: opp {
		compatible = "operating-points-v2";
		// shared-opp; Not an shared OPP

		entry00 {
			opp-khz = <1000000>;
		};
		entry01 {
			opp-khz = <1100000>;
		};
	};
};

And we can rewrite it as:

/ {
	cpus {
		cpu@0 {
			operating-points-v2 = <&opp0_provider>;
		};

		cpu@1 {
			operating-points-v2 = <&opp1_provider>;
		};
	};

        /* Table is shared between providers */
	opp_table: opp_table {
		entry00 {
			opp-khz = <1000000>;
		};
		entry01 {
			opp-khz = <1100000>;
		};
	};

	opp0_provider: opp0 {
		compatible = "operating-points-v2";
                opp_table = <&opp_table>;
	};

	opp1_provider: opp1 {
		compatible = "operating-points-v2";
                opp_table = <&opp_table>;
	};
};


But this is similar to what I proposed initially and people objected
to it.

> > +- clock-latency-ns: Specifies the maximum possible transition latency (in
> > +  nanoseconds) for switching to this OPP from any other OPP.
> > +- turbo-mode: Marks the OPP to be used only for turbo modes.
> > +- status: Marks the node enabled/disabled.
> 
> s/clock/transition/
> 
> Scaling the regulator can take longer than the clock. Better to reflect
> that this value is capturing the total transition time.

So this is how we are doing it today and I am not sure if we want to
get it from DT now.

	ret = regulator_set_voltage_time(cpu_reg, min_uV, max_uV);
	if (ret > 0)
		transition_latency += ret * 1000;

> It seems wrong to me that the clock and supply data is owned by the cpu
> node, and not the opp descriptor. Everything about the opp transition
> should belong to a provider node. Then the cpu simply needs to consume
> that via a phandle.

https://lists.linaro.org/pipermail/linaro-kernel/2014-December/019505.html

> > +Deprecated Bindings
> > +-------------------
> >  
> >  Properties:
> >  - operating-points: An array of 2-tuples items, and each item consists
> 
> I think we should keep these. They work for the simple case.

Hmm, maybe.

Mark Brown May 13, 2015, 11:03 a.m. UTC | #12

On Wed, May 13, 2015 at 02:25:28PM +0530, Viresh Kumar wrote:
> On 12-05-15, 14:42, Michael Turquette wrote:
> > Quoting Viresh Kumar (2015-04-30 05:07:59)
> 
> > Why should this new binding exist?

> The answer to this particular query is perhaps simple, i.e. we have
> unsolved problems that we wanted to solve in a generic way.

> But probably the bigger question is "Should we really put the OPPs
> (new or old bindings) in DT".

And also is trying to do this in a completely generic manner the right
way of going about things - do we really understand the problem area
well enough to create a completely generic solution for all cases,
bearing in mind that once things go into a DT binding they become an ABI?

Nishanth Menon May 13, 2015, 3 p.m. UTC | #13

On 05/13/2015 12:05 AM, Viresh Kumar wrote:
> On 12-05-15, 11:04, Nishanth Menon wrote:
>> Just curious -> is'nt it better to just have min<->max range? binding
>> as it stands right now is open to interpretation as to what will be
>> attempted and in what sequence? with a valid min, target or max -
>> is'nt it more power efficient always to go for a "min" than a target?
>>
>> Further, min<->max implies anywhere in that range and is more
>> consistent with "regulator like" description.
> 
> It came out after some discussions on the list, you may want to go
> through that.
> 
> https://lists.linaro.org/pipermail/linaro-kernel/2015-January/019844.html

I see the thread saying that voltage-tolerance is a crappy idea -> I
agree to that.

What I dont see in the thread, and the point I raised here, why have
nominal/typical voltage at all? min<->max should be sufficient,
correct? If the device cannot function at min, then it should not be
documented as part of valid range at all.

Mark Brown May 13, 2015, 3:16 p.m. UTC | #14

On Wed, May 13, 2015 at 10:00:41AM -0500, Nishanth Menon wrote:

> What I dont see in the thread, and the point I raised here, why have
> nominal/typical voltage at all? min<->max should be sufficient,
> correct? If the device cannot function at min, then it should not be
> documented as part of valid range at all.

Going for the minimum specified voltage is asking for trouble with
regard to tolerances and so on, see also your concern about process
corners.  If the electrical engineers specify things as X +/- Y the
most conservative thing to do is to try to hit X rather than going
straight for the limits.

Nishanth Menon May 13, 2015, 4:14 p.m. UTC | #15

On 05/13/2015 10:16 AM, Mark Brown wrote:
> On Wed, May 13, 2015 at 10:00:41AM -0500, Nishanth Menon wrote:
> 
>> What I dont see in the thread, and the point I raised here, why have
>> nominal/typical voltage at all? min<->max should be sufficient,
>> correct? If the device cannot function at min, then it should not be
>> documented as part of valid range at all.
> 
> Going for the minimum specified voltage is asking for trouble with
> regard to tolerances and so on, see also your concern about process
> corners.  If the electrical engineers specify things as X +/- Y the
> most conservative thing to do is to try to hit X rather than going
> straight for the limits.

I've had the same debate with my company's SoC designers as well on
various occasions as well. At least for the SoCs I deal with, X +/- Y
range specification involves PMIC/Board variations, where X is the
least voltage that is attempted to be set on PMIC, X-Y is the min
voltage allowed at the SoC ball, due to SMPS noise/IRDrop and other
board specific behavior.

I am not saying all SoC vendors do specifications the same way, but
our interest from device tree description is the operating voltage
range for the device that we can control on the PMIC. if setting (X-Y)
is not stable voltage for the device, then it should not be stated in
the description.

To illustrate: What does it mean for a driver using regulator API?
Attempt X <-> (X+Y), if that fails (example PMIC SMPS max < X), try
(X-Y)<->(X)? If yes, then we do expect (X-Y)<->(X+Y) should be stable
for all operating conditions for the device, correct? If this is is
not stable at (X-Y), then we have a wrong specification for the device
for claiming X +/- Y is a valid range for the device.

Mark Brown May 13, 2015, 4:21 p.m. UTC | #16

On Wed, May 13, 2015 at 11:14:07AM -0500, Nishanth Menon wrote:

> To illustrate: What does it mean for a driver using regulator API?
> Attempt X <-> (X+Y), if that fails (example PMIC SMPS max < X), try
> (X-Y)<->(X)? If yes, then we do expect (X-Y)<->(X+Y) should be stable
> for all operating conditions for the device, correct? If this is is
> not stable at (X-Y), then we have a wrong specification for the device
> for claiming X +/- Y is a valid range for the device.

We have an explicit regulator API call for setting by tolerance which
tries to get as close as possible to the target voltage (currently the
implementation is very cheap and nasty but ideally it'll get improved).

Nishanth Menon May 13, 2015, 4:34 p.m. UTC | #17

On 05/13/2015 11:21 AM, Mark Brown wrote:
> On Wed, May 13, 2015 at 11:14:07AM -0500, Nishanth Menon wrote:
> 
>> To illustrate: What does it mean for a driver using regulator API?
>> Attempt X <-> (X+Y), if that fails (example PMIC SMPS max < X), try
>> (X-Y)<->(X)? If yes, then we do expect (X-Y)<->(X+Y) should be stable
>> for all operating conditions for the device, correct? If this is is
>> not stable at (X-Y), then we have a wrong specification for the device
>> for claiming X +/- Y is a valid range for the device.
> 
> We have an explicit regulator API call for setting by tolerance which
> tries to get as close as possible to the target voltage (currently the
> implementation is very cheap and nasty but ideally it'll get improved).
> 

Thanks - I think the original implementation was triggered by the old
"voltage-tolerance" mess, so wont be too surprised.

With respect to the property itself, I still am not completely
convinced that having typ actually helps. but having it does not hurt
either, since min could be == typ in SoC description if desired.

So, since I dont strongly feel either way here to reject things
completely, and supposedly, it does help some other SoC vendors(which
I am not convinced how), will leave the discussion alone for the moment.

Mike Turquette May 14, 2015, 12:32 a.m. UTC | #18

Quoting Mark Brown (2015-05-13 04:03:57)
> On Wed, May 13, 2015 at 02:25:28PM +0530, Viresh Kumar wrote:
> > On 12-05-15, 14:42, Michael Turquette wrote:
> > > Quoting Viresh Kumar (2015-04-30 05:07:59)
> > 
> > > Why should this new binding exist?
> 
> > The answer to this particular query is perhaps simple, i.e. we have
> > unsolved problems that we wanted to solve in a generic way.
> 
> > But probably the bigger question is "Should we really put the OPPs
> > (new or old bindings) in DT".
> 
> And also is trying to do this in a completely generic manner the right
> way of going about things - do we really understand the problem area
> well enough to create a completely generic solution for all cases,
> bearing in mind that once things go into a DT binding they become an ABI?

No, we don't understand the problem space well enough to form an ABI.
And even if we did understand it perfectly today, the constraints and
expressions will change tomorrow.

Putting this stuff in C without any philosophical constraints on whether
or not we can change it later is the way to go.

Regards,
Mike

Viresh Kumar May 15, 2015, 2:15 p.m. UTC | #19

On 14-05-15, 03:25, Michael Turquette wrote:
> No, we don't understand the problem space well enough to form an ABI.

And why do you think so? We have been facing many problems since a
long time which we are trying to solve here.

I agree that it might not be right to try too many things which may
not be required later, but most of the things we have now in new
bindings are actually required.

> Putting this stuff in C without any philosophical constraints on whether
> or not we can change it later is the way to go.

I don't agree to that :)

Nishanth Menon May 15, 2015, 3:43 p.m. UTC | #20

On 05/15/2015 09:15 AM, Viresh Kumar wrote:
> On 14-05-15, 03:25, Michael Turquette wrote:
>> No, we don't understand the problem space well enough to form an ABI.
> 
> And why do you think so? We have been facing many problems since a
> long time which we are trying to solve here.

I would state "problem space is better defined now based on data made
public by developers on various SoCs", this new binding seems to
address majority of the concerns (esp with vendor specific
extensions). OPP behavior is very SoC vendor specific -> it can only
evolve with an extensible framework - which is what this new binding
provides. This is something that was badly missing in the older
binding and framework (I should blame myself for it), even though the
previous definitions were simple, in effect it was inflexible to the
detriment of many SoCs.

Do we know 100% if the new binding solves every SoC's issues - we wont
be able to do that unless folks speak up - but then, providing ability
for vendor specific extension allows to consolidate and make common as
necessary.

Point blank rejection might be a bit of an overkill, IMHO.

> 
> I agree that it might not be right to try too many things which may
> not be required later, but most of the things we have now in new
> bindings are actually required.
> 
>> Putting this stuff in C without any philosophical constraints on whether
>> or not we can change it later is the way to go.
> 
> I don't agree to that :)
> 
I second Viresh on this. Benefit of forcing data separation into
device tree has provided the flexibility now to be able to loadup OPPs
from bootloader OR over DTC overlay as desired - that is the right
choice rather than embedding it within C code, providing kludgy
extension options to provide dynamic data updates.

Rob Herring May 15, 2015, 5:27 p.m. UTC | #21

On Wed, May 13, 2015 at 7:32 PM, Michael Turquette
<mike.turquette@linaro.org> wrote:
> Quoting Mark Brown (2015-05-13 04:03:57)
>> On Wed, May 13, 2015 at 02:25:28PM +0530, Viresh Kumar wrote:
>> > On 12-05-15, 14:42, Michael Turquette wrote:
>> > > Quoting Viresh Kumar (2015-04-30 05:07:59)
>> >
>> > > Why should this new binding exist?
>>
>> > The answer to this particular query is perhaps simple, i.e. we have
>> > unsolved problems that we wanted to solve in a generic way.
>>
>> > But probably the bigger question is "Should we really put the OPPs
>> > (new or old bindings) in DT".
>>
>> And also is trying to do this in a completely generic manner the right
>> way of going about things - do we really understand the problem area
>> well enough to create a completely generic solution for all cases,
>> bearing in mind that once things go into a DT binding they become an ABI?
>
> No, we don't understand the problem space well enough to form an ABI.
> And even if we did understand it perfectly today, the constraints and
> expressions will change tomorrow.
>
> Putting this stuff in C without any philosophical constraints on whether
> or not we can change it later is the way to go.

Having a binding does not preclude you from doing your own thing in C.
You can choose to ignore it (or parts of it). It also doesn't mean you
have to use a generic driver with a generic binding.

While yes, we could do the same thing in C, the reality is that we
often don't force common data into common structures. This leads to
obfuscating what are the real h/w differences because everyone does
things their own way. Conveniently, struct dev_pm_opp is a perfect
example here. We already know that just freq and volt alone are not
enough, yet that is all the kernel structure has.

The way you avoid the ABI issue is you create a new one which is what
we've done here. I'm not saying we want 10 different OPP bindings. If
we are back here in a year with v3, then yes you are probably right.
But I believe v2 is a framework that can last. Most of the review
issues have resulted in simply dropping properties (until needed)
without otherwise changing the binding. I think that demonstrates this
bindiing is flexible and extendable.

Rob

Stephen Boyd May 20, 2015, 12:51 a.m. UTC | #22

On 04/30/15 05:07, Viresh Kumar wrote:
> Current OPP (Operating performance point) DT bindings are proven to be
> insufficient at multiple instances.
>
> The shortcomings we are trying to solve here:
>
> - Getting clock sharing information between CPUs. Single shared clock vs
>   independent clock per core vs shared clock per cluster.
>
> - Support for specifying current levels along with voltages.
>
> - Support for multiple regulators.
>
> - Support for turbo modes.
>
> - Other per OPP settings: transition latencies, disabled status, etc.?
>
> - Expandability of OPPs in future.
>
> This patch introduces new bindings "operating-points-v2" to get these problems
> solved. Refer to the bindings for more details.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

This looks like it covers my usecases. I'm not sure if I'm going to put
all of this data in DT because of the philosophical question regarding
what should be in DT and the technical decisions that some platforms I'm
working on have made to encode parts of the OPPs in fuses, but at least
the binding looks to support all the scenarios I have today.

> ---
>  Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++-
>  1 file changed, 362 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> index 74499e5033fc..3b67a5c8d965 100644
> --- a/Documentation/devicetree/bindings/power/opp.txt
> +++ b/Documentation/devicetree/bindings/power/opp.txt
> @@ -1,8 +1,366 @@
> -* Generic OPP Interface
> +Generic OPP (Operating Performance Points) Bindings
> +----------------------------------------------------
>  
> -SoCs have a standard set of tuples consisting of frequency and
> -voltage pairs that the device will support per voltage domain. These
> -are called Operating Performance Points or OPPs.
> +Devices work at voltage-current-frequency triplets and some implementations have
> +the liberty of choosing these. These triplets are called Operating Performance
> +Points aka OPPs. This document defines bindings for these OPPs applicable across
> +wide range of devices. For illustration purpose, this document uses CPU as a
> +device.
> +
> +
> +* Property: operating-points-v2
> +
> +Devices supporting OPPs must set their "operating-points-v2" property with
> +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle
> +to find the operating points for the device.
> +
> +
> +* OPP Descriptor Node
> +
> +This describes the OPPs belonging to a device. This node can have following
> +properties:
> +
> +Required properties:
> +- compatible: Allow OPPs to express their compatibility. It should be:
> +  "operating-points-v2".
> +- OPP nodes: One or more OPP nodes describing voltage-current-frequency
> +  triplets. Their name isn't significant but their phandle can be used to
> +  reference an OPP.
> +
> +Optional properties:
> +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> +  Missing property means devices have independent clock/voltage/current lines,
> +  but they share OPP tables.
> +
> +
> +* OPP Node
> +
> +This defines voltage-current-frequency triplets along with other related
> +properties.
> +
> +Required properties:
> +- opp-khz: Frequency in kHz

Can this be in Hz? I thought there were some problems with kHz and Hz
before where we wanted to make this into Hz so that rounding problems
don't arise?

Also I wonder if all properties should be optional? I don't have this
scenario today, but perhaps the frequencies could be encoded in fuses,
but the voltages wouldn't be and so we might want to read out the
frequencies for a fixed set of voltages. Of course, if there's nothing
in the OPP node at all, it's not very useful, so perhaps some statement
that at least one of the frequency/voltage/amperage properties should be
present.

> +
> +Optional properties:
> +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's voltage is specified with an array of size one or three.
> +  Single entry is for target voltage and three entries are for <target min max>
> +  voltages.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node.
> +
> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's current is specified with an array of size one or three.
> +  Single entry is for target current and three entries are for <target min max>
> +  currents.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node. If few regulators don't provide
> +  capability to configure current, then values for then should be marked as

s/then/them/

> +  zero.
> +
> +- clock-latency-ns: Specifies the maximum possible transition latency (in
> +  nanoseconds) for switching to this OPP from any other OPP.
> +- turbo-mode: Marks the OPP to be used only for turbo modes.
> +- status: Marks the node enabled/disabled.

Please put some space between properties so they can be found quickly:

clock-latency-ns:

turbo-mode:

status:

...

Viresh Kumar May 20, 2015, 2:07 a.m. UTC | #23

On 19-05-15, 17:51, Stephen Boyd wrote:
> This looks like it covers my usecases. I'm not sure if I'm going to put
> all of this data in DT because of the philosophical question regarding
> what should be in DT and the technical decisions that some platforms I'm
> working on have made to encode parts of the OPPs in fuses, but at least
> the binding looks to support all the scenarios I have today.

Thanks.

> > +Required properties:
> > +- opp-khz: Frequency in kHz
> 
> Can this be in Hz? I thought there were some problems with kHz and Hz
> before where we wanted to make this into Hz so that rounding problems
> don't arise?

Yeah, good point.

> Also I wonder if all properties should be optional? I don't have this
> scenario today, but perhaps the frequencies could be encoded in fuses,
> but the voltages wouldn't be and so we might want to read out the
> frequencies for a fixed set of voltages. Of course, if there's nothing
> in the OPP node at all, it's not very useful, so perhaps some statement
> that at least one of the frequency/voltage/amperage properties should be
> present.

I am not sure. What we are trying to do (fill partially in DT and
partially in platform), is a trick and not the right use of bindings.

Ideally whatever is passed in DT should be complete by itself and
doesn't require platform to tweak it (which it can't). For example,
the cpufreq-dt driver will try to initialize OPPs from the DT directly
and wouldn't know about the platform tweaks. That can work eventually
as platform will add OPPs for the same bindings before cpufreq driver
will try to do, but that's a trick.

And then its all about frequency in the first place, and so marking
that optional looks wrong. Probably not the right use of these
bindings.

> > +  Entries for multiple regulators must be present in the same order as
> > +  regulators are specified in device's DT node. If few regulators don't provide
> > +  capability to configure current, then values for then should be marked as
> 
> s/then/them/

Thanks. But this part is already re-written and doesn't have this
issue anymore.

> Please put some space between properties so they can be found quickly:

Sure, thanks.

Stephen Boyd May 20, 2015, 7:39 p.m. UTC | #24

On 05/19/15 19:07, Viresh Kumar wrote:
> On 19-05-15, 17:51, Stephen Boyd wrote:
>
>> Also I wonder if all properties should be optional? I don't have this
>> scenario today, but perhaps the frequencies could be encoded in fuses,
>> but the voltages wouldn't be and so we might want to read out the
>> frequencies for a fixed set of voltages. Of course, if there's nothing
>> in the OPP node at all, it's not very useful, so perhaps some statement
>> that at least one of the frequency/voltage/amperage properties should be
>> present.
> I am not sure. What we are trying to do (fill partially in DT and
> partially in platform), is a trick and not the right use of bindings.
>
> Ideally whatever is passed in DT should be complete by itself and
> doesn't require platform to tweak it (which it can't). For example,
> the cpufreq-dt driver will try to initialize OPPs from the DT directly
> and wouldn't know about the platform tweaks. That can work eventually
> as platform will add OPPs for the same bindings before cpufreq driver
> will try to do, but that's a trick.
>
> And then its all about frequency in the first place, and so marking
> that optional looks wrong. Probably not the right use of these
> bindings.

Ok then I won't be using these bindings on any of the new platforms I
have where half the data is in one place, and half in another. But for
some of Krait based platforms I have they should be useable.

Viresh Kumar May 21, 2015, 4:33 a.m. UTC | #25

On 20-05-15, 12:39, Stephen Boyd wrote:
> On 05/19/15 19:07, Viresh Kumar wrote:
> > On 19-05-15, 17:51, Stephen Boyd wrote:
> >
> >> Also I wonder if all properties should be optional? I don't have this
> >> scenario today, but perhaps the frequencies could be encoded in fuses,
> >> but the voltages wouldn't be and so we might want to read out the
> >> frequencies for a fixed set of voltages. Of course, if there's nothing
> >> in the OPP node at all, it's not very useful, so perhaps some statement
> >> that at least one of the frequency/voltage/amperage properties should be
> >> present.
> > I am not sure. What we are trying to do (fill partially in DT and
> > partially in platform), is a trick and not the right use of bindings.
> >
> > Ideally whatever is passed in DT should be complete by itself and
> > doesn't require platform to tweak it (which it can't). For example,
> > the cpufreq-dt driver will try to initialize OPPs from the DT directly
> > and wouldn't know about the platform tweaks. That can work eventually
> > as platform will add OPPs for the same bindings before cpufreq driver
> > will try to do, but that's a trick.
> >
> > And then its all about frequency in the first place, and so marking
> > that optional looks wrong. Probably not the right use of these
> > bindings.
> 
> Ok then I won't be using these bindings on any of the new platforms I
> have where half the data is in one place, and half in another. But for
> some of Krait based platforms I have they should be useable.

You are not the only one, I have seen other requests (even for the
existing bindings) to fill stuff partially in DT as they want freq to
come from bootloader.

@Rob: What do you suggest for such platforms? Let them keep (ab)using
old or new OPP DT bindings or create another binding to just pass on
freq table?

@Stephen: Can you please provide your feedback on the updated version
please. Thanks.

Nishanth Menon May 21, 2015, 6:02 a.m. UTC | #26

On 05/13/2015 03:55 AM, Viresh Kumar wrote:
[...]
>> It seems wrong to me that the clock and supply data is owned by the cpu
>> node, and not the opp descriptor. Everything about the opp transition
>> should belong to a provider node. Then the cpu simply needs to consume
>> that via a phandle.
> 
> https://lists.linaro.org/pipermail/linaro-kernel/2014-December/019505.html

This argues that clock is an input to the cpu, this is not in-correct,
but, it could also be argued that OPP tables are clock dependent.

For example, with multiple clock source options that a device might
choose to select from internally(by some means.. lets not just restrict
ourselves to just CPUs here for a moment), the tables might be
different. We can always debate that this then is the responsibility of
the driver handling the description for that device and we might want
possibility of vice versa as well - same OPP table used by different
clock source selections as well.

Viresh Kumar May 22, 2015, 2:04 p.m. UTC | #27

On 21-05-15, 01:02, Nishanth Menon wrote:
> This argues that clock is an input to the cpu, this is not in-correct,
> but, it could also be argued that OPP tables are clock dependent.
> 
> For example, with multiple clock source options that a device might
> choose to select from internally(by some means.. lets not just restrict
> ourselves to just CPUs here for a moment), the tables might be
> different. We can always debate that this then is the responsibility of
> the driver handling the description for that device and we might want
> possibility of vice versa as well - same OPP table used by different
> clock source selections as well.

@Rob: Any inputs ?

Rob Herring May 22, 2015, 4:04 p.m. UTC | #28

On Fri, May 22, 2015 at 9:04 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 21-05-15, 01:02, Nishanth Menon wrote:
>> This argues that clock is an input to the cpu, this is not in-correct,
>> but, it could also be argued that OPP tables are clock dependent.

What piece of h/w is the clock an input to then?

>> For example, with multiple clock source options that a device might
>> choose to select from internally(by some means.. lets not just restrict
>> ourselves to just CPUs here for a moment), the tables might be
>> different. We can always debate that this then is the responsibility of
>> the driver handling the description for that device and we might want
>> possibility of vice versa as well - same OPP table used by different
>> clock source selections as well.
>
> @Rob: Any inputs ?

If you are going to describe this clock mux in DT, then that mux
should be part of the h/w block that controls it. You could always add
entries that describe what parent clock must be used for a given OPP,
but that is a new binding and not part of the existing clock binding.

If things get too complicated, then don't try to describe this in DT.
That is always an option.

Rob

Nishanth Menon May 22, 2015, 5:42 p.m. UTC | #29

On 05/22/2015 11:04 AM, Rob Herring wrote:
> On Fri, May 22, 2015 at 9:04 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> On 21-05-15, 01:02, Nishanth Menon wrote:
>>> This argues that clock is an input to the cpu, this is not in-correct,
>>> but, it could also be argued that OPP tables are clock dependent.
> 
> What piece of h/w is the clock an input to then?

The case(from one of the SoCs I deal with) I had in my mind was
something as follows:
GPU can get it's clock from a tree derived out of DPLL X OR a clock
derived out of DPLL Y.

Frequencies a,b,c can only be supported on DPLL X, where as frequencies
d,e,f on DPLL Y based tree.

There are many ways to skin the cat in this case.. if the clk mux is
glitchless, then a-f can be supported, else, we have to have two opp
tables which are selected by driver based on clock parent.

With the case where we cannot switch between the DPLLs, the question was
interesting if we decided to hook up the clock into the opp table based
on the fact that the frequencies are based of the clock.

The final clock feeding in to GPU, is the mux clock - every thing else
gets hidden by CCF, unless the driver is aware of the clock topology
(which we dont like to see in driver, since the driver is supposed to
work across multiple SoCs) - Now, the OPP tables would obviously be
based on which DPLL the source is from.

Our interest was not to have SoC specificity inside the driver and help
try and describe everything within DT itself - including the choice of
DPLL X based or DPLL Y based selection being made based on board (each
board tends to be targetted for some unique performance needs based on
usecase the board is being planned to be used for).

Ofcourse, with some platform specific code, this might be very easy to
do as well.. so not really very hard reasoning for me to have clock in
OPP table description itself.

> 
>>> For example, with multiple clock source options that a device might
>>> choose to select from internally(by some means.. lets not just restrict
>>> ourselves to just CPUs here for a moment), the tables might be
>>> different. We can always debate that this then is the responsibility of
>>> the driver handling the description for that device and we might want
>>> possibility of vice versa as well - same OPP table used by different
>>> clock source selections as well.
>>
>> @Rob: Any inputs ?
> 
> If you are going to describe this clock mux in DT, then that mux
> should be part of the h/w block that controls it. You could always add
> entries that describe what parent clock must be used for a given OPP,
> but that is a new binding and not part of the existing clock binding.
> 
> If things get too complicated, then don't try to describe this in DT.
> That is always an option.

Yes, ofcourse... it is always an option, we just tend to like to have
all data in the same place if possible - DT is the preferred approach.

Viresh Kumar May 25, 2015, 11:59 a.m. UTC | #30

On 21-05-15, 10:03, Viresh Kumar wrote:
> On 20-05-15, 12:39, Stephen Boyd wrote:
> > On 05/19/15 19:07, Viresh Kumar wrote:
> > > On 19-05-15, 17:51, Stephen Boyd wrote:
> > >
> > >> Also I wonder if all properties should be optional? I don't have this
> > >> scenario today, but perhaps the frequencies could be encoded in fuses,
> > >> but the voltages wouldn't be and so we might want to read out the
> > >> frequencies for a fixed set of voltages. Of course, if there's nothing
> > >> in the OPP node at all, it's not very useful, so perhaps some statement
> > >> that at least one of the frequency/voltage/amperage properties should be
> > >> present.
> > > I am not sure. What we are trying to do (fill partially in DT and
> > > partially in platform), is a trick and not the right use of bindings.
> > >
> > > Ideally whatever is passed in DT should be complete by itself and
> > > doesn't require platform to tweak it (which it can't). For example,
> > > the cpufreq-dt driver will try to initialize OPPs from the DT directly
> > > and wouldn't know about the platform tweaks. That can work eventually
> > > as platform will add OPPs for the same bindings before cpufreq driver
> > > will try to do, but that's a trick.
> > >
> > > And then its all about frequency in the first place, and so marking
> > > that optional looks wrong. Probably not the right use of these
> > > bindings.
> > 
> > Ok then I won't be using these bindings on any of the new platforms I
> > have where half the data is in one place, and half in another. But for
> > some of Krait based platforms I have they should be useable.
> 
> You are not the only one, I have seen other requests (even for the
> existing bindings) to fill stuff partially in DT as they want freq to
> come from bootloader.
> 
> @Rob: What do you suggest for such platforms? Let them keep (ab)using
> old or new OPP DT bindings or create another binding to just pass on
> freq table?

Rob: Can you please comment on this one as well ? Thanks.

Viresh Kumar May 26, 2015, 5:25 a.m. UTC | #31

On 22-05-15, 12:42, Nishanth Menon wrote:
> Yes, ofcourse... it is always an option, we just tend to like to have
> all data in the same place if possible - DT is the preferred approach.

If we are done, may I send the next version incorporating your
suggestions ?

[V4,1/3] OPP: Redefine bindings to overcome shortcomings

Commit Message

Comments

Patch