diff mbox

[RFC,v4,3/3] Documentation: arm: define DT idle states bindings

Message ID 1392724051-11950-4-git-send-email-lorenzo.pieralisi@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Lorenzo Pieralisi Feb. 18, 2014, 11:47 a.m. UTC
ARM based platforms implement a variety of power management schemes that
allow processors to enter idle states at run-time.
The parameters defining these idle states vary on a per-platform basis forcing
the OS to hardcode the state parameters in platform specific static tables
whose size grows as the number of platforms supported in the kernel increases
and hampers device drivers standardization.

Therefore, this patch aims at standardizing idle state device tree bindings for
ARM platforms. Bindings define idle state parameters inclusive of entry methods
and state latencies, to allow operating systems to retrieve the configuration
entries from the device tree and initialize the related power management
drivers, paving the way for common code in the kernel to deal with idle
states and removing the need for static data in current and previous kernel
versions.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 Documentation/devicetree/bindings/arm/cpus.txt        |  10 +
 Documentation/devicetree/bindings/arm/idle-states.txt | 781 +++++
 2 files changed, 791 insertions(+)

Comments

Sebastian Capella Feb. 19, 2014, 4:04 p.m. UTC | #1
Quoting Lorenzo Pieralisi (2014-02-18 03:47:31)
> +       - index
> +               Usage: Required
> +               Value type: <u32>
> +               Definition: It represents the idle state index.
> +                           An increasing index value implies less power
> +                           consumption. Index must be given a sequential
> +                           value = {0, 1, ....}, starting from 0.
One minor comment.  In the example, it can be tricky to see how this is sequential
since the states interleave.  Not sure if it merits rewording here?

These look good to me!

Thanks!

Sebastian
Lorenzo Pieralisi March 10, 2014, 6:01 p.m. UTC | #2
On Wed, Feb 19, 2014 at 04:04:49PM +0000, Sebastian Capella wrote:
> Quoting Lorenzo Pieralisi (2014-02-18 03:47:31)
> > +       - index
> > +               Usage: Required
> > +               Value type: <u32>
> > +               Definition: It represents the idle state index.
> > +                           An increasing index value implies less power
> > +                           consumption. Index must be given a sequential
> > +                           value = {0, 1, ....}, starting from 0.
> One minor comment.  In the example, it can be tricky to see how this is sequential
> since the states interleave.  Not sure if it merits rewording here?

- index
	Usage: Required
	Value type: <u32>
	Definition: It represents the idle state index.
		    The index must be given an increasing
		    value = {0, 1, ....}, starting from 0, with higher
		    values implying less power consumption.
		    Indices must be unique as seen from a cpu
		    perspective, ie phandles in the cpu nodes [1]
		    cpu-idle-states array property are not allowed to
		    point at idle state nodes having the same index
		    value.

Ack ?

I will post a v5, should be final.

Lorenzo
Sebastian Capella March 10, 2014, 6:22 p.m. UTC | #3
On 10 March 2014 11:01, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> Ack ?

Hi Lorenzo,

Yes, ack from me.

 Acked-by: Sebastian Capella <sebastian.capella@linaro.org>

Thanks!

Sebastian
Rob Herring March 10, 2014, 7:13 p.m. UTC | #4
On Tue, Feb 18, 2014 at 5:47 AM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> ARM based platforms implement a variety of power management schemes that
> allow processors to enter idle states at run-time.
> The parameters defining these idle states vary on a per-platform basis forcing
> the OS to hardcode the state parameters in platform specific static tables
> whose size grows as the number of platforms supported in the kernel increases
> and hampers device drivers standardization.
>
> Therefore, this patch aims at standardizing idle state device tree bindings for
> ARM platforms. Bindings define idle state parameters inclusive of entry methods
> and state latencies, to allow operating systems to retrieve the configuration
> entries from the device tree and initialize the related power management
> drivers, paving the way for common code in the kernel to deal with idle
> states and removing the need for static data in current and previous kernel
> versions.
>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/cpus.txt        |  10 +
>  Documentation/devicetree/bindings/arm/idle-states.txt | 781 +++++
>  2 files changed, 791 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
> index 9130435..fd1fd8d 100644
> --- a/Documentation/devicetree/bindings/arm/cpus.txt
> +++ b/Documentation/devicetree/bindings/arm/cpus.txt
> @@ -191,6 +191,13 @@ nodes to be present and contain the properties described below.
>                           property identifying a 64-bit zero-initialised
>                           memory location.
>
> +       - cpu-idle-states
> +               Usage: Optional
> +               Value type: <prop-encoded-array>
> +               Definition:
> +                       # List of phandles to idle state nodes supported
> +                         by this cpu [1].
> +
>  Example 1 (dual-cluster big.LITTLE system 32-bit):
>
>         cpus {
> @@ -382,3 +389,6 @@ cpus {
>                 cpu-release-addr = <0 0x20000000>;
>         };
>  };
> +
> +[1] ARM Linux kernel documentation - idle states bindings
> +    Documentation/devicetree/bindings/arm/idle-states.txt
> diff --git a/Documentation/devicetree/bindings/arm/idle-states.txt b/Documentation/devicetree/bindings/arm/idle-states.txt
> new file mode 100644
> index 0000000..f9a48a1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/idle-states.txt
> @@ -0,0 +1,781 @@
> +==========================================
> +ARM idle states binding description
> +==========================================
> +
> +==========================================
> +1 - Introduction
> +==========================================
> +
> +ARM systems contain HW capable of managing power consumption dynamically,
> +where cores can be put in different low-power states (ranging from simple
> +wfi to power gating) according to OSPM policies. The CPU states representing
> +the range of dynamic idle states that a processor can enter at run-time, can be
> +specified through device tree bindings representing the parameters required
> +to enter/exit specific idle states on a given processor.
> +
> +According to the Server Base System Architecture document (SBSA, [4]), the
> +power states an ARM CPU can be put into are identified by the following list:
> +
> +- Running
> +- Idle_standby
> +- Idle_retention
> +- Sleep
> +- Off
> +
> +The power states described in the SBSA document define the basic CPU states on
> +top of which ARM platforms implement power management schemes that allow an OS
> +PM implementation to put the processor in different idle states (which include
> +states listed above; "off" state is not an idle state since it does not have
> +wake-up capabilities, hence it is not considered in this document).

Is your only target SBSA compliant systems? If so, we obviously don't
need this since those will all be using ACPI. :)

Either way I'd like to see some real usage of this binding. We
continue to add more and more complexity to cpu related DT bindings
with very little actual use. We don't need bindings for how ARM thinks
h/w should work. We need bindings for how h/w actually works.

I continue to be confused why we added cpu topology bindings yet don't
add information that applies to certain levels in the topology. This
makes me think the topology should just be built into /cpus.

> +
> +Idle state parameters (eg entry latency) are platform specific and need to be
> +characterized with bindings that provide the required information to OSPM
> +code so that it can build the required tables and use them at runtime.
> +
> +The device tree binding definition for ARM idle states is the subject of this
> +document.
> +
> +===========================================
> +2 - idle-states node
> +===========================================
> +
> +ARM processor idle states are defined within the idle-states node, which is
> +a direct child of the cpus node and provides a container where the processor
> +idle states, defined as device tree nodes, are listed.
> +
> +- idle-states node
> +
> +       Usage: Optional - On ARM systems, is a container of processor idle
> +                         states nodes. If the system does not provide CPU
> +                         power management capabilities or the processor just
> +                         supports idle_standby an idle-states node is not
> +                         required.
> +
> +       Description: idle-states node is a container node, where its
> +                    subnodes describe the CPU idle states.
> +
> +       Node name must be "idle-states".
> +
> +       The idle-states node's parent node must be the cpus node.
> +
> +       The idle-states node's child nodes can be:
> +
> +       - one or more state nodes
> +
> +       Any other configuration is considered invalid.
> +
> +       An idle-states node defines the following properties:
> +
> +       - entry-method
> +               Usage: Required
> +               Value type: <stringlist>
> +               Definition: Describes the method by which a CPU enters the
> +                           idle states. This property is required and must be
> +                           one of:
> +
> +                           - "arm,psci-cpu-suspend"
> +                             ARM PSCI firmware interface, CPU suspend
> +                             method[3].
> +
> +                           - "[vendor],[method]"
> +                             An implementation dependent string with
> +                             format "vendor,method", where vendor is a string
> +                             denoting the name of the manufacturer and
> +                             method is a string specifying the mechanism
> +                             used to enter the idle state.
> +
> +The nodes describing the idle states (state) can only be defined within the
> +idle-states node.
> +
> +Any other configuration is consider invalid and therefore must be ignored.
> +
> +===========================================
> +3 - state node
> +===========================================
> +
> +A state node represents an idle state description and must be defined as
> +follows:
> +
> +- state node
> +
> +       Description: must be child of either the idle-states node or
> +                    a state node.
> +
> +       The state node name shall follow standard device tree naming
> +       rules ([6], 2.2.1 "Node names"), in particular state nodes which
> +       are siblings within a single common parent must be given a unique name.
> +
> +       The idle state entered by executing the wfi instruction (idle_standby
> +       SBSA,[4][5]) is considered standard on all ARM platforms and therefore
> +       must not be listed.
> +
> +       A state node can contain state child nodes. A state node with
> +       children represents a hierarchical state, which is a superset of
> +       the child states. Hierarchical states require all CPUs on which
> +       they are valid (ie cpu nodes [1] containing cpu-idle-states arrays
> +       having a phandle to the state) to request the state in order for it
> +       to be entered.
> +
> +       A state node defines the following properties:
> +
> +       - compatible
> +               Usage: Required
> +               Value type: <stringlist>
> +               Definition: Must be "arm,idle-state".
> +
> +       - index
> +               Usage: Required
> +               Value type: <u32>
> +               Definition: It represents the idle state index.
> +                           An increasing index value implies less power
> +                           consumption. Index must be given a sequential
> +                           value = {0, 1, ....}, starting from 0.
> +                           Phandles in the cpu nodes [1] cpu-idle-states
> +                           array property are not allowed to point at idle
> +                           state nodes having the same index value.

Generally, we don't do indexes in DT. Why is this not just the order
of states defined in the DT.

cpuidle wants to know the power consumption for a state as well as
latencies. While I'm not for just putting what Linux wants into DT,
that does seem like a h/w property. How do you plan to handle that?
Maybe it is deemed to not really be useful information. After all, I
just made shit up for highbank.

> +
> +       - logic-state-retained
> +               Usage: See definition
> +               Value type: <none>
> +               Definition: if present logic is retained on state entry,
> +                           otherwise it is lost.
> +
> +       - cache-state-retained
> +               Usage: See definition
> +               Value type: <none>
> +               Definition: if present cache memory is retained on state entry,
> +                           otherwise it is lost.
> +
> +       - entry-method-param
> +               Usage: See definition.
> +               Value type: <u32>
> +               Definition: Depends on the idle-states node entry-method
> +                           property value. Refer to the entry-method bindings
> +                           for this property value definition.
> +
> +       - entry-latency
> +               Usage: Required
> +               Value type: <prop-encoded-array>
> +               Definition: u32 value representing worst case latency
> +                           in microseconds required to enter the idle state.

Append times with the unit. "-us" in this case.

> +
> +       - exit-latency
> +               Usage: Required
> +               Value type: <prop-encoded-array>
> +               Definition: u32 value representing worst case latency
> +                           in microseconds required to exit the idle state.

ditto

> +
> +       - min-residency
> +               Usage: Required
> +               Value type: <prop-encoded-array>
> +               Definition: u32 value representing time in microseconds
> +                           required for the CPU to be in the idle state to
> +                           break even in power consumption terms compared
> +                           to idle state idle_standby ([4][5]).

ditto

> +
> +       - power-domains
> +               Usage: Optional
> +               Value type: <prop-encoded-array>
> +               Definition: List of power domain specifiers ([2]) describing
> +                           the power domains that are affected by the idle
> +                           state entry. All devices whose power-domain phandle
> +                           points at one of the power domains listed in this
> +                           property are affected by the idle state entry.
> +
> +
> +===========================================
> +4 - Examples
> +===========================================
> +
> +Example 1 (ARM 64-bit, 16-cpu system):
> +
> +pd_clusters: power-domain-clusters@80002000 {
> +       compatible = "arm,power-controller";
> +       reg = <0x0 0x80002000 0x0 0x1000>;
> +       #power-domain-cells = <1>;
> +       #address-cells = <2>;
> +       #size-cells = <2>;
> +
> +       pd_cores: power-domain-cores@80000000 {
> +               compatible = "arm,power-controller";
> +               reg = <0x0 0x80000000 0x0 0x1000>;
> +               #power-domain-cells = <1>;
> +       };
> +};
> +
> +cpus {
> +       #size-cells = <0>;
> +       #address-cells = <2>;
> +
> +       idle-states {
> +               entry-method = "arm,psci-cpu-suspend";
> +
> +               CLUSTER_RET_0: cluster-ret-0 {
> +                       /* cluster retention */
> +                       compatible = "arm,idle-state";
> +                       index = <2>;
> +                       logic-state-retained;
> +                       cache-state-retained;
> +                       entry-method-param = <0x1010000>;
> +                       entry-latency = <50>;
> +                       exit-latency = <100>;
> +                       min-residency = <250>;
> +                       power-domains = <&pd_clusters 0>;
> +                       CPU_RET_0_0: cpu-ret-0 {

As I pointed out, here we have topology definition and it is
independent of the cpu topology binding.

I'd prefer to see retention spelled out.

> +                               /* cpu retention */

then the comment wouldn't be needed.

> +                               compatible = "arm,idle-state";
> +                               index = <0>;
> +                               cache-state-retained;
> +                               entry-method-param = <0x0010000>;
> +                               entry-latency = <20>;
> +                               exit-latency = <40>;
> +                               min-residency = <30>;
> +                               power-domains = <&pd_cores 0>,
> +                                               <&pd_cores 1>,
> +                                               <&pd_cores 2>,
> +                                               <&pd_cores 3>,
> +                                               <&pd_cores 4>,
> +                                               <&pd_cores 5>,
> +                                               <&pd_cores 6>,
> +                                               <&pd_cores 7>;

I don't like this. The power domain phandle for a core belongs with the core.

What if you have groups of 2 cores in 1 domain? It doesn't work and
that's a very common scenario in current h/w.

Rob
Lorenzo Pieralisi March 11, 2014, 12:51 p.m. UTC | #5
On Mon, Mar 10, 2014 at 07:13:04PM +0000, Rob Herring wrote:
> On Tue, Feb 18, 2014 at 5:47 AM, Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
> > ARM based platforms implement a variety of power management schemes that
> > allow processors to enter idle states at run-time.
> > The parameters defining these idle states vary on a per-platform basis forcing
> > the OS to hardcode the state parameters in platform specific static tables
> > whose size grows as the number of platforms supported in the kernel increases
> > and hampers device drivers standardization.
> >
> > Therefore, this patch aims at standardizing idle state device tree bindings for
> > ARM platforms. Bindings define idle state parameters inclusive of entry methods
> > and state latencies, to allow operating systems to retrieve the configuration
> > entries from the device tree and initialize the related power management
> > drivers, paving the way for common code in the kernel to deal with idle
> > states and removing the need for static data in current and previous kernel
> > versions.
> >
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > ---
> >  Documentation/devicetree/bindings/arm/cpus.txt        |  10 +
> >  Documentation/devicetree/bindings/arm/idle-states.txt | 781 +++++
> >  2 files changed, 791 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
> > index 9130435..fd1fd8d 100644
> > --- a/Documentation/devicetree/bindings/arm/cpus.txt
> > +++ b/Documentation/devicetree/bindings/arm/cpus.txt
> > @@ -191,6 +191,13 @@ nodes to be present and contain the properties described below.
> >                           property identifying a 64-bit zero-initialised
> >                           memory location.
> >
> > +       - cpu-idle-states
> > +               Usage: Optional
> > +               Value type: <prop-encoded-array>
> > +               Definition:
> > +                       # List of phandles to idle state nodes supported
> > +                         by this cpu [1].
> > +
> >  Example 1 (dual-cluster big.LITTLE system 32-bit):
> >
> >         cpus {
> > @@ -382,3 +389,6 @@ cpus {
> >                 cpu-release-addr = <0 0x20000000>;
> >         };
> >  };
> > +
> > +[1] ARM Linux kernel documentation - idle states bindings
> > +    Documentation/devicetree/bindings/arm/idle-states.txt
> > diff --git a/Documentation/devicetree/bindings/arm/idle-states.txt b/Documentation/devicetree/bindings/arm/idle-states.txt
> > new file mode 100644
> > index 0000000..f9a48a1
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/arm/idle-states.txt
> > @@ -0,0 +1,781 @@
> > +==========================================
> > +ARM idle states binding description
> > +==========================================
> > +
> > +==========================================
> > +1 - Introduction
> > +==========================================
> > +
> > +ARM systems contain HW capable of managing power consumption dynamically,
> > +where cores can be put in different low-power states (ranging from simple
> > +wfi to power gating) according to OSPM policies. The CPU states representing
> > +the range of dynamic idle states that a processor can enter at run-time, can be
> > +specified through device tree bindings representing the parameters required
> > +to enter/exit specific idle states on a given processor.
> > +
> > +According to the Server Base System Architecture document (SBSA, [4]), the
> > +power states an ARM CPU can be put into are identified by the following list:
> > +
> > +- Running
> > +- Idle_standby
> > +- Idle_retention
> > +- Sleep
> > +- Off
> > +
> > +The power states described in the SBSA document define the basic CPU states on
> > +top of which ARM platforms implement power management schemes that allow an OS
> > +PM implementation to put the processor in different idle states (which include
> > +states listed above; "off" state is not an idle state since it does not have
> > +wake-up capabilities, hence it is not considered in this document).
> 
> Is your only target SBSA compliant systems? If so, we obviously don't
> need this since those will all be using ACPI. :)

SBSA defines nomenclature "on top of which ARM platforms implement power
management schemes". I think that's proper wording, ACPI or DT.

> Either way I'd like to see some real usage of this binding. We
> continue to add more and more complexity to cpu related DT bindings
> with very little actual use. We don't need bindings for how ARM thinks
> h/w should work. We need bindings for how h/w actually works.

That's great and that's what these bindings are meant for.
If you and other reviewers out there spot inconsinstencies with how
"h/w actually works (TM)", flag this up. I am not posting these bindings
to define how ARM thinks h/w should work, I really do not understand
why you think that's the case.

I will be posting a generic PSCI based CPU idle driver soon.

> I continue to be confused why we added cpu topology bindings yet don't
> add information that applies to certain levels in the topology. This
> makes me think the topology should just be built into /cpus.

And how is that different from cpu-map ?

Are you referring to OPPs ? What do you mean by "built into /cpus" ?

The first reason why we defined the cpu-map was to override MPIDR
configurations. If we want to use that for other reasons (use phandle to
topology nodes to group CPUs) that's still fine.

I told you already, it was not an easy decision to make and I am
always open to suggestions, if you have a solution in mind post it.

> > +
> > +Idle state parameters (eg entry latency) are platform specific and need to be
> > +characterized with bindings that provide the required information to OSPM
> > +code so that it can build the required tables and use them at runtime.
> > +
> > +The device tree binding definition for ARM idle states is the subject of this
> > +document.
> > +
> > +===========================================
> > +2 - idle-states node
> > +===========================================
> > +
> > +ARM processor idle states are defined within the idle-states node, which is
> > +a direct child of the cpus node and provides a container where the processor
> > +idle states, defined as device tree nodes, are listed.
> > +
> > +- idle-states node
> > +
> > +       Usage: Optional - On ARM systems, is a container of processor idle
> > +                         states nodes. If the system does not provide CPU
> > +                         power management capabilities or the processor just
> > +                         supports idle_standby an idle-states node is not
> > +                         required.
> > +
> > +       Description: idle-states node is a container node, where its
> > +                    subnodes describe the CPU idle states.
> > +
> > +       Node name must be "idle-states".
> > +
> > +       The idle-states node's parent node must be the cpus node.
> > +
> > +       The idle-states node's child nodes can be:
> > +
> > +       - one or more state nodes
> > +
> > +       Any other configuration is considered invalid.
> > +
> > +       An idle-states node defines the following properties:
> > +
> > +       - entry-method
> > +               Usage: Required
> > +               Value type: <stringlist>
> > +               Definition: Describes the method by which a CPU enters the
> > +                           idle states. This property is required and must be
> > +                           one of:
> > +
> > +                           - "arm,psci-cpu-suspend"
> > +                             ARM PSCI firmware interface, CPU suspend
> > +                             method[3].
> > +
> > +                           - "[vendor],[method]"
> > +                             An implementation dependent string with
> > +                             format "vendor,method", where vendor is a string
> > +                             denoting the name of the manufacturer and
> > +                             method is a string specifying the mechanism
> > +                             used to enter the idle state.
> > +
> > +The nodes describing the idle states (state) can only be defined within the
> > +idle-states node.
> > +
> > +Any other configuration is consider invalid and therefore must be ignored.
> > +
> > +===========================================
> > +3 - state node
> > +===========================================
> > +
> > +A state node represents an idle state description and must be defined as
> > +follows:
> > +
> > +- state node
> > +
> > +       Description: must be child of either the idle-states node or
> > +                    a state node.
> > +
> > +       The state node name shall follow standard device tree naming
> > +       rules ([6], 2.2.1 "Node names"), in particular state nodes which
> > +       are siblings within a single common parent must be given a unique name.
> > +
> > +       The idle state entered by executing the wfi instruction (idle_standby
> > +       SBSA,[4][5]) is considered standard on all ARM platforms and therefore
> > +       must not be listed.
> > +
> > +       A state node can contain state child nodes. A state node with
> > +       children represents a hierarchical state, which is a superset of
> > +       the child states. Hierarchical states require all CPUs on which
> > +       they are valid (ie cpu nodes [1] containing cpu-idle-states arrays
> > +       having a phandle to the state) to request the state in order for it
> > +       to be entered.
> > +
> > +       A state node defines the following properties:
> > +
> > +       - compatible
> > +               Usage: Required
> > +               Value type: <stringlist>
> > +               Definition: Must be "arm,idle-state".
> > +
> > +       - index
> > +               Usage: Required
> > +               Value type: <u32>
> > +               Definition: It represents the idle state index.
> > +                           An increasing index value implies less power
> > +                           consumption. Index must be given a sequential
> > +                           value = {0, 1, ....}, starting from 0.
> > +                           Phandles in the cpu nodes [1] cpu-idle-states
> > +                           array property are not allowed to point at idle
> > +                           state nodes having the same index value.
> 
> Generally, we don't do indexes in DT. Why is this not just the order
> of states defined in the DT.

Because I need a way to order states in terms of power consumption.

> cpuidle wants to know the power consumption for a state as well as
> latencies. While I'm not for just putting what Linux wants into DT,
> that does seem like a h/w property. How do you plan to handle that?

Linux does not require power consumption for a state anymore. Ordering
is needed (Linux and possibly other OS) that's what index is supposed to do,
increasing indices meaning less power consumption.

Adding a h/w property for power consumption is extremely hard to define
because it depends on loads of parameters and buys us nothing. Ordering
is important, though.

> Maybe it is deemed to not really be useful information. After all, I
> just made shit up for highbank.

That's great to read, maybe we should NAK this patch and made all data
up in the kernel for the upcoming CPU idle drivers.

Or we improve it and get it in the kernel to revert that status quo.

> > +
> > +       - logic-state-retained
> > +               Usage: See definition
> > +               Value type: <none>
> > +               Definition: if present logic is retained on state entry,
> > +                           otherwise it is lost.
> > +
> > +       - cache-state-retained
> > +               Usage: See definition
> > +               Value type: <none>
> > +               Definition: if present cache memory is retained on state entry,
> > +                           otherwise it is lost.
> > +
> > +       - entry-method-param
> > +               Usage: See definition.
> > +               Value type: <u32>
> > +               Definition: Depends on the idle-states node entry-method
> > +                           property value. Refer to the entry-method bindings
> > +                           for this property value definition.
> > +
> > +       - entry-latency
> > +               Usage: Required
> > +               Value type: <prop-encoded-array>
> > +               Definition: u32 value representing worst case latency
> > +                           in microseconds required to enter the idle state.
> 
> Append times with the unit. "-us" in this case.

Ok.

> 
> > +
> > +       - exit-latency
> > +               Usage: Required
> > +               Value type: <prop-encoded-array>
> > +               Definition: u32 value representing worst case latency
> > +                           in microseconds required to exit the idle state.
> 
> ditto
> 
> > +
> > +       - min-residency
> > +               Usage: Required
> > +               Value type: <prop-encoded-array>
> > +               Definition: u32 value representing time in microseconds
> > +                           required for the CPU to be in the idle state to
> > +                           break even in power consumption terms compared
> > +                           to idle state idle_standby ([4][5]).
> 
> ditto
> 
> > +
> > +       - power-domains
> > +               Usage: Optional
> > +               Value type: <prop-encoded-array>
> > +               Definition: List of power domain specifiers ([2]) describing
> > +                           the power domains that are affected by the idle
> > +                           state entry. All devices whose power-domain phandle
> > +                           points at one of the power domains listed in this
> > +                           property are affected by the idle state entry.
> > +
> > +
> > +===========================================
> > +4 - Examples
> > +===========================================
> > +
> > +Example 1 (ARM 64-bit, 16-cpu system):
> > +
> > +pd_clusters: power-domain-clusters@80002000 {
> > +       compatible = "arm,power-controller";
> > +       reg = <0x0 0x80002000 0x0 0x1000>;
> > +       #power-domain-cells = <1>;
> > +       #address-cells = <2>;
> > +       #size-cells = <2>;
> > +
> > +       pd_cores: power-domain-cores@80000000 {
> > +               compatible = "arm,power-controller";
> > +               reg = <0x0 0x80000000 0x0 0x1000>;
> > +               #power-domain-cells = <1>;
> > +       };
> > +};
> > +
> > +cpus {
> > +       #size-cells = <0>;
> > +       #address-cells = <2>;
> > +
> > +       idle-states {
> > +               entry-method = "arm,psci-cpu-suspend";
> > +
> > +               CLUSTER_RET_0: cluster-ret-0 {
> > +                       /* cluster retention */
> > +                       compatible = "arm,idle-state";
> > +                       index = <2>;
> > +                       logic-state-retained;
> > +                       cache-state-retained;
> > +                       entry-method-param = <0x1010000>;
> > +                       entry-latency = <50>;
> > +                       exit-latency = <100>;
> > +                       min-residency = <250>;
> > +                       power-domains = <&pd_clusters 0>;
> > +                       CPU_RET_0_0: cpu-ret-0 {
> 
> As I pointed out, here we have topology definition and it is
> independent of the cpu topology binding.

Early version of the patches used cpu-map here to define on which CPUs
the state is valid. I can remove the cpu-idle-states list of phandles
from the cpu nodes and define a phandle in every idle state pointing at
topology nodes to describe on which CPUs that state is valid.

Is that what you want to see ? BTW, this is the only reason why I have
not posted the generic idle code yet, I want to understand if there is a
dependency on cpu-map parsing code first.

There is another and more important reason: what if the power domain layout
does not follow the topology (a power domain for only two cores in a
cluster of 4) ? Weird, but possible. I am just trying to cater for all
sensible cases from the beginning, and not as an afterthought.

> I'd prefer to see retention spelled out.

Both node name and tag ? That's cumbersome, but I will do it.

> 
> > +                               /* cpu retention */
> 
> then the comment wouldn't be needed.
> 
> > +                               compatible = "arm,idle-state";
> > +                               index = <0>;
> > +                               cache-state-retained;
> > +                               entry-method-param = <0x0010000>;
> > +                               entry-latency = <20>;
> > +                               exit-latency = <40>;
> > +                               min-residency = <30>;
> > +                               power-domains = <&pd_cores 0>,
> > +                                               <&pd_cores 1>,
> > +                                               <&pd_cores 2>,
> > +                                               <&pd_cores 3>,
> > +                                               <&pd_cores 4>,
> > +                                               <&pd_cores 5>,
> > +                                               <&pd_cores 6>,
> > +                                               <&pd_cores 7>;
> 
> I don't like this. The power domain phandle for a core belongs with the core.

The power domains list define all power domains affected by the idle
state entry. In that specific case, it is a core power-gating state valid on
some of the CPUs, and the list defines all power domains affected.
If we have a separate power domain for caches, or CPU peripherals this
allows us to define what "CPU" components are affected by the idle state
entry.

A CPU becomes just another device, attached to a list of power domains.

And in the process, it avoids replicating the same idle state for every
given CPU.

> What if you have groups of 2 cores in 1 domain? It doesn't work and
> that's a very common scenario in current h/w.

You define an idle state, attach it to that domain and the two cores point
at it in their cpu-idle-states phandle list. Or if you prefer, the idle
state points at a node in the cpu-map defining the two cores (but please
see my comment above).

I just need to fix a discrepancy related to the definition of hierarchical
states, whch cpus are affected by what state can be detected by using
power domains phandles in the cpu node.

Thanks for having a look,
Lorenzo
Antti P Miettinen March 17, 2014, 11:15 a.m. UTC | #6
Sorry for having been lazy in commenting..

From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Date: Tue, 18 Feb 2014 11:47:31 +0000
> +	- min-residency
> +		Usage: Required
> +		Value type: <prop-encoded-array>
> +		Definition: u32 value representing time in microseconds
> +			    required for the CPU to be in the idle state to
> +			    break even in power consumption terms compared
> +			    to idle state idle_standby ([4][5]).

To me this continues to be a bit illdefined. Say we have three states:
0,1,2. State 0 is the idle_standby. Providing a minimum residency for
state 1 compared to state 0 sort of makes sense, but if we provide a
minimum residency for state 2 compared to state 0 the break even time
is going to be smaller than break even when comparing state 1 and
state 2. With this data we'd enter state 2 when we'd be better off
entering state 1.

	--Antti
Lorenzo Pieralisi March 17, 2014, 11:53 a.m. UTC | #7
Hi Antti,

On Mon, Mar 17, 2014 at 11:15:07AM +0000, Antti P Miettinen wrote:
> Sorry for having been lazy in commenting..

No worries, comments always welcome.

> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Date: Tue, 18 Feb 2014 11:47:31 +0000
> > +	- min-residency
> > +		Usage: Required
> > +		Value type: <prop-encoded-array>
> > +		Definition: u32 value representing time in microseconds
> > +			    required for the CPU to be in the idle state to
> > +			    break even in power consumption terms compared
> > +			    to idle state idle_standby ([4][5]).
> 
> To me this continues to be a bit illdefined. Say we have three states:
> 0,1,2. State 0 is the idle_standby. Providing a minimum residency for
> state 1 compared to state 0 sort of makes sense, but if we provide a
> minimum residency for state 2 compared to state 0 the break even time
> is going to be smaller than break even when comparing state 1 and
> state 2. With this data we'd enter state 2 when we'd be better off
> entering state 1.

I am not sure I got your reply right, but min-residency for
state 2 will be higher than state 1, since it has to cater for the
dynamic power consumed by entering the state (but burns less power
than state 1 when _in_ the state).

Entering a state has a power cost and min-residency should take that into
account, worst-case as per other stats.

min-residency (and so the break-even) should take into account that
entering the state is not for free.

I think that comparing against idle_standby is the only sane way we can
define that parameter, either that or we remove it.

Does it make sense ?

Thanks !
Lorenzo
Antti P Miettinen March 17, 2014, 1:49 p.m. UTC | #8
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Hi Antti,
> 
> On Mon, Mar 17, 2014 at 11:15:07AM +0000, Antti P Miettinen wrote:
>> Sorry for having been lazy in commenting..
> 
> No worries, comments always welcome.
> 
>> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> Date: Tue, 18 Feb 2014 11:47:31 +0000
>> > +	- min-residency
>> > +		Usage: Required
>> > +		Value type: <prop-encoded-array>
>> > +		Definition: u32 value representing time in microseconds
>> > +			    required for the CPU to be in the idle state to
>> > +			    break even in power consumption terms compared
>> > +			    to idle state idle_standby ([4][5]).
>> 
>> To me this continues to be a bit illdefined. Say we have three states:
>> 0,1,2. State 0 is the idle_standby. Providing a minimum residency for
>> state 1 compared to state 0 sort of makes sense, but if we provide a
>> minimum residency for state 2 compared to state 0 the break even time
>> is going to be smaller than break even when comparing state 1 and
>> state 2. With this data we'd enter state 2 when we'd be better off
>> entering state 1.
> 
> I am not sure I got your reply right, but min-residency for
> state 2 will be higher than state 1, since it has to cater for the
> dynamic power consumed by entering the state (but burns less power
> than state 1 when _in_ the state).
> 
> Entering a state has a power cost and min-residency should take that into
> account, worst-case as per other stats.
> 
> min-residency (and so the break-even) should take into account that
> entering the state is not for free.
> 
> I think that comparing against idle_standby is the only sane way we can
> define that parameter, either that or we remove it.
> 
> Does it make sense ?
> 
> Thanks !
> Lorenzo

The point is that if you compare breakeven between state 0 and state 2
the breakeven time will be smaller that when you compare the breakeven
between state 1 and state 2. Assuming states ordered by "deepness" in
the sense that deeper states have lower in-state power and longer
entry/exit times.

I guess you could specify that the min-residency defines the time when
the state breaks even compared to the previous (shallower) state.

	--Antti
Lorenzo Pieralisi March 17, 2014, 2:45 p.m. UTC | #9
On Mon, Mar 17, 2014 at 01:49:40PM +0000, Antti P Miettinen wrote:
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Hi Antti,
> > 
> > On Mon, Mar 17, 2014 at 11:15:07AM +0000, Antti P Miettinen wrote:
> >> Sorry for having been lazy in commenting..
> > 
> > No worries, comments always welcome.
> > 
> >> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> >> Date: Tue, 18 Feb 2014 11:47:31 +0000
> >> > +	- min-residency
> >> > +		Usage: Required
> >> > +		Value type: <prop-encoded-array>
> >> > +		Definition: u32 value representing time in microseconds
> >> > +			    required for the CPU to be in the idle state to
> >> > +			    break even in power consumption terms compared
> >> > +			    to idle state idle_standby ([4][5]).
> >> 
> >> To me this continues to be a bit illdefined. Say we have three states:
> >> 0,1,2. State 0 is the idle_standby. Providing a minimum residency for
> >> state 1 compared to state 0 sort of makes sense, but if we provide a
> >> minimum residency for state 2 compared to state 0 the break even time
> >> is going to be smaller than break even when comparing state 1 and
> >> state 2. With this data we'd enter state 2 when we'd be better off
> >> entering state 1.
> > 
> > I am not sure I got your reply right, but min-residency for
> > state 2 will be higher than state 1, since it has to cater for the
> > dynamic power consumed by entering the state (but burns less power
> > than state 1 when _in_ the state).
> > 
> > Entering a state has a power cost and min-residency should take that into
> > account, worst-case as per other stats.
> > 
> > min-residency (and so the break-even) should take into account that
> > entering the state is not for free.
> > 
> > I think that comparing against idle_standby is the only sane way we can
> > define that parameter, either that or we remove it.
> > 
> > Does it make sense ?
> > 
> > Thanks !
> > Lorenzo
> 
> The point is that if you compare breakeven between state 0 and state 2
> the breakeven time will be smaller that when you compare the breakeven
> between state 1 and state 2. Assuming states ordered by "deepness" in
> the sense that deeper states have lower in-state power and longer
> entry/exit times.
> 
> I guess you could specify that the min-residency defines the time when
> the state breaks even compared to the previous (shallower) state.

I am not following Antti I am sorry. States are ordered in terms of
power consumption which also means that deeper idle states have a longer
required min-residency to break even against idle_standby in order to actually
save power.

When we make a decision on what idle state to enter all we do, and
that's OS agnostic, is predicting (+checking the next event) the next IRQ and
see if it is worth entering a state or not. We have to compare it against
a baseline, which is the processor being in standbywfi and that's what
these bindings define.

I do not understand why you want to define min-residency against the
previous shallower state.

What this binding says is: standbywfi is the shallower idle state in
power consumption terms. Deeper idle states save more power than
standbywfi if the residency in that state is at least min-residency.

I do not see where the problem is to be honest, maybe I need an example.

Thanks!
Lorenzo
Antti P Miettinen March 17, 2014, 6:26 p.m. UTC | #10
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> When we make a decision on what idle state to enter all we do, and
> that's OS agnostic, is predicting (+checking the next event) the next IRQ and
> see if it is worth entering a state or not. We have to compare it against
> a baseline, which is the processor being in standbywfi and that's what
> these bindings define.
> 
> I do not understand why you want to define min-residency against the
> previous shallower state.
> 
> What this binding says is: standbywfi is the shallower idle state in
> power consumption terms. Deeper idle states save more power than
> standbywfi if the residency in that state is at least min-residency.
> 
> I do not see where the problem is to be honest, maybe I need an example.
> 
> Thanks!
> Lorenzo

Sorry, I should have explained myself more clearly. I've been
pondering about these issues somewhat lately so I'm perhaps suffering
from a bit of a tunnel vision.

In short, when we choose an idle state based on expected idle duration
we are not comparing wfi against all possible idle states in turn and
making a decision between wfi and state X. Instead we want to choose
among all states the one that gives minimum energy for the expected
idle time. I'll try to elaborate..

Entering and exiting idle states takes time at nonzero power. To make
up for this lost energy we indeed want the time in the idle state to
be sufficiently long to make up for the lost energy. Now the important
question here is "make up compared to what?".

The energy over the idle time can be also interpreted as average
power. When the idle time increases the average power for a state
approaches the in-state power. A deeper idle state would be a state
with lower in-state power and longer entry/exit time. Therefore the
average power for a deeper idle state drops slower as function of idle
time than the average power for a shallower idle state. If we'd plot
the average power for a number of idle states as function of idle
duration, we'd get a set of "constant over idle time plus constant"
style curves. Average power for state 0 will drop fastest close to the
in-state power of state 0. Average power for state 1 will drop slower
and approach the in-state power of state 1, average power for state 2
will drop even slower and approach the in-state power of state 3.

To define that the min-residency is the breakeven time against state 0
means that we are looking at the curves and asking "when does the
average power for state X cross the average power for state 0?". But
that would be the guideline for making a decision between state 0 and
the state in question. Even if average power for state 2 is below
the average power of state 0 it is not necessarily yet below the
average power of state 1. To break even against state 1 the idle
duration needs to be longer.

Yet another way to look at this: for three states we can define three
times of interest:
- t1: the time when state1 breaks even against state0
- t2: the time when state2 breaks even against state0
- t3: the time when state2 breaks even against state1
and t3 would typically be larger than t2.

	--Antti
Lorenzo Pieralisi March 17, 2014, 7:24 p.m. UTC | #11
On Mon, Mar 17, 2014 at 06:26:38PM +0000, Antti P Miettinen wrote:
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > When we make a decision on what idle state to enter all we do, and
> > that's OS agnostic, is predicting (+checking the next event) the next IRQ and
> > see if it is worth entering a state or not. We have to compare it against
> > a baseline, which is the processor being in standbywfi and that's what
> > these bindings define.
> > 
> > I do not understand why you want to define min-residency against the
> > previous shallower state.
> > 
> > What this binding says is: standbywfi is the shallower idle state in
> > power consumption terms. Deeper idle states save more power than
> > standbywfi if the residency in that state is at least min-residency.
> > 
> > I do not see where the problem is to be honest, maybe I need an example.
> > 
> > Thanks!
> > Lorenzo
> 
> Sorry, I should have explained myself more clearly. I've been
> pondering about these issues somewhat lately so I'm perhaps suffering
> from a bit of a tunnel vision.
> 
> In short, when we choose an idle state based on expected idle duration
> we are not comparing wfi against all possible idle states in turn and
> making a decision between wfi and state X. Instead we want to choose
> among all states the one that gives minimum energy for the expected
> idle time. I'll try to elaborate..
> 
> Entering and exiting idle states takes time at nonzero power. To make
> up for this lost energy we indeed want the time in the idle state to
> be sufficiently long to make up for the lost energy. Now the important
> question here is "make up compared to what?".
> 
> The energy over the idle time can be also interpreted as average
> power. When the idle time increases the average power for a state
> approaches the in-state power. A deeper idle state would be a state
> with lower in-state power and longer entry/exit time. Therefore the
> average power for a deeper idle state drops slower as function of idle
> time than the average power for a shallower idle state. If we'd plot
> the average power for a number of idle states as function of idle
> duration, we'd get a set of "constant over idle time plus constant"
> style curves. Average power for state 0 will drop fastest close to the
> in-state power of state 0. Average power for state 1 will drop slower
> and approach the in-state power of state 1, average power for state 2
> will drop even slower and approach the in-state power of state 3.
> 
> To define that the min-residency is the breakeven time against state 0
> means that we are looking at the curves and asking "when does the
> average power for state X cross the average power for state 0?". But
> that would be the guideline for making a decision between state 0 and
> the state in question. Even if average power for state 2 is below
> the average power of state 0 it is not necessarily yet below the
> average power of state 1. To break even against state 1 the idle
> duration needs to be longer.
> 
> Yet another way to look at this: for three states we can define three
> times of interest:
> - t1: the time when state1 breaks even against state0
> - t2: the time when state2 breaks even against state0
> - t3: the time when state2 breaks even against state1
> and t3 would typically be larger than t2.

Now it is crystal clear, and you are absolutely right, sorry for
misunderstanding.

Help me define it then please:

- min-residency-us

"u32 value representing time in microseconds required for the CPU to be in
the idle state to guarantee power savings maximization".

Rather vague (on purpose), if anyone comes up with a better definition please
shout.

Thanks !
Lorenzo
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
index 9130435..fd1fd8d 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -191,6 +191,13 @@  nodes to be present and contain the properties described below.
 			  property identifying a 64-bit zero-initialised
 			  memory location.
 
+	- cpu-idle-states
+		Usage: Optional
+		Value type: <prop-encoded-array>
+		Definition:
+			# List of phandles to idle state nodes supported
+			  by this cpu [1].
+
 Example 1 (dual-cluster big.LITTLE system 32-bit):
 
 	cpus {
@@ -382,3 +389,6 @@  cpus {
 		cpu-release-addr = <0 0x20000000>;
 	};
 };
+
+[1] ARM Linux kernel documentation - idle states bindings
+    Documentation/devicetree/bindings/arm/idle-states.txt
diff --git a/Documentation/devicetree/bindings/arm/idle-states.txt b/Documentation/devicetree/bindings/arm/idle-states.txt
new file mode 100644
index 0000000..f9a48a1
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/idle-states.txt
@@ -0,0 +1,781 @@ 
+==========================================
+ARM idle states binding description
+==========================================
+
+==========================================
+1 - Introduction
+==========================================
+
+ARM systems contain HW capable of managing power consumption dynamically,
+where cores can be put in different low-power states (ranging from simple
+wfi to power gating) according to OSPM policies. The CPU states representing
+the range of dynamic idle states that a processor can enter at run-time, can be
+specified through device tree bindings representing the parameters required
+to enter/exit specific idle states on a given processor.
+
+According to the Server Base System Architecture document (SBSA, [4]), the
+power states an ARM CPU can be put into are identified by the following list:
+
+- Running
+- Idle_standby
+- Idle_retention
+- Sleep
+- Off
+
+The power states described in the SBSA document define the basic CPU states on
+top of which ARM platforms implement power management schemes that allow an OS
+PM implementation to put the processor in different idle states (which include
+states listed above; "off" state is not an idle state since it does not have
+wake-up capabilities, hence it is not considered in this document).
+
+Idle state parameters (eg entry latency) are platform specific and need to be
+characterized with bindings that provide the required information to OSPM
+code so that it can build the required tables and use them at runtime.
+
+The device tree binding definition for ARM idle states is the subject of this
+document.
+
+===========================================
+2 - idle-states node
+===========================================
+
+ARM processor idle states are defined within the idle-states node, which is
+a direct child of the cpus node and provides a container where the processor
+idle states, defined as device tree nodes, are listed.
+
+- idle-states node
+
+	Usage: Optional - On ARM systems, is a container of processor idle
+			  states nodes. If the system does not provide CPU
+			  power management capabilities or the processor just
+			  supports idle_standby an idle-states node is not
+			  required.
+
+	Description: idle-states node is a container node, where its
+		     subnodes describe the CPU idle states.
+
+	Node name must be "idle-states".
+
+	The idle-states node's parent node must be the cpus node.
+
+	The idle-states node's child nodes can be:
+
+	- one or more state nodes
+
+	Any other configuration is considered invalid.
+
+	An idle-states node defines the following properties:
+
+	- entry-method
+		Usage: Required
+		Value type: <stringlist>
+		Definition: Describes the method by which a CPU enters the
+			    idle states. This property is required and must be
+			    one of:
+
+			    - "arm,psci-cpu-suspend"
+			      ARM PSCI firmware interface, CPU suspend
+			      method[3].
+
+			    - "[vendor],[method]"
+			      An implementation dependent string with
+			      format "vendor,method", where vendor is a string
+			      denoting the name of the manufacturer and
+			      method is a string specifying the mechanism
+			      used to enter the idle state.
+
+The nodes describing the idle states (state) can only be defined within the
+idle-states node.
+
+Any other configuration is consider invalid and therefore must be ignored.
+
+===========================================
+3 - state node
+===========================================
+
+A state node represents an idle state description and must be defined as
+follows:
+
+- state node
+
+	Description: must be child of either the idle-states node or
+		     a state node.
+
+	The state node name shall follow standard device tree naming
+	rules ([6], 2.2.1 "Node names"), in particular state nodes which
+	are siblings within a single common parent must be given a unique name.
+
+	The idle state entered by executing the wfi instruction (idle_standby
+	SBSA,[4][5]) is considered standard on all ARM platforms and therefore
+	must not be listed.
+
+	A state node can contain state child nodes. A state node with
+	children represents a hierarchical state, which is a superset of
+	the child states. Hierarchical states require all CPUs on which
+	they are valid (ie cpu nodes [1] containing cpu-idle-states arrays
+	having a phandle to the state) to request the state in order for it
+	to be entered.
+
+	A state node defines the following properties:
+
+	- compatible
+		Usage: Required
+		Value type: <stringlist>
+		Definition: Must be "arm,idle-state".
+
+	- index
+		Usage: Required
+		Value type: <u32>
+		Definition: It represents the idle state index.
+			    An increasing index value implies less power
+			    consumption. Index must be given a sequential
+			    value = {0, 1, ....}, starting from 0.
+			    Phandles in the cpu nodes [1] cpu-idle-states
+			    array property are not allowed to point at idle
+			    state nodes having the same index value.
+
+	- logic-state-retained
+		Usage: See definition
+		Value type: <none>
+		Definition: if present logic is retained on state entry,
+			    otherwise it is lost.
+
+	- cache-state-retained
+		Usage: See definition
+		Value type: <none>
+		Definition: if present cache memory is retained on state entry,
+			    otherwise it is lost.
+
+	- entry-method-param
+		Usage: See definition.
+		Value type: <u32>
+		Definition: Depends on the idle-states node entry-method
+			    property value. Refer to the entry-method bindings
+			    for this property value definition.
+
+	- entry-latency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: u32 value representing worst case latency
+			    in microseconds required to enter the idle state.
+
+	- exit-latency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: u32 value representing worst case latency
+			    in microseconds required to exit the idle state.
+
+	- min-residency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: u32 value representing time in microseconds
+			    required for the CPU to be in the idle state to
+			    break even in power consumption terms compared
+			    to idle state idle_standby ([4][5]).
+
+	- power-domains
+		Usage: Optional
+		Value type: <prop-encoded-array>
+		Definition: List of power domain specifiers ([2]) describing
+			    the power domains that are affected by the idle
+			    state entry. All devices whose power-domain phandle
+			    points at one of the power domains listed in this
+			    property are affected by the idle state entry.
+
+
+===========================================
+4 - Examples
+===========================================
+
+Example 1 (ARM 64-bit, 16-cpu system):
+
+pd_clusters: power-domain-clusters@80002000 {
+	compatible = "arm,power-controller";
+	reg = <0x0 0x80002000 0x0 0x1000>;
+	#power-domain-cells = <1>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	pd_cores: power-domain-cores@80000000 {
+		compatible = "arm,power-controller";
+		reg = <0x0 0x80000000 0x0 0x1000>;
+		#power-domain-cells = <1>;
+	};
+};
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <2>;
+
+	idle-states {
+		entry-method = "arm,psci-cpu-suspend";
+
+		CLUSTER_RET_0: cluster-ret-0 {
+			/* cluster retention */
+			compatible = "arm,idle-state";
+			index = <2>;
+			logic-state-retained;
+			cache-state-retained;
+			entry-method-param = <0x1010000>;
+			entry-latency = <50>;
+			exit-latency = <100>;
+			min-residency = <250>;
+			power-domains = <&pd_clusters 0>;
+			CPU_RET_0_0: cpu-ret-0 {
+				/* cpu retention */
+				compatible = "arm,idle-state";
+				index = <0>;
+				cache-state-retained;
+				entry-method-param = <0x0010000>;
+				entry-latency = <20>;
+				exit-latency = <40>;
+				min-residency = <30>;
+				power-domains = <&pd_cores 0>,
+						<&pd_cores 1>,
+						<&pd_cores 2>,
+						<&pd_cores 3>,
+						<&pd_cores 4>,
+						<&pd_cores 5>,
+						<&pd_cores 6>,
+						<&pd_cores 7>;
+			};
+		};
+
+		CLUSTER_SLEEP_0: cluster-sleep-0 {
+			/* cluster sleep */
+			compatible = "arm,idle-state";
+			index = <3>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <600>;
+			exit-latency = <1100>;
+			min-residency = <2700>;
+			power-domains = <&pd_clusters 0>;
+			CPU_SLEEP_0_0: cpu-sleep-0 {
+				/* cpu sleep */
+				compatible = "arm,idle-state";
+				index = <1>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <250>;
+				exit-latency = <500>;
+				min-residency = <350>;
+				power-domains = <&pd_cores 0>,
+						<&pd_cores 1>,
+						<&pd_cores 2>,
+						<&pd_cores 3>,
+						<&pd_cores 4>,
+						<&pd_cores 5>,
+						<&pd_cores 6>,
+						<&pd_cores 7>;
+			};
+		};
+		CLUSTER_RET_1: cluster-ret-1 {
+			/* cluster retention */
+			compatible = "arm,idle-state";
+			index = <2>;
+			logic-state-retained;
+			cache-state-retained;
+			entry-method-param = <0x1010000>;
+			entry-latency = <50>;
+			exit-latency = <100>;
+			min-residency = <270>;
+			power-domains = <&pd_clusters 1>;
+			CPU_RET_1_0: cpu-ret-0 {
+				/* cpu retention */
+				compatible = "arm,idle-state";
+				index = <0>;
+				cache-state-retained;
+				entry-method-param = <0x0010000>;
+				entry-latency = <20>;
+				exit-latency = <40>;
+				min-residency = <30>;
+				power-domains = <&pd_cores 8>,
+						<&pd_cores 9>,
+						<&pd_cores 10>,
+						<&pd_cores 11>,
+						<&pd_cores 12>,
+						<&pd_cores 13>,
+						<&pd_cores 14>,
+						<&pd_cores 15>;
+			};
+		};
+
+		CLUSTER_SLEEP_1: cluster-sleep-1 {
+			/* cluster sleep */
+			compatible = "arm,idle-state";
+			index = <3>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <500>;
+			exit-latency = <1200>;
+			min-residency = <3500>;
+			power-domains = <&pd_clusters 1>;
+			CPU_SLEEP_1_0: cpu-sleep-0 {
+				/* cpu sleep */
+				compatible = "arm,idle-state";
+				index = <1>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <70>;
+				exit-latency = <100>;
+				min-residency = <100>;
+				power-domains = <&pd_cores 8>,
+						<&pd_cores 9>,
+						<&pd_cores 10>,
+						<&pd_cores 11>,
+						<&pd_cores 12>,
+						<&pd_cores 13>,
+						<&pd_cores 14>,
+						<&pd_cores 15>;
+			};
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x0>;
+		enable-method = "psci";
+		next-level-cache = <&L1_0>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_0: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 0>;
+		};
+		L2_0: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 0>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x1>;
+		enable-method = "psci";
+		next-level-cache = <&L1_1>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_1: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 1>;
+		};
+	};
+
+	CPU2: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_2>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_2: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 2>;
+		};
+	};
+
+	CPU3: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_3>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_3: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 3>;
+		};
+	};
+
+	CPU4: cpu@10000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10000>;
+		enable-method = "psci";
+		next-level-cache = <&L1_4>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_4: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 4>;
+		};
+	};
+
+	CPU5: cpu@10001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10001>;
+		enable-method = "psci";
+		next-level-cache = <&L1_5>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_5: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 5>;
+		};
+	};
+
+	CPU6: cpu@10100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_6>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_6: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 6>;
+		};
+	};
+
+	CPU7: cpu@10101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_7>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_7: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 7>;
+		};
+	};
+
+	CPU8: cpu@100000000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x0>;
+		enable-method = "psci";
+		next-level-cache = <&L1_8>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_8: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 8>;
+		};
+		L2_1: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU9: cpu@100000001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x1>;
+		enable-method = "psci";
+		next-level-cache = <&L1_9>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_9: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 9>;
+		};
+	};
+
+	CPU10: cpu@100000100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_10>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_10: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 10>;
+		};
+	};
+
+	CPU11: cpu@100000101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_11>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_11: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 11>;
+		};
+	};
+
+	CPU12: cpu@100010000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10000>;
+		enable-method = "psci";
+		next-level-cache = <&L1_12>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_12: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 12>;
+		};
+	};
+
+	CPU13: cpu@100010001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10001>;
+		enable-method = "psci";
+		next-level-cache = <&L1_13>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_13: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 13>;
+		};
+	};
+
+	CPU14: cpu@100010100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_14>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_14: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 14>;
+		};
+	};
+
+	CPU15: cpu@100010101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_15>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_15: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 15>;
+		};
+	};
+};
+
+Example 2 (ARM 32-bit, 8-cpu system, two clusters):
+
+pd_clusters: power-domain-clusters@80002000 {
+	compatible = "arm,power-controller";
+	reg = <0x80002000 0x1000>;
+	#power-domain-cells = <1>;
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	pd_cores: power-domain-cores@80000000 {
+		compatible = "arm,power-controller";
+		reg = <0x80000000 0x1000>;
+		#power-domain-cells = <1>;
+	};
+};
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <1>;
+
+	idle-states {
+		entry-method = "arm,psci-cpu-suspend";
+
+		CLUSTER_SLEEP_0: cluster-sleep-0 {
+			compatible = "arm,idle-state";
+			index = <1>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <1000>;
+			exit-latency = <1500>;
+			min-residency = <1500>;
+			power-domains = <&pd_clusters 0>;
+			CPU_SLEEP_0_0: cpu-sleep-0 {
+				compatible = "arm,idle-state";
+				index = <0>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <400>;
+				exit-latency = <500>;
+				min-residency = <300>;
+				power-domains = <&pd_cores 0>,
+						<&pd_cores 1>,
+						<&pd_cores 2>,
+						<&pd_cores 3>;
+			};
+		};
+
+		CLUSTER_SLEEP_1: cluster-sleep-1 {
+			compatible = "arm,idle-state";
+			index = <1>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <800>;
+			exit-latency = <2000>;
+			min-residency = <6500>;
+			power-domains = <&pd_clusters 1>;
+			CPU_SLEEP_1_0: cpu-sleep-0 {
+				compatible = "arm,idle-state";
+				index = <0>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <300>;
+				exit-latency = <500>;
+				min-residency = <500>;
+				power-domains = <&pd_cores 4>,
+						<&pd_cores 5>,
+						<&pd_cores 6>,
+						<&pd_cores 7>;
+			};
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x0>;
+		next-level-cache = <&L1_0>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_0: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 0>;
+		};
+		L2_0: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 0>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x1>;
+		next-level-cache = <&L1_1>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_1: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 1>;
+		};
+	};
+
+	CPU2: cpu@2 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x2>;
+		next-level-cache = <&L1_2>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_2: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 2>;
+		};
+	};
+
+	CPU3: cpu@3 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x3>;
+		next-level-cache = <&L1_3>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_3: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 3>;
+		};
+	};
+
+	CPU4: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x100>;
+		next-level-cache = <&L1_4>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_4: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 4>;
+		};
+		L2_1: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU5: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x101>;
+		next-level-cache = <&L1_5>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_5: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 5>;
+		};
+	};
+
+	CPU6: cpu@102 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x102>;
+		next-level-cache = <&L1_6>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_6: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 6>;
+		};
+	};
+
+	CPU7: cpu@103 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x103>;
+		next-level-cache = <&L1_7>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_7: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 7>;
+		};
+	};
+};
+
+===========================================
+4 - References
+===========================================
+
+[1] ARM Linux Kernel documentation - CPUs bindings
+    Documentation/devicetree/bindings/arm/cpus.txt
+
+[2] ARM Linux Kernel documentation - power domain bindings
+    Documentation/devicetree/bindings/power/power_domain.txt
+
+[3] ARM Linux Kernel documentation - PSCI bindings
+    Documentation/devicetree/bindings/arm/psci.txt
+
+[4] ARM Server Base System Architecture (SBSA)
+    http://infocenter.arm.com/help/index.jsp
+
+[5] ARM Architecture Reference Manuals
+    http://infocenter.arm.com/help/index.jsp
+
+[6] ePAPR standard
+    https://www.power.org/documentation/epapr-version-1-1/