diff mbox series

[v3,3/3] dt-bindings: thermal: Add yaml bindings for thermal zones

Message ID 9c447186008ef2e3f4c3e712458dc0ddcd8a8b03.1585117436.git.amit.kucheria@linaro.org (mailing list archive)
State Superseded
Headers show
Series Convert thermal bindings to yaml | expand

Commit Message

Amit Kucheria March 25, 2020, 6:34 a.m. UTC
As part of moving the thermal bindings to YAML, split it up into 3
bindings: thermal sensors, cooling devices and thermal zones.

The thermal-zone binding is a software abstraction to capture the
properties of each zone - how often they should be checked, the
temperature thresholds (trips) at which mitigation actions need to be
taken and the level of mitigation needed at those thresholds.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
---
Changes since v2:
- Addressed review comment from Rob
- Added required properties for thermal-zones node
- Added select: true to thermal-cooling-devices.yaml
- Fixed up example to pass dt_binding_check

 .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
 1 file changed, 324 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml

Comments

Lukasz Luba March 25, 2020, 11:06 a.m. UTC | #1
On 3/25/20 6:34 AM, Amit Kucheria wrote:
> As part of moving the thermal bindings to YAML, split it up into 3
> bindings: thermal sensors, cooling devices and thermal zones.
> 
> The thermal-zone binding is a software abstraction to capture the
> properties of each zone - how often they should be checked, the
> temperature thresholds (trips) at which mitigation actions need to be
> taken and the level of mitigation needed at those thresholds.
> 
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
> Changes since v2:
> - Addressed review comment from Rob
> - Added required properties for thermal-zones node
> - Added select: true to thermal-cooling-devices.yaml
> - Fixed up example to pass dt_binding_check
> 
>   .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
>   1 file changed, 324 insertions(+)
>   create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> 
> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> new file mode 100644
> index 000000000000..5632304dcf62
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> @@ -0,0 +1,324 @@
> +# SPDX-License-Identifier: (GPL-2.0)
> +# Copyright 2020 Linaro Ltd.
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
> +$schema: http://devicetree.org/meta-schemas/base.yaml#
> +
> +title: Thermal zone binding
> +
> +maintainers:
> +  - Amit Kucheria <amitk@kernel.org>
> +
> +description: |
> +  Thermal management is achieved in devicetree by describing the sensor hardware
> +  and the software abstraction of cooling devices and thermal zones required to
> +  take appropriate action to mitigate thermal overloads.
> +
> +  The following node types are used to completely describe a thermal management
> +  system in devicetree:
> +   - thermal-sensor: device that measures temperature, has SoC-specific bindings
> +   - cooling-device: device used to dissipate heat either passively or actively
> +   - thermal-zones: a container of the following node types used to describe all
> +     thermal data for the platform
> +
> +  This binding describes the thermal-zones.
> +
> +  The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
> +  (temperature derivative over time) in two situations for a thermal zone:
> +    1. when passive cooling is activated (polling-delay-passive)
> +    2. when the zone just needs to be monitored (polling-delay) or when
> +       active cooling is activated.
> +
> +  The maximum dT/dt is highly bound to hardware power consumption and
> +  dissipation capability. The delays should be chosen to account for said
> +  max dT/dt, such that a device does not cross several trip boundaries
> +  unexpectedly between polls. Choosing the right polling delays shall avoid
> +  having the device in temperature ranges that may damage the silicon structures
> +  and reduce silicon lifetime.
> +
> +properties:
> +  $nodename:
> +    const: thermal-zones
> +    description:
> +      A /thermal-zones node is required in order to use the thermal framework to
> +      manage input from the various thermal zones in the system in order to
> +      mitigate thermal overload conditions. It does not represent a real device
> +      in the system, but acts as a container to link thermal sensor devices,

I would say 'thermal sensor device', since there is 1-to-1 mapping and
aggregating a few sensors inside one tz is not allowed (or I missed
some patches queuing).

> +      platform-data regarding temperature thresholds and the mitigation actions
> +      to take when the temperature crosses those thresholds.
> +
> +patternProperties:
> +  "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
> +    type: object
> +    description:
> +      Each thermal zone node contains information about how frequently it
> +      must be checked, the sensor responsible for reporting temperature for
> +      this zone, one sub-node containing the various trip points for this
> +      zone and one sub-node containing all the zone cooling-maps.
> +
> +    properties:
> +      polling-delay:
> +        $ref: /schemas/types.yaml#/definitions/uint32
> +        description:
> +          The maximum number of milliseconds to wait between polls when
> +          checking this thermal zone. Setting this to 0 disables the polling
> +          timers setup by the thermal framework and assumes that the thermal
> +          sensors in this zone support interrupts.
> +
> +      polling-delay-passive:
> +        $ref: /schemas/types.yaml#/definitions/uint32
> +        description:
> +          The maximum number of milliseconds to wait between polls when
> +          checking this thermal zone while doing passive cooling. Setting
> +          this to 0 disables the polling timers setup by the thermal
> +          framework and assumes that the thermal sensors in this zone
> +          support interrupts.
> +
> +      thermal-sensors:
> +        $ref: /schemas/types.yaml#/definitions/phandle-array
> +        description:
> +          A list of thermal sensor phandles and sensor specifiers used to
> +          monitor this thermal zone.

I don't know why it's not consistent with the actual code in
of-thermal.c, where there is even a comment stated:
/* For now, thermal framework supports only 1 sensor per zone */

I think this is the place where developers should be informed about
the limitation and not even try to put more sensors into the list.

> +
> +      trips:
> +        type: object
> +        description:
> +          This node describes a set of points in the temperature domain at
> +          which the thermal framework needs to takes action. The actions to

s/needs to takes/needs to take/

> +          be taken are defined in another node called cooling-maps.
> +
> +        patternProperties:
> +          "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
> +            type: object
> +
> +            properties:
> +              temperature:
> +                $ref: /schemas/types.yaml#/definitions/int32
> +                minimum: -273000
> +                maximum: 200000
> +                description:
> +                  An integer expressing the trip temperature in millicelsius.
> +
> +              hysteresis:
> +                $ref: /schemas/types.yaml#/definitions/uint32
> +                description:
> +                  An unsigned integer expressing the hysteresis delta with
> +                  respect to the trip temperature property above, also in
> +                  millicelsius.

This property is worth a bit longer description.

> +
> +              type:
> +                $ref: /schemas/types.yaml#/definitions/string
> +                enum:
> +                  - active   # enable active cooling e.g. fans
> +                  - passive  # enable passive cooling e.g. throttling cpu
> +                  - hot      # send notification to driver
> +                  - critical # send notification to driver, trigger shutdown
> +                description: |
> +                  There are four valid trip types: active, passive, hot,
> +                  critical.

[snip]

> +
> +    thermal-zones {
> +            cpu0-thermal {
> +                    polling-delay-passive = <250>;
> +                    polling-delay = <1000>;
> +
> +                    thermal-sensors = <&tsens0 1>;
> +
> +                    trips {
> +                            cpu0_alert0: trip-point0 {
> +                                    temperature = <90000>;
> +                                    hysteresis = <2000>;
> +                                    type = "passive";
> +                            };
> +
> +                            cpu0_alert1: trip-point1 {
> +                                    temperature = <95000>;
> +                                    hysteresis = <2000>;
> +                                    type = "passive";
> +                            };
> +
> +                            cpu0_crit: cpu_crit {
> +                                    temperature = <110000>;
> +                                    hysteresis = <1000>;
> +                                    type = "critical";
> +                            };
> +                    };
> +
> +                    cooling-maps {
> +                            map0 {
> +                                    trip = <&cpu0_alert0>;
> +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU1 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU2 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU3 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>;
> +                            };
> +
> +                            map1 {
> +                                    trip = <&cpu0_alert1>;
> +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU1 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU2 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU3 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>;

 From this two examples of handling cpu0_alert0 and cpu0_alert1 you
cannot conclude anything (if you don't understand thermal framework (and
probably IPA). As a simple example it would be better to put a comment
with a description and limit min, max to a specific OPP:

map0 {
     trip = <&cpu0_alert0>;
     /* Corresponds to 1400MHz in OPP table */
     cooling-device = <&CPU0 3 3>, <&CPU1 3 3>, <&CPU2 3 3>, <&CPU3 3 3>;
};

map1 {
     trip = <&cpu0_alert1>;
     /* Corresponds to 1000MHz in OPP table */
     cooling-device = <&CPU0 5 5>, <&CPU1 5 5>, <&CPU2 5 5>, <&CPU3 5 5>;
};

IMHO this kind of example would tell more to an avg driver developer.

Regards,
Lukasz
Amit Kucheria March 25, 2020, 3:42 p.m. UTC | #2
On Wed, Mar 25, 2020 at 4:36 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 3/25/20 6:34 AM, Amit Kucheria wrote:
> > As part of moving the thermal bindings to YAML, split it up into 3
> > bindings: thermal sensors, cooling devices and thermal zones.
> >
> > The thermal-zone binding is a software abstraction to capture the
> > properties of each zone - how often they should be checked, the
> > temperature thresholds (trips) at which mitigation actions need to be
> > taken and the level of mitigation needed at those thresholds.
> >
> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > ---
> > Changes since v2:
> > - Addressed review comment from Rob
> > - Added required properties for thermal-zones node
> > - Added select: true to thermal-cooling-devices.yaml
> > - Fixed up example to pass dt_binding_check
> >
> >   .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
> >   1 file changed, 324 insertions(+)
> >   create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > new file mode 100644
> > index 000000000000..5632304dcf62
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > @@ -0,0 +1,324 @@
> > +# SPDX-License-Identifier: (GPL-2.0)
> > +# Copyright 2020 Linaro Ltd.
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
> > +$schema: http://devicetree.org/meta-schemas/base.yaml#
> > +
> > +title: Thermal zone binding
> > +
> > +maintainers:
> > +  - Amit Kucheria <amitk@kernel.org>
> > +
> > +description: |
> > +  Thermal management is achieved in devicetree by describing the sensor hardware
> > +  and the software abstraction of cooling devices and thermal zones required to
> > +  take appropriate action to mitigate thermal overloads.
> > +
> > +  The following node types are used to completely describe a thermal management
> > +  system in devicetree:
> > +   - thermal-sensor: device that measures temperature, has SoC-specific bindings
> > +   - cooling-device: device used to dissipate heat either passively or actively
> > +   - thermal-zones: a container of the following node types used to describe all
> > +     thermal data for the platform
> > +
> > +  This binding describes the thermal-zones.
> > +
> > +  The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
> > +  (temperature derivative over time) in two situations for a thermal zone:
> > +    1. when passive cooling is activated (polling-delay-passive)
> > +    2. when the zone just needs to be monitored (polling-delay) or when
> > +       active cooling is activated.
> > +
> > +  The maximum dT/dt is highly bound to hardware power consumption and
> > +  dissipation capability. The delays should be chosen to account for said
> > +  max dT/dt, such that a device does not cross several trip boundaries
> > +  unexpectedly between polls. Choosing the right polling delays shall avoid
> > +  having the device in temperature ranges that may damage the silicon structures
> > +  and reduce silicon lifetime.
> > +
> > +properties:
> > +  $nodename:
> > +    const: thermal-zones
> > +    description:
> > +      A /thermal-zones node is required in order to use the thermal framework to
> > +      manage input from the various thermal zones in the system in order to
> > +      mitigate thermal overload conditions. It does not represent a real device
> > +      in the system, but acts as a container to link thermal sensor devices,
>
> I would say 'thermal sensor device', since there is 1-to-1 mapping and
> aggregating a few sensors inside one tz is not allowed (or I missed
> some patches queuing).

See below.

>
> > +      platform-data regarding temperature thresholds and the mitigation actions
> > +      to take when the temperature crosses those thresholds.
> > +
> > +patternProperties:
> > +  "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
> > +    type: object
> > +    description:
> > +      Each thermal zone node contains information about how frequently it
> > +      must be checked, the sensor responsible for reporting temperature for
> > +      this zone, one sub-node containing the various trip points for this
> > +      zone and one sub-node containing all the zone cooling-maps.
> > +
> > +    properties:
> > +      polling-delay:
> > +        $ref: /schemas/types.yaml#/definitions/uint32
> > +        description:
> > +          The maximum number of milliseconds to wait between polls when
> > +          checking this thermal zone. Setting this to 0 disables the polling
> > +          timers setup by the thermal framework and assumes that the thermal
> > +          sensors in this zone support interrupts.
> > +
> > +      polling-delay-passive:
> > +        $ref: /schemas/types.yaml#/definitions/uint32
> > +        description:
> > +          The maximum number of milliseconds to wait between polls when
> > +          checking this thermal zone while doing passive cooling. Setting
> > +          this to 0 disables the polling timers setup by the thermal
> > +          framework and assumes that the thermal sensors in this zone
> > +          support interrupts.
> > +
> > +      thermal-sensors:
> > +        $ref: /schemas/types.yaml#/definitions/phandle-array
> > +        description:
> > +          A list of thermal sensor phandles and sensor specifiers used to
> > +          monitor this thermal zone.
>
> I don't know why it's not consistent with the actual code in
> of-thermal.c, where there is even a comment stated:
> /* For now, thermal framework supports only 1 sensor per zone */
>
> I think this is the place where developers should be informed about
> the limitation and not even try to put more sensors into the list.

That is a good point. I'm currently "porting" the existing binding as
described in thermal.txt to yaml. If you look at some of the example
(c) in there, the bindings allow many sensors to a zone mapping but
the thermal core doesn't implement that functionality.

So should we fix the core code or change the bindings? Thoughts - Rob,
Daniel, Rui?

> > +
> > +      trips:
> > +        type: object
> > +        description:
> > +          This node describes a set of points in the temperature domain at
> > +          which the thermal framework needs to takes action. The actions to
>
> s/needs to takes/needs to take/

Will fix.

> > +          be taken are defined in another node called cooling-maps.
> > +
> > +        patternProperties:
> > +          "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
> > +            type: object
> > +
> > +            properties:
> > +              temperature:
> > +                $ref: /schemas/types.yaml#/definitions/int32
> > +                minimum: -273000
> > +                maximum: 200000
> > +                description:
> > +                  An integer expressing the trip temperature in millicelsius.
> > +
> > +              hysteresis:
> > +                $ref: /schemas/types.yaml#/definitions/uint32
> > +                description:
> > +                  An unsigned integer expressing the hysteresis delta with
> > +                  respect to the trip temperature property above, also in
> > +                  millicelsius.
>
> This property is worth a bit longer description.

Will improve the description.

> > +
> > +              type:
> > +                $ref: /schemas/types.yaml#/definitions/string
> > +                enum:
> > +                  - active   # enable active cooling e.g. fans
> > +                  - passive  # enable passive cooling e.g. throttling cpu
> > +                  - hot      # send notification to driver
> > +                  - critical # send notification to driver, trigger shutdown
> > +                description: |
> > +                  There are four valid trip types: active, passive, hot,
> > +                  critical.
>
> [snip]
>
> > +
> > +    thermal-zones {
> > +            cpu0-thermal {
> > +                    polling-delay-passive = <250>;
> > +                    polling-delay = <1000>;
> > +
> > +                    thermal-sensors = <&tsens0 1>;
> > +
> > +                    trips {
> > +                            cpu0_alert0: trip-point0 {
> > +                                    temperature = <90000>;
> > +                                    hysteresis = <2000>;
> > +                                    type = "passive";
> > +                            };
> > +
> > +                            cpu0_alert1: trip-point1 {
> > +                                    temperature = <95000>;
> > +                                    hysteresis = <2000>;
> > +                                    type = "passive";
> > +                            };
> > +
> > +                            cpu0_crit: cpu_crit {
> > +                                    temperature = <110000>;
> > +                                    hysteresis = <1000>;
> > +                                    type = "critical";
> > +                            };
> > +                    };
> > +
> > +                    cooling-maps {
> > +                            map0 {
> > +                                    trip = <&cpu0_alert0>;
> > +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU1 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU2 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU3 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>;
> > +                            };
> > +
> > +                            map1 {
> > +                                    trip = <&cpu0_alert1>;
> > +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU1 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU2 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU3 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>;
>
>  From this two examples of handling cpu0_alert0 and cpu0_alert1 you
> cannot conclude anything (if you don't understand thermal framework (and
> probably IPA). As a simple example it would be better to put a comment
> with a description and limit min, max to a specific OPP:
>
> map0 {
>      trip = <&cpu0_alert0>;
>      /* Corresponds to 1400MHz in OPP table */
>      cooling-device = <&CPU0 3 3>, <&CPU1 3 3>, <&CPU2 3 3>, <&CPU3 3 3>;
> };
>
> map1 {
>      trip = <&cpu0_alert1>;
>      /* Corresponds to 1000MHz in OPP table */
>      cooling-device = <&CPU0 5 5>, <&CPU1 5 5>, <&CPU2 5 5>, <&CPU3 5 5>;
> };
>
> IMHO this kind of example would tell more to an avg driver developer.

Will fix.

Thanks for the review.

Regards,
Amit
Amit Kucheria March 30, 2020, 10:34 a.m. UTC | #3
On Wed, Mar 25, 2020 at 9:12 PM Amit Kucheria <amit.kucheria@linaro.org> wrote:
>
> On Wed, Mar 25, 2020 at 4:36 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >
> >
> >
> > On 3/25/20 6:34 AM, Amit Kucheria wrote:
> > > As part of moving the thermal bindings to YAML, split it up into 3
> > > bindings: thermal sensors, cooling devices and thermal zones.
> > >
> > > The thermal-zone binding is a software abstraction to capture the
> > > properties of each zone - how often they should be checked, the
> > > temperature thresholds (trips) at which mitigation actions need to be
> > > taken and the level of mitigation needed at those thresholds.
> > >
> > > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > > ---
> > > Changes since v2:
> > > - Addressed review comment from Rob
> > > - Added required properties for thermal-zones node
> > > - Added select: true to thermal-cooling-devices.yaml
> > > - Fixed up example to pass dt_binding_check
> > >
> > >   .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
> > >   1 file changed, 324 insertions(+)
> > >   create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > >
> > > diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > > new file mode 100644
> > > index 000000000000..5632304dcf62
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > > @@ -0,0 +1,324 @@
> > > +# SPDX-License-Identifier: (GPL-2.0)
> > > +# Copyright 2020 Linaro Ltd.
> > > +%YAML 1.2
> > > +---
> > > +$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
> > > +$schema: http://devicetree.org/meta-schemas/base.yaml#
> > > +
> > > +title: Thermal zone binding
> > > +
> > > +maintainers:
> > > +  - Amit Kucheria <amitk@kernel.org>
> > > +
> > > +description: |
> > > +  Thermal management is achieved in devicetree by describing the sensor hardware
> > > +  and the software abstraction of cooling devices and thermal zones required to
> > > +  take appropriate action to mitigate thermal overloads.
> > > +
> > > +  The following node types are used to completely describe a thermal management
> > > +  system in devicetree:
> > > +   - thermal-sensor: device that measures temperature, has SoC-specific bindings
> > > +   - cooling-device: device used to dissipate heat either passively or actively
> > > +   - thermal-zones: a container of the following node types used to describe all
> > > +     thermal data for the platform
> > > +
> > > +  This binding describes the thermal-zones.
> > > +
> > > +  The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
> > > +  (temperature derivative over time) in two situations for a thermal zone:
> > > +    1. when passive cooling is activated (polling-delay-passive)
> > > +    2. when the zone just needs to be monitored (polling-delay) or when
> > > +       active cooling is activated.
> > > +
> > > +  The maximum dT/dt is highly bound to hardware power consumption and
> > > +  dissipation capability. The delays should be chosen to account for said
> > > +  max dT/dt, such that a device does not cross several trip boundaries
> > > +  unexpectedly between polls. Choosing the right polling delays shall avoid
> > > +  having the device in temperature ranges that may damage the silicon structures
> > > +  and reduce silicon lifetime.
> > > +
> > > +properties:
> > > +  $nodename:
> > > +    const: thermal-zones
> > > +    description:
> > > +      A /thermal-zones node is required in order to use the thermal framework to
> > > +      manage input from the various thermal zones in the system in order to
> > > +      mitigate thermal overload conditions. It does not represent a real device
> > > +      in the system, but acts as a container to link thermal sensor devices,
> >
> > I would say 'thermal sensor device', since there is 1-to-1 mapping and
> > aggregating a few sensors inside one tz is not allowed (or I missed
> > some patches queuing).
>
> See below.
>
> >
> > > +      platform-data regarding temperature thresholds and the mitigation actions
> > > +      to take when the temperature crosses those thresholds.
> > > +
> > > +patternProperties:
> > > +  "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
> > > +    type: object
> > > +    description:
> > > +      Each thermal zone node contains information about how frequently it
> > > +      must be checked, the sensor responsible for reporting temperature for
> > > +      this zone, one sub-node containing the various trip points for this
> > > +      zone and one sub-node containing all the zone cooling-maps.
> > > +
> > > +    properties:
> > > +      polling-delay:
> > > +        $ref: /schemas/types.yaml#/definitions/uint32
> > > +        description:
> > > +          The maximum number of milliseconds to wait between polls when
> > > +          checking this thermal zone. Setting this to 0 disables the polling
> > > +          timers setup by the thermal framework and assumes that the thermal
> > > +          sensors in this zone support interrupts.
> > > +
> > > +      polling-delay-passive:
> > > +        $ref: /schemas/types.yaml#/definitions/uint32
> > > +        description:
> > > +          The maximum number of milliseconds to wait between polls when
> > > +          checking this thermal zone while doing passive cooling. Setting
> > > +          this to 0 disables the polling timers setup by the thermal
> > > +          framework and assumes that the thermal sensors in this zone
> > > +          support interrupts.
> > > +
> > > +      thermal-sensors:
> > > +        $ref: /schemas/types.yaml#/definitions/phandle-array
> > > +        description:
> > > +          A list of thermal sensor phandles and sensor specifiers used to
> > > +          monitor this thermal zone.
> >
> > I don't know why it's not consistent with the actual code in
> > of-thermal.c, where there is even a comment stated:
> > /* For now, thermal framework supports only 1 sensor per zone */
> >
> > I think this is the place where developers should be informed about
> > the limitation and not even try to put more sensors into the list.
>
> That is a good point. I'm currently "porting" the existing binding as
> described in thermal.txt to yaml. If you look at some of the example
> (c) in there, the bindings allow many sensors to a zone mapping but
> the thermal core doesn't implement that functionality.
>
> So should we fix the core code or change the bindings? Thoughts - Rob,
> Daniel, Rui?

Rob, Daniel: Any comments? We don't have any concerns for Linux
backward compatibility since multiple sensors per zone isn't used
anywhere. But asking since bindings are supposed to be OS-agnostic.

> > > +
> > > +      trips:
> > > +        type: object
> > > +        description:
> > > +          This node describes a set of points in the temperature domain at
> > > +          which the thermal framework needs to takes action. The actions to
> >
> > s/needs to takes/needs to take/
>
> Will fix.
>
> > > +          be taken are defined in another node called cooling-maps.
> > > +
> > > +        patternProperties:
> > > +          "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
> > > +            type: object
> > > +
> > > +            properties:
> > > +              temperature:
> > > +                $ref: /schemas/types.yaml#/definitions/int32
> > > +                minimum: -273000
> > > +                maximum: 200000
> > > +                description:
> > > +                  An integer expressing the trip temperature in millicelsius.
> > > +
> > > +              hysteresis:
> > > +                $ref: /schemas/types.yaml#/definitions/uint32
> > > +                description:
> > > +                  An unsigned integer expressing the hysteresis delta with
> > > +                  respect to the trip temperature property above, also in
> > > +                  millicelsius.
> >
> > This property is worth a bit longer description.
>
> Will improve the description.
>
> > > +
> > > +              type:
> > > +                $ref: /schemas/types.yaml#/definitions/string
> > > +                enum:
> > > +                  - active   # enable active cooling e.g. fans
> > > +                  - passive  # enable passive cooling e.g. throttling cpu
> > > +                  - hot      # send notification to driver
> > > +                  - critical # send notification to driver, trigger shutdown
> > > +                description: |
> > > +                  There are four valid trip types: active, passive, hot,
> > > +                  critical.
> >
> > [snip]
> >
> > > +
> > > +    thermal-zones {
> > > +            cpu0-thermal {
> > > +                    polling-delay-passive = <250>;
> > > +                    polling-delay = <1000>;
> > > +
> > > +                    thermal-sensors = <&tsens0 1>;
> > > +
> > > +                    trips {
> > > +                            cpu0_alert0: trip-point0 {
> > > +                                    temperature = <90000>;
> > > +                                    hysteresis = <2000>;
> > > +                                    type = "passive";
> > > +                            };
> > > +
> > > +                            cpu0_alert1: trip-point1 {
> > > +                                    temperature = <95000>;
> > > +                                    hysteresis = <2000>;
> > > +                                    type = "passive";
> > > +                            };
> > > +
> > > +                            cpu0_crit: cpu_crit {
> > > +                                    temperature = <110000>;
> > > +                                    hysteresis = <1000>;
> > > +                                    type = "critical";
> > > +                            };
> > > +                    };
> > > +
> > > +                    cooling-maps {
> > > +                            map0 {
> > > +                                    trip = <&cpu0_alert0>;
> > > +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>,
> > > +                                                     <&CPU1 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>,
> > > +                                                     <&CPU2 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>,
> > > +                                                     <&CPU3 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>;
> > > +                            };
> > > +
> > > +                            map1 {
> > > +                                    trip = <&cpu0_alert1>;
> > > +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>,
> > > +                                                     <&CPU1 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>,
> > > +                                                     <&CPU2 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>,
> > > +                                                     <&CPU3 THERMAL_NO_LIMIT
> > > +                                                            THERMAL_NO_LIMIT>;
> >
> >  From this two examples of handling cpu0_alert0 and cpu0_alert1 you
> > cannot conclude anything (if you don't understand thermal framework (and
> > probably IPA). As a simple example it would be better to put a comment
> > with a description and limit min, max to a specific OPP:
> >
> > map0 {
> >      trip = <&cpu0_alert0>;
> >      /* Corresponds to 1400MHz in OPP table */
> >      cooling-device = <&CPU0 3 3>, <&CPU1 3 3>, <&CPU2 3 3>, <&CPU3 3 3>;
> > };
> >
> > map1 {
> >      trip = <&cpu0_alert1>;
> >      /* Corresponds to 1000MHz in OPP table */
> >      cooling-device = <&CPU0 5 5>, <&CPU1 5 5>, <&CPU2 5 5>, <&CPU3 5 5>;
> > };
> >
> > IMHO this kind of example would tell more to an avg driver developer.
>
> Will fix.
>
> Thanks for the review.
>
> Regards,
> Amit
Daniel Lezcano March 30, 2020, 1:07 p.m. UTC | #4
Hi Amit,

On 30/03/2020 12:34, Amit Kucheria wrote:

[ ... ]

>>> I don't know why it's not consistent with the actual code in
>>> of-thermal.c, where there is even a comment stated: /* For now,
>>> thermal framework supports only 1 sensor per zone */
>>>
>>> I think this is the place where developers should be informed
>>> about the limitation and not even try to put more sensors into
>>> the list.
>>
>> That is a good point. I'm currently "porting" the existing
>> binding as described in thermal.txt to yaml. If you look at some
>> of the example (c) in there, the bindings allow many sensors to a
>> zone mapping but the thermal core doesn't implement that
>> functionality.
>>
>> So should we fix the core code or change the bindings? Thoughts -
>> Rob, Daniel, Rui?
>
> Rob, Daniel: Any comments? We don't have any concerns for Linux
> backward compatibility since multiple sensors per zone isn't used
> anywhere. But asking since bindings are supposed to be
> OS-agnostic.

IMO, we should remove it as it is not used anywhere.

We still have to decide how we aggregate multiple sensors.
Rob Herring (Arm) March 31, 2020, 9:13 p.m. UTC | #5
On Mon, Mar 30, 2020 at 03:07:53PM +0200, Daniel Lezcano wrote:
> 
> Hi Amit,
> 
> On 30/03/2020 12:34, Amit Kucheria wrote:
> 
> [ ... ]
> 
> >>> I don't know why it's not consistent with the actual code in
> >>> of-thermal.c, where there is even a comment stated: /* For now,
> >>> thermal framework supports only 1 sensor per zone */
> >>>
> >>> I think this is the place where developers should be informed
> >>> about the limitation and not even try to put more sensors into
> >>> the list.
> >>
> >> That is a good point. I'm currently "porting" the existing
> >> binding as described in thermal.txt to yaml. If you look at some
> >> of the example (c) in there, the bindings allow many sensors to a
> >> zone mapping but the thermal core doesn't implement that
> >> functionality.
> >>
> >> So should we fix the core code or change the bindings? Thoughts -
> >> Rob, Daniel, Rui?
> >
> > Rob, Daniel: Any comments? We don't have any concerns for Linux
> > backward compatibility since multiple sensors per zone isn't used
> > anywhere. But asking since bindings are supposed to be
> > OS-agnostic.
> 
> IMO, we should remove it as it is not used anywhere.
> 
> We still have to decide how we aggregate multiple sensors.

The schema only needs to pass what currently exists (assuming no 
errors), so extending it later is fine with me.

Rob
Rob Herring (Arm) March 31, 2020, 9:20 p.m. UTC | #6
On Wed, 25 Mar 2020 12:04:54 +0530, Amit Kucheria wrote:
> As part of moving the thermal bindings to YAML, split it up into 3
> bindings: thermal sensors, cooling devices and thermal zones.
> 
> The thermal-zone binding is a software abstraction to capture the
> properties of each zone - how often they should be checked, the
> temperature thresholds (trips) at which mitigation actions need to be
> taken and the level of mitigation needed at those thresholds.
> 
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
> Changes since v2:
> - Addressed review comment from Rob
> - Added required properties for thermal-zones node
> - Added select: true to thermal-cooling-devices.yaml
> - Fixed up example to pass dt_binding_check
> 
>  .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
>  1 file changed, 324 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> 

The change discussed doesn't affect the schema other than adding a 
'maxItems: 1', so:

Reviewed-by: Rob Herring <robh@kernel.org>
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
new file mode 100644
index 000000000000..5632304dcf62
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
@@ -0,0 +1,324 @@ 
+# SPDX-License-Identifier: (GPL-2.0)
+# Copyright 2020 Linaro Ltd.
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
+$schema: http://devicetree.org/meta-schemas/base.yaml#
+
+title: Thermal zone binding
+
+maintainers:
+  - Amit Kucheria <amitk@kernel.org>
+
+description: |
+  Thermal management is achieved in devicetree by describing the sensor hardware
+  and the software abstraction of cooling devices and thermal zones required to
+  take appropriate action to mitigate thermal overloads.
+
+  The following node types are used to completely describe a thermal management
+  system in devicetree:
+   - thermal-sensor: device that measures temperature, has SoC-specific bindings
+   - cooling-device: device used to dissipate heat either passively or actively
+   - thermal-zones: a container of the following node types used to describe all
+     thermal data for the platform
+
+  This binding describes the thermal-zones.
+
+  The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
+  (temperature derivative over time) in two situations for a thermal zone:
+    1. when passive cooling is activated (polling-delay-passive)
+    2. when the zone just needs to be monitored (polling-delay) or when
+       active cooling is activated.
+
+  The maximum dT/dt is highly bound to hardware power consumption and
+  dissipation capability. The delays should be chosen to account for said
+  max dT/dt, such that a device does not cross several trip boundaries
+  unexpectedly between polls. Choosing the right polling delays shall avoid
+  having the device in temperature ranges that may damage the silicon structures
+  and reduce silicon lifetime.
+
+properties:
+  $nodename:
+    const: thermal-zones
+    description:
+      A /thermal-zones node is required in order to use the thermal framework to
+      manage input from the various thermal zones in the system in order to
+      mitigate thermal overload conditions. It does not represent a real device
+      in the system, but acts as a container to link thermal sensor devices,
+      platform-data regarding temperature thresholds and the mitigation actions
+      to take when the temperature crosses those thresholds.
+
+patternProperties:
+  "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
+    type: object
+    description:
+      Each thermal zone node contains information about how frequently it
+      must be checked, the sensor responsible for reporting temperature for
+      this zone, one sub-node containing the various trip points for this
+      zone and one sub-node containing all the zone cooling-maps.
+
+    properties:
+      polling-delay:
+        $ref: /schemas/types.yaml#/definitions/uint32
+        description:
+          The maximum number of milliseconds to wait between polls when
+          checking this thermal zone. Setting this to 0 disables the polling
+          timers setup by the thermal framework and assumes that the thermal
+          sensors in this zone support interrupts.
+
+      polling-delay-passive:
+        $ref: /schemas/types.yaml#/definitions/uint32
+        description:
+          The maximum number of milliseconds to wait between polls when
+          checking this thermal zone while doing passive cooling. Setting
+          this to 0 disables the polling timers setup by the thermal
+          framework and assumes that the thermal sensors in this zone
+          support interrupts.
+
+      thermal-sensors:
+        $ref: /schemas/types.yaml#/definitions/phandle-array
+        description:
+          A list of thermal sensor phandles and sensor specifiers used to
+          monitor this thermal zone.
+
+      trips:
+        type: object
+        description:
+          This node describes a set of points in the temperature domain at
+          which the thermal framework needs to takes action. The actions to
+          be taken are defined in another node called cooling-maps.
+
+        patternProperties:
+          "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
+            type: object
+
+            properties:
+              temperature:
+                $ref: /schemas/types.yaml#/definitions/int32
+                minimum: -273000
+                maximum: 200000
+                description:
+                  An integer expressing the trip temperature in millicelsius.
+
+              hysteresis:
+                $ref: /schemas/types.yaml#/definitions/uint32
+                description:
+                  An unsigned integer expressing the hysteresis delta with
+                  respect to the trip temperature property above, also in
+                  millicelsius.
+
+              type:
+                $ref: /schemas/types.yaml#/definitions/string
+                enum:
+                  - active   # enable active cooling e.g. fans
+                  - passive  # enable passive cooling e.g. throttling cpu
+                  - hot      # send notification to driver
+                  - critical # send notification to driver, trigger shutdown
+                description: |
+                  There are four valid trip types: active, passive, hot,
+                  critical.
+
+                  The critical trip type is used to set the maximum
+                  temperature threshold above which the HW becomes
+                  unstable and underlying firmware might even trigger a
+                  reboot. Hitting the critical threshold triggers a system
+                  shutdown.
+
+                  The hot trip type can be used to send a notification to
+                  the thermal driver (if a .notify callback is registered).
+                  The action to be taken is left to the driver.
+
+                  The passive trip type can be used to slow down HW e.g. run
+                  the CPU, GPU, bus at a lower frequency.
+
+                  The active trip type can be used to control other HW to
+                  help in cooling e.g. fans can be sped up or slowed down
+
+            required:
+              - temperature
+              - hysteresis
+              - type
+            additionalProperties: false
+
+        additionalProperties: false
+
+      cooling-maps:
+        type: object
+        description:
+          This node describes the action to be taken when a thermal zone
+          crosses one of the temperature thresholds described in the trips
+          node. The action takes the form of a mapping relation between a
+          trip and the target cooling device state.
+
+        patternProperties:
+          "^map[-a-zA-Z0-9]*$":
+            type: object
+
+            properties:
+              trip:
+                $ref: /schemas/types.yaml#/definitions/phandle
+                description:
+                  A phandle of a trip point node within this thermal zone.
+
+              cooling-device:
+                $ref: /schemas/types.yaml#/definitions/phandle-array
+                description:
+                  A list of cooling device phandles along with the minimum
+                  and maximum cooling state specifiers for each cooling
+                  device. Using the THERMAL_NO_LIMIT (-1UL) constant in the
+                  cooling-device phandle limit specifier lets the framework
+                  use the minimum and maximum cooling state for that cooling
+                  device automatically.
+
+              contribution:
+                $ref: /schemas/types.yaml#/definitions/uint32
+                minimum: 0
+                maximum: 100
+                description:
+                  The contribution of the cooling devices at the trip
+                  temperature, both referenced in this map, to this thermal
+                  zone as a percentage.
+
+            required:
+              - trip
+              - cooling-device
+            additionalProperties: false
+
+    required:
+      - polling-delay
+      - polling-delay-passive
+      - thermal-sensors
+      - trips
+    additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+    #include <dt-bindings/thermal/thermal.h>
+
+    // Example 1: SDM845 TSENS
+    soc: soc@0 {
+            #address-cells = <2>;
+            #size-cells = <2>;
+
+            /* ... */
+
+            tsens0: thermal-sensor@c263000 {
+                    compatible = "qcom,sdm845-tsens", "qcom,tsens-v2";
+                    reg = <0 0x0c263000 0 0x1ff>, /* TM */
+                          <0 0x0c222000 0 0x1ff>; /* SROT */
+                    #qcom,sensors = <13>;
+                    interrupts = <GIC_SPI 506 IRQ_TYPE_LEVEL_HIGH>,
+                                 <GIC_SPI 508 IRQ_TYPE_LEVEL_HIGH>;
+                    interrupt-names = "uplow", "critical";
+                    #thermal-sensor-cells = <1>;
+            };
+
+            tsens1: thermal-sensor@c265000 {
+                    compatible = "qcom,sdm845-tsens", "qcom,tsens-v2";
+                    reg = <0 0x0c265000 0 0x1ff>, /* TM */
+                          <0 0x0c223000 0 0x1ff>; /* SROT */
+                    #qcom,sensors = <8>;
+                    interrupts = <GIC_SPI 507 IRQ_TYPE_LEVEL_HIGH>,
+                                 <GIC_SPI 509 IRQ_TYPE_LEVEL_HIGH>;
+                    interrupt-names = "uplow", "critical";
+                    #thermal-sensor-cells = <1>;
+            };
+    };
+
+    /* ... */
+
+    thermal-zones {
+            cpu0-thermal {
+                    polling-delay-passive = <250>;
+                    polling-delay = <1000>;
+
+                    thermal-sensors = <&tsens0 1>;
+
+                    trips {
+                            cpu0_alert0: trip-point0 {
+                                    temperature = <90000>;
+                                    hysteresis = <2000>;
+                                    type = "passive";
+                            };
+
+                            cpu0_alert1: trip-point1 {
+                                    temperature = <95000>;
+                                    hysteresis = <2000>;
+                                    type = "passive";
+                            };
+
+                            cpu0_crit: cpu_crit {
+                                    temperature = <110000>;
+                                    hysteresis = <1000>;
+                                    type = "critical";
+                            };
+                    };
+
+                    cooling-maps {
+                            map0 {
+                                    trip = <&cpu0_alert0>;
+                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>,
+                                                     <&CPU1 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>,
+                                                     <&CPU2 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>,
+                                                     <&CPU3 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>;
+                            };
+
+                            map1 {
+                                    trip = <&cpu0_alert1>;
+                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>,
+                                                     <&CPU1 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>,
+                                                     <&CPU2 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>,
+                                                     <&CPU3 THERMAL_NO_LIMIT
+                                                            THERMAL_NO_LIMIT>;
+                            };
+                    };
+            };
+
+            /* ... */
+
+            cluster0-thermal {
+                    polling-delay-passive = <250>;
+                    polling-delay = <1000>;
+
+                    thermal-sensors = <&tsens0 5>;
+
+                    trips {
+                            cluster0_alert0: trip-point0 {
+                                    temperature = <90000>;
+                                    hysteresis = <2000>;
+                                    type = "hot";
+                            };
+                            cluster0_crit: cluster0_crit {
+                                    temperature = <110000>;
+                                    hysteresis = <2000>;
+                                    type = "critical";
+                            };
+                    };
+            };
+
+            /* ... */
+
+            gpu-top-thermal {
+                    polling-delay-passive = <250>;
+                    polling-delay = <1000>;
+
+                    thermal-sensors = <&tsens0 11>;
+
+                    trips {
+                            gpu1_alert0: trip-point0 {
+                                    temperature = <90000>;
+                                    hysteresis = <2000>;
+                                    type = "hot";
+                            };
+                    };
+            };
+    };
+...