diff mbox

[PATCHv6,02/16] drivers: thermal: introduce device tree parser

Message ID 1379540136-28378-1-git-send-email-eduardo.valentin@ti.com (mailing list archive)
State Superseded, archived
Delegated to: Zhang Rui
Headers show

Commit Message

Eduardo Valentin Sept. 18, 2013, 9:35 p.m. UTC
This patch introduces a device tree bindings for
describing the hardware thermal behavior and limits.
Also a parser to read and interpret the data and feed
it in the thermal framework is presented.

This patch introduces a thermal data parser for device
tree. The parsed data is used to build thermal zones
and thermal binding parameters. The output data
can then be used to deploy thermal policies.

This patch adds also documentation regarding this
API and how to define tree nodes to use
this infrastructure.

Note that, in order to be able to have control
on the sensor registration on the DT thermal zone,
it was required to allow changing the thermal zone
.get_temp callback. For this reason, this patch
also removes the 'const' modifier from the .ops
field of thermal zone devices.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
---
 .../devicetree/bindings/thermal/thermal.txt        | 498 +++++++++++++
 drivers/thermal/Kconfig                            |  13 +
 drivers/thermal/Makefile                           |   1 +
 drivers/thermal/of-thermal.c                       | 775 +++++++++++++++++++++
 drivers/thermal/thermal_core.c                     |   9 +-
 drivers/thermal/thermal_core.h                     |   9 +
 include/dt-bindings/thermal/thermal.h              |  27 +
 include/linux/thermal.h                            |  28 +-
 8 files changed, 1357 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
 create mode 100644 drivers/thermal/of-thermal.c
 create mode 100644 include/dt-bindings/thermal/thermal.h

---

Hello all,

Thanks Joe Perches for the effort of reviewing this code, I really appreciate it.

Here is a version with a refactored init function without leaks in the failure
patch. I also added a couple of comments to better understand the intention of
that function. Besides, when there are no memory available, the function
rolls back what ever thermal zones have been added.

Thanks,

Eduardo

Comments

Mark Rutland Sept. 23, 2013, 10:40 a.m. UTC | #1
Hi Eduardo,

Apologies for having taken so long to get back you on this.

I have several comments on the binding and the way it's parsed.

On Wed, Sep 18, 2013 at 10:35:36PM +0100, Eduardo Valentin wrote:
> This patch introduces a device tree bindings for
> describing the hardware thermal behavior and limits.
> Also a parser to read and interpret the data and feed
> it in the thermal framework is presented.
> 
> This patch introduces a thermal data parser for device
> tree. The parsed data is used to build thermal zones
> and thermal binding parameters. The output data
> can then be used to deploy thermal policies.
> 
> This patch adds also documentation regarding this
> API and how to define tree nodes to use
> this infrastructure.
> 
> Note that, in order to be able to have control
> on the sensor registration on the DT thermal zone,
> it was required to allow changing the thermal zone
> .get_temp callback. For this reason, this patch
> also removes the 'const' modifier from the .ops
> field of thermal zone devices.
> 
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
> ---
>  .../devicetree/bindings/thermal/thermal.txt        | 498 +++++++++++++
>  drivers/thermal/Kconfig                            |  13 +
>  drivers/thermal/Makefile                           |   1 +
>  drivers/thermal/of-thermal.c                       | 775 +++++++++++++++++++++
>  drivers/thermal/thermal_core.c                     |   9 +-
>  drivers/thermal/thermal_core.h                     |   9 +
>  include/dt-bindings/thermal/thermal.h              |  27 +
>  include/linux/thermal.h                            |  28 +-
>  8 files changed, 1357 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
>  create mode 100644 drivers/thermal/of-thermal.c
>  create mode 100644 include/dt-bindings/thermal/thermal.h
> 
> ---
> 
> Hello all,
> 
> Thanks Joe Perches for the effort of reviewing this code, I really appreciate it.
> 
> Here is a version with a refactored init function without leaks in the failure
> patch. I also added a couple of comments to better understand the intention of
> that function. Besides, when there are no memory available, the function
> rolls back what ever thermal zones have been added.
> 
> Thanks,
> 
> Eduardo
> 
> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
> new file mode 100644
> index 0000000..6664533
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
> @@ -0,0 +1,498 @@
> +* Thermal Framework Device Tree descriptor
> +
> +Generic binding to provide a way of defining hardware thermal
> +structure using device tree. A thermal structure includes thermal
> +zones and their components, such as trip points, polling intervals,
> +sensors and cooling devices binding descriptors.
> +
> +The target of device tree thermal descriptors is to describe only
> +the hardware thermal aspects, not how the system must control or which
> +algorithm or policy must be taken in place.
> +
> +There are five types of nodes involved to describe thermal bindings:
> +- sensors: used to describe the device source of temperature sensing;
> +- cooling devices: used to describe devices source of power dissipation control;
> +- trip points: used to describe points in temperature domain defined to
> +make the system aware of hardware limits;
> +- cooling attachments: used to describe links between trip points and
> +cooling devices;

I think "attachments" sounds a bit odd, though I don't have a better
naming suggestion.

> +- thermal zones: used to describe thermal data within the hardware;
> +
> +It follows a description of each type of these device tree nodes.
> +
> +* Sensor devices
> +
> +Sensor devices are nodes providing temperature sensing capabilities on thermal
> +zones. Typical devices are I2C ADC converters and bandgaps. Theses are nodes
> +providing temperature data to thermal zones. Temperature sensor devices may
> +control one or more internal sensors.
> +
> +Required property:
> +- #sensor-cells:       Used to provide sensor device specific information
> +                       while referring to it. Must be at least 1, in order
> +                       to identify uniquely the sensor instances within
> +                       the IC. See thermal zone binding for more details
> +                       on how consumers refer to sensor devices.

I don't see why this needs to be at least one cell -- if an IC only has
one temperature sensor it should be fine for this to be zero and have no
ambiguity.

It may make sense to call these thermal sensor devices, there are plenty
of other sensors we might want to describe in the dt for other purposes,
and we can impose some consistency if we remove the ambiguity.

> +
> +* Cooling device nodes
> +
> +Cooling devices are nodes providing control on power dissipation. There
> +are essentially two ways to provide control on power dissipation. First
> +is by means of regulating device performance, which is known as passive
> +cooling. Second is by means of activating devices in order to remove
> +the dissipated heat, which is known as active cooling, e.g. regulating
> +fan speeds. In both cases, cooling devices shall have a way to determine
> +the level of cooling.
> +
> +Required property:
> +- cooling-min-level:   A unsigned integer indicating the smallest
> +                       cooling level accepted. Typically 0.
> +- cooling-max-level:   An unsigned integer indicating the largest
> +                       cooling level accepted.

I'm not sure what a "cooling level" means. It seems a bit abstract. Is
this binding specific?

How does this relate to cooling-cells? These seem to be a fixed size.

> +- #cooling-cells:      Used to provide cooling device specific information
> +                       while referring to it. Must be at least 2, in order
> +                       to specify minimum and maximum cooling level used
> +                       in the reference. See Cooling device attachments section
> +                       below for more details on how consumers refer to
> +                       cooling devices.

Are these cooling cells always expected to cover min and max, and are
min and max always expected to take the same number of cells? The above
cooling-*-level properties imply they each take up one cell.

Does #cooling-cells = <3> make sense?. If this covers additional
information (e.g. which fan on a multi-fan controller), how does the OS
determine which cells are min and max, and how do these relate to
cooling-level-min and cooling-level-max?

> +
> +* Trip points
> +
> +The trip node is a node to describe a point in the temperature domain
> +in which the system takes an action. This node describes just the point,
> +not the action.

I'm still not sure on this, as it sounds like embedding policy into the
DT, rather than a description of the hardware. I realise we need to
prevent devices burning out, so describing the max acceptable
temperature and possibly a preferred range makes sense, but do we need
this level of flexibility? Maybe we do.

> +
> +Required properties:
> +- temperature:         the trip temperature level, in milliCelsius.

Preferably, s/milliCelsius/millicelsius/ over the document.

Given that we have signed values elsewhere, is this signed or unsigned?

> +- hysteresis:          a (low) hysteresis value on 'temperature'. This is a
> +                       relative value, in milliCelsius.
> +- type:                        the trip type. Here is the type mapping:
> +       THERMAL_TRIP_ACTIVE     0:      A trip point to enable active cooling
> +       THERMAL_TRIP_PASSIVE    1:      A trip point to enable passive cooling
> +       THERMAL_TRIP_HOT        2:      A trip point to notify emergency
> +       THERMAL_TRIP_CRITICAL   3:      Hardware not reliable.
> +
> +Refer to include/dt-bindings/thermal/thermal.h for definition of these consts.

Why not use a string for this? We do so for other type parameters in
other bindings (see dr_mode in
Documentation/devicetree/bindings/usb/generic.txt, phy-mode for ethernet
PHYs, etc).

> +
> +* Cooling device attachments
> +
> +The cooling device attachments node is a node to describe how cooling devices
> +get assigned to trip points of the zone. The cooling devices are expected
> +to be loaded in the target system.
> +
> +Required properties:
> +- cooling-device:      A phandle of a cooling device with its parameters,
> +                       referring to which cooling device is used in this
> +                       binding. The required parameters are: the minimum
> +                       cooling level and the maximum cooling level used
> +                       in this attach.

Might this be a list in general? The code doesn't have to support that
yet.

It would be nice to have a name for the cells after a phandle which
describe the cooling device's configuration. You've called them
parameters here, but it probably makes sense to call them a
cooling-specifier (following clock-specifier and interrupt-specifier).

> +- trip:                        A phandle of a trip point node within the same thermal
> +                       zone.
> +
> +Optional property:
> +- contribution:                The cooling contribution to the thermal zone of the
> +                       referred cooling device at the referred trip point.
> +                       The contribution is a value from 0 to 100. The sum
> +                       of all cooling contributions within a thermal zone
> +                       must never exceed 100.

This is a bit arbitrary. Couldn't we sum all contributions to find the
total automatically, so this could be a simpler ratio?

> +
> +Note: Using the THERMAL_NO_LIMIT (-1L) constant in the cooling-device phandle
> +limit parameters means:
> +(i)   - minimum level allowed for minimum cooling level used in the reference.
> +(ii)  - maximum level allowed for maximum cooling level used in the reference.
> +Refer to include/dt-bindings/thermal/thermal.h for definition of this constant.
> +
> +* Thermal zones
> +
> +The thermal-zone node is the node containing all the required info
> +for describing a thermal zone, including its cdev bindings. The thermal_zone
> +node must contain, apart from its own properties, one node containing
> +trip nodes and one node containing all the zone cooling attachments.

s/cdev/cooling device/ ?

> +
> +Required properties:
> +- passive-delay:       The maximum number of milliseconds to wait between polls
> +                       when performing passive cooling.
> +- polling-delay:       The maximum number of milliseconds to wait between polls
> +                       when checking this thermal zone.

How about polling-delay-passive, polling-delay-active? I'm still not

> +- sensors:             A list of sensor phandles and their parameters. The
> +                       required parameter is the sensor id, in order to
> +                       identify internal sensors when the sensor IC features
> +                       several sensing units.

As mentioned above, I'm not sure you even need that one cell.

- sensors:                A list of sensor phandle + thermal-sensor-specifier
                          cells describing the sensors monitoring the thermal
			  zone.

> +- trips:               A sub-node containing several trip point nodes required
> +                       to describe the thermal zone.
> +- cooling-attachments  A sub-node containing several cooling device attaches
> +                       nodes, used to describe the relation between trips
> +                       and cooling devices.
> +
> +Optional property:
> +- coefficients:                An array of integers (one signed cell) containing
> +                       coefficients to compose a linear relation between
> +                       the sensors described in the sensors property.
> +                       Coefficients defaults to 1, in case this property
> +                       is not specified. A simple linear polynomial is used:
> +                       Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1) + cn.
> +
> +                       The coefficients are ordered and they match with sensors
> +                       by means of sensor ID. Additional coefficients are
> +                       interpreted as constant offsets.

What is the end result of this? Presumably this is meant to result in an
estimate of the average temperature of the thermal zone, rather than an
arbitrary value? This should be mentioned.

This doesn't seem to be used the in the code below...

> +
> +Note: The delay properties are bound to the maximum dT/dt (temperature
> +derivative over time) in two situations for a thermal zone:
> +(i)  - when active cooling is activated (passive-delay); and
> +(ii) - when the zone just needs to be monitored (polling-delay).
> +The maximum dT/dt is highly bound to hardware power consumption and dissipation
> +capability.

I'm not sure what you mean by this, could you elaborate?

I guess you mean that the delays should be chosen to account for said
max dT/dt, such that a device can't unexpectedly cross several trip
boundaries between polls?

> +
> +* Examples
> +
> +Below are several examples on how to use thermal data descriptors
> +using device tree bindings:
> +
> +(a) - CPU thermal zone
> +
> +The CPU thermal zone example below describes how to setup one thermal zone
> +using one single sensor as temperature source and many cooling devices and
> +power dissipation control sources.
> +
> +#include <dt-bindings/thermal/thermal.h>
> +
> +cpus {
> +       cpu0: cpu@0 {
> +               ...
> +               cooling-min-level = <0>;
> +               cooling-max-level = <3>;
> +               #cooling-cells = <2>; /* min followed by max */
> +       };

What do those min and max mean in this context? What do the values in
the range [0,3] correspond to?

I'm not sure it makes sense to describe passive cooling in this way --
the precise way an OS does less work on a device is fundamentally tied
to the design of the OS (it might be able to rate-limit requests, it
might be able to clock the device down, it might only be able to turn
the device off).

I'm not sure there is anything we can describe for passive cooling
(beyond the thermal limits the OS should attempt to stick to).

> +       ...
> +};
> +
> +&i2c1 {
> +       ...
> +       fan0: fan@0x48 {
> +               ...
> +               cooling-min-level = <0>;
> +               cooling-max-level = <9>;
> +               #cooling-cells = <2>; /* min followed by max */
> +       };

What do min and max mean here for the fan?

> +};
> +
> +bandgap0: bandgap@0x0000ED00 {
> +       ...
> +       #sensor-cells = <1>;
> +};
> +
> +cpu-thermal: cpu-thermal {

How do the thermal zones get probed?

> +       passive-delay = <250>; /* milliseconds */
> +       polling-delay = <1000>; /* milliseconds */
> +
> +               /* sensor       ID */
> +        sensors = <&bandgap0     0>;
> +
> +        trips {
> +                cpu-alert0: cpu-alert {
> +                        temperature = <90000>; /* milliCelsius */
> +                        hysteresis = <2000>; /* milliCelsius */
> +                        type = <THERMAL_TRIP_ACTIVE>;
> +                };
> +                cpu-alert1: cpu-alert {
> +                        temperature = <100000>; /* milliCelsius */
> +                        hysteresis = <2000>; /* milliCelsius */
> +                        type = <THERMAL_TRIP_PASSIVE>;
> +                };
> +                cpu-crit: cpu-crit {
> +                        temperature = <125000>; /* milliCelsius */
> +                        hysteresis = <2000>; /* milliCelsius */
> +                        type = <THERMAL_TRIP_CRITICAL>;
> +                };
> +        };
> +
> +       cooling-attachments {
> +               attach0 {
> +                       trip = <&cpu-alert0>;
> +                       cooling-device = <&fan0 THERMAL_NO_LIMITS 4>;
> +               };
> +               attach1 {
> +                       trip = <&cpu-alert1>;
> +                       cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
> +               };
> +               attach2 {
> +                       trip = <&cpu-alert1>;
> +                       cooling-device =
> +                               <&cpu0 THERMAL_NO_LIMITS THERMAL_NO_LIMITS>;
> +               };
> +       };

Was there a good reason for splitting trips and attachment?

> +};
> +
> +In the example above, the ADC sensor at address 0x0000ED00 is used to monitor
> +the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan device controlled
> +via I2C bus 1, at adress 0x48, is used to remove the heat out of the thermal
> +zone 'cpu-thermal' using its cooling levels from its minimum to 4, when it
> +reaches trip point 'cpu-alert0' at 90C, as an example of active cooling. The
> +same cooling device is used at 'cpu-alert1', but from 5 to its maximum level.
> +The cpu@0 device is also linked to the same thermal zone, 'cpu-thermal', as a
> +passive cooling device, using all its cooling levels at trip point 'cpu-alert1',
> +which is a trip point at 100C.
> +

[...]

> +(c) - Several sensors within one single thermal zone
> +
> +The example below illustrates how to use more than one sensor within
> +one thermal zone.
> +
> +#include <dt-bindings/thermal/thermal.h>
> +
> +&i2c1 {
> +       ...
> +       adc: sensor@0x49 {
> +               ...
> +               #sensor-cells = <1>;
> +       };
> +};
> +
> +bandgap0: bandgap@0x0000ED00 {
> +       ...
> +       #sensor-cells = <1>;
> +};
> +
> +cpu-thermal: cpu-thermal {
> +       passive-delay = <250>; /* milliseconds */
> +       polling-delay = <1000>; /* milliseconds */
> +
> +               /* sensor       ID */
> +        sensors = <&bandgap0   0>,
> +                 <&adc         0>;
> +
> +               /* hotspot = 100 * bandgap - 120 * adc + 484 */
> +       coefficients =          <100    -120    484>;

Aha, so these are signed. This *must* be mentioned in the documentation.
The types of all properties should be described in their definition.

[...]

> +struct thermal_zone_device *
> +thermal_zone_of_sensor_register(struct device *dev, int sensor_id,
> +                               void *data, int (*get_temp)(void *, long *),
> +                               int (*get_trend)(void *, long *))
> +{
> +       struct device_node *np, *child, *sensor_np;
> +
> +       np = of_find_node_by_name(NULL, "thermal-zones");

This is the first instance of "thermal-zones" in this patch. Presumably
this is the container for thermal zones that allows them to be probed
(answering my question above). This *must* be described in the binding.

> +       if (!np)
> +               return ERR_PTR(-ENODEV);
> +
> +       if (!dev || !dev->of_node)
> +               return ERR_PTR(-EINVAL);
> +
> +       sensor_np = dev->of_node;
> +
> +       for_each_child_of_node(np, child) {
> +               struct of_phandle_args sensor_specs;
> +               int ret;
> +
> +               /* For now, thermal framework supports only 1 sensor per zone */
> +               ret = of_parse_phandle_with_args(child, "sensors",
> +                                                "#sensor-cells",
> +                                                0, &sensor_specs);
> +               if (ret)
> +                       continue;
> +
> +               if (sensor_specs.args_count < 1)
> +                       continue;

Why? I fail to see why a single sensor *must* have some configuration cells.

> +
> +               if (sensor_specs.np == sensor_np &&
> +                   sensor_specs.args[0] == sensor_id) {
> +                       of_node_put(np);
> +                       return thermal_zone_of_add_sensor(child, sensor_np,
> +                                                         data,
> +                                                         get_temp,
> +                                                         get_trend);
> +               }
> +       }
> +       of_node_put(np);
> +
> +       return ERR_PTR(-ENODEV);
> +}
> +EXPORT_SYMBOL_GPL(thermal_zone_of_sensor_register);

[...]

> +static int thermal_of_populate_bind_params(struct device_node *np,
> +                                          struct __thermal_bind_params *__tbp,
> +                                          struct __thermal_trip *trips,
> +                                          int ntrips)
> +{
> +       struct of_phandle_args cooling_spec;
> +       struct device_node *trip;
> +       int ret, i;
> +       u32 prop;
> +
> +       /* Default weight. Usage is optional */
> +       __tbp->usage = 0;
> +       ret = of_property_read_u32(np, "usage", &prop);

That wasn't described in the binding. Should this be reading the
"contribution" property?

> +       if (ret == 0)
> +               __tbp->usage = prop;
> +
> +       trip = of_parse_phandle(np, "trip", 0);
> +       if (!trip) {
> +               pr_err("missing trip property\n");
> +               return -ENODEV;
> +       }
> +
> +       /* match using device_node */
> +       for (i = 0; i < ntrips; i++)
> +               if (trip == trips[i].np) {
> +                       __tbp->trip_id = i;
> +                       break;
> +               }
> +
> +       if (i == ntrips) {
> +               ret = -ENODEV;
> +               goto end;
> +       }
> +
> +       ret = of_parse_phandle_with_args(np, "cooling-device", "#cooling-cells",
> +                                        0, &cooling_spec);
> +       if (ret < 0) {
> +               pr_err("missing cooling_device property\n");
> +               goto end;
> +       }
> +       __tbp->cooling_device = cooling_spec.np;
> +       if (cooling_spec.args_count >= 2) { /* at least min and max */
> +               __tbp->min = cooling_spec.args[0];
> +               __tbp->max = cooling_spec.args[1];

Ah, so the first two cells are meant to be min and max, not any
arbitrary cells. Why is this necessary?

> +       } else {
> +               pr_err("wrong reference to cooling device, missing limits\n");
> +       }
> +
> +end:
> +       of_node_put(trip);
> +
> +       return ret;
> +}
> +
> +/**
> + * thermal_of_populate_trip - parse and fill one trip point data
> + * @np: DT node containing a trip point node
> + * @trip: trip point data structure to be filled up
> + *
> + * This function parses a trip point type of node represented by
> + * @np parameter and fills the read data into @trip data structure.
> + *
> + * Return: 0 on success, proper error code otherwise
> + */
> +static int thermal_of_populate_trip(struct device_node *np,
> +                                   struct __thermal_trip *trip)
> +{
> +       int prop;
> +       int ret;
> +
> +       ret = of_property_read_u32(np, "temperature", &prop);
> +       if (ret < 0) {
> +               pr_err("missing temperature property\n");
> +               return ret;
> +       }
> +       trip->temperature = prop;
> +
> +       ret = of_property_read_u32(np, "hysteresis", &prop);
> +       if (ret < 0) {
> +               pr_err("missing hysteresis property\n");
> +               return ret;
> +       }
> +       trip->hysteresis = prop;
> +
> +       ret = of_property_read_u32(np, "type", &prop);
> +       if (ret < 0) {
> +               pr_err("missing type property\n");
> +               return ret;
> +       }
> +       trip->type = prop;

No sanity checking?

I'd prefer a string and a table from string to Linux internal ID. Others
may have differing opinions.

> +
> +       /* Required for cooling attachment matching */
> +       trip->np = np;
> +
> +       return 0;
> +}

[...]

> +
> +/**
> + * thermal_of_build_thermal_zone - parse and fill one thermal zone data
> + * @np: DT node containing a thermal zone node
> + *
> + * This function parses a thermal zone type of node represented by
> + * @np parameter and fills the read data into a __thermal_zone data structure
> + * and return this pointer.
> + *
> + * Return: On success returns a valid struct __thermal_zone,
> + * otherwise, it returns a corresponding ERR_PTR(). Caller must
> + * check the return value with help of IS_ERR() helper.
> + */
> +static struct __thermal_zone *
> +thermal_of_build_thermal_zone(struct device_node *np)
> +{
> +       struct device_node *child, *gchild;
> +       struct __thermal_zone *tz;
> +       int ret, i;
> +       u32 prop;
> +
> +       if (!np) {
> +               pr_err("no thermal zone np\n");
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       tz = kzalloc(sizeof(*tz), GFP_KERNEL);
> +       if (!tz)
> +               return ERR_PTR(-ENOMEM);
> +
> +       ret = of_property_read_u32(np, "passive-delay", &prop);
> +       if (ret < 0) {
> +               pr_err("missing passive_delay property\n");

Inconsistent '-' and '_' between the parsing and the error.

> +               return ERR_PTR(ret);
> +       }
> +       tz->passive_delay = prop;
> +
> +       ret = of_property_read_u32(np, "polling-delay", &prop);
> +       if (ret < 0) {
> +               pr_err("missing polling_delay property\n");

Same here.

> +               return ERR_PTR(ret);
> +       }
> +       tz->polling_delay = prop;
> +
> +       /* trips */
> +       child = of_get_child_by_name(np, "trips");
> +
> +       /* No trips provided */
> +       if (!child)
> +               goto finish;
> +
> +       tz->ntrips = of_get_child_count(child);

What if there are no children, or this fails (returning zero)?

> +       tz->trips = kzalloc(tz->ntrips * sizeof(*tz->trips), GFP_KERNEL);

Here kzalloc could return ZERO_SIZE_PTR ((void*) 16). So the check below
isn't sufficient to stop us continuing if the node has no children. We
should check tz->ntrips above before calling kzalloc.

> +       if (!tz->trips)
> +               return ERR_PTR(-ENOMEM);
> +       i = 0;
> +       for_each_child_of_node(child, gchild)
> +               thermal_of_populate_trip(gchild, &tz->trips[i++]);

What if this fails for a child node?

> +
> +       of_node_put(child);
> +
> +       /* cooling-attachments */
> +       child = of_get_child_by_name(np, "cooling-attachments");
> +
> +       /* cooling-attachments provided */
> +       if (!child)
> +               goto finish;
> +
> +       tz->num_tbps = of_get_child_count(child);
> +       tz->tbps = kzalloc(tz->num_tbps * sizeof(*tz->tbps), GFP_KERNEL);
> +       if (!tz->tbps)
> +               return ERR_PTR(-ENOMEM);
> +       i = 0;
> +       for_each_child_of_node(child, gchild)
> +               thermal_of_populate_bind_params(gchild, &tz->tbps[i++],
> +                                               tz->trips, tz->ntrips);
> +
> +finish:
> +       tz->mode = THERMAL_DEVICE_DISABLED;
> +
> +       return tz;
> +}

What about all that useless data we may have just allocated memory for?

[...]

> +int __init of_parse_thermal_zones(void)
> +{
> +       struct device_node *np, *child;
> +       struct __thermal_zone *tz;
> +       struct thermal_zone_device_ops *ops;
> +
> +       np = of_find_node_by_name(NULL, "thermal-zones");
> +       if (!np) {
> +               pr_debug("unable to find thermal zones\n");
> +               return 0; /* Run successfully on systems without thermal DT */
> +       }
> +
> +       for_each_child_of_node(np, child) {
> +               struct thermal_zone_device *zone;
> +               struct thermal_zone_params *tzp;

So each child of thermal-zones must be a thermal zone (we can't embed
other nodes of information)?

> +
> +               tz = thermal_of_build_thermal_zone(child);
> +               if (IS_ERR(tz)) {
> +                       pr_err("failed to build thermal zone %s: %ld\n",
> +                              child->name,
> +                              PTR_ERR(tz));
> +                       continue;
> +               }
> +
> +               ops = kmemdup(&of_thermal_ops, sizeof(*ops), GFP_KERNEL);
> +               if (!ops)
> +                       goto exit_free;
> +
> +               tzp = kzalloc(sizeof(*tzp), GFP_KERNEL);
> +               if (!tzp) {
> +                       kfree(ops);
> +                       goto exit_free;
> +               }
> +
> +               /* No hwmon because there might be hwmon drivers registering */
> +               tzp->no_hwmon = true;
> +
> +               zone = thermal_zone_device_register(child->name, tz->ntrips,
> +                                                   0, tz,
> +                                                   ops, tzp,
> +                                                   tz->passive_delay,
> +                                                   tz->polling_delay);
> +               if (IS_ERR(zone)) {
> +                       pr_err("Failed to build %s zone %ld\n", child->name,
> +                              PTR_ERR(zone));
> +                       kfree(tzp);
> +                       kfree(ops);
> +                       of_thermal_free_zone(tz);
> +                       /* attempting to build remaining zones still */
> +               }
> +       }
> +
> +       return 0;
> +
> +exit_free:
> +       of_thermal_free_zone(tz);
> +
> +       /* no memory available, so free what we have built */
> +       of_thermal_destroy_zones();
> +
> +       return -ENOMEM;
> +}

Cheers,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Sept. 23, 2013, 3:39 p.m. UTC | #2
On 23-09-2013 06:40, Mark Rutland wrote:
> Hi Eduardo,
> 

Hi Mark,

> Apologies for having taken so long to get back you on this.
> 

Well, at least we are keeping it up somehow. I just would like to have
an agreement, at least for the binding part, as I mentioned before.

> I have several comments on the binding and the way it's parsed.

OK. I will try to clarify that.

> 
> On Wed, Sep 18, 2013 at 10:35:36PM +0100, Eduardo Valentin wrote:
>> This patch introduces a device tree bindings for
>> describing the hardware thermal behavior and limits.
>> Also a parser to read and interpret the data and feed
>> it in the thermal framework is presented.
>>
>> This patch introduces a thermal data parser for device
>> tree. The parsed data is used to build thermal zones
>> and thermal binding parameters. The output data
>> can then be used to deploy thermal policies.
>>
>> This patch adds also documentation regarding this
>> API and how to define tree nodes to use
>> this infrastructure.
>>
>> Note that, in order to be able to have control
>> on the sensor registration on the DT thermal zone,
>> it was required to allow changing the thermal zone
>> .get_temp callback. For this reason, this patch
>> also removes the 'const' modifier from the .ops
>> field of thermal zone devices.
>>
>> Cc: Zhang Rui <rui.zhang@intel.com>
>> Cc: linux-pm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
>> ---
>>  .../devicetree/bindings/thermal/thermal.txt        | 498 +++++++++++++
>>  drivers/thermal/Kconfig                            |  13 +
>>  drivers/thermal/Makefile                           |   1 +
>>  drivers/thermal/of-thermal.c                       | 775 +++++++++++++++++++++
>>  drivers/thermal/thermal_core.c                     |   9 +-
>>  drivers/thermal/thermal_core.h                     |   9 +
>>  include/dt-bindings/thermal/thermal.h              |  27 +
>>  include/linux/thermal.h                            |  28 +-
>>  8 files changed, 1357 insertions(+), 3 deletions(-)
>>  create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
>>  create mode 100644 drivers/thermal/of-thermal.c
>>  create mode 100644 include/dt-bindings/thermal/thermal.h
>>
>> ---
>>
>> Hello all,
>>
>> Thanks Joe Perches for the effort of reviewing this code, I really appreciate it.
>>
>> Here is a version with a refactored init function without leaks in the failure
>> patch. I also added a couple of comments to better understand the intention of
>> that function. Besides, when there are no memory available, the function
>> rolls back what ever thermal zones have been added.
>>
>> Thanks,
>>
>> Eduardo
>>
>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
>> new file mode 100644
>> index 0000000..6664533
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>> @@ -0,0 +1,498 @@
>> +* Thermal Framework Device Tree descriptor
>> +
>> +Generic binding to provide a way of defining hardware thermal
>> +structure using device tree. A thermal structure includes thermal
>> +zones and their components, such as trip points, polling intervals,
>> +sensors and cooling devices binding descriptors.
>> +
>> +The target of device tree thermal descriptors is to describe only
>> +the hardware thermal aspects, not how the system must control or which
>> +algorithm or policy must be taken in place.
>> +
>> +There are five types of nodes involved to describe thermal bindings:
>> +- sensors: used to describe the device source of temperature sensing;
>> +- cooling devices: used to describe devices source of power dissipation control;
>> +- trip points: used to describe points in temperature domain defined to
>> +make the system aware of hardware limits;
>> +- cooling attachments: used to describe links between trip points and
>> +cooling devices;
> 
> I think "attachments" sounds a bit odd, though I don't have a better
> naming suggestion.

I understand. But I also could not find a better naming. We use the term
'cooling binding'. However, while rewriting and rereading this binding
document  I realized that the 'binding' term could be confused between
'cooling binding' and 'device tree binding'. So I chose to use another
name and 'attachments' as the best I could come out with :-). I am open
to suggestions here, if you have something better.

> 
>> +- thermal zones: used to describe thermal data within the hardware;
>> +
>> +It follows a description of each type of these device tree nodes.
>> +
>> +* Sensor devices
>> +
>> +Sensor devices are nodes providing temperature sensing capabilities on thermal
>> +zones. Typical devices are I2C ADC converters and bandgaps. Theses are nodes
>> +providing temperature data to thermal zones. Temperature sensor devices may
>> +control one or more internal sensors.
>> +
>> +Required property:
>> +- #sensor-cells:       Used to provide sensor device specific information
>> +                       while referring to it. Must be at least 1, in order
>> +                       to identify uniquely the sensor instances within
>> +                       the IC. See thermal zone binding for more details
>> +                       on how consumers refer to sensor devices.
> 
> I don't see why this needs to be at least one cell -- if an IC only has
> one temperature sensor it should be fine for this to be zero and have no
> ambiguity.

Well, idea was to have all these entries uniform. Besides, making sure
one has added a #sensor-cells entry to a device tree node seams to be
sign that the writer of that node really wants it to be a temperature
sensor used within this type of binding.

> 
> It may make sense to call these thermal sensor devices, there are plenty
> of other sensors we might want to describe in the dt for other purposes,
> and we can impose some consistency if we remove the ambiguity.

Sounds fair to me.

> 
>> +
>> +* Cooling device nodes
>> +
>> +Cooling devices are nodes providing control on power dissipation. There
>> +are essentially two ways to provide control on power dissipation. First
>> +is by means of regulating device performance, which is known as passive
>> +cooling. Second is by means of activating devices in order to remove
>> +the dissipated heat, which is known as active cooling, e.g. regulating
>> +fan speeds. In both cases, cooling devices shall have a way to determine
>> +the level of cooling.
>> +
>> +Required property:
>> +- cooling-min-level:   A unsigned integer indicating the smallest
>> +                       cooling level accepted. Typically 0.
>> +- cooling-max-level:   An unsigned integer indicating the largest
>> +                       cooling level accepted.
> 
> I'm not sure what a "cooling level" means. It seems a bit abstract. Is
> this binding specific?

It is just a state to describe the level of cooling provided. I tried to
explained it in the above paragraph, but based on your comment, it was
not enough :-).

A fan device may come with three different speeds. Then it would be seen
as having three cooling levels. Similarly, a cpu may come with four
operating points (predefined pairs of voltage and frequency set), then
it would be seen as having four cooling levels. Pretty straight forward.

> 
> How does this relate to cooling-cells? These seem to be a fixed size.
> 

The cooling cells was simply to say which levels would be used. E.g. If
a cpu has 10 different operating points, at a certain trip point, the
designer may choose only the 6th until the 10th operating point
available to cool off the thermal zone.

>> +- #cooling-cells:      Used to provide cooling device specific information
>> +                       while referring to it. Must be at least 2, in order
>> +                       to specify minimum and maximum cooling level used
>> +                       in the reference. See Cooling device attachments section
>> +                       below for more details on how consumers refer to
>> +                       cooling devices.
> 
> Are these cooling cells always expected to cover min and max, and are
> min and max always expected to take the same number of cells? The above
> cooling-*-level properties imply they each take up one cell.

Yes, they take only one cell. And yes, so far I see only one need so
far, to describe min and max level used while referring to the cooling
device.

> 
> Does #cooling-cells = <3> make sense?. If this covers additional
> information (e.g. which fan on a multi-fan controller), how does the OS
> determine which cells are min and max, and how do these relate to
> cooling-level-min and cooling-level-max?

Well, I haven't seen such case. But would make sense to me, yes. Not
that we would be using though.

> 
>> +
>> +* Trip points
>> +
>> +The trip node is a node to describe a point in the temperature domain
>> +in which the system takes an action. This node describes just the point,
>> +not the action.
> 
> I'm still not sure on this, as it sounds like embedding policy into the
> DT, rather than a description of the hardware. I realise we need to

Why?

> prevent devices burning out, so describing the max acceptable
> temperature and possibly a preferred range makes sense, but do we need
> this level of flexibility? Maybe we do.
> 

Can you please be more specific? Any counter example on how to be less
flexible? I believe this shall give a good level of flexibility. Nothing
less than needed, and nothing that can open for abusing the notation.

>> +
>> +Required properties:
>> +- temperature:         the trip temperature level, in milliCelsius.
> 
> Preferably, s/milliCelsius/millicelsius/ over the document.
> 

No issues.

> Given that we have signed values elsewhere, is this signed or unsigned?


Signed. But withing the current code implementation, and design too, we
are concerned with high positive temperatures, not low negative
temperatures.

> 
>> +- hysteresis:          a (low) hysteresis value on 'temperature'. This is a
>> +                       relative value, in milliCelsius.
>> +- type:                        the trip type. Here is the type mapping:
>> +       THERMAL_TRIP_ACTIVE     0:      A trip point to enable active cooling
>> +       THERMAL_TRIP_PASSIVE    1:      A trip point to enable passive cooling
>> +       THERMAL_TRIP_HOT        2:      A trip point to notify emergency
>> +       THERMAL_TRIP_CRITICAL   3:      Hardware not reliable.
>> +
>> +Refer to include/dt-bindings/thermal/thermal.h for definition of these consts.
> 
> Why not use a string for this? We do so for other type parameters in
> other bindings (see dr_mode in
> Documentation/devicetree/bindings/usb/generic.txt, phy-mode for ethernet
> PHYs, etc).

I think this is a matter of taste. I am not sure I see the advantage of
changing this to string constants. If it is the case of having better
readability, I would say we do not loose readability while using macros,
that is descriptive enough.

> 
>> +
>> +* Cooling device attachments
>> +
>> +The cooling device attachments node is a node to describe how cooling devices
>> +get assigned to trip points of the zone. The cooling devices are expected
>> +to be loaded in the target system.
>> +
>> +Required properties:
>> +- cooling-device:      A phandle of a cooling device with its parameters,
>> +                       referring to which cooling device is used in this
>> +                       binding. The required parameters are: the minimum
>> +                       cooling level and the maximum cooling level used
>> +                       in this attach.
> 
> Might this be a list in general? The code doesn't have to support that
> yet.

Well, I thought of having a list at first, but then we would loose the
'contribution' property, which is specific to a cooling device, not to a
list of cooling devices.

> 
> It would be nice to have a name for the cells after a phandle which
> describe the cooling device's configuration. You've called them
> parameters here, but it probably makes sense to call them a
> cooling-specifier (following clock-specifier and interrupt-specifier).

I think I do not follow this suggestion properly. Can you please give an
example? The parameters I was referring to was just the min and max
cooling levels.

> 
>> +- trip:                        A phandle of a trip point node within the same thermal
>> +                       zone.
>> +
>> +Optional property:
>> +- contribution:                The cooling contribution to the thermal zone of the
>> +                       referred cooling device at the referred trip point.
>> +                       The contribution is a value from 0 to 100. The sum
>> +                       of all cooling contributions within a thermal zone
>> +                       must never exceed 100.
> 
> This is a bit arbitrary. Couldn't we sum all contributions to find the
> total automatically, so this could be a simpler ratio?

Well, both representations do the job. I fail to see the advantage
between one and another.

> 
>> +
>> +Note: Using the THERMAL_NO_LIMIT (-1L) constant in the cooling-device phandle
>> +limit parameters means:
>> +(i)   - minimum level allowed for minimum cooling level used in the reference.
>> +(ii)  - maximum level allowed for maximum cooling level used in the reference.
>> +Refer to include/dt-bindings/thermal/thermal.h for definition of this constant.
>> +
>> +* Thermal zones
>> +
>> +The thermal-zone node is the node containing all the required info
>> +for describing a thermal zone, including its cdev bindings. The thermal_zone
>> +node must contain, apart from its own properties, one node containing
>> +trip nodes and one node containing all the zone cooling attachments.
> 
> s/cdev/cooling device/ ?

Yes.

> 
>> +
>> +Required properties:
>> +- passive-delay:       The maximum number of milliseconds to wait between polls
>> +                       when performing passive cooling.
>> +- polling-delay:       The maximum number of milliseconds to wait between polls
>> +                       when checking this thermal zone.
> 
> How about polling-delay-passive, polling-delay-active? I'm still not

I am ok with the name suggestion

> 
>> +- sensors:             A list of sensor phandles and their parameters. The
>> +                       required parameter is the sensor id, in order to
>> +                       identify internal sensors when the sensor IC features
>> +                       several sensing units.
> 
> As mentioned above, I'm not sure you even need that one cell.

To make then uniform, as mentioned. Can you please give an example of
existing similar case, in which we simply avoid using #.*-cells?

> 
> - sensors:                A list of sensor phandle + thermal-sensor-specifier
>                           cells describing the sensors monitoring the thermal
> 			  zone.

I see your suggestion.

> 
>> +- trips:               A sub-node containing several trip point nodes required
>> +                       to describe the thermal zone.
>> +- cooling-attachments  A sub-node containing several cooling device attaches
>> +                       nodes, used to describe the relation between trips
>> +                       and cooling devices.
>> +
>> +Optional property:
>> +- coefficients:                An array of integers (one signed cell) containing
>> +                       coefficients to compose a linear relation between
>> +                       the sensors described in the sensors property.
>> +                       Coefficients defaults to 1, in case this property
>> +                       is not specified. A simple linear polynomial is used:
>> +                       Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1) + cn.
>> +
>> +                       The coefficients are ordered and they match with sensors
>> +                       by means of sensor ID. Additional coefficients are
>> +                       interpreted as constant offsets.
> 
> What is the end result of this? Presumably this is meant to result in an
> estimate of the average temperature of the thermal zone, rather than an
> arbitrary value? This should be mentioned.

Well not exactly an average, but at least a linear relation, as already
mentioned. Not only that, but with the above property, one can use also
one single sensor + offset, which is a very common use case.

> 
> This doesn't seem to be used the in the code below...

It is in the TODO list. But you are right, it is not implemented.

> 
>> +
>> +Note: The delay properties are bound to the maximum dT/dt (temperature
>> +derivative over time) in two situations for a thermal zone:
>> +(i)  - when active cooling is activated (passive-delay); and
>> +(ii) - when the zone just needs to be monitored (polling-delay).
>> +The maximum dT/dt is highly bound to hardware power consumption and dissipation
>> +capability.
> 
> I'm not sure what you mean by this, could you elaborate?
> 
> I guess you mean that the delays should be chosen to account for said
> max dT/dt, such that a device can't unexpectedly cross several trip
> boundaries between polls?

Yes, that is exactly what I meant. And that is why I have been saying
that I see these polling values are part of hardware description. Worst
case is when device unexpectedly cross several trip boundaries between
polls without the system seeing, but also reaches temperature ranges
that compromise silicon lifetime or internal structures.

These is highly bound to maximum power delta that the circuitry may be
exposed to.

> 
>> +
>> +* Examples
>> +
>> +Below are several examples on how to use thermal data descriptors
>> +using device tree bindings:
>> +
>> +(a) - CPU thermal zone
>> +
>> +The CPU thermal zone example below describes how to setup one thermal zone
>> +using one single sensor as temperature source and many cooling devices and
>> +power dissipation control sources.
>> +
>> +#include <dt-bindings/thermal/thermal.h>
>> +
>> +cpus {
>> +       cpu0: cpu@0 {
>> +               ...
>> +               cooling-min-level = <0>;
>> +               cooling-max-level = <3>;
>> +               #cooling-cells = <2>; /* min followed by max */
>> +       };
> 
> What do those min and max mean in this context? What do the values in
> the range [0,3] correspond to?

In the case the CPU is DVFS-capable, which is pretty common case these
days, they represent that the referred cpu has four possible operating
points. In this case, 0 means the maximum operating point, 3 means the
minimum operating point.

> 
> I'm not sure it makes sense to describe passive cooling in this way --
> the precise way an OS does less work on a device is fundamentally tied
> to the design of the OS (it might be able to rate-limit requests, it
> might be able to clock the device down, it might only be able to turn
> the device off).

No, this is not about the work the OS capability of doing work on a CPU.
This is more tied to the frequency and voltage available on a CPU. That
is, the hardware capability of cooling itself, in this case.

> 
> I'm not sure there is anything we can describe for passive cooling
> (beyond the thermal limits the OS should attempt to stick to).
> 

I am not sure what you mean here, same level of cooling applies for a
cpu capable of doing dvfs, for instance.

>> +       ...
>> +};
>> +
>> +&i2c1 {
>> +       ...
>> +       fan0: fan@0x48 {
>> +               ...
>> +               cooling-min-level = <0>;
>> +               cooling-max-level = <9>;
>> +               #cooling-cells = <2>; /* min followed by max */
>> +       };
> 
> What do min and max mean here for the fan?

speeds

> 
>> +};
>> +
>> +bandgap0: bandgap@0x0000ED00 {
>> +       ...
>> +       #sensor-cells = <1>;
>> +};
>> +
>> +cpu-thermal: cpu-thermal {
> 
> How do the thermal zones get probed?

Regarding the implementation for Linux, I proposed a function that
parses all of them, when the thermal framework is initialized. They will
be represented as thermal zone devices in the thermal framework. But
they will only be enabled when the sensor drivers register themselves to
the framework.

> 
>> +       passive-delay = <250>; /* milliseconds */
>> +       polling-delay = <1000>; /* milliseconds */
>> +
>> +               /* sensor       ID */
>> +        sensors = <&bandgap0     0>;
>> +
>> +        trips {
>> +                cpu-alert0: cpu-alert {
>> +                        temperature = <90000>; /* milliCelsius */
>> +                        hysteresis = <2000>; /* milliCelsius */
>> +                        type = <THERMAL_TRIP_ACTIVE>;
>> +                };
>> +                cpu-alert1: cpu-alert {
>> +                        temperature = <100000>; /* milliCelsius */
>> +                        hysteresis = <2000>; /* milliCelsius */
>> +                        type = <THERMAL_TRIP_PASSIVE>;
>> +                };
>> +                cpu-crit: cpu-crit {
>> +                        temperature = <125000>; /* milliCelsius */
>> +                        hysteresis = <2000>; /* milliCelsius */
>> +                        type = <THERMAL_TRIP_CRITICAL>;
>> +                };
>> +        };
>> +
>> +       cooling-attachments {
>> +               attach0 {
>> +                       trip = <&cpu-alert0>;
>> +                       cooling-device = <&fan0 THERMAL_NO_LIMITS 4>;
>> +               };
>> +               attach1 {
>> +                       trip = <&cpu-alert1>;
>> +                       cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
>> +               };
>> +               attach2 {
>> +                       trip = <&cpu-alert1>;
>> +                       cooling-device =
>> +                               <&cpu0 THERMAL_NO_LIMITS THERMAL_NO_LIMITS>;
>> +               };
>> +       };
> 
> Was there a good reason for splitting trips and attachment?

Same reason I mentioned for not having a list of cooling devices in each
attachment, we could loose information specific to the attachment trip,
cooling device.

> 
>> +};
>> +
>> +In the example above, the ADC sensor at address 0x0000ED00 is used to monitor
>> +the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan device controlled
>> +via I2C bus 1, at adress 0x48, is used to remove the heat out of the thermal
>> +zone 'cpu-thermal' using its cooling levels from its minimum to 4, when it
>> +reaches trip point 'cpu-alert0' at 90C, as an example of active cooling. The
>> +same cooling device is used at 'cpu-alert1', but from 5 to its maximum level.
>> +The cpu@0 device is also linked to the same thermal zone, 'cpu-thermal', as a
>> +passive cooling device, using all its cooling levels at trip point 'cpu-alert1',
>> +which is a trip point at 100C.
>> +
> 
> [...]
> 
>> +(c) - Several sensors within one single thermal zone
>> +
>> +The example below illustrates how to use more than one sensor within
>> +one thermal zone.
>> +
>> +#include <dt-bindings/thermal/thermal.h>
>> +
>> +&i2c1 {
>> +       ...
>> +       adc: sensor@0x49 {
>> +               ...
>> +               #sensor-cells = <1>;
>> +       };
>> +};
>> +
>> +bandgap0: bandgap@0x0000ED00 {
>> +       ...
>> +       #sensor-cells = <1>;
>> +};
>> +
>> +cpu-thermal: cpu-thermal {
>> +       passive-delay = <250>; /* milliseconds */
>> +       polling-delay = <1000>; /* milliseconds */
>> +
>> +               /* sensor       ID */
>> +        sensors = <&bandgap0   0>,
>> +                 <&adc         0>;
>> +
>> +               /* hotspot = 100 * bandgap - 120 * adc + 484 */
>> +       coefficients =          <100    -120    484>;
> 
> Aha, so these are signed. This *must* be mentioned in the documentation.
> The types of all properties should be described in their definition.
> 

OK.


> [...]
> 
>> +struct thermal_zone_device *
>> +thermal_zone_of_sensor_register(struct device *dev, int sensor_id,
>> +                               void *data, int (*get_temp)(void *, long *),
>> +                               int (*get_trend)(void *, long *))
>> +{
>> +       struct device_node *np, *child, *sensor_np;
>> +
>> +       np = of_find_node_by_name(NULL, "thermal-zones");
> 
> This is the first instance of "thermal-zones" in this patch. Presumably
> this is the container for thermal zones that allows them to be probed
> (answering my question above). This *must* be described in the binding.

Hmm.. I am not sure I follow you properly. Are you asking me to mention
implementation details in the binding document (how thermal zones are
probed)?

> 
>> +       if (!np)
>> +               return ERR_PTR(-ENODEV);
>> +
>> +       if (!dev || !dev->of_node)
>> +               return ERR_PTR(-EINVAL);
>> +
>> +       sensor_np = dev->of_node;
>> +
>> +       for_each_child_of_node(np, child) {
>> +               struct of_phandle_args sensor_specs;
>> +               int ret;
>> +
>> +               /* For now, thermal framework supports only 1 sensor per zone */
>> +               ret = of_parse_phandle_with_args(child, "sensors",
>> +                                                "#sensor-cells",
>> +                                                0, &sensor_specs);
>> +               if (ret)
>> +                       continue;
>> +
>> +               if (sensor_specs.args_count < 1)
>> +                       continue;
> 
> Why? I fail to see why a single sensor *must* have some configuration cells.

I guess by now you know my answer :-). Idea was to have all of them
uniform, but if you have an existing example that have this property
optional, then I can consider it.

> 
>> +
>> +               if (sensor_specs.np == sensor_np &&
>> +                   sensor_specs.args[0] == sensor_id) {
>> +                       of_node_put(np);
>> +                       return thermal_zone_of_add_sensor(child, sensor_np,
>> +                                                         data,
>> +                                                         get_temp,
>> +                                                         get_trend);
>> +               }
>> +       }
>> +       of_node_put(np);
>> +
>> +       return ERR_PTR(-ENODEV);
>> +}
>> +EXPORT_SYMBOL_GPL(thermal_zone_of_sensor_register);
> 
> [...]
> 
>> +static int thermal_of_populate_bind_params(struct device_node *np,
>> +                                          struct __thermal_bind_params *__tbp,
>> +                                          struct __thermal_trip *trips,
>> +                                          int ntrips)
>> +{
>> +       struct of_phandle_args cooling_spec;
>> +       struct device_node *trip;
>> +       int ret, i;
>> +       u32 prop;
>> +
>> +       /* Default weight. Usage is optional */
>> +       __tbp->usage = 0;
>> +       ret = of_property_read_u32(np, "usage", &prop);
> 
> That wasn't described in the binding. Should this be reading the
> "contribution" property?

Yes, this is my bad, it is supposed to be 'contribution'.

> 
>> +       if (ret == 0)
>> +               __tbp->usage = prop;
>> +
>> +       trip = of_parse_phandle(np, "trip", 0);
>> +       if (!trip) {
>> +               pr_err("missing trip property\n");
>> +               return -ENODEV;
>> +       }
>> +
>> +       /* match using device_node */
>> +       for (i = 0; i < ntrips; i++)
>> +               if (trip == trips[i].np) {
>> +                       __tbp->trip_id = i;
>> +                       break;
>> +               }
>> +
>> +       if (i == ntrips) {
>> +               ret = -ENODEV;
>> +               goto end;
>> +       }
>> +
>> +       ret = of_parse_phandle_with_args(np, "cooling-device", "#cooling-cells",
>> +                                        0, &cooling_spec);
>> +       if (ret < 0) {
>> +               pr_err("missing cooling_device property\n");
>> +               goto end;
>> +       }
>> +       __tbp->cooling_device = cooling_spec.np;
>> +       if (cooling_spec.args_count >= 2) { /* at least min and max */
>> +               __tbp->min = cooling_spec.args[0];
>> +               __tbp->max = cooling_spec.args[1];
> 
> Ah, so the first two cells are meant to be min and max, not any
> arbitrary cells. Why is this necessary?

Same example I answered above, one cooling device that has 10 levels of
cooling can be used at trip point at 100C only from 6th to 10th cooling
level. And on different trip point, from 4th to 5th, and so on so forth...

> 
>> +       } else {
>> +               pr_err("wrong reference to cooling device, missing limits\n");
>> +       }
>> +
>> +end:
>> +       of_node_put(trip);
>> +
>> +       return ret;
>> +}
>> +
>> +/**
>> + * thermal_of_populate_trip - parse and fill one trip point data
>> + * @np: DT node containing a trip point node
>> + * @trip: trip point data structure to be filled up
>> + *
>> + * This function parses a trip point type of node represented by
>> + * @np parameter and fills the read data into @trip data structure.
>> + *
>> + * Return: 0 on success, proper error code otherwise
>> + */
>> +static int thermal_of_populate_trip(struct device_node *np,
>> +                                   struct __thermal_trip *trip)
>> +{
>> +       int prop;
>> +       int ret;
>> +
>> +       ret = of_property_read_u32(np, "temperature", &prop);
>> +       if (ret < 0) {
>> +               pr_err("missing temperature property\n");
>> +               return ret;
>> +       }
>> +       trip->temperature = prop;
>> +
>> +       ret = of_property_read_u32(np, "hysteresis", &prop);
>> +       if (ret < 0) {
>> +               pr_err("missing hysteresis property\n");
>> +               return ret;
>> +       }
>> +       trip->hysteresis = prop;
>> +
>> +       ret = of_property_read_u32(np, "type", &prop);
>> +       if (ret < 0) {
>> +               pr_err("missing type property\n");
>> +               return ret;
>> +       }
>> +       trip->type = prop;
> 
> No sanity checking?
> 

Yes, I can added it.

> I'd prefer a string and a table from string to Linux internal ID. Others
> may have differing opinions.

I just fail to see any advantage of using string constants, as I already
mentioned.

> 
>> +
>> +       /* Required for cooling attachment matching */
>> +       trip->np = np;
>> +
>> +       return 0;
>> +}
> 
> [...]
> 
>> +
>> +/**
>> + * thermal_of_build_thermal_zone - parse and fill one thermal zone data
>> + * @np: DT node containing a thermal zone node
>> + *
>> + * This function parses a thermal zone type of node represented by
>> + * @np parameter and fills the read data into a __thermal_zone data structure
>> + * and return this pointer.
>> + *
>> + * Return: On success returns a valid struct __thermal_zone,
>> + * otherwise, it returns a corresponding ERR_PTR(). Caller must
>> + * check the return value with help of IS_ERR() helper.
>> + */
>> +static struct __thermal_zone *
>> +thermal_of_build_thermal_zone(struct device_node *np)
>> +{
>> +       struct device_node *child, *gchild;
>> +       struct __thermal_zone *tz;
>> +       int ret, i;
>> +       u32 prop;
>> +
>> +       if (!np) {
>> +               pr_err("no thermal zone np\n");
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>> +       tz = kzalloc(sizeof(*tz), GFP_KERNEL);
>> +       if (!tz)
>> +               return ERR_PTR(-ENOMEM);
>> +
>> +       ret = of_property_read_u32(np, "passive-delay", &prop);
>> +       if (ret < 0) {
>> +               pr_err("missing passive_delay property\n");
> 
> Inconsistent '-' and '_' between the parsing and the error.

Right! I will fix the message errors.

> 
>> +               return ERR_PTR(ret);
>> +       }
>> +       tz->passive_delay = prop;
>> +
>> +       ret = of_property_read_u32(np, "polling-delay", &prop);
>> +       if (ret < 0) {
>> +               pr_err("missing polling_delay property\n");
> 
> Same here.


ok.

> 
>> +               return ERR_PTR(ret);
>> +       }
>> +       tz->polling_delay = prop;
>> +
>> +       /* trips */
>> +       child = of_get_child_by_name(np, "trips");
>> +
>> +       /* No trips provided */
>> +       if (!child)
>> +               goto finish;
>> +
>> +       tz->ntrips = of_get_child_count(child);
> 
> What if there are no children, or this fails (returning zero)?

OK. I can add the check for zero children.

> 
>> +       tz->trips = kzalloc(tz->ntrips * sizeof(*tz->trips), GFP_KERNEL);
> 
> Here kzalloc could return ZERO_SIZE_PTR ((void*) 16). So the check below
> isn't sufficient to stop us continuing if the node has no children. We
> should check tz->ntrips above before calling kzalloc.

got it. adding the check for zero children above.

> 
>> +       if (!tz->trips)
>> +               return ERR_PTR(-ENOMEM);
>> +       i = 0;
>> +       for_each_child_of_node(child, gchild)
>> +               thermal_of_populate_trip(gchild, &tz->trips[i++]);
> 
> What if this fails for a child node?

I will add a check to avoid continuing. There is no sense to me to
continue with a child that is wrongly described.

> 
>> +
>> +       of_node_put(child);
>> +
>> +       /* cooling-attachments */
>> +       child = of_get_child_by_name(np, "cooling-attachments");
>> +
>> +       /* cooling-attachments provided */
>> +       if (!child)
>> +               goto finish;
>> +
>> +       tz->num_tbps = of_get_child_count(child);
>> +       tz->tbps = kzalloc(tz->num_tbps * sizeof(*tz->tbps), GFP_KERNEL);
>> +       if (!tz->tbps)
>> +               return ERR_PTR(-ENOMEM);
>> +       i = 0;
>> +       for_each_child_of_node(child, gchild)
>> +               thermal_of_populate_bind_params(gchild, &tz->tbps[i++],
>> +                                               tz->trips, tz->ntrips);
>> +
>> +finish:
>> +       tz->mode = THERMAL_DEVICE_DISABLED;
>> +
>> +       return tz;
>> +}
> 
> What about all that useless data we may have just allocated memory for?

Which useless data? The allocated data is used to hold the parsed data
which in turn is used to make the thermal zone device work properly.
They are freed when the thermal framework is unloaded, in which each
thermal zone device is unregistered and all related data is freed, see
the end of the file.

> 
> [...]
> 
>> +int __init of_parse_thermal_zones(void)
>> +{
>> +       struct device_node *np, *child;
>> +       struct __thermal_zone *tz;
>> +       struct thermal_zone_device_ops *ops;
>> +
>> +       np = of_find_node_by_name(NULL, "thermal-zones");
>> +       if (!np) {
>> +               pr_debug("unable to find thermal zones\n");
>> +               return 0; /* Run successfully on systems without thermal DT */
>> +       }
>> +
>> +       for_each_child_of_node(np, child) {
>> +               struct thermal_zone_device *zone;
>> +               struct thermal_zone_params *tzp;
> 
> So each child of thermal-zones must be a thermal zone (we can't embed
> other nodes of information)?

Yes, each child is a thermal zone. Well, I don't see which other
information you would embed in it.

> 
>> +
>> +               tz = thermal_of_build_thermal_zone(child);
>> +               if (IS_ERR(tz)) {
>> +                       pr_err("failed to build thermal zone %s: %ld\n",
>> +                              child->name,
>> +                              PTR_ERR(tz));
>> +                       continue;
>> +               }
>> +
>> +               ops = kmemdup(&of_thermal_ops, sizeof(*ops), GFP_KERNEL);
>> +               if (!ops)
>> +                       goto exit_free;
>> +
>> +               tzp = kzalloc(sizeof(*tzp), GFP_KERNEL);
>> +               if (!tzp) {
>> +                       kfree(ops);
>> +                       goto exit_free;
>> +               }
>> +
>> +               /* No hwmon because there might be hwmon drivers registering */
>> +               tzp->no_hwmon = true;
>> +
>> +               zone = thermal_zone_device_register(child->name, tz->ntrips,
>> +                                                   0, tz,
>> +                                                   ops, tzp,
>> +                                                   tz->passive_delay,
>> +                                                   tz->polling_delay);
>> +               if (IS_ERR(zone)) {
>> +                       pr_err("Failed to build %s zone %ld\n", child->name,
>> +                              PTR_ERR(zone));
>> +                       kfree(tzp);
>> +                       kfree(ops);
>> +                       of_thermal_free_zone(tz);
>> +                       /* attempting to build remaining zones still */
>> +               }
>> +       }
>> +
>> +       return 0;
>> +
>> +exit_free:
>> +       of_thermal_free_zone(tz);
>> +
>> +       /* no memory available, so free what we have built */
>> +       of_thermal_destroy_zones();
>> +
>> +       return -ENOMEM;
>> +}
> 
> Cheers,
> Mark.


All best,

Eduardo
> 
>
hongbo.zhang@freescale.com Sept. 24, 2013, 8:11 a.m. UTC | #3
On 09/23/2013 06:40 PM, Mark Rutland wrote:
> Hi Eduardo,
>
> Apologies for having taken so long to get back you on this.
>
> I have several comments on the binding and the way it's parsed.
>
> On Wed, Sep 18, 2013 at 10:35:36PM +0100, Eduardo Valentin wrote:
>> This patch introduces a device tree bindings for
>> describing the hardware thermal behavior and limits.
>> Also a parser to read and interpret the data and feed
>> it in the thermal framework is presented.
>>
>> This patch introduces a thermal data parser for device
>> tree. The parsed data is used to build thermal zones
>> and thermal binding parameters. The output data
>> can then be used to deploy thermal policies.
>>
>> This patch adds also documentation regarding this
>> API and how to define tree nodes to use
>> this infrastructure.
>>
>> Note that, in order to be able to have control
>> on the sensor registration on the DT thermal zone,
>> it was required to allow changing the thermal zone
>> .get_temp callback. For this reason, this patch
>> also removes the 'const' modifier from the .ops
>> field of thermal zone devices.
>>
>> Cc: Zhang Rui <rui.zhang@intel.com>
>> Cc: linux-pm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
>> ---
>>   .../devicetree/bindings/thermal/thermal.txt        | 498 +++++++++++++
>>   drivers/thermal/Kconfig                            |  13 +
>>   drivers/thermal/Makefile                           |   1 +
>>   drivers/thermal/of-thermal.c                       | 775 +++++++++++++++++++++
>>   drivers/thermal/thermal_core.c                     |   9 +-
>>   drivers/thermal/thermal_core.h                     |   9 +
>>   include/dt-bindings/thermal/thermal.h              |  27 +
>>   include/linux/thermal.h                            |  28 +-
>>   8 files changed, 1357 insertions(+), 3 deletions(-)
>>   create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
>>   create mode 100644 drivers/thermal/of-thermal.c
>>   create mode 100644 include/dt-bindings/thermal/thermal.h
>>
>> ---
>>
>> Hello all,
>>
>> Thanks Joe Perches for the effort of reviewing this code, I really appreciate it.
>>
>> Here is a version with a refactored init function without leaks in the failure
>> patch. I also added a couple of comments to better understand the intention of
>> that function. Besides, when there are no memory available, the function
>> rolls back what ever thermal zones have been added.
>>
>> Thanks,
>>
>> Eduardo
>>
>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
>> new file mode 100644
>> index 0000000..6664533
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>> @@ -0,0 +1,498 @@
>> +* Thermal Framework Device Tree descriptor
>> +
>> +Generic binding to provide a way of defining hardware thermal
>> +structure using device tree. A thermal structure includes thermal
>> +zones and their components, such as trip points, polling intervals,
>> +sensors and cooling devices binding descriptors.
>> +
>> +The target of device tree thermal descriptors is to describe only
>> +the hardware thermal aspects, not how the system must control or which
>> +algorithm or policy must be taken in place.
>> +
>> +There are five types of nodes involved to describe thermal bindings:
>> +- sensors: used to describe the device source of temperature sensing;
>> +- cooling devices: used to describe devices source of power dissipation control;
>> +- trip points: used to describe points in temperature domain defined to
>> +make the system aware of hardware limits;
>> +- cooling attachments: used to describe links between trip points and
>> +cooling devices;
> I think "attachments" sounds a bit odd, though I don't have a better
> naming suggestion.

"map" or "mapping" is better?

>> +- thermal zones: used to describe thermal data within the hardware;
>> +
>> +It follows a description of each type of these device tree nodes.
>> +
>> +* Sensor devices
>> +
>> +Sensor devices are nodes providing temperature sensing capabilities on thermal
>> +zones. Typical devices are I2C ADC converters and bandgaps. Theses are nodes
>> +providing temperature data to thermal zones. Temperature sensor devices may
>> +control one or more internal sensors.
>> +
>> +Required property:
>> +- #sensor-cells:       Used to provide sensor device specific information
>> +                       while referring to it. Must be at least 1, in order
>> +                       to identify uniquely the sensor instances within
>> +                       the IC. See thermal zone binding for more details
>> +                       on how consumers refer to sensor devices.
> I don't see why this needs to be at least one cell -- if an IC only has
> one temperature sensor it should be fine for this to be zero and have no
> ambiguity.
>
> It may make sense to call these thermal sensor devices, there are plenty
> of other sensors we might want to describe in the dt for other purposes,
> and we can impose some consistency if we remove the ambiguity.
>
>> +
>> +* Cooling device nodes
>> +
>> +Cooling devices are nodes providing control on power dissipation. There
>> +are essentially two ways to provide control on power dissipation. First
>> +is by means of regulating device performance, which is known as passive
>> +cooling. Second is by means of activating devices in order to remove
>> +the dissipated heat, which is known as active cooling, e.g. regulating
>> +fan speeds. In both cases, cooling devices shall have a way to determine
>> +the level of cooling.
>> +
>> +Required property:
>> +- cooling-min-level:   A unsigned integer indicating the smallest
>> +                       cooling level accepted. Typically 0.
>> +- cooling-max-level:   An unsigned integer indicating the largest
>> +                       cooling level accepted.
> I'm not sure what a "cooling level" means. It seems a bit abstract. Is
> this binding specific?

It means cooling ability of a cooling device, such as speed of a fan.
But it is better to use cooling-min/max-state I think, because the 
thermal management code is using "cooling state", concepts in dt and c 
should be identical.

> How does this relate to cooling-cells? These seem to be a fixed size.

I also think cooling-cells is redundant.

>> +- #cooling-cells:      Used to provide cooling device specific information
>> +                       while referring to it. Must be at least 2, in order
>> +                       to specify minimum and maximum cooling level used
>> +                       in the reference. See Cooling device attachments section
>> +                       below for more details on how consumers refer to
>> +                       cooling devices.
> Are these cooling cells always expected to cover min and max, and are
> min and max always expected to take the same number of cells? The above
> cooling-*-level properties imply they each take up one cell.
>
> Does #cooling-cells = <3> make sense?. If this covers additional
> information (e.g. which fan on a multi-fan controller), how does the OS
> determine which cells are min and max, and how do these relate to
> cooling-level-min and cooling-level-max?
>
>> +
>> +* Trip points
>> +
>> +The trip node is a node to describe a point in the temperature domain
>> +in which the system takes an action. This node describes just the point,
>> +not the action.
> I'm still not sure on this, as it sounds like embedding policy into the
> DT, rather than a description of the hardware. I realise we need to
> prevent devices burning out, so describing the max acceptable
> temperature and possibly a preferred range makes sense, but do we need
> this level of flexibility? Maybe we do.
>
Putting platform data into dt is acceptable, right?
In a narrow sense, trip points are platform data, but in a broad sense, 
both trip point itself and its corresponding cooling device are all 
platform data.

>> +
>> +Required properties:
>> +- temperature:         the trip temperature level, in milliCelsius.
> Preferably, s/milliCelsius/millicelsius/ over the document.
>
> Given that we have signed values elsewhere, is this signed or unsigned?
>
>> +- hysteresis:          a (low) hysteresis value on 'temperature'. This is a
>> +                       relative value, in milliCelsius.
>> +- type:                        the trip type. Here is the type mapping:
>> +       THERMAL_TRIP_ACTIVE     0:      A trip point to enable active cooling
>> +       THERMAL_TRIP_PASSIVE    1:      A trip point to enable passive cooling
>> +       THERMAL_TRIP_HOT        2:      A trip point to notify emergency
>> +       THERMAL_TRIP_CRITICAL   3:      Hardware not reliable.
>> +
>> +Refer to include/dt-bindings/thermal/thermal.h for definition of these consts.
> Why not use a string for this? We do so for other type parameters in
> other bindings (see dr_mode in
> Documentation/devicetree/bindings/usb/generic.txt, phy-mode for ethernet
> PHYs, etc).
>
>> +
>> +* Cooling device attachments
>> +
>> +The cooling device attachments node is a node to describe how cooling devices
>> +get assigned to trip points of the zone. The cooling devices are expected
>> +to be loaded in the target system.
>> +
>> +Required properties:
>> +- cooling-device:      A phandle of a cooling device with its parameters,
>> +                       referring to which cooling device is used in this
>> +                       binding. The required parameters are: the minimum
>> +                       cooling level and the maximum cooling level used
>> +                       in this attach.
> Might this be a list in general? The code doesn't have to support that
> yet.
>
> It would be nice to have a name for the cells after a phandle which
> describe the cooling device's configuration. You've called them
> parameters here, but it probably makes sense to call them a
> cooling-specifier (following clock-specifier and interrupt-specifier).
>
>> +- trip:                        A phandle of a trip point node within the same thermal
>> +                       zone.
>> +
>> +Optional property:
>> +- contribution:                The cooling contribution to the thermal zone of the
>> +                       referred cooling device at the referred trip point.
>> +                       The contribution is a value from 0 to 100. The sum
>> +                       of all cooling contributions within a thermal zone
>> +                       must never exceed 100.
> This is a bit arbitrary. Couldn't we sum all contributions to find the
> total automatically, so this could be a simpler ratio?
>
>> +
>> +Note: Using the THERMAL_NO_LIMIT (-1L) constant in the cooling-device phandle
>> +limit parameters means:
>> +(i)   - minimum level allowed for minimum cooling level used in the reference.
>> +(ii)  - maximum level allowed for maximum cooling level used in the reference.
>> +Refer to include/dt-bindings/thermal/thermal.h for definition of this constant.
>> +
>> +* Thermal zones
>> +
>> +The thermal-zone node is the node containing all the required info
>> +for describing a thermal zone, including its cdev bindings. The thermal_zone
>> +node must contain, apart from its own properties, one node containing
>> +trip nodes and one node containing all the zone cooling attachments.
> s/cdev/cooling device/ ?
>
>> +
>> +Required properties:
>> +- passive-delay:       The maximum number of milliseconds to wait between polls
>> +                       when performing passive cooling.
>> +- polling-delay:       The maximum number of milliseconds to wait between polls
>> +                       when checking this thermal zone.
> How about polling-delay-passive, polling-delay-active? I'm still not
>
>> +- sensors:             A list of sensor phandles and their parameters. The
>> +                       required parameter is the sensor id, in order to
>> +                       identify internal sensors when the sensor IC features
>> +                       several sensing units.
> As mentioned above, I'm not sure you even need that one cell.
>
> - sensors:                A list of sensor phandle + thermal-sensor-specifier
>                            cells describing the sensors monitoring the thermal
> 			  zone.
>
>> +- trips:               A sub-node containing several trip point nodes required
>> +                       to describe the thermal zone.
>> +- cooling-attachments  A sub-node containing several cooling device attaches
>> +                       nodes, used to describe the relation between trips
>> +                       and cooling devices.
>> +
>> +Optional property:
>> +- coefficients:                An array of integers (one signed cell) containing
>> +                       coefficients to compose a linear relation between
>> +                       the sensors described in the sensors property.
>> +                       Coefficients defaults to 1, in case this property
>> +                       is not specified. A simple linear polynomial is used:
>> +                       Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1) + cn.
>> +
>> +                       The coefficients are ordered and they match with sensors
>> +                       by means of sensor ID. Additional coefficients are
>> +                       interpreted as constant offsets.
> What is the end result of this? Presumably this is meant to result in an
> estimate of the average temperature of the thermal zone, rather than an
> arbitrary value? This should be mentioned.
>
> This doesn't seem to be used the in the code below...
>
>> +
>> +Note: The delay properties are bound to the maximum dT/dt (temperature
>> +derivative over time) in two situations for a thermal zone:
>> +(i)  - when active cooling is activated (passive-delay); and
>> +(ii) - when the zone just needs to be monitored (polling-delay).
>> +The maximum dT/dt is highly bound to hardware power consumption and dissipation
>> +capability.
> I'm not sure what you mean by this, could you elaborate?
>
> I guess you mean that the delays should be chosen to account for said
> max dT/dt, such that a device can't unexpectedly cross several trip
> boundaries between polls?
>
>> +
>> +* Examples
>> +
>> +Below are several examples on how to use thermal data descriptors
>> +using device tree bindings:
>> +
>> +(a) - CPU thermal zone
>> +
>> +The CPU thermal zone example below describes how to setup one thermal zone
>> +using one single sensor as temperature source and many cooling devices and
>> +power dissipation control sources.
>> +
>> +#include <dt-bindings/thermal/thermal.h>
>> +
>> +cpus {
>> +       cpu0: cpu@0 {
>> +               ...
>> +               cooling-min-level = <0>;
>> +               cooling-max-level = <3>;
>> +               #cooling-cells = <2>; /* min followed by max */
>> +       };
> What do those min and max mean in this context? What do the values in
> the range [0,3] correspond to?
>
> I'm not sure it makes sense to describe passive cooling in this way --
> the precise way an OS does less work on a device is fundamentally tied
> to the design of the OS (it might be able to rate-limit requests, it
> might be able to clock the device down, it might only be able to turn
> the device off).
>
> I'm not sure there is anything we can describe for passive cooling
> (beyond the thermal limits the OS should attempt to stick to).
>
>> +       ...
>> +};
>> +
>> +&i2c1 {
>> +       ...
>> +       fan0: fan@0x48 {
>> +               ...
>> +               cooling-min-level = <0>;
>> +               cooling-max-level = <9>;
>> +               #cooling-cells = <2>; /* min followed by max */
>> +       };
> What do min and max mean here for the fan?
>
It means the speed level of a fan. Even if a fan can run at a continuous 
speed range, but when it is controlled under thermal management 
framework, it can only run at discrete speeds.
>> +};
>> +
>> +bandgap0: bandgap@0x0000ED00 {
>> +       ...
>> +       #sensor-cells = <1>;
>> +};
>> +
>> +cpu-thermal: cpu-thermal {
> How do the thermal zones get probed?
>
>> +       passive-delay = <250>; /* milliseconds */
>> +       polling-delay = <1000>; /* milliseconds */
>> +
>> +               /* sensor       ID */
>> +        sensors = <&bandgap0     0>;
>> +
>> +        trips {
>> +                cpu-alert0: cpu-alert {
>> +                        temperature = <90000>; /* milliCelsius */
>> +                        hysteresis = <2000>; /* milliCelsius */
>> +                        type = <THERMAL_TRIP_ACTIVE>;
>> +                };
>> +                cpu-alert1: cpu-alert {
>> +                        temperature = <100000>; /* milliCelsius */
>> +                        hysteresis = <2000>; /* milliCelsius */
>> +                        type = <THERMAL_TRIP_PASSIVE>;
>> +                };
>> +                cpu-crit: cpu-crit {
>> +                        temperature = <125000>; /* milliCelsius */
>> +                        hysteresis = <2000>; /* milliCelsius */
>> +                        type = <THERMAL_TRIP_CRITICAL>;
>> +                };
>> +        };
>> +
>> +       cooling-attachments {
>> +               attach0 {
>> +                       trip = <&cpu-alert0>;
>> +                       cooling-device = <&fan0 THERMAL_NO_LIMITS 4>;
>> +               };
>> +               attach1 {
>> +                       trip = <&cpu-alert1>;
>> +                       cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
>> +               };
>> +               attach2 {
>> +                       trip = <&cpu-alert1>;
>> +                       cooling-device =
>> +                               <&cpu0 THERMAL_NO_LIMITS THERMAL_NO_LIMITS>;
>> +               };
>> +       };
> Was there a good reason for splitting trips and attachment?

This is for adding/removing cooling device dynamically.

>> +};
>> +
>> +In the example above, the ADC sensor at address 0x0000ED00 is used to monitor
>> +the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan device controlled
>> +via I2C bus 1, at adress 0x48, is used to remove the heat out of the thermal
>> +zone 'cpu-thermal' using its cooling levels from its minimum to 4, when it
>> +reaches trip point 'cpu-alert0' at 90C, as an example of active cooling. The
>> +same cooling device is used at 'cpu-alert1', but from 5 to its maximum level.
>> +The cpu@0 device is also linked to the same thermal zone, 'cpu-thermal', as a
>> +passive cooling device, using all its cooling levels at trip point 'cpu-alert1',
>> +which is a trip point at 100C.
>> +
>



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Sept. 24, 2013, 3:50 p.m. UTC | #4
Zhang,

I appreciate your interest. I am replying a couple of your comments.
Please also check my answers to Mark's queries on other email thread.

On 24-09-2013 04:11, Hongbo Zhang wrote:
> On 09/23/2013 06:40 PM, Mark Rutland wrote:
>> Hi Eduardo,
>>
>> Apologies for having taken so long to get back you on this.
>>
>> I have several comments on the binding and the way it's parsed.
>>
>> On Wed, Sep 18, 2013 at 10:35:36PM +0100, Eduardo Valentin wrote:
>>> This patch introduces a device tree bindings for
>>> describing the hardware thermal behavior and limits.
>>> Also a parser to read and interpret the data and feed
>>> it in the thermal framework is presented.
>>>
>>> This patch introduces a thermal data parser for device
>>> tree. The parsed data is used to build thermal zones
>>> and thermal binding parameters. The output data
>>> can then be used to deploy thermal policies.
>>>
>>> This patch adds also documentation regarding this
>>> API and how to define tree nodes to use
>>> this infrastructure.
>>>
>>> Note that, in order to be able to have control
>>> on the sensor registration on the DT thermal zone,
>>> it was required to allow changing the thermal zone
>>> .get_temp callback. For this reason, this patch
>>> also removes the 'const' modifier from the .ops
>>> field of thermal zone devices.
>>>
>>> Cc: Zhang Rui <rui.zhang@intel.com>
>>> Cc: linux-pm@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
>>> ---
>>>   .../devicetree/bindings/thermal/thermal.txt        | 498 +++++++++++++
>>>   drivers/thermal/Kconfig                            |  13 +
>>>   drivers/thermal/Makefile                           |   1 +
>>>   drivers/thermal/of-thermal.c                       | 775
>>> +++++++++++++++++++++
>>>   drivers/thermal/thermal_core.c                     |   9 +-
>>>   drivers/thermal/thermal_core.h                     |   9 +
>>>   include/dt-bindings/thermal/thermal.h              |  27 +
>>>   include/linux/thermal.h                            |  28 +-
>>>   8 files changed, 1357 insertions(+), 3 deletions(-)
>>>   create mode 100644
>>> Documentation/devicetree/bindings/thermal/thermal.txt
>>>   create mode 100644 drivers/thermal/of-thermal.c
>>>   create mode 100644 include/dt-bindings/thermal/thermal.h
>>>
>>> ---
>>>
>>> Hello all,
>>>
>>> Thanks Joe Perches for the effort of reviewing this code, I really
>>> appreciate it.
>>>
>>> Here is a version with a refactored init function without leaks in
>>> the failure
>>> patch. I also added a couple of comments to better understand the
>>> intention of
>>> that function. Besides, when there are no memory available, the function
>>> rolls back what ever thermal zones have been added.
>>>
>>> Thanks,
>>>
>>> Eduardo
>>>
>>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt
>>> b/Documentation/devicetree/bindings/thermal/thermal.txt
>>> new file mode 100644
>>> index 0000000..6664533
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>>> @@ -0,0 +1,498 @@
>>> +* Thermal Framework Device Tree descriptor
>>> +
>>> +Generic binding to provide a way of defining hardware thermal
>>> +structure using device tree. A thermal structure includes thermal
>>> +zones and their components, such as trip points, polling intervals,
>>> +sensors and cooling devices binding descriptors.
>>> +
>>> +The target of device tree thermal descriptors is to describe only
>>> +the hardware thermal aspects, not how the system must control or which
>>> +algorithm or policy must be taken in place.
>>> +
>>> +There are five types of nodes involved to describe thermal bindings:
>>> +- sensors: used to describe the device source of temperature sensing;
>>> +- cooling devices: used to describe devices source of power
>>> dissipation control;
>>> +- trip points: used to describe points in temperature domain defined to
>>> +make the system aware of hardware limits;
>>> +- cooling attachments: used to describe links between trip points and
>>> +cooling devices;
>> I think "attachments" sounds a bit odd, though I don't have a better
>> naming suggestion.
> 
> "map" or "mapping" is better?

Well,  yeah, that could work.

> 
>>> +- thermal zones: used to describe thermal data within the hardware;
>>> +
>>> +It follows a description of each type of these device tree nodes.
>>> +
>>> +* Sensor devices
>>> +
>>> +Sensor devices are nodes providing temperature sensing capabilities
>>> on thermal
>>> +zones. Typical devices are I2C ADC converters and bandgaps. Theses
>>> are nodes
>>> +providing temperature data to thermal zones. Temperature sensor
>>> devices may
>>> +control one or more internal sensors.
>>> +
>>> +Required property:
>>> +- #sensor-cells:       Used to provide sensor device specific
>>> information
>>> +                       while referring to it. Must be at least 1, in
>>> order
>>> +                       to identify uniquely the sensor instances within
>>> +                       the IC. See thermal zone binding for more
>>> details
>>> +                       on how consumers refer to sensor devices.
>> I don't see why this needs to be at least one cell -- if an IC only has
>> one temperature sensor it should be fine for this to be zero and have no
>> ambiguity.
>>
>> It may make sense to call these thermal sensor devices, there are plenty
>> of other sensors we might want to describe in the dt for other purposes,
>> and we can impose some consistency if we remove the ambiguity.
>>
>>> +
>>> +* Cooling device nodes
>>> +
>>> +Cooling devices are nodes providing control on power dissipation. There
>>> +are essentially two ways to provide control on power dissipation. First
>>> +is by means of regulating device performance, which is known as passive
>>> +cooling. Second is by means of activating devices in order to remove
>>> +the dissipated heat, which is known as active cooling, e.g. regulating
>>> +fan speeds. In both cases, cooling devices shall have a way to
>>> determine
>>> +the level of cooling.
>>> +
>>> +Required property:
>>> +- cooling-min-level:   A unsigned integer indicating the smallest
>>> +                       cooling level accepted. Typically 0.
>>> +- cooling-max-level:   An unsigned integer indicating the largest
>>> +                       cooling level accepted.
>> I'm not sure what a "cooling level" means. It seems a bit abstract. Is
>> this binding specific?
> 
> It means cooling ability of a cooling device, such as speed of a fan.
> But it is better to use cooling-min/max-state I think, because the
> thermal management code is using "cooling state", concepts in dt and c
> should be identical.

I think the idea is just to be clear. Just because the thermal
management implementation uses a terminology, it does not necessarily
mean it is best fit for describing in device tree bindings. But I dont
really have strong opinion on level or state at this point.

> 
>> How does this relate to cooling-cells? These seem to be a fixed size.
> 
> I also think cooling-cells is redundant.

Cooling cells is not redundant, please check my explanation on the other
thread. It is required to determine which level/states are used while
binding.

> 
>>> +- #cooling-cells:      Used to provide cooling device specific
>>> information
>>> +                       while referring to it. Must be at least 2, in
>>> order
>>> +                       to specify minimum and maximum cooling level
>>> used
>>> +                       in the reference. See Cooling device
>>> attachments section
>>> +                       below for more details on how consumers refer to
>>> +                       cooling devices.
>> Are these cooling cells always expected to cover min and max, and are
>> min and max always expected to take the same number of cells? The above
>> cooling-*-level properties imply they each take up one cell.
>>
>> Does #cooling-cells = <3> make sense?. If this covers additional
>> information (e.g. which fan on a multi-fan controller), how does the OS
>> determine which cells are min and max, and how do these relate to
>> cooling-level-min and cooling-level-max?
>>
>>> +
>>> +* Trip points
>>> +
>>> +The trip node is a node to describe a point in the temperature domain
>>> +in which the system takes an action. This node describes just the
>>> point,
>>> +not the action.
>> I'm still not sure on this, as it sounds like embedding policy into the
>> DT, rather than a description of the hardware. I realise we need to
>> prevent devices burning out, so describing the max acceptable
>> temperature and possibly a preferred range makes sense, but do we need
>> this level of flexibility? Maybe we do.
>>
> Putting platform data into dt is acceptable, right?
> In a narrow sense, trip points are platform data, but in a broad sense,
> both trip point itself and its corresponding cooling device are all
> platform data.
> 

I think Mark needs to elaborate a bit more to clarify why he sees this
as embedding policy into DT. And I believe it is fine to pass platform
data, as long as it represents hardware and it is  not representing policy.

>>> +
>>> +Required properties:
>>> +- temperature:         the trip temperature level, in milliCelsius.
>> Preferably, s/milliCelsius/millicelsius/ over the document.
>>
>> Given that we have signed values elsewhere, is this signed or unsigned?
>>
>>> +- hysteresis:          a (low) hysteresis value on 'temperature'.
>>> This is a
>>> +                       relative value, in milliCelsius.
>>> +- type:                        the trip type. Here is the type mapping:
>>> +       THERMAL_TRIP_ACTIVE     0:      A trip point to enable active
>>> cooling
>>> +       THERMAL_TRIP_PASSIVE    1:      A trip point to enable
>>> passive cooling
>>> +       THERMAL_TRIP_HOT        2:      A trip point to notify emergency
>>> +       THERMAL_TRIP_CRITICAL   3:      Hardware not reliable.
>>> +
>>> +Refer to include/dt-bindings/thermal/thermal.h for definition of
>>> these consts.
>> Why not use a string for this? We do so for other type parameters in
>> other bindings (see dr_mode in
>> Documentation/devicetree/bindings/usb/generic.txt, phy-mode for ethernet
>> PHYs, etc).
>>
>>> +
>>> +* Cooling device attachments
>>> +
>>> +The cooling device attachments node is a node to describe how
>>> cooling devices
>>> +get assigned to trip points of the zone. The cooling devices are
>>> expected
>>> +to be loaded in the target system.
>>> +
>>> +Required properties:
>>> +- cooling-device:      A phandle of a cooling device with its
>>> parameters,
>>> +                       referring to which cooling device is used in
>>> this
>>> +                       binding. The required parameters are: the
>>> minimum
>>> +                       cooling level and the maximum cooling level used
>>> +                       in this attach.
>> Might this be a list in general? The code doesn't have to support that
>> yet.
>>
>> It would be nice to have a name for the cells after a phandle which
>> describe the cooling device's configuration. You've called them
>> parameters here, but it probably makes sense to call them a
>> cooling-specifier (following clock-specifier and interrupt-specifier).
>>
>>> +- trip:                        A phandle of a trip point node within
>>> the same thermal
>>> +                       zone.
>>> +
>>> +Optional property:
>>> +- contribution:                The cooling contribution to the
>>> thermal zone of the
>>> +                       referred cooling device at the referred trip
>>> point.
>>> +                       The contribution is a value from 0 to 100.
>>> The sum
>>> +                       of all cooling contributions within a thermal
>>> zone
>>> +                       must never exceed 100.
>> This is a bit arbitrary. Couldn't we sum all contributions to find the
>> total automatically, so this could be a simpler ratio?
>>
>>> +
>>> +Note: Using the THERMAL_NO_LIMIT (-1L) constant in the
>>> cooling-device phandle
>>> +limit parameters means:
>>> +(i)   - minimum level allowed for minimum cooling level used in the
>>> reference.
>>> +(ii)  - maximum level allowed for maximum cooling level used in the
>>> reference.
>>> +Refer to include/dt-bindings/thermal/thermal.h for definition of
>>> this constant.
>>> +
>>> +* Thermal zones
>>> +
>>> +The thermal-zone node is the node containing all the required info
>>> +for describing a thermal zone, including its cdev bindings. The
>>> thermal_zone
>>> +node must contain, apart from its own properties, one node containing
>>> +trip nodes and one node containing all the zone cooling attachments.
>> s/cdev/cooling device/ ?
>>
>>> +
>>> +Required properties:
>>> +- passive-delay:       The maximum number of milliseconds to wait
>>> between polls
>>> +                       when performing passive cooling.
>>> +- polling-delay:       The maximum number of milliseconds to wait
>>> between polls
>>> +                       when checking this thermal zone.
>> How about polling-delay-passive, polling-delay-active? I'm still not
>>
>>> +- sensors:             A list of sensor phandles and their
>>> parameters. The
>>> +                       required parameter is the sensor id, in order to
>>> +                       identify internal sensors when the sensor IC
>>> features
>>> +                       several sensing units.
>> As mentioned above, I'm not sure you even need that one cell.
>>
>> - sensors:                A list of sensor phandle +
>> thermal-sensor-specifier
>>                            cells describing the sensors monitoring the
>> thermal
>>               zone.
>>
>>> +- trips:               A sub-node containing several trip point
>>> nodes required
>>> +                       to describe the thermal zone.
>>> +- cooling-attachments  A sub-node containing several cooling device
>>> attaches
>>> +                       nodes, used to describe the relation between
>>> trips
>>> +                       and cooling devices.
>>> +
>>> +Optional property:
>>> +- coefficients:                An array of integers (one signed
>>> cell) containing
>>> +                       coefficients to compose a linear relation
>>> between
>>> +                       the sensors described in the sensors property.
>>> +                       Coefficients defaults to 1, in case this
>>> property
>>> +                       is not specified. A simple linear polynomial
>>> is used:
>>> +                       Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1)
>>> + cn.
>>> +
>>> +                       The coefficients are ordered and they match
>>> with sensors
>>> +                       by means of sensor ID. Additional
>>> coefficients are
>>> +                       interpreted as constant offsets.
>> What is the end result of this? Presumably this is meant to result in an
>> estimate of the average temperature of the thermal zone, rather than an
>> arbitrary value? This should be mentioned.
>>
>> This doesn't seem to be used the in the code below...
>>
>>> +
>>> +Note: The delay properties are bound to the maximum dT/dt (temperature
>>> +derivative over time) in two situations for a thermal zone:
>>> +(i)  - when active cooling is activated (passive-delay); and
>>> +(ii) - when the zone just needs to be monitored (polling-delay).
>>> +The maximum dT/dt is highly bound to hardware power consumption and
>>> dissipation
>>> +capability.
>> I'm not sure what you mean by this, could you elaborate?
>>
>> I guess you mean that the delays should be chosen to account for said
>> max dT/dt, such that a device can't unexpectedly cross several trip
>> boundaries between polls?
>>
>>> +
>>> +* Examples
>>> +
>>> +Below are several examples on how to use thermal data descriptors
>>> +using device tree bindings:
>>> +
>>> +(a) - CPU thermal zone
>>> +
>>> +The CPU thermal zone example below describes how to setup one
>>> thermal zone
>>> +using one single sensor as temperature source and many cooling
>>> devices and
>>> +power dissipation control sources.
>>> +
>>> +#include <dt-bindings/thermal/thermal.h>
>>> +
>>> +cpus {
>>> +       cpu0: cpu@0 {
>>> +               ...
>>> +               cooling-min-level = <0>;
>>> +               cooling-max-level = <3>;
>>> +               #cooling-cells = <2>; /* min followed by max */
>>> +       };
>> What do those min and max mean in this context? What do the values in
>> the range [0,3] correspond to?
>>
>> I'm not sure it makes sense to describe passive cooling in this way --
>> the precise way an OS does less work on a device is fundamentally tied
>> to the design of the OS (it might be able to rate-limit requests, it
>> might be able to clock the device down, it might only be able to turn
>> the device off).
>>
>> I'm not sure there is anything we can describe for passive cooling
>> (beyond the thermal limits the OS should attempt to stick to).
>>
>>> +       ...
>>> +};
>>> +
>>> +&i2c1 {
>>> +       ...
>>> +       fan0: fan@0x48 {
>>> +               ...
>>> +               cooling-min-level = <0>;
>>> +               cooling-max-level = <9>;
>>> +               #cooling-cells = <2>; /* min followed by max */
>>> +       };
>> What do min and max mean here for the fan?
>>
> It means the speed level of a fan. Even if a fan can run at a continuous
> speed range, but when it is controlled under thermal management
> framework, it can only run at discrete speeds.


Please, an important point here, this is not about thermal framework
requirement. But how we describe hardware. As you mentioned, even if
there are fan devices out there that uses continuous speed range, there
other devices, even fan devices or cpu frequency steps for instance,
that are discrete. Again, *this is not a thermal framework requirement*.

>>> +};
>>> +
>>> +bandgap0: bandgap@0x0000ED00 {
>>> +       ...
>>> +       #sensor-cells = <1>;
>>> +};
>>> +
>>> +cpu-thermal: cpu-thermal {
>> How do the thermal zones get probed?
>>
>>> +       passive-delay = <250>; /* milliseconds */
>>> +       polling-delay = <1000>; /* milliseconds */
>>> +
>>> +               /* sensor       ID */
>>> +        sensors = <&bandgap0     0>;
>>> +
>>> +        trips {
>>> +                cpu-alert0: cpu-alert {
>>> +                        temperature = <90000>; /* milliCelsius */
>>> +                        hysteresis = <2000>; /* milliCelsius */
>>> +                        type = <THERMAL_TRIP_ACTIVE>;
>>> +                };
>>> +                cpu-alert1: cpu-alert {
>>> +                        temperature = <100000>; /* milliCelsius */
>>> +                        hysteresis = <2000>; /* milliCelsius */
>>> +                        type = <THERMAL_TRIP_PASSIVE>;
>>> +                };
>>> +                cpu-crit: cpu-crit {
>>> +                        temperature = <125000>; /* milliCelsius */
>>> +                        hysteresis = <2000>; /* milliCelsius */
>>> +                        type = <THERMAL_TRIP_CRITICAL>;
>>> +                };
>>> +        };
>>> +
>>> +       cooling-attachments {
>>> +               attach0 {
>>> +                       trip = <&cpu-alert0>;
>>> +                       cooling-device = <&fan0 THERMAL_NO_LIMITS 4>;
>>> +               };
>>> +               attach1 {
>>> +                       trip = <&cpu-alert1>;
>>> +                       cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
>>> +               };
>>> +               attach2 {
>>> +                       trip = <&cpu-alert1>;
>>> +                       cooling-device =
>>> +                               <&cpu0 THERMAL_NO_LIMITS
>>> THERMAL_NO_LIMITS>;
>>> +               };
>>> +       };
>> Was there a good reason for splitting trips and attachment?
> 
> This is for adding/removing cooling device dynamically.

Well not really. I don't think this would prevent us of representing it
inside the trip point. It is, as I replied to Mark, because I wanted to
keep the flexibility to allow describing better properties related to
the cooling device while associated to a trip point alone, e.g.
'contribution'.

> 
>>> +};
>>> +
>>> +In the example above, the ADC sensor at address 0x0000ED00 is used
>>> to monitor
>>> +the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan
>>> device controlled
>>> +via I2C bus 1, at adress 0x48, is used to remove the heat out of the
>>> thermal
>>> +zone 'cpu-thermal' using its cooling levels from its minimum to 4,
>>> when it
>>> +reaches trip point 'cpu-alert0' at 90C, as an example of active
>>> cooling. The
>>> +same cooling device is used at 'cpu-alert1', but from 5 to its
>>> maximum level.
>>> +The cpu@0 device is also linked to the same thermal zone,
>>> 'cpu-thermal', as a
>>> +passive cooling device, using all its cooling levels at trip point
>>> 'cpu-alert1',
>>> +which is a trip point at 100C.
>>> +
>>
> 
> 
> 
> 
>
Eduardo Valentin Sept. 24, 2013, 9:33 p.m. UTC | #5
On 23-09-2013 06:40, Mark Rutland wrote:
> 
> It would be nice to have a name for the cells after a phandle which
> describe the cooling device's configuration. You've called them
> parameters here, but it probably makes sense to call them a
> cooling-specifier (following clock-specifier and interrupt-specifier).

Maybe it was not very clear, and I am working on improving this, but
what I am proposing is simply to have:
 cooling-device = <&cdev min max>

where min and max are one cell unsigned values referring to minimum
cooling level and maximum cooling level, for this reference. Note that
'cdev' may have 10 levels, but in this reference we may use only from 6
to 10:
 cooling-device = <&cdev 6 10>;

I don't see a need to have a cooling-names for this case. And for now, I
also don't see why we would use other specifiers. But we can leave it
open for future extensions.

It does make sense to have thermal-sensor-names (using thermal-sensor as
per your suggestion). Because it makes clear where the sensor is in the
case of using several sensors in one zone. Just like in the example I
already gave:
> +cpu-thermal: cpu-thermal {
> +	polling-delay-passive = <250>; /* milliseconds */
> +	polling-delay = <1000>; /* milliseconds */
> +
> +		/* sensor       ID */
> +     thermal-sensors = <&bandgap0	0>,
> +		  	  <&adc		0>;
> +	thermal-sensors-names = "cpu", "pcb north";
> +
> +		/* hotspot = 100 * bandgap - 120 * adc + 484 */
> +	coefficients = 		<100	-120	484>;
> +
> +        trips {
> +		...
> +        };
> +
> +	cooling-attachments {
> +		...
> +	};
> +};
hongbo.zhang@freescale.com Sept. 25, 2013, 5:39 a.m. UTC | #6
On 09/24/2013 11:50 PM, Eduardo Valentin wrote:
> Zhang,
>
> I appreciate your interest. I am replying a couple of your comments.
> Please also check my answers to Mark's queries on other email thread.
When I replied to Mark yesterday, I missed your mail replying him already.
I should check that mail and reply it for further concerns.
> On 24-09-2013 04:11, Hongbo Zhang wrote:
>> On 09/23/2013 06:40 PM, Mark Rutland wrote:
>>> Hi Eduardo,
>>>
>>> Apologies for having taken so long to get back you on this.
>>>
>>> I have several comments on the binding and the way it's parsed.
>>>
>>> On Wed, Sep 18, 2013 at 10:35:36PM +0100, Eduardo Valentin wrote:
>>>> This patch introduces a device tree bindings for
>>>> describing the hardware thermal behavior and limits.
>>>> Also a parser to read and interpret the data and feed
>>>> it in the thermal framework is presented.
>>>>
>>>> This patch introduces a thermal data parser for device
>>>> tree. The parsed data is used to build thermal zones
>>>> and thermal binding parameters. The output data
>>>> can then be used to deploy thermal policies.
>>>>
>>>> This patch adds also documentation regarding this
>>>> API and how to define tree nodes to use
>>>> this infrastructure.
>>>>
>>>> Note that, in order to be able to have control
>>>> on the sensor registration on the DT thermal zone,
>>>> it was required to allow changing the thermal zone
>>>> .get_temp callback. For this reason, this patch
>>>> also removes the 'const' modifier from the .ops
>>>> field of thermal zone devices.
>>>>
>>>> Cc: Zhang Rui <rui.zhang@intel.com>
>>>> Cc: linux-pm@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
>>>> ---
>>>>    .../devicetree/bindings/thermal/thermal.txt        | 498 +++++++++++++
>>>>    drivers/thermal/Kconfig                            |  13 +
>>>>    drivers/thermal/Makefile                           |   1 +
>>>>    drivers/thermal/of-thermal.c                       | 775
>>>> +++++++++++++++++++++
>>>>    drivers/thermal/thermal_core.c                     |   9 +-
>>>>    drivers/thermal/thermal_core.h                     |   9 +
>>>>    include/dt-bindings/thermal/thermal.h              |  27 +
>>>>    include/linux/thermal.h                            |  28 +-
>>>>    8 files changed, 1357 insertions(+), 3 deletions(-)
>>>>    create mode 100644
>>>> Documentation/devicetree/bindings/thermal/thermal.txt
>>>>    create mode 100644 drivers/thermal/of-thermal.c
>>>>    create mode 100644 include/dt-bindings/thermal/thermal.h
>>>>
>>>> ---
>>>>
>>>> Hello all,
>>>>
>>>> Thanks Joe Perches for the effort of reviewing this code, I really
>>>> appreciate it.
>>>>
>>>> Here is a version with a refactored init function without leaks in
>>>> the failure
>>>> patch. I also added a couple of comments to better understand the
>>>> intention of
>>>> that function. Besides, when there are no memory available, the function
>>>> rolls back what ever thermal zones have been added.
>>>>
>>>> Thanks,
>>>>
>>>> Eduardo
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt
>>>> b/Documentation/devicetree/bindings/thermal/thermal.txt
>>>> new file mode 100644
>>>> index 0000000..6664533
>>>> --- /dev/null
>>>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>>>> @@ -0,0 +1,498 @@
>>>> +* Thermal Framework Device Tree descriptor
>>>> +
>>>> +Generic binding to provide a way of defining hardware thermal
>>>> +structure using device tree. A thermal structure includes thermal
>>>> +zones and their components, such as trip points, polling intervals,
>>>> +sensors and cooling devices binding descriptors.
>>>> +
>>>> +The target of device tree thermal descriptors is to describe only
>>>> +the hardware thermal aspects, not how the system must control or which
>>>> +algorithm or policy must be taken in place.
>>>> +
>>>> +There are five types of nodes involved to describe thermal bindings:
>>>> +- sensors: used to describe the device source of temperature sensing;
>>>> +- cooling devices: used to describe devices source of power
>>>> dissipation control;
>>>> +- trip points: used to describe points in temperature domain defined to
>>>> +make the system aware of hardware limits;
>>>> +- cooling attachments: used to describe links between trip points and
>>>> +cooling devices;
>>> I think "attachments" sounds a bit odd, though I don't have a better
>>> naming suggestion.
>> "map" or "mapping" is better?
> Well,  yeah, that could work.
:)
>>>> +- thermal zones: used to describe thermal data within the hardware;
>>>> +
>>>> +It follows a description of each type of these device tree nodes.
>>>> +
>>>> +* Sensor devices
>>>> +
>>>> +Sensor devices are nodes providing temperature sensing capabilities
>>>> on thermal
>>>> +zones. Typical devices are I2C ADC converters and bandgaps. Theses
>>>> are nodes
>>>> +providing temperature data to thermal zones. Temperature sensor
>>>> devices may
>>>> +control one or more internal sensors.
>>>> +
>>>> +Required property:
>>>> +- #sensor-cells:       Used to provide sensor device specific
>>>> information
>>>> +                       while referring to it. Must be at least 1, in
>>>> order
>>>> +                       to identify uniquely the sensor instances within
>>>> +                       the IC. See thermal zone binding for more
>>>> details
>>>> +                       on how consumers refer to sensor devices.
>>> I don't see why this needs to be at least one cell -- if an IC only has
>>> one temperature sensor it should be fine for this to be zero and have no
>>> ambiguity.
>>>
>>> It may make sense to call these thermal sensor devices, there are plenty
>>> of other sensors we might want to describe in the dt for other purposes,
>>> and we can impose some consistency if we remove the ambiguity.
>>>
>>>> +
>>>> +* Cooling device nodes
>>>> +
>>>> +Cooling devices are nodes providing control on power dissipation. There
>>>> +are essentially two ways to provide control on power dissipation. First
>>>> +is by means of regulating device performance, which is known as passive
>>>> +cooling. Second is by means of activating devices in order to remove
>>>> +the dissipated heat, which is known as active cooling, e.g. regulating
>>>> +fan speeds. In both cases, cooling devices shall have a way to
>>>> determine
>>>> +the level of cooling.
>>>> +
>>>> +Required property:
>>>> +- cooling-min-level:   A unsigned integer indicating the smallest
>>>> +                       cooling level accepted. Typically 0.
>>>> +- cooling-max-level:   An unsigned integer indicating the largest
>>>> +                       cooling level accepted.
>>> I'm not sure what a "cooling level" means. It seems a bit abstract. Is
>>> this binding specific?
>> It means cooling ability of a cooling device, such as speed of a fan.
>> But it is better to use cooling-min/max-state I think, because the
>> thermal management code is using "cooling state", concepts in dt and c
>> should be identical.
> I think the idea is just to be clear. Just because the thermal
> management implementation uses a terminology, it does not necessarily
> mean it is best fit for describing in device tree bindings. But I dont
> really have strong opinion on level or state at this point.
>
>>> How does this relate to cooling-cells? These seem to be a fixed size.
>> I also think cooling-cells is redundant.
> Cooling cells is not redundant, please check my explanation on the other
> thread. It is required to determine which level/states are used while
> binding.
>
OK

>>>> +- #cooling-cells:      Used to provide cooling device specific
>>>> information
>>>> +                       while referring to it. Must be at least 2, in
>>>> order
>>>> +                       to specify minimum and maximum cooling level
>>>> used
>>>> +                       in the reference. See Cooling device
>>>> attachments section
>>>> +                       below for more details on how consumers refer to
>>>> +                       cooling devices.
>>> Are these cooling cells always expected to cover min and max, and are
>>> min and max always expected to take the same number of cells? The above
>>> cooling-*-level properties imply they each take up one cell.
>>>
>>> Does #cooling-cells = <3> make sense?. If this covers additional
>>> information (e.g. which fan on a multi-fan controller), how does the OS
>>> determine which cells are min and max, and how do these relate to
>>> cooling-level-min and cooling-level-max?
>>>
>>>> +
>>>> +* Trip points
>>>> +
>>>> +The trip node is a node to describe a point in the temperature domain
>>>> +in which the system takes an action. This node describes just the
>>>> point,
>>>> +not the action.
>>> I'm still not sure on this, as it sounds like embedding policy into the
>>> DT, rather than a description of the hardware. I realise we need to
>>> prevent devices burning out, so describing the max acceptable
>>> temperature and possibly a preferred range makes sense, but do we need
>>> this level of flexibility? Maybe we do.
>>>
>> Putting platform data into dt is acceptable, right?
>> In a narrow sense, trip points are platform data, but in a broad sense,
>> both trip point itself and its corresponding cooling device are all
>> platform data.
>>
> I think Mark needs to elaborate a bit more to clarify why he sees this
> as embedding policy into DT. And I believe it is fine to pass platform
> data, as long as it represents hardware and it is  not representing policy.
>
>>>> +
>>>> +Required properties:
>>>> +- temperature:         the trip temperature level, in milliCelsius.
>>> Preferably, s/milliCelsius/millicelsius/ over the document.
>>>
>>> Given that we have signed values elsewhere, is this signed or unsigned?
>>>
>>>> +- hysteresis:          a (low) hysteresis value on 'temperature'.
>>>> This is a
>>>> +                       relative value, in milliCelsius.
>>>> +- type:                        the trip type. Here is the type mapping:
>>>> +       THERMAL_TRIP_ACTIVE     0:      A trip point to enable active
>>>> cooling
>>>> +       THERMAL_TRIP_PASSIVE    1:      A trip point to enable
>>>> passive cooling
>>>> +       THERMAL_TRIP_HOT        2:      A trip point to notify emergency
>>>> +       THERMAL_TRIP_CRITICAL   3:      Hardware not reliable.
>>>> +
>>>> +Refer to include/dt-bindings/thermal/thermal.h for definition of
>>>> these consts.
>>> Why not use a string for this? We do so for other type parameters in
>>> other bindings (see dr_mode in
>>> Documentation/devicetree/bindings/usb/generic.txt, phy-mode for ethernet
>>> PHYs, etc).
>>>
>>>> +
>>>> +* Cooling device attachments
>>>> +
>>>> +The cooling device attachments node is a node to describe how
>>>> cooling devices
>>>> +get assigned to trip points of the zone. The cooling devices are
>>>> expected
>>>> +to be loaded in the target system.
>>>> +
>>>> +Required properties:
>>>> +- cooling-device:      A phandle of a cooling device with its
>>>> parameters,
>>>> +                       referring to which cooling device is used in
>>>> this
>>>> +                       binding. The required parameters are: the
>>>> minimum
>>>> +                       cooling level and the maximum cooling level used
>>>> +                       in this attach.
>>> Might this be a list in general? The code doesn't have to support that
>>> yet.
>>>
>>> It would be nice to have a name for the cells after a phandle which
>>> describe the cooling device's configuration. You've called them
>>> parameters here, but it probably makes sense to call them a
>>> cooling-specifier (following clock-specifier and interrupt-specifier).
>>>
>>>> +- trip:                        A phandle of a trip point node within
>>>> the same thermal
>>>> +                       zone.
>>>> +
>>>> +Optional property:
>>>> +- contribution:                The cooling contribution to the
>>>> thermal zone of the
>>>> +                       referred cooling device at the referred trip
>>>> point.
>>>> +                       The contribution is a value from 0 to 100.
>>>> The sum
>>>> +                       of all cooling contributions within a thermal
>>>> zone
>>>> +                       must never exceed 100.
>>> This is a bit arbitrary. Couldn't we sum all contributions to find the
>>> total automatically, so this could be a simpler ratio?
>>>
>>>> +
>>>> +Note: Using the THERMAL_NO_LIMIT (-1L) constant in the
>>>> cooling-device phandle
>>>> +limit parameters means:
>>>> +(i)   - minimum level allowed for minimum cooling level used in the
>>>> reference.
>>>> +(ii)  - maximum level allowed for maximum cooling level used in the
>>>> reference.
>>>> +Refer to include/dt-bindings/thermal/thermal.h for definition of
>>>> this constant.
>>>> +
>>>> +* Thermal zones
>>>> +
>>>> +The thermal-zone node is the node containing all the required info
>>>> +for describing a thermal zone, including its cdev bindings. The
>>>> thermal_zone
>>>> +node must contain, apart from its own properties, one node containing
>>>> +trip nodes and one node containing all the zone cooling attachments.
>>> s/cdev/cooling device/ ?
>>>
>>>> +
>>>> +Required properties:
>>>> +- passive-delay:       The maximum number of milliseconds to wait
>>>> between polls
>>>> +                       when performing passive cooling.
>>>> +- polling-delay:       The maximum number of milliseconds to wait
>>>> between polls
>>>> +                       when checking this thermal zone.
>>> How about polling-delay-passive, polling-delay-active? I'm still not
>>>
>>>> +- sensors:             A list of sensor phandles and their
>>>> parameters. The
>>>> +                       required parameter is the sensor id, in order to
>>>> +                       identify internal sensors when the sensor IC
>>>> features
>>>> +                       several sensing units.
>>> As mentioned above, I'm not sure you even need that one cell.
>>>
>>> - sensors:                A list of sensor phandle +
>>> thermal-sensor-specifier
>>>                             cells describing the sensors monitoring the
>>> thermal
>>>                zone.
>>>
>>>> +- trips:               A sub-node containing several trip point
>>>> nodes required
>>>> +                       to describe the thermal zone.
>>>> +- cooling-attachments  A sub-node containing several cooling device
>>>> attaches
>>>> +                       nodes, used to describe the relation between
>>>> trips
>>>> +                       and cooling devices.
>>>> +
>>>> +Optional property:
>>>> +- coefficients:                An array of integers (one signed
>>>> cell) containing
>>>> +                       coefficients to compose a linear relation
>>>> between
>>>> +                       the sensors described in the sensors property.
>>>> +                       Coefficients defaults to 1, in case this
>>>> property
>>>> +                       is not specified. A simple linear polynomial
>>>> is used:
>>>> +                       Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1)
>>>> + cn.
>>>> +
>>>> +                       The coefficients are ordered and they match
>>>> with sensors
>>>> +                       by means of sensor ID. Additional
>>>> coefficients are
>>>> +                       interpreted as constant offsets.
>>> What is the end result of this? Presumably this is meant to result in an
>>> estimate of the average temperature of the thermal zone, rather than an
>>> arbitrary value? This should be mentioned.
>>>
>>> This doesn't seem to be used the in the code below...
>>>
>>>> +
>>>> +Note: The delay properties are bound to the maximum dT/dt (temperature
>>>> +derivative over time) in two situations for a thermal zone:
>>>> +(i)  - when active cooling is activated (passive-delay); and
>>>> +(ii) - when the zone just needs to be monitored (polling-delay).
>>>> +The maximum dT/dt is highly bound to hardware power consumption and
>>>> dissipation
>>>> +capability.
>>> I'm not sure what you mean by this, could you elaborate?
>>>
>>> I guess you mean that the delays should be chosen to account for said
>>> max dT/dt, such that a device can't unexpectedly cross several trip
>>> boundaries between polls?
>>>
>>>> +
>>>> +* Examples
>>>> +
>>>> +Below are several examples on how to use thermal data descriptors
>>>> +using device tree bindings:
>>>> +
>>>> +(a) - CPU thermal zone
>>>> +
>>>> +The CPU thermal zone example below describes how to setup one
>>>> thermal zone
>>>> +using one single sensor as temperature source and many cooling
>>>> devices and
>>>> +power dissipation control sources.
>>>> +
>>>> +#include <dt-bindings/thermal/thermal.h>
>>>> +
>>>> +cpus {
>>>> +       cpu0: cpu@0 {
>>>> +               ...
>>>> +               cooling-min-level = <0>;
>>>> +               cooling-max-level = <3>;
>>>> +               #cooling-cells = <2>; /* min followed by max */
>>>> +       };
>>> What do those min and max mean in this context? What do the values in
>>> the range [0,3] correspond to?
>>>
>>> I'm not sure it makes sense to describe passive cooling in this way --
>>> the precise way an OS does less work on a device is fundamentally tied
>>> to the design of the OS (it might be able to rate-limit requests, it
>>> might be able to clock the device down, it might only be able to turn
>>> the device off).
>>>
>>> I'm not sure there is anything we can describe for passive cooling
>>> (beyond the thermal limits the OS should attempt to stick to).
>>>
>>>> +       ...
>>>> +};
>>>> +
>>>> +&i2c1 {
>>>> +       ...
>>>> +       fan0: fan@0x48 {
>>>> +               ...
>>>> +               cooling-min-level = <0>;
>>>> +               cooling-max-level = <9>;
>>>> +               #cooling-cells = <2>; /* min followed by max */
>>>> +       };
>>> What do min and max mean here for the fan?
>>>
>> It means the speed level of a fan. Even if a fan can run at a continuous
>> speed range, but when it is controlled under thermal management
>> framework, it can only run at discrete speeds.
>
> Please, an important point here, this is not about thermal framework
> requirement. But how we describe hardware. As you mentioned, even if
> there are fan devices out there that uses continuous speed range, there
> other devices, even fan devices or cpu frequency steps for instance,
> that are discrete. Again, *this is not a thermal framework requirement*.
>
>>>> +};
>>>> +
>>>> +bandgap0: bandgap@0x0000ED00 {
>>>> +       ...
>>>> +       #sensor-cells = <1>;
>>>> +};
>>>> +
>>>> +cpu-thermal: cpu-thermal {
>>> How do the thermal zones get probed?
>>>
>>>> +       passive-delay = <250>; /* milliseconds */
>>>> +       polling-delay = <1000>; /* milliseconds */
>>>> +
>>>> +               /* sensor       ID */
>>>> +        sensors = <&bandgap0     0>;
>>>> +
>>>> +        trips {
>>>> +                cpu-alert0: cpu-alert {
>>>> +                        temperature = <90000>; /* milliCelsius */
>>>> +                        hysteresis = <2000>; /* milliCelsius */
>>>> +                        type = <THERMAL_TRIP_ACTIVE>;
>>>> +                };
>>>> +                cpu-alert1: cpu-alert {
>>>> +                        temperature = <100000>; /* milliCelsius */
>>>> +                        hysteresis = <2000>; /* milliCelsius */
>>>> +                        type = <THERMAL_TRIP_PASSIVE>;
>>>> +                };
>>>> +                cpu-crit: cpu-crit {
>>>> +                        temperature = <125000>; /* milliCelsius */
>>>> +                        hysteresis = <2000>; /* milliCelsius */
>>>> +                        type = <THERMAL_TRIP_CRITICAL>;
>>>> +                };
>>>> +        };
>>>> +
>>>> +       cooling-attachments {
>>>> +               attach0 {
>>>> +                       trip = <&cpu-alert0>;
>>>> +                       cooling-device = <&fan0 THERMAL_NO_LIMITS 4>;
>>>> +               };
>>>> +               attach1 {
>>>> +                       trip = <&cpu-alert1>;
>>>> +                       cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
>>>> +               };
>>>> +               attach2 {
>>>> +                       trip = <&cpu-alert1>;
>>>> +                       cooling-device =
>>>> +                               <&cpu0 THERMAL_NO_LIMITS
>>>> THERMAL_NO_LIMITS>;
>>>> +               };
>>>> +       };
>>> Was there a good reason for splitting trips and attachment?
>> This is for adding/removing cooling device dynamically.
> Well not really. I don't think this would prevent us of representing it
> inside the trip point. It is, as I replied to Mark, because I wanted to
> keep the flexibility to allow describing better properties related to
> the cooling device while associated to a trip point alone, e.g.
> 'contribution'.

Understand your purpose and explanation.
But this can also really help to add/remove cooling device freely, there 
were already drivers with separated files of thermal zone and cooling 
device.

>>>> +};
>>>> +
>>>> +In the example above, the ADC sensor at address 0x0000ED00 is used
>>>> to monitor
>>>> +the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan
>>>> device controlled
>>>> +via I2C bus 1, at adress 0x48, is used to remove the heat out of the
>>>> thermal
>>>> +zone 'cpu-thermal' using its cooling levels from its minimum to 4,
>>>> when it
>>>> +reaches trip point 'cpu-alert0' at 90C, as an example of active
>>>> cooling. The
>>>> +same cooling device is used at 'cpu-alert1', but from 5 to its
>>>> maximum level.
>>>> +The cpu@0 device is also linked to the same thermal zone,
>>>> 'cpu-thermal', as a
>>>> +passive cooling device, using all its cooling levels at trip point
>>>> 'cpu-alert1',
>>>> +which is a trip point at 100C.
>>>> +
>>
>>
>>
>>
>



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
hongbo.zhang@freescale.com Sept. 25, 2013, 7:13 a.m. UTC | #7
On 09/19/2013 05:35 AM, Eduardo Valentin wrote:
>   
> [...]
> +
> +/***   sensor API   ***/
> +

You are introducing new concept here, the original framework and drivers 
cannot use this, right? any further plan to update original framework 
for this new feature?

> +static struct thermal_zone_device *
> +thermal_zone_of_add_sensor(struct device_node *zone,
> +			   struct device_node *sensor, void *data,
> +			   int (*get_temp)(void *, long *),
> +			   int (*get_trend)(void *, long *))
> +{
> +	struct thermal_zone_device *tzd;
> +	struct __thermal_zone *tz;
> +
> +	tzd = thermal_zone_get_zone_by_name(zone->name);
> +	if (IS_ERR(tzd))
> +		return ERR_PTR(-EPROBE_DEFER);
> +
> [...]
> +
> +/*
> + * Here are the thermal trip types. This must
> + * match with enum thermal_trip_type at
> + * include/linux/thermal.h
> + */
> +#define THERMAL_TRIP_ACTIVE		0
> +#define THERMAL_TRIP_PASSIVE		1
> +#define THERMAL_TRIP_HOT		2
> +#define THERMAL_TRIP_CRITICAL		3
> +

These macros seem duplicated with enum thermal_trip_type in thermal.h, 
do you have further plan to merge them?
Or by using string "active",  "passive" etc in the dts, then you can 
reuse the original enum definition.

> +/* On cooling devices upper and lower limits */
> +#define THERMAL_NO_LIMIT		(-1UL)
> +
> +#endif
> [...]


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Sept. 25, 2013, 2:15 p.m. UTC | #8
On 25-09-2013 03:13, Hongbo Zhang wrote:
> On 09/19/2013 05:35 AM, Eduardo Valentin wrote:
>>   [...]
>> +
>> +/***   sensor API   ***/
>> +
> 
> You are introducing new concept here, the original framework and drivers
> cannot use this, right? any further plan to update original framework
> for this new feature?
> 

Well, not new as such. Just a specific way to register sensors to
thermal framework. What is really new is the fact that we really need to
have sensors decoupled from thermal zone devices, and today we have
these concepts pretty merged together.

To answer your question, for now I am more concerned with the bindings
definition. Once that is at least agreed, then we can follow up with the
migration of existing drivers. For now, there are two examples in this
series, first one is using one existing thermal driver, which is the TI
SoC thermal driver, and the second one is the hwmon drivers, which are
existing sensor drivers, but are not thermal drivers.

The plan forward, once this series is accepted is to migrate existing
drivers, yes, so that they can use device tree uniformly. Of course,
this needs help from driver authors.

My proposal will be to follow up this series with a two fold migration.
First step to change the existing thermal drivers to have both, the
current support and the device tree support. And second step, for those
who wish to, we could remove the old code containing thermal data and
have only dt support. Of course, this requires drivers authors input.



>> +static struct thermal_zone_device *
>> +thermal_zone_of_add_sensor(struct device_node *zone,
>> +               struct device_node *sensor, void *data,
>> +               int (*get_temp)(void *, long *),
>> +               int (*get_trend)(void *, long *))
>> +{
>> +    struct thermal_zone_device *tzd;
>> +    struct __thermal_zone *tz;
>> +
>> +    tzd = thermal_zone_get_zone_by_name(zone->name);
>> +    if (IS_ERR(tzd))
>> +        return ERR_PTR(-EPROBE_DEFER);
>> +
>> [...]
>> +
>> +/*
>> + * Here are the thermal trip types. This must
>> + * match with enum thermal_trip_type at
>> + * include/linux/thermal.h
>> + */
>> +#define THERMAL_TRIP_ACTIVE        0
>> +#define THERMAL_TRIP_PASSIVE        1
>> +#define THERMAL_TRIP_HOT        2
>> +#define THERMAL_TRIP_CRITICAL        3
>> +
> 
> These macros seem duplicated with enum thermal_trip_type in thermal.h,
> do you have further plan to merge them?
> Or by using string "active",  "passive" etc in the dts, then you can
> reuse the original enum definition.

I am changing this so that in DT we have string constants, and we keep a
map from string to enum, just like we have for phy-mode, as suggested by
Mark.

You can have a taste of it here:
https://git.kernel.org/cgit/linux/kernel/git/evalenti/linux.git/commit/?h=thermal_work/thermal_core/dt_parser_rfc_v4&id=73f16c27fc763495188fba7d6e17b9c986efc6ac

I will be reposting this version once we are done with this thread
discussion and I am finished with my current test.

If you have the time, I would appreciate if you could try the series on
your board, as I am don't have access to your hardware. It would be
really nice to see how this work is behaving in other environments then
the one I have.

Thanks for your interest in this work.

> 
>> +/* On cooling devices upper and lower limits */
>> +#define THERMAL_NO_LIMIT        (-1UL)
>> +
>> +#endif
>> [...]
> 
> 
> 
>
Eduardo Valentin Sept. 25, 2013, 2:23 p.m. UTC | #9
On 25-09-2013 01:39, Hongbo Zhang wrote:
> On 09/24/2013 11:50 PM, Eduardo Valentin wrote:
>> Zhang,
>>
>> I appreciate your interest. I am replying a couple of your comments.
>> Please also check my answers to Mark's queries on other email thread.
> When I replied to Mark yesterday, I missed your mail replying him already.
> I should check that mail and reply it for further concerns.
>> On 24-09-2013 04:11, Hongbo Zhang wrote:
>>> On 09/23/2013 06:40 PM, Mark Rutland wrote:
>>>> Hi Eduardo,
>>>>
>>>> Apologies for having taken so long to get back you on this.
>>>>
>>>> I have several comments on the binding and the way it's parsed.
>>>>
>>>> On Wed, Sep 18, 2013 at 10:35:36PM +0100, Eduardo Valentin wrote:
>>>>> This patch introduces a device tree bindings for
>>>>> describing the hardware thermal behavior and limits.
>>>>> Also a parser to read and interpret the data and feed
>>>>> it in the thermal framework is presented.
>>>>>
>>>>> This patch introduces a thermal data parser for device
>>>>> tree. The parsed data is used to build thermal zones
>>>>> and thermal binding parameters. The output data
>>>>> can then be used to deploy thermal policies.
>>>>>
>>>>> This patch adds also documentation regarding this
>>>>> API and how to define tree nodes to use
>>>>> this infrastructure.
>>>>>
>>>>> Note that, in order to be able to have control
>>>>> on the sensor registration on the DT thermal zone,
>>>>> it was required to allow changing the thermal zone
>>>>> .get_temp callback. For this reason, this patch
>>>>> also removes the 'const' modifier from the .ops
>>>>> field of thermal zone devices.
>>>>>
>>>>> Cc: Zhang Rui <rui.zhang@intel.com>
>>>>> Cc: linux-pm@vger.kernel.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
>>>>> ---
>>>>>    .../devicetree/bindings/thermal/thermal.txt        | 498
>>>>> +++++++++++++
>>>>>    drivers/thermal/Kconfig                            |  13 +
>>>>>    drivers/thermal/Makefile                           |   1 +
>>>>>    drivers/thermal/of-thermal.c                       | 775
>>>>> +++++++++++++++++++++
>>>>>    drivers/thermal/thermal_core.c                     |   9 +-
>>>>>    drivers/thermal/thermal_core.h                     |   9 +
>>>>>    include/dt-bindings/thermal/thermal.h              |  27 +
>>>>>    include/linux/thermal.h                            |  28 +-
>>>>>    8 files changed, 1357 insertions(+), 3 deletions(-)
>>>>>    create mode 100644
>>>>> Documentation/devicetree/bindings/thermal/thermal.txt
>>>>>    create mode 100644 drivers/thermal/of-thermal.c
>>>>>    create mode 100644 include/dt-bindings/thermal/thermal.h
>>>>>
>>>>> ---
>>>>>
>>>>> Hello all,
>>>>>
>>>>> Thanks Joe Perches for the effort of reviewing this code, I really
>>>>> appreciate it.
>>>>>
>>>>> Here is a version with a refactored init function without leaks in
>>>>> the failure
>>>>> patch. I also added a couple of comments to better understand the
>>>>> intention of
>>>>> that function. Besides, when there are no memory available, the
>>>>> function
>>>>> rolls back what ever thermal zones have been added.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eduardo
>>>>>
>>>>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt
>>>>> b/Documentation/devicetree/bindings/thermal/thermal.txt
>>>>> new file mode 100644
>>>>> index 0000000..6664533
>>>>> --- /dev/null
>>>>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>>>>> @@ -0,0 +1,498 @@
>>>>> +* Thermal Framework Device Tree descriptor
>>>>> +
>>>>> +Generic binding to provide a way of defining hardware thermal
>>>>> +structure using device tree. A thermal structure includes thermal
>>>>> +zones and their components, such as trip points, polling intervals,
>>>>> +sensors and cooling devices binding descriptors.
>>>>> +
>>>>> +The target of device tree thermal descriptors is to describe only
>>>>> +the hardware thermal aspects, not how the system must control or
>>>>> which
>>>>> +algorithm or policy must be taken in place.
>>>>> +
>>>>> +There are five types of nodes involved to describe thermal bindings:
>>>>> +- sensors: used to describe the device source of temperature sensing;
>>>>> +- cooling devices: used to describe devices source of power
>>>>> dissipation control;
>>>>> +- trip points: used to describe points in temperature domain
>>>>> defined to
>>>>> +make the system aware of hardware limits;
>>>>> +- cooling attachments: used to describe links between trip points and
>>>>> +cooling devices;
>>>> I think "attachments" sounds a bit odd, though I don't have a better
>>>> naming suggestion.
>>> "map" or "mapping" is better?
>> Well,  yeah, that could work.
> :)
>>>>> +- thermal zones: used to describe thermal data within the hardware;
>>>>> +
>>>>> +It follows a description of each type of these device tree nodes.
>>>>> +
>>>>> +* Sensor devices
>>>>> +
>>>>> +Sensor devices are nodes providing temperature sensing capabilities
>>>>> on thermal
>>>>> +zones. Typical devices are I2C ADC converters and bandgaps. Theses
>>>>> are nodes
>>>>> +providing temperature data to thermal zones. Temperature sensor
>>>>> devices may
>>>>> +control one or more internal sensors.
>>>>> +
>>>>> +Required property:
>>>>> +- #sensor-cells:       Used to provide sensor device specific
>>>>> information
>>>>> +                       while referring to it. Must be at least 1, in
>>>>> order
>>>>> +                       to identify uniquely the sensor instances
>>>>> within
>>>>> +                       the IC. See thermal zone binding for more
>>>>> details
>>>>> +                       on how consumers refer to sensor devices.
>>>> I don't see why this needs to be at least one cell -- if an IC only has
>>>> one temperature sensor it should be fine for this to be zero and
>>>> have no
>>>> ambiguity.
>>>>
>>>> It may make sense to call these thermal sensor devices, there are
>>>> plenty
>>>> of other sensors we might want to describe in the dt for other
>>>> purposes,
>>>> and we can impose some consistency if we remove the ambiguity.
>>>>
>>>>> +
>>>>> +* Cooling device nodes
>>>>> +
>>>>> +Cooling devices are nodes providing control on power dissipation.
>>>>> There
>>>>> +are essentially two ways to provide control on power dissipation.
>>>>> First
>>>>> +is by means of regulating device performance, which is known as
>>>>> passive
>>>>> +cooling. Second is by means of activating devices in order to remove
>>>>> +the dissipated heat, which is known as active cooling, e.g.
>>>>> regulating
>>>>> +fan speeds. In both cases, cooling devices shall have a way to
>>>>> determine
>>>>> +the level of cooling.
>>>>> +
>>>>> +Required property:
>>>>> +- cooling-min-level:   A unsigned integer indicating the smallest
>>>>> +                       cooling level accepted. Typically 0.
>>>>> +- cooling-max-level:   An unsigned integer indicating the largest
>>>>> +                       cooling level accepted.
>>>> I'm not sure what a "cooling level" means. It seems a bit abstract. Is
>>>> this binding specific?
>>> It means cooling ability of a cooling device, such as speed of a fan.
>>> But it is better to use cooling-min/max-state I think, because the
>>> thermal management code is using "cooling state", concepts in dt and c
>>> should be identical.
>> I think the idea is just to be clear. Just because the thermal
>> management implementation uses a terminology, it does not necessarily
>> mean it is best fit for describing in device tree bindings. But I dont
>> really have strong opinion on level or state at this point.
>>
>>>> How does this relate to cooling-cells? These seem to be a fixed size.
>>> I also think cooling-cells is redundant.
>> Cooling cells is not redundant, please check my explanation on the other
>> thread. It is required to determine which level/states are used while
>> binding.
>>
> OK
> 
>>>>> +- #cooling-cells:      Used to provide cooling device specific
>>>>> information
>>>>> +                       while referring to it. Must be at least 2, in
>>>>> order
>>>>> +                       to specify minimum and maximum cooling level
>>>>> used
>>>>> +                       in the reference. See Cooling device
>>>>> attachments section
>>>>> +                       below for more details on how consumers
>>>>> refer to
>>>>> +                       cooling devices.
>>>> Are these cooling cells always expected to cover min and max, and are
>>>> min and max always expected to take the same number of cells? The above
>>>> cooling-*-level properties imply they each take up one cell.
>>>>
>>>> Does #cooling-cells = <3> make sense?. If this covers additional
>>>> information (e.g. which fan on a multi-fan controller), how does the OS
>>>> determine which cells are min and max, and how do these relate to
>>>> cooling-level-min and cooling-level-max?
>>>>
>>>>> +
>>>>> +* Trip points
>>>>> +
>>>>> +The trip node is a node to describe a point in the temperature domain
>>>>> +in which the system takes an action. This node describes just the
>>>>> point,
>>>>> +not the action.
>>>> I'm still not sure on this, as it sounds like embedding policy into the
>>>> DT, rather than a description of the hardware. I realise we need to
>>>> prevent devices burning out, so describing the max acceptable
>>>> temperature and possibly a preferred range makes sense, but do we need
>>>> this level of flexibility? Maybe we do.
>>>>
>>> Putting platform data into dt is acceptable, right?
>>> In a narrow sense, trip points are platform data, but in a broad sense,
>>> both trip point itself and its corresponding cooling device are all
>>> platform data.
>>>
>> I think Mark needs to elaborate a bit more to clarify why he sees this
>> as embedding policy into DT. And I believe it is fine to pass platform
>> data, as long as it represents hardware and it is  not representing
>> policy.
>>
>>>>> +
>>>>> +Required properties:
>>>>> +- temperature:         the trip temperature level, in milliCelsius.
>>>> Preferably, s/milliCelsius/millicelsius/ over the document.
>>>>
>>>> Given that we have signed values elsewhere, is this signed or unsigned?
>>>>
>>>>> +- hysteresis:          a (low) hysteresis value on 'temperature'.
>>>>> This is a
>>>>> +                       relative value, in milliCelsius.
>>>>> +- type:                        the trip type. Here is the type
>>>>> mapping:
>>>>> +       THERMAL_TRIP_ACTIVE     0:      A trip point to enable active
>>>>> cooling
>>>>> +       THERMAL_TRIP_PASSIVE    1:      A trip point to enable
>>>>> passive cooling
>>>>> +       THERMAL_TRIP_HOT        2:      A trip point to notify
>>>>> emergency
>>>>> +       THERMAL_TRIP_CRITICAL   3:      Hardware not reliable.
>>>>> +
>>>>> +Refer to include/dt-bindings/thermal/thermal.h for definition of
>>>>> these consts.
>>>> Why not use a string for this? We do so for other type parameters in
>>>> other bindings (see dr_mode in
>>>> Documentation/devicetree/bindings/usb/generic.txt, phy-mode for
>>>> ethernet
>>>> PHYs, etc).
>>>>
>>>>> +
>>>>> +* Cooling device attachments
>>>>> +
>>>>> +The cooling device attachments node is a node to describe how
>>>>> cooling devices
>>>>> +get assigned to trip points of the zone. The cooling devices are
>>>>> expected
>>>>> +to be loaded in the target system.
>>>>> +
>>>>> +Required properties:
>>>>> +- cooling-device:      A phandle of a cooling device with its
>>>>> parameters,
>>>>> +                       referring to which cooling device is used in
>>>>> this
>>>>> +                       binding. The required parameters are: the
>>>>> minimum
>>>>> +                       cooling level and the maximum cooling level
>>>>> used
>>>>> +                       in this attach.
>>>> Might this be a list in general? The code doesn't have to support that
>>>> yet.
>>>>
>>>> It would be nice to have a name for the cells after a phandle which
>>>> describe the cooling device's configuration. You've called them
>>>> parameters here, but it probably makes sense to call them a
>>>> cooling-specifier (following clock-specifier and interrupt-specifier).
>>>>
>>>>> +- trip:                        A phandle of a trip point node within
>>>>> the same thermal
>>>>> +                       zone.
>>>>> +
>>>>> +Optional property:
>>>>> +- contribution:                The cooling contribution to the
>>>>> thermal zone of the
>>>>> +                       referred cooling device at the referred trip
>>>>> point.
>>>>> +                       The contribution is a value from 0 to 100.
>>>>> The sum
>>>>> +                       of all cooling contributions within a thermal
>>>>> zone
>>>>> +                       must never exceed 100.
>>>> This is a bit arbitrary. Couldn't we sum all contributions to find the
>>>> total automatically, so this could be a simpler ratio?
>>>>
>>>>> +
>>>>> +Note: Using the THERMAL_NO_LIMIT (-1L) constant in the
>>>>> cooling-device phandle
>>>>> +limit parameters means:
>>>>> +(i)   - minimum level allowed for minimum cooling level used in the
>>>>> reference.
>>>>> +(ii)  - maximum level allowed for maximum cooling level used in the
>>>>> reference.
>>>>> +Refer to include/dt-bindings/thermal/thermal.h for definition of
>>>>> this constant.
>>>>> +
>>>>> +* Thermal zones
>>>>> +
>>>>> +The thermal-zone node is the node containing all the required info
>>>>> +for describing a thermal zone, including its cdev bindings. The
>>>>> thermal_zone
>>>>> +node must contain, apart from its own properties, one node containing
>>>>> +trip nodes and one node containing all the zone cooling attachments.
>>>> s/cdev/cooling device/ ?
>>>>
>>>>> +
>>>>> +Required properties:
>>>>> +- passive-delay:       The maximum number of milliseconds to wait
>>>>> between polls
>>>>> +                       when performing passive cooling.
>>>>> +- polling-delay:       The maximum number of milliseconds to wait
>>>>> between polls
>>>>> +                       when checking this thermal zone.
>>>> How about polling-delay-passive, polling-delay-active? I'm still not
>>>>
>>>>> +- sensors:             A list of sensor phandles and their
>>>>> parameters. The
>>>>> +                       required parameter is the sensor id, in
>>>>> order to
>>>>> +                       identify internal sensors when the sensor IC
>>>>> features
>>>>> +                       several sensing units.
>>>> As mentioned above, I'm not sure you even need that one cell.
>>>>
>>>> - sensors:                A list of sensor phandle +
>>>> thermal-sensor-specifier
>>>>                             cells describing the sensors monitoring the
>>>> thermal
>>>>                zone.
>>>>
>>>>> +- trips:               A sub-node containing several trip point
>>>>> nodes required
>>>>> +                       to describe the thermal zone.
>>>>> +- cooling-attachments  A sub-node containing several cooling device
>>>>> attaches
>>>>> +                       nodes, used to describe the relation between
>>>>> trips
>>>>> +                       and cooling devices.
>>>>> +
>>>>> +Optional property:
>>>>> +- coefficients:                An array of integers (one signed
>>>>> cell) containing
>>>>> +                       coefficients to compose a linear relation
>>>>> between
>>>>> +                       the sensors described in the sensors property.
>>>>> +                       Coefficients defaults to 1, in case this
>>>>> property
>>>>> +                       is not specified. A simple linear polynomial
>>>>> is used:
>>>>> +                       Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1)
>>>>> + cn.
>>>>> +
>>>>> +                       The coefficients are ordered and they match
>>>>> with sensors
>>>>> +                       by means of sensor ID. Additional
>>>>> coefficients are
>>>>> +                       interpreted as constant offsets.
>>>> What is the end result of this? Presumably this is meant to result
>>>> in an
>>>> estimate of the average temperature of the thermal zone, rather than an
>>>> arbitrary value? This should be mentioned.
>>>>
>>>> This doesn't seem to be used the in the code below...
>>>>
>>>>> +
>>>>> +Note: The delay properties are bound to the maximum dT/dt
>>>>> (temperature
>>>>> +derivative over time) in two situations for a thermal zone:
>>>>> +(i)  - when active cooling is activated (passive-delay); and
>>>>> +(ii) - when the zone just needs to be monitored (polling-delay).
>>>>> +The maximum dT/dt is highly bound to hardware power consumption and
>>>>> dissipation
>>>>> +capability.
>>>> I'm not sure what you mean by this, could you elaborate?
>>>>
>>>> I guess you mean that the delays should be chosen to account for said
>>>> max dT/dt, such that a device can't unexpectedly cross several trip
>>>> boundaries between polls?
>>>>
>>>>> +
>>>>> +* Examples
>>>>> +
>>>>> +Below are several examples on how to use thermal data descriptors
>>>>> +using device tree bindings:
>>>>> +
>>>>> +(a) - CPU thermal zone
>>>>> +
>>>>> +The CPU thermal zone example below describes how to setup one
>>>>> thermal zone
>>>>> +using one single sensor as temperature source and many cooling
>>>>> devices and
>>>>> +power dissipation control sources.
>>>>> +
>>>>> +#include <dt-bindings/thermal/thermal.h>
>>>>> +
>>>>> +cpus {
>>>>> +       cpu0: cpu@0 {
>>>>> +               ...
>>>>> +               cooling-min-level = <0>;
>>>>> +               cooling-max-level = <3>;
>>>>> +               #cooling-cells = <2>; /* min followed by max */
>>>>> +       };
>>>> What do those min and max mean in this context? What do the values in
>>>> the range [0,3] correspond to?
>>>>
>>>> I'm not sure it makes sense to describe passive cooling in this way --
>>>> the precise way an OS does less work on a device is fundamentally tied
>>>> to the design of the OS (it might be able to rate-limit requests, it
>>>> might be able to clock the device down, it might only be able to turn
>>>> the device off).
>>>>
>>>> I'm not sure there is anything we can describe for passive cooling
>>>> (beyond the thermal limits the OS should attempt to stick to).
>>>>
>>>>> +       ...
>>>>> +};
>>>>> +
>>>>> +&i2c1 {
>>>>> +       ...
>>>>> +       fan0: fan@0x48 {
>>>>> +               ...
>>>>> +               cooling-min-level = <0>;
>>>>> +               cooling-max-level = <9>;
>>>>> +               #cooling-cells = <2>; /* min followed by max */
>>>>> +       };
>>>> What do min and max mean here for the fan?
>>>>
>>> It means the speed level of a fan. Even if a fan can run at a continuous
>>> speed range, but when it is controlled under thermal management
>>> framework, it can only run at discrete speeds.
>>
>> Please, an important point here, this is not about thermal framework
>> requirement. But how we describe hardware. As you mentioned, even if
>> there are fan devices out there that uses continuous speed range, there
>> other devices, even fan devices or cpu frequency steps for instance,
>> that are discrete. Again, *this is not a thermal framework requirement*.
>>
>>>>> +};
>>>>> +
>>>>> +bandgap0: bandgap@0x0000ED00 {
>>>>> +       ...
>>>>> +       #sensor-cells = <1>;
>>>>> +};
>>>>> +
>>>>> +cpu-thermal: cpu-thermal {
>>>> How do the thermal zones get probed?
>>>>
>>>>> +       passive-delay = <250>; /* milliseconds */
>>>>> +       polling-delay = <1000>; /* milliseconds */
>>>>> +
>>>>> +               /* sensor       ID */
>>>>> +        sensors = <&bandgap0     0>;
>>>>> +
>>>>> +        trips {
>>>>> +                cpu-alert0: cpu-alert {
>>>>> +                        temperature = <90000>; /* milliCelsius */
>>>>> +                        hysteresis = <2000>; /* milliCelsius */
>>>>> +                        type = <THERMAL_TRIP_ACTIVE>;
>>>>> +                };
>>>>> +                cpu-alert1: cpu-alert {
>>>>> +                        temperature = <100000>; /* milliCelsius */
>>>>> +                        hysteresis = <2000>; /* milliCelsius */
>>>>> +                        type = <THERMAL_TRIP_PASSIVE>;
>>>>> +                };
>>>>> +                cpu-crit: cpu-crit {
>>>>> +                        temperature = <125000>; /* milliCelsius */
>>>>> +                        hysteresis = <2000>; /* milliCelsius */
>>>>> +                        type = <THERMAL_TRIP_CRITICAL>;
>>>>> +                };
>>>>> +        };
>>>>> +
>>>>> +       cooling-attachments {
>>>>> +               attach0 {
>>>>> +                       trip = <&cpu-alert0>;
>>>>> +                       cooling-device = <&fan0 THERMAL_NO_LIMITS 4>;
>>>>> +               };
>>>>> +               attach1 {
>>>>> +                       trip = <&cpu-alert1>;
>>>>> +                       cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
>>>>> +               };
>>>>> +               attach2 {
>>>>> +                       trip = <&cpu-alert1>;
>>>>> +                       cooling-device =
>>>>> +                               <&cpu0 THERMAL_NO_LIMITS
>>>>> THERMAL_NO_LIMITS>;
>>>>> +               };
>>>>> +       };
>>>> Was there a good reason for splitting trips and attachment?
>>> This is for adding/removing cooling device dynamically.
>> Well not really. I don't think this would prevent us of representing it
>> inside the trip point. It is, as I replied to Mark, because I wanted to
>> keep the flexibility to allow describing better properties related to
>> the cooling device while associated to a trip point alone, e.g.
>> 'contribution'.
> 
> Understand your purpose and explanation.
> But this can also really help to add/remove cooling device freely, there
> were already drivers with separated files of thermal zone and cooling
> device.

Right. The idea is of course to probe cooling devices independently from
thermal zone devices, and also from thermal sensor devices. With this
series we will achieve that state.

Currently, thermal drivers have several responsibilities: control the
sensor, report the thermal zone requirements and probe cooling devices.

On top of this series, using device tree, cooling devices are supposed
to be probed based on their nodes. I have started to do this with the
cpufreq cooling device. And it is usable already. Thermal zone devices
will be probed by the of-thermal.c code. And thermal sensors will be
probed by their own driver, but they need to be registered to a thermal
zone, so that the zone gets enabled.

However, the specific point we are talking about, is more about how the
zone gets connected to the cooling device. But the probing of this
cooling device is not triggered by the node above, which is just
describing the connection. As I mentioned, the cooling device needs to
be probed based on the node that this connection points to, by means of
a phandle.

Just like regulators, or dma, or interrupts are mapped. Here the thermal
zone is a consumer, in a way, of the cooling device and of the sensor
device.

I hope this is clear in the documentation I am writing. Please let me
know if not.

> 
>>>>> +};
>>>>> +
>>>>> +In the example above, the ADC sensor at address 0x0000ED00 is used
>>>>> to monitor
>>>>> +the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan
>>>>> device controlled
>>>>> +via I2C bus 1, at adress 0x48, is used to remove the heat out of the
>>>>> thermal
>>>>> +zone 'cpu-thermal' using its cooling levels from its minimum to 4,
>>>>> when it
>>>>> +reaches trip point 'cpu-alert0' at 90C, as an example of active
>>>>> cooling. The
>>>>> +same cooling device is used at 'cpu-alert1', but from 5 to its
>>>>> maximum level.
>>>>> +The cpu@0 device is also linked to the same thermal zone,
>>>>> 'cpu-thermal', as a
>>>>> +passive cooling device, using all its cooling levels at trip point
>>>>> 'cpu-alert1',
>>>>> +which is a trip point at 100C.
>>>>> +
>>>
>>>
>>>
>>>
>>
> 
> 
> 
> 
>
Mark Rutland Sept. 30, 2013, 3:40 p.m. UTC | #10
On Tue, Sep 24, 2013 at 10:33:19PM +0100, Eduardo Valentin wrote:
> On 23-09-2013 06:40, Mark Rutland wrote:
> > 
> > It would be nice to have a name for the cells after a phandle which
> > describe the cooling device's configuration. You've called them
> > parameters here, but it probably makes sense to call them a
> > cooling-specifier (following clock-specifier and interrupt-specifier).
> 
> Maybe it was not very clear, and I am working on improving this, but
> what I am proposing is simply to have:
>  cooling-device = <&cdev min max>
> 
> where min and max are one cell unsigned values referring to minimum
> cooling level and maximum cooling level, for this reference. Note that
> 'cdev' may have 10 levels, but in this reference we may use only from 6
> to 10:
>  cooling-device = <&cdev 6 10>;
> 
> I don't see a need to have a cooling-names for this case. And for now, I
> also don't see why we would use other specifiers. But we can leave it
> open for future extensions.

There seesm to be some confusion ehre, so let me clarify.  I was asking
for consistent terminology (i.e. "cooling-specifier" rather than
"parameters"), not *-names properties. Consistent terminology and style
makes binding far easier to read.

> 
> It does make sense to have thermal-sensor-names (using thermal-sensor as
> per your suggestion). Because it makes clear where the sensor is in the
> case of using several sensors in one zone. Just like in the example I
> already gave:

Is there a way this can be useful at run-time, or could this just be
achieved with comments in the dt?

> > +cpu-thermal: cpu-thermal {
> > +	polling-delay-passive = <250>; /* milliseconds */
> > +	polling-delay = <1000>; /* milliseconds */
> > +
> > +		/* sensor       ID */
> > +     thermal-sensors = <&bandgap0	0>,
> > +		  	  <&adc		0>;
> > +	thermal-sensors-names = "cpu", "pcb north";
> > +
> > +		/* hotspot = 100 * bandgap - 120 * adc + 484 */
> > +	coefficients = 		<100	-120	484>;
> > +
> > +        trips {
> > +		...
> > +        };
> > +
> > +	cooling-attachments {
> > +		...
> > +	};
> > +};

Cheers,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
hongbo.zhang@freescale.com Oct. 8, 2013, 8:45 a.m. UTC | #11
On 09/25/2013 10:15 PM, Eduardo Valentin wrote:
> On 25-09-2013 03:13, Hongbo Zhang wrote:
>> On 09/19/2013 05:35 AM, Eduardo Valentin wrote:
>>>    [...]
>>> +
>>> +/***   sensor API   ***/
>>> +
>> You are introducing new concept here, the original framework and drivers
>> cannot use this, right? any further plan to update original framework
>> for this new feature?
>>
> Well, not new as such. Just a specific way to register sensors to
> thermal framework. What is really new is the fact that we really need to
> have sensors decoupled from thermal zone devices, and today we have
> these concepts pretty merged together.
> To answer your question, for now I am more concerned with the bindings
> definition. Once that is at least agreed, then we can follow up with the
> migration of existing drivers. For now, there are two examples in this
> series, first one is using one existing thermal driver, which is the TI
> SoC thermal driver, and the second one is the hwmon drivers, which are
> existing sensor drivers, but are not thermal drivers.
>
> The plan forward, once this series is accepted is to migrate existing
> drivers, yes, so that they can use device tree uniformly. Of course,
> this needs help from driver authors.
>
> My proposal will be to follow up this series with a two fold migration.
> First step to change the existing thermal drivers to have both, the
> current support and the device tree support. And second step, for those
> who wish to, we could remove the old code containing thermal data and
> have only dt support. Of course, this requires drivers authors input.
>
>
>
>>> +static struct thermal_zone_device *
>>> +thermal_zone_of_add_sensor(struct device_node *zone,
>>> +               struct device_node *sensor, void *data,
>>> +               int (*get_temp)(void *, long *),
>>> +               int (*get_trend)(void *, long *))
>>> +{
>>> +    struct thermal_zone_device *tzd;
>>> +    struct __thermal_zone *tz;
>>> +
>>> +    tzd = thermal_zone_get_zone_by_name(zone->name);
>>> +    if (IS_ERR(tzd))
>>> +        return ERR_PTR(-EPROBE_DEFER);
>>> +
>>> [...]
>>> +
>>> +/*
>>> + * Here are the thermal trip types. This must
>>> + * match with enum thermal_trip_type at
>>> + * include/linux/thermal.h
>>> + */
>>> +#define THERMAL_TRIP_ACTIVE        0
>>> +#define THERMAL_TRIP_PASSIVE        1
>>> +#define THERMAL_TRIP_HOT        2
>>> +#define THERMAL_TRIP_CRITICAL        3
>>> +
>> These macros seem duplicated with enum thermal_trip_type in thermal.h,
>> do you have further plan to merge them?
>> Or by using string "active",  "passive" etc in the dts, then you can
>> reuse the original enum definition.
> I am changing this so that in DT we have string constants, and we keep a
> map from string to enum, just like we have for phy-mode, as suggested by
> Mark.
>
> You can have a taste of it here:
> https://git.kernel.org/cgit/linux/kernel/git/evalenti/linux.git/commit/?h=thermal_work/thermal_core/dt_parser_rfc_v4&id=73f16c27fc763495188fba7d6e17b9c986efc6ac
>
> I will be reposting this version once we are done with this thread
> discussion and I am finished with my current test.
>
> If you have the time, I would appreciate if you could try the series on
> your board, as I am don't have access to your hardware. It would be
> really nice to see how this work is behaving in other environments then
> the one I have.

We have thermal management plan in next year, currently I don't have 
proper board to test this.
I would like to do it when I have time and board, but I will track these 
thermal treads.
>
> Thanks for your interest in this work.
>
>>> +/* On cooling devices upper and lower limits */
>>> +#define THERMAL_NO_LIMIT        (-1UL)
>>> +
>>> +#endif
>>> [...]
>>
>>
>>
>



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Oct. 8, 2013, 2:25 p.m. UTC | #12
On 08-10-2013 04:45, Hongbo Zhang wrote:
> On 09/25/2013 10:15 PM, Eduardo Valentin wrote:
>> On 25-09-2013 03:13, Hongbo Zhang wrote:
>>> On 09/19/2013 05:35 AM, Eduardo Valentin wrote:
>>>>    [...]
>>>> +
>>>> +/***   sensor API   ***/
>>>> +
>>> You are introducing new concept here, the original framework and drivers
>>> cannot use this, right? any further plan to update original framework
>>> for this new feature?
>>>
>> Well, not new as such. Just a specific way to register sensors to
>> thermal framework. What is really new is the fact that we really need to
>> have sensors decoupled from thermal zone devices, and today we have
>> these concepts pretty merged together.
>> To answer your question, for now I am more concerned with the bindings
>> definition. Once that is at least agreed, then we can follow up with the
>> migration of existing drivers. For now, there are two examples in this
>> series, first one is using one existing thermal driver, which is the TI
>> SoC thermal driver, and the second one is the hwmon drivers, which are
>> existing sensor drivers, but are not thermal drivers.
>>
>> The plan forward, once this series is accepted is to migrate existing
>> drivers, yes, so that they can use device tree uniformly. Of course,
>> this needs help from driver authors.
>>
>> My proposal will be to follow up this series with a two fold migration.
>> First step to change the existing thermal drivers to have both, the
>> current support and the device tree support. And second step, for those
>> who wish to, we could remove the old code containing thermal data and
>> have only dt support. Of course, this requires drivers authors input.
>>
>>
>>
>>>> +static struct thermal_zone_device *
>>>> +thermal_zone_of_add_sensor(struct device_node *zone,
>>>> +               struct device_node *sensor, void *data,
>>>> +               int (*get_temp)(void *, long *),
>>>> +               int (*get_trend)(void *, long *))
>>>> +{
>>>> +    struct thermal_zone_device *tzd;
>>>> +    struct __thermal_zone *tz;
>>>> +
>>>> +    tzd = thermal_zone_get_zone_by_name(zone->name);
>>>> +    if (IS_ERR(tzd))
>>>> +        return ERR_PTR(-EPROBE_DEFER);
>>>> +
>>>> [...]
>>>> +
>>>> +/*
>>>> + * Here are the thermal trip types. This must
>>>> + * match with enum thermal_trip_type at
>>>> + * include/linux/thermal.h
>>>> + */
>>>> +#define THERMAL_TRIP_ACTIVE        0
>>>> +#define THERMAL_TRIP_PASSIVE        1
>>>> +#define THERMAL_TRIP_HOT        2
>>>> +#define THERMAL_TRIP_CRITICAL        3
>>>> +
>>> These macros seem duplicated with enum thermal_trip_type in thermal.h,
>>> do you have further plan to merge them?
>>> Or by using string "active",  "passive" etc in the dts, then you can
>>> reuse the original enum definition.
>> I am changing this so that in DT we have string constants, and we keep a
>> map from string to enum, just like we have for phy-mode, as suggested by
>> Mark.
>>
>> You can have a taste of it here:
>> https://git.kernel.org/cgit/linux/kernel/git/evalenti/linux.git/commit/?h=thermal_work/thermal_core/dt_parser_rfc_v4&id=73f16c27fc763495188fba7d6e17b9c986efc6ac
>>
>>
>> I will be reposting this version once we are done with this thread
>> discussion and I am finished with my current test.
>>
>> If you have the time, I would appreciate if you could try the series on
>> your board, as I am don't have access to your hardware. It would be
>> really nice to see how this work is behaving in other environments then
>> the one I have.
> 
> We have thermal management plan in next year, currently I don't have
> proper board to test this.
> I would like to do it when I have time and board, but I will track these
> thermal treads.

OK Zhang, no worries then.

>>
>> Thanks for your interest in this work.
>>
>>>> +/* On cooling devices upper and lower limits */
>>>> +#define THERMAL_NO_LIMIT        (-1UL)
>>>> +
>>>> +#endif
>>>> [...]
>>>
>>>
>>>
>>
> 
> 
> 
> 
>
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
new file mode 100644
index 0000000..6664533
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/thermal.txt
@@ -0,0 +1,498 @@ 
+* Thermal Framework Device Tree descriptor
+
+Generic binding to provide a way of defining hardware thermal
+structure using device tree. A thermal structure includes thermal
+zones and their components, such as trip points, polling intervals,
+sensors and cooling devices binding descriptors.
+
+The target of device tree thermal descriptors is to describe only
+the hardware thermal aspects, not how the system must control or which
+algorithm or policy must be taken in place.
+
+There are five types of nodes involved to describe thermal bindings:
+- sensors: used to describe the device source of temperature sensing;
+- cooling devices: used to describe devices source of power dissipation control;
+- trip points: used to describe points in temperature domain defined to
+make the system aware of hardware limits;
+- cooling attachments: used to describe links between trip points and
+cooling devices;
+- thermal zones: used to describe thermal data within the hardware;
+
+It follows a description of each type of these device tree nodes.
+
+* Sensor devices
+
+Sensor devices are nodes providing temperature sensing capabilities on thermal
+zones. Typical devices are I2C ADC converters and bandgaps. Theses are nodes
+providing temperature data to thermal zones. Temperature sensor devices may
+control one or more internal sensors.
+
+Required property:
+- #sensor-cells:	Used to provide sensor device specific information
+			while referring to it. Must be at least 1, in order
+			to identify uniquely the sensor instances within
+			the IC. See thermal zone binding for more details
+			on how consumers refer to sensor devices.
+
+* Cooling device nodes
+
+Cooling devices are nodes providing control on power dissipation. There
+are essentially two ways to provide control on power dissipation. First
+is by means of regulating device performance, which is known as passive
+cooling. Second is by means of activating devices in order to remove
+the dissipated heat, which is known as active cooling, e.g. regulating
+fan speeds. In both cases, cooling devices shall have a way to determine
+the level of cooling.
+
+Required property:
+- cooling-min-level:	A unsigned integer indicating the smallest
+			cooling level accepted. Typically 0.
+- cooling-max-level:	An unsigned integer indicating the largest
+			cooling level accepted.
+- #cooling-cells:	Used to provide cooling device specific information
+			while referring to it. Must be at least 2, in order
+			to specify minimum and maximum cooling level used
+			in the reference. See Cooling device attachments section
+			below for more details on how consumers refer to
+			cooling devices.
+
+* Trip points
+
+The trip node is a node to describe a point in the temperature domain
+in which the system takes an action. This node describes just the point,
+not the action.
+
+Required properties:
+- temperature:		the trip temperature level, in milliCelsius.
+- hysteresis:		a (low) hysteresis value on 'temperature'. This is a
+			relative value, in milliCelsius.
+- type:			the trip type. Here is the type mapping:
+	THERMAL_TRIP_ACTIVE	0:	A trip point to enable active cooling
+	THERMAL_TRIP_PASSIVE	1:	A trip point to enable passive cooling
+	THERMAL_TRIP_HOT	2:	A trip point to notify emergency
+	THERMAL_TRIP_CRITICAL	3:	Hardware not reliable.
+
+Refer to include/dt-bindings/thermal/thermal.h for definition of these consts.
+
+* Cooling device attachments
+
+The cooling device attachments node is a node to describe how cooling devices
+get assigned to trip points of the zone. The cooling devices are expected
+to be loaded in the target system.
+
+Required properties:
+- cooling-device:	A phandle of a cooling device with its parameters,
+			referring to which cooling device is used in this
+			binding. The required parameters are: the minimum
+			cooling level and the maximum cooling level used
+			in this attach.
+- trip:			A phandle of a trip point node within the same thermal
+			zone.
+
+Optional property:
+- contribution:		The cooling contribution to the thermal zone of the
+			referred cooling device at the referred trip point.
+			The contribution is a value from 0 to 100. The sum
+			of all cooling contributions within a thermal zone
+			must never exceed 100.
+
+Note: Using the THERMAL_NO_LIMIT (-1L) constant in the cooling-device phandle
+limit parameters means:
+(i)   - minimum level allowed for minimum cooling level used in the reference.
+(ii)  - maximum level allowed for maximum cooling level used in the reference.
+Refer to include/dt-bindings/thermal/thermal.h for definition of this constant.
+
+* Thermal zones
+
+The thermal-zone node is the node containing all the required info
+for describing a thermal zone, including its cdev bindings. The thermal_zone
+node must contain, apart from its own properties, one node containing
+trip nodes and one node containing all the zone cooling attachments.
+
+Required properties:
+- passive-delay:	The maximum number of milliseconds to wait between polls
+			when performing passive cooling.
+- polling-delay:	The maximum number of milliseconds to wait between polls
+			when checking this thermal zone.
+- sensors:		A list of sensor phandles and their parameters. The
+			required parameter is the sensor id, in order to
+			identify internal sensors when the sensor IC features
+			several sensing units.
+- trips:		A sub-node containing several trip point nodes required
+			to describe the thermal zone.
+- cooling-attachments	A sub-node containing several cooling device attaches
+			nodes, used to describe the relation between trips
+			and cooling devices.
+
+Optional property:
+- coefficients:		An array of integers (one signed cell) containing
+			coefficients to compose a linear relation between
+			the sensors described in the sensors property.
+			Coefficients defaults to 1, in case this property
+			is not specified. A simple linear polynomial is used:
+			Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1) + cn.
+
+			The coefficients are ordered and they match with sensors
+			by means of sensor ID. Additional coefficients are
+			interpreted as constant offsets.
+
+Note: The delay properties are bound to the maximum dT/dt (temperature
+derivative over time) in two situations for a thermal zone:
+(i)  - when active cooling is activated (passive-delay); and
+(ii) - when the zone just needs to be monitored (polling-delay).
+The maximum dT/dt is highly bound to hardware power consumption and dissipation
+capability.
+
+* Examples
+
+Below are several examples on how to use thermal data descriptors
+using device tree bindings:
+
+(a) - CPU thermal zone
+
+The CPU thermal zone example below describes how to setup one thermal zone
+using one single sensor as temperature source and many cooling devices and
+power dissipation control sources.
+
+#include <dt-bindings/thermal/thermal.h>
+
+cpus {
+	cpu0: cpu@0 {
+		...
+		cooling-min-level = <0>;
+		cooling-max-level = <3>;
+		#cooling-cells = <2>; /* min followed by max */
+	};
+	...
+};
+
+&i2c1 {
+	...
+	fan0: fan@0x48 {
+		...
+		cooling-min-level = <0>;
+		cooling-max-level = <9>;
+		#cooling-cells = <2>; /* min followed by max */
+	};
+};
+
+bandgap0: bandgap@0x0000ED00 {
+	...
+	#sensor-cells = <1>;
+};
+
+cpu-thermal: cpu-thermal {
+	passive-delay = <250>; /* milliseconds */
+	polling-delay = <1000>; /* milliseconds */
+
+		/* sensor       ID */
+        sensors = <&bandgap0     0>;
+
+        trips {
+                cpu-alert0: cpu-alert {
+                        temperature = <90000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_ACTIVE>;
+                };
+                cpu-alert1: cpu-alert {
+                        temperature = <100000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_PASSIVE>;
+                };
+                cpu-crit: cpu-crit {
+                        temperature = <125000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_CRITICAL>;
+                };
+        };
+
+	cooling-attachments {
+		attach0 {
+			trip = <&cpu-alert0>;
+			cooling-device = <&fan0 THERMAL_NO_LIMITS 4>;
+		};
+		attach1 {
+			trip = <&cpu-alert1>;
+			cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
+		};
+		attach2 {
+			trip = <&cpu-alert1>;
+			cooling-device =
+				<&cpu0 THERMAL_NO_LIMITS THERMAL_NO_LIMITS>;
+		};
+	};
+};
+
+In the example above, the ADC sensor at address 0x0000ED00 is used to monitor
+the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan device controlled
+via I2C bus 1, at adress 0x48, is used to remove the heat out of the thermal
+zone 'cpu-thermal' using its cooling levels from its minimum to 4, when it
+reaches trip point 'cpu-alert0' at 90C, as an example of active cooling. The
+same cooling device is used at 'cpu-alert1', but from 5 to its maximum level.
+The cpu@0 device is also linked to the same thermal zone, 'cpu-thermal', as a
+passive cooling device, using all its cooling levels at trip point 'cpu-alert1',
+which is a trip point at 100C.
+
+(b) - IC with several internal sensors
+
+The example below describes how to deploy several thermal zones based off a
+single sensor IC, assuming it has several internal sensors. This is a common
+case on SoC designs with several internal IPs that may need different thermal
+requirements, and thus may have their own sensor to monitor or detect internal
+hotspots in their silicon.
+
+#include <dt-bindings/thermal/thermal.h>
+
+bandgap0: bandgap@0x0000ED00 {
+	...
+	#sensor-cells = <1>;
+};
+
+cpu-thermal: cpu-thermal {
+	passive-delay = <250>; /* milliseconds */
+	polling-delay = <1000>; /* milliseconds */
+
+		/* sensor       ID */
+        sensors = <&bandgap0     0>;
+
+        trips {
+		/* each zone within the SoC may have its own trips */
+                cpu-alert: cpu-alert {
+                        temperature = <100000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_PASSIVE>;
+                };
+                cpu-crit: cpu-crit {
+                        temperature = <125000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_CRITICAL>;
+                };
+        };
+
+	cooling-attachments {
+		/* each zone within the SoC may have its own cooling */
+		...
+	};
+};
+
+gpu-thermal: gpu-thermal {
+	passive-delay = <120>; /* milliseconds */
+	polling-delay = <1000>; /* milliseconds */
+
+		/* sensor       ID */
+        sensors = <&bandgap0     1>;
+
+        trips {
+		/* each zone within the SoC may have its own trips */
+                gpu-alert: gpu-alert {
+                        temperature = <90000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_PASSIVE>;
+                };
+                gpu-crit: gpu-crit {
+                        temperature = <105000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_CRITICAL>;
+                };
+        };
+
+	cooling-attachments {
+		/* each zone within the SoC may have its own cooling */
+		...
+	};
+};
+
+dsp-thermal: dsp-thermal {
+	passive-delay = <50>; /* milliseconds */
+	polling-delay = <1000>; /* milliseconds */
+
+		/* sensor       ID */
+        sensors = <&bandgap0     2>;
+
+        trips {
+		/* each zone within the SoC may have its own trips */
+                dsp-alert: gpu-alert {
+                        temperature = <90000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_PASSIVE>;
+                };
+                dsp-crit: gpu-crit {
+                        temperature = <135000>; /* milliCelsius */
+                        hysteresis = <2000>; /* milliCelsius */
+                        type = <THERMAL_TRIP_CRITICAL>;
+                };
+        };
+
+	cooling-attachments {
+		/* each zone within the SoC may have its own cooling */
+		...
+	};
+};
+
+In the example above there is one bandgap IC which has the capability to
+monitor three sensors. The hardware has been designed so that sensors are
+placed on different places in the DIE to monitor different temperature
+hotspots: one for CPU thermal zone, one for GPU thermal zone and the
+other to monitor a DSP thermal zone.
+
+Thus, there is a need to assign each sensor provided by the bandgap IC
+to different thermal zones. This is achieved by means of using the
+#sensor-cells property and using the first parameter as sensor ID.
+In the example, then, bandgap.sensor0 is used to monitor CPU thermal zone,
+bandgap.sensor1 is used to monitor GPU thermal zone and bandgap.sensor2
+is used to monitor DSP thermal zone. Each zone may be uncorrelated,
+having its own dT/dt requirements, trips and cooling attachments.
+
+
+(c) - Several sensors within one single thermal zone
+
+The example below illustrates how to use more than one sensor within
+one thermal zone.
+
+#include <dt-bindings/thermal/thermal.h>
+
+&i2c1 {
+	...
+	adc: sensor@0x49 {
+		...
+		#sensor-cells = <1>;
+	};
+};
+
+bandgap0: bandgap@0x0000ED00 {
+	...
+	#sensor-cells = <1>;
+};
+
+cpu-thermal: cpu-thermal {
+	passive-delay = <250>; /* milliseconds */
+	polling-delay = <1000>; /* milliseconds */
+
+		/* sensor       ID */
+        sensors = <&bandgap0	0>,
+		  <&adc		0>;
+
+		/* hotspot = 100 * bandgap - 120 * adc + 484 */
+	coefficients = 		<100	-120	484>;
+
+        trips {
+		...
+        };
+
+	cooling-attachments {
+		...
+	};
+};
+
+In some cases, there is a need to use more than one sensor to extrapolate
+a thermal hotspot in the silicon. The above example illustrate this situation.
+For instance, it may be the case that a sensor external to CPU IP may be place
+close to CPU hotspot and together with internal CPU sensor, it is used
+to determine the hotspot. The hyppotetical extrapolation rule would be:
+		hotspot = 100 * bandgap - 120 * adc + 484
+
+The same idea can be used to add fixed offset:
+	passive-delay = <1000>; /* milliseconds */
+	polling-delay = <2500>; /* milliseconds */
+		hotspot = 1 * adc + 6000
+
+In the above equation, the hotspot is always 6C higher than what is read
+from the sensor ADC. The binding would be then:
+		/* sensor       ID */
+        sensors =  <&adc	0>;
+
+		/* hotspot = 1 * adc + 6000 */
+	coefficients = 		<1	6000>;
+
+(d) - Board thermal
+
+The board thermal example below illustrates how to setup one thermal zone
+with many sensors and many cooling devices.
+
+#include <dt-bindings/thermal/thermal.h>
+
+&i2c1 {
+	...
+	adc-dummy: sensor@0x50 {
+		...
+		#sensor-cells = <1>; /* sensor internal ID */
+	};
+};
+
+batt-thermal {
+	passive-delay = <500>; /* milliseconds */
+	polling-delay = <2500>; /* milliseconds */
+
+	/* sensor       ID */
+	sensors = <&adc-dummy     4>;
+
+	trips {
+		...
+	};
+
+	cooling-attachments {
+		...
+	};
+};
+
+board-thermal: board-thermal {
+	passive-delay = <1000>; /* milliseconds */
+	polling-delay = <2500>; /* milliseconds */
+
+		/* sensor       ID */
+	sensors = <&adc-dummy     0>,
+		  <&adc-dummy     1>,
+		  <&adc-dymmy     2>;
+			/*
+			 * An array of coefficients describing the sensor
+			 * linear relation. E.g.:
+			 * z = c1*x1 + c2*x2 + c3*x3
+			 */
+	coefficients =		<1200	-345	890>;
+
+	trips {
+		/* Trips are based on resulting linear equation */
+		cpu-trip: cpu-trip {
+			temperature = <60000>; /* milliCelsius */
+			hysteresis = <2000>; /* milliCelsius */
+			type = <THERMAL_TRIP_PASSIVE>;
+		};
+		gpu-trip: gpu-trip {
+			temperature = <55000>; /* milliCelsius */
+			hysteresis = <2000>; /* milliCelsius */
+			type = <THERMAL_TRIP_PASSIVE>;
+		}
+		lcd-trip: lcp-trip {
+			temperature = <53000>; /* milliCelsius */
+			hysteresis = <2000>; /* milliCelsius */
+			type = <THERMAL_TRIP_PASSIVE>;
+		};
+		crit-trip: crit-trip {
+			temperature = <68000>; /* milliCelsius */
+			hysteresis = <2000>; /* milliCelsius */
+			type = <THERMAL_TRIP_CRITICAL>;
+		};
+	};
+
+	cooling-attachments {
+		attach0 {
+			trip = <&cpu-trip>;
+			cooling-device = <&cpu0 0 2>;
+			contribution = <55>;
+		};
+		attach1 {
+			trip = <&gpu-trip>;
+			cooling-device = <&gpu0 0 2>;
+			contribution = <20>;
+		};
+		attach2 {
+			trip = <&lcd-trip>;
+			cooling-device = <&lcd0 5 10>;
+			contribution = <15>;
+		};
+	};
+};
+
+The above example is a mix of previous examples, a sensor IP with several internal
+sensors used to monitor different zones, one of them is composed by several sensors and
+with different cooling devices.
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index dbfc390..dd81eb8 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -29,6 +29,19 @@  config THERMAL_HWMON
 	  Say 'Y' here if you want all thermal sensors to
 	  have hwmon sysfs interface too.
 
+config THERMAL_OF
+	bool
+	prompt "APIs to parse thermal data out of device tree"
+	depends on OF
+	default y
+	help
+	  This options provides helpers to add the support to
+	  read and parse thermal data definitions out of the
+	  device tree blob.
+
+	  Say 'Y' here if you need to build thermal infrastructure
+	  based on device tree.
+
 choice
 	prompt "Default Thermal governor"
 	default THERMAL_DEFAULT_GOV_STEP_WISE
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 584b363..4b03956 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -7,6 +7,7 @@  thermal_sys-y			+= thermal_core.o
 
 # interface to/from other layers providing sensors
 thermal_sys-$(CONFIG_THERMAL_HWMON)		+= thermal_hwmon.o
+thermal_sys-$(CONFIG_THERMAL_OF)		+= of-thermal.o
 
 # governors
 thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE)	+= fair_share.o
diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
new file mode 100644
index 0000000..a32f393
--- /dev/null
+++ b/drivers/thermal/of-thermal.c
@@ -0,0 +1,775 @@ 
+/*
+ *  of-thermal.c - Generic Thermal Management device tree support.
+ *
+ *  Copyright (C) 2013 Texas Instruments
+ *  Copyright (C) 2013 Eduardo Valentin <eduardo.valentin@ti.com>
+ *
+ *
+ *  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, write to the Free Software Foundation, Inc.,
+ *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+#include <linux/thermal.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/of_device.h>
+#include <linux/of_platform.h>
+#include <linux/err.h>
+#include <linux/export.h>
+#include <linux/string.h>
+
+#include "thermal_core.h"
+
+/***   Private data structures to represent thermal device tree data ***/
+
+/**
+ * struct __thermal_trip - representation of a point in temperature domain
+ * @np: pointer to struct device_node that this trip point was created from
+ * @temperature: temperature value in miliCelsius
+ * @hysteresis: relative hysteresis in miliCelsius
+ * @type: trip point type
+ */
+
+struct __thermal_trip {
+	struct device_node *np;
+	unsigned long int temperature;
+	unsigned long int hysteresis;
+	enum thermal_trip_type type;
+};
+
+/**
+ * struct __thermal_bind_param - a match between trip and cooling device
+ * @cooling_device: a pointer to identify the referred cooling device
+ * @trip_id: the trip point index
+ * @usage: the percentage (from 0 to 100) of cooling contribution
+ * @min: minimum cooling level used at this trip point
+ * @max: maximum cooling level used at this trip point
+ */
+
+struct __thermal_bind_params {
+	struct device_node *cooling_device;
+	unsigned int trip_id;
+	unsigned int usage;
+	unsigned long min;
+	unsigned long max;
+};
+
+/**
+ * struct __thermal_zone - internal representation of a thermal zone
+ * @mode: current thermal zone device mode (enabled/disabled)
+ * @passive_delay: polling interval while passive cooling is activated
+ * @polling_delay: zone polling interval
+ * @ntrips: number of trip points
+ * @trips: an array of trip points (0..ntrips - 1)
+ * @num_tbps: number of thermal bind params
+ * @tbps: an array of thermal bind params (0..num_tbps - 1)
+ * @sensor_data: sensor private data used while reading temperature and trend
+ * @get_temp: sensor callback to read temperature
+ * @get_trend: sensor callback to read temperature trend
+ */
+
+struct __thermal_zone {
+	enum thermal_device_mode mode;
+	int passive_delay;
+	int polling_delay;
+
+	/* trip data */
+	int ntrips;
+	struct __thermal_trip *trips;
+
+	/* cooling binding data */
+	int num_tbps;
+	struct __thermal_bind_params *tbps;
+
+	/* sensor interface */
+	void *sensor_data;
+	int (*get_temp)(void *, long *);
+	int (*get_trend)(void *, long *);
+};
+
+/***   DT thermal zone device callbacks   ***/
+
+static int of_thermal_get_temp(struct thermal_zone_device *tz,
+			       unsigned long *temp)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	if (!data->get_temp)
+		return -EINVAL;
+
+	return data->get_temp(data->sensor_data, temp);
+}
+
+static int of_thermal_get_trend(struct thermal_zone_device *tz, int trip,
+				enum thermal_trend *trend)
+{
+	struct __thermal_zone *data = tz->devdata;
+	long dev_trend;
+	int r;
+
+	if (!data->get_trend)
+		return -EINVAL;
+
+	r = data->get_trend(data->sensor_data, &dev_trend);
+	if (r)
+		return r;
+
+	/* TODO: These intervals might have some thresholds, but in core code */
+	if (dev_trend > 0)
+		*trend = THERMAL_TREND_RAISING;
+	else if (dev_trend < 0)
+		*trend = THERMAL_TREND_DROPPING;
+	else
+		*trend = THERMAL_TREND_STABLE;
+
+	return 0;
+}
+
+static int of_thermal_bind(struct thermal_zone_device *thermal,
+			   struct thermal_cooling_device *cdev)
+{
+	struct __thermal_zone *data = thermal->devdata;
+	int i;
+
+	if (!data || IS_ERR(data))
+		return -ENODEV;
+
+	/* find where to bind */
+	for (i = 0; i < data->num_tbps; i++) {
+		struct __thermal_bind_params *tbp = data->tbps + i;
+
+		if (tbp->cooling_device == cdev->np) {
+			int ret;
+
+			ret = thermal_zone_bind_cooling_device(thermal,
+						tbp->trip_id, cdev,
+						tbp->min,
+						tbp->max);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int of_thermal_unbind(struct thermal_zone_device *thermal,
+			     struct thermal_cooling_device *cdev)
+{
+	struct __thermal_zone *data = thermal->devdata;
+	int i;
+
+	if (!data || IS_ERR(data))
+		return -ENODEV;
+
+	/* find where to unbind */
+	for (i = 0; i < data->num_tbps; i++) {
+		struct __thermal_bind_params *tbp = data->tbps + i;
+
+		if (tbp->cooling_device == cdev->np) {
+			int ret;
+
+			ret = thermal_zone_unbind_cooling_device(thermal,
+						tbp->trip_id, cdev);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int of_thermal_get_mode(struct thermal_zone_device *tz,
+			       enum thermal_device_mode *mode)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	*mode = data->mode;
+
+	return 0;
+}
+
+static int of_thermal_set_mode(struct thermal_zone_device *tz,
+			       enum thermal_device_mode mode)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	mutex_lock(&tz->lock);
+
+	if (mode == THERMAL_DEVICE_ENABLED)
+		tz->polling_delay = data->polling_delay;
+	else
+		tz->polling_delay = 0;
+
+	mutex_unlock(&tz->lock);
+
+	data->mode = mode;
+	thermal_zone_device_update(tz);
+
+	return 0;
+}
+
+static int of_thermal_get_trip_type(struct thermal_zone_device *tz, int trip,
+				    enum thermal_trip_type *type)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	*type = data->trips[trip].type;
+
+	return 0;
+}
+
+static int of_thermal_get_trip_temp(struct thermal_zone_device *tz, int trip,
+				    unsigned long *temp)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	*temp = data->trips[trip].temperature;
+
+	return 0;
+}
+
+static int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
+				    unsigned long temp)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	/* thermal fw should take care of data->mask & (1 << trip) */
+	data->trips[trip].temperature = temp;
+
+	return 0;
+}
+
+static int of_thermal_get_trip_hyst(struct thermal_zone_device *tz, int trip,
+				    unsigned long *hyst)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	*hyst = data->trips[trip].hysteresis;
+
+	return 0;
+}
+
+static int of_thermal_set_trip_hyst(struct thermal_zone_device *tz, int trip,
+				    unsigned long hyst)
+{
+	struct __thermal_zone *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	/* thermal fw should take care of data->mask & (1 << trip) */
+	data->trips[trip].hysteresis = hyst;
+
+	return 0;
+}
+
+static int of_thermal_get_crit_temp(struct thermal_zone_device *tz,
+				    unsigned long *temp)
+{
+	struct __thermal_zone *data = tz->devdata;
+	int i;
+
+	for (i = 0; i < data->ntrips; i++)
+		if (data->trips[i].type == THERMAL_TRIP_CRITICAL) {
+			*temp = data->trips[i].temperature;
+			return 0;
+		}
+
+	return -EINVAL;
+}
+
+static struct thermal_zone_device_ops of_thermal_ops = {
+	.get_mode = of_thermal_get_mode,
+	.set_mode = of_thermal_set_mode,
+
+	.get_trip_type = of_thermal_get_trip_type,
+	.get_trip_temp = of_thermal_get_trip_temp,
+	.set_trip_temp = of_thermal_set_trip_temp,
+	.get_trip_hyst = of_thermal_get_trip_hyst,
+	.set_trip_hyst = of_thermal_set_trip_hyst,
+	.get_crit_temp = of_thermal_get_crit_temp,
+
+	.bind = of_thermal_bind,
+	.unbind = of_thermal_unbind,
+};
+
+/***   sensor API   ***/
+
+static struct thermal_zone_device *
+thermal_zone_of_add_sensor(struct device_node *zone,
+			   struct device_node *sensor, void *data,
+			   int (*get_temp)(void *, long *),
+			   int (*get_trend)(void *, long *))
+{
+	struct thermal_zone_device *tzd;
+	struct __thermal_zone *tz;
+
+	tzd = thermal_zone_get_zone_by_name(zone->name);
+	if (IS_ERR(tzd))
+		return ERR_PTR(-EPROBE_DEFER);
+
+	tz = tzd->devdata;
+
+	mutex_lock(&tzd->lock);
+	tz->get_temp = get_temp;
+	tz->get_trend = get_trend;
+	tz->sensor_data = data;
+
+	tzd->ops->get_temp = of_thermal_get_temp;
+	tzd->ops->get_trend = of_thermal_get_trend;
+	mutex_unlock(&tzd->lock);
+
+	return tzd;
+}
+
+/**
+ * thermal_zone_of_sensor_register - registers a sensor to a DT thermal zone
+ * @dev: a valid struct device pointer of a sensor device. Must contain
+ *       a valid .of_node, for the sensor node.
+ * @sensor_id: a sensor identifier, in case the sensor IP has more
+ *             than one sensors
+ * @data: a private pointer (owned by the caller) that will be passed
+ *        back, when a temperature reading is needed.
+ * @get_temp: a pointer to a function that reads the sensor temperature.
+ * @get_trend: a pointer to a function that reads the sensor temperature trend.
+ *
+ * This function will search the list of thermal zones described in device
+ * tree and look for the zone that refer to the sensor device pointed by
+ * @dev->of_node as temperature providers. For the zone pointing to the
+ * sensor node, the sensor will be added to the DT thermal zone device.
+ *
+ * The thermal zone temperature is provided by the @get_temp function
+ * pointer. When called, it will have the private pointer @data back.
+ *
+ * The thermal zone temperature trend is provided by the @get_trend function
+ * pointer. When called, it will have the private pointer @data back.
+ *
+ * TODO:
+ * 01 - This function must enqueue the new sensor instead of using
+ * it as the only source of temperature values.
+ *
+ * 02 - There must be a way to match the sensor with all thermal zones
+ * that refer to it.
+ *
+ * Return: On success returns a valid struct thermal_zone_device,
+ * otherwise, it returns a corresponding ERR_PTR(). Caller must
+ * check the return value with help of IS_ERR() helper.
+ */
+struct thermal_zone_device *
+thermal_zone_of_sensor_register(struct device *dev, int sensor_id,
+				void *data, int (*get_temp)(void *, long *),
+				int (*get_trend)(void *, long *))
+{
+	struct device_node *np, *child, *sensor_np;
+
+	np = of_find_node_by_name(NULL, "thermal-zones");
+	if (!np)
+		return ERR_PTR(-ENODEV);
+
+	if (!dev || !dev->of_node)
+		return ERR_PTR(-EINVAL);
+
+	sensor_np = dev->of_node;
+
+	for_each_child_of_node(np, child) {
+		struct of_phandle_args sensor_specs;
+		int ret;
+
+		/* For now, thermal framework supports only 1 sensor per zone */
+		ret = of_parse_phandle_with_args(child, "sensors",
+						 "#sensor-cells",
+						 0, &sensor_specs);
+		if (ret)
+			continue;
+
+		if (sensor_specs.args_count < 1)
+			continue;
+
+		if (sensor_specs.np == sensor_np &&
+		    sensor_specs.args[0] == sensor_id) {
+			of_node_put(np);
+			return thermal_zone_of_add_sensor(child, sensor_np,
+							  data,
+							  get_temp,
+							  get_trend);
+		}
+	}
+	of_node_put(np);
+
+	return ERR_PTR(-ENODEV);
+}
+EXPORT_SYMBOL_GPL(thermal_zone_of_sensor_register);
+
+/**
+ * thermal_zone_of_sensor_unregister - unregisters a sensor from a DT thermal zone
+ * @dev: a valid struct device pointer of a sensor device. Must contain
+ *       a valid .of_node, for the sensor node.
+ * @tzd: a pointer to struct thermal_zone_device where the sensor is registered.
+ *
+ * This function removes the sensor callbacks and private data from the
+ * thermal zone device registered with thermal_zone_of_sensor_register()
+ * API. It will also silent the zone by remove the .get_temp() and .get_trend()
+ * thermal zone device callbacks.
+ *
+ * TODO: When the support to several sensors per zone is added, this
+ * function must search the sensor list based on @dev parameter.
+ *
+ */
+void thermal_zone_of_sensor_unregister(struct device *dev,
+				       struct thermal_zone_device *tzd)
+{
+	struct __thermal_zone *tz;
+
+	if (!dev || !tzd || !tzd->devdata)
+		return;
+
+	tz = tzd->devdata;
+
+	/* no __thermal_zone, nothing to be done */
+	if (!tz)
+		return;
+
+	mutex_lock(&tzd->lock);
+	tzd->ops->get_temp = NULL;
+	tzd->ops->get_trend = NULL;
+
+	tz->get_temp = NULL;
+	tz->get_trend = NULL;
+	tz->sensor_data = NULL;
+	mutex_unlock(&tzd->lock);
+}
+EXPORT_SYMBOL_GPL(thermal_zone_of_sensor_unregister);
+
+/***   functions parsing device tree nodes   ***/
+
+/**
+ * thermal_of_populate_bind_params - parse and fill cooling attachment data
+ * @np: DT node containing a cooling-attachment node
+ * @__tbp: data structure to be filled with cooling attachment info
+ * @trips: array of thermal zone trip points
+ * @ntrips: number of trip points inside trips.
+ *
+ * This function parses a cooling-attachment type of node represented by
+ * @np parameter and fills the read data into @__tbp data structure.
+ * It needs the already parsed array of trip points of the thermal zone
+ * in consideration.
+ *
+ * Return: 0 on success, proper error code otherwise
+ */
+static int thermal_of_populate_bind_params(struct device_node *np,
+					   struct __thermal_bind_params *__tbp,
+					   struct __thermal_trip *trips,
+					   int ntrips)
+{
+	struct of_phandle_args cooling_spec;
+	struct device_node *trip;
+	int ret, i;
+	u32 prop;
+
+	/* Default weight. Usage is optional */
+	__tbp->usage = 0;
+	ret = of_property_read_u32(np, "usage", &prop);
+	if (ret == 0)
+		__tbp->usage = prop;
+
+	trip = of_parse_phandle(np, "trip", 0);
+	if (!trip) {
+		pr_err("missing trip property\n");
+		return -ENODEV;
+	}
+
+	/* match using device_node */
+	for (i = 0; i < ntrips; i++)
+		if (trip == trips[i].np) {
+			__tbp->trip_id = i;
+			break;
+		}
+
+	if (i == ntrips) {
+		ret = -ENODEV;
+		goto end;
+	}
+
+	ret = of_parse_phandle_with_args(np, "cooling-device", "#cooling-cells",
+					 0, &cooling_spec);
+	if (ret < 0) {
+		pr_err("missing cooling_device property\n");
+		goto end;
+	}
+	__tbp->cooling_device = cooling_spec.np;
+	if (cooling_spec.args_count >= 2) { /* at least min and max */
+		__tbp->min = cooling_spec.args[0];
+		__tbp->max = cooling_spec.args[1];
+	} else {
+		pr_err("wrong reference to cooling device, missing limits\n");
+	}
+
+end:
+	of_node_put(trip);
+
+	return ret;
+}
+
+/**
+ * thermal_of_populate_trip - parse and fill one trip point data
+ * @np: DT node containing a trip point node
+ * @trip: trip point data structure to be filled up
+ *
+ * This function parses a trip point type of node represented by
+ * @np parameter and fills the read data into @trip data structure.
+ *
+ * Return: 0 on success, proper error code otherwise
+ */
+static int thermal_of_populate_trip(struct device_node *np,
+				    struct __thermal_trip *trip)
+{
+	int prop;
+	int ret;
+
+	ret = of_property_read_u32(np, "temperature", &prop);
+	if (ret < 0) {
+		pr_err("missing temperature property\n");
+		return ret;
+	}
+	trip->temperature = prop;
+
+	ret = of_property_read_u32(np, "hysteresis", &prop);
+	if (ret < 0) {
+		pr_err("missing hysteresis property\n");
+		return ret;
+	}
+	trip->hysteresis = prop;
+
+	ret = of_property_read_u32(np, "type", &prop);
+	if (ret < 0) {
+		pr_err("missing type property\n");
+		return ret;
+	}
+	trip->type = prop;
+
+	/* Required for cooling attachment matching */
+	trip->np = np;
+
+	return 0;
+}
+
+/**
+ * thermal_of_build_thermal_zone - parse and fill one thermal zone data
+ * @np: DT node containing a thermal zone node
+ *
+ * This function parses a thermal zone type of node represented by
+ * @np parameter and fills the read data into a __thermal_zone data structure
+ * and return this pointer.
+ *
+ * Return: On success returns a valid struct __thermal_zone,
+ * otherwise, it returns a corresponding ERR_PTR(). Caller must
+ * check the return value with help of IS_ERR() helper.
+ */
+static struct __thermal_zone *
+thermal_of_build_thermal_zone(struct device_node *np)
+{
+	struct device_node *child, *gchild;
+	struct __thermal_zone *tz;
+	int ret, i;
+	u32 prop;
+
+	if (!np) {
+		pr_err("no thermal zone np\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	tz = kzalloc(sizeof(*tz), GFP_KERNEL);
+	if (!tz)
+		return ERR_PTR(-ENOMEM);
+
+	ret = of_property_read_u32(np, "passive-delay", &prop);
+	if (ret < 0) {
+		pr_err("missing passive_delay property\n");
+		return ERR_PTR(ret);
+	}
+	tz->passive_delay = prop;
+
+	ret = of_property_read_u32(np, "polling-delay", &prop);
+	if (ret < 0) {
+		pr_err("missing polling_delay property\n");
+		return ERR_PTR(ret);
+	}
+	tz->polling_delay = prop;
+
+	/* trips */
+	child = of_get_child_by_name(np, "trips");
+
+	/* No trips provided */
+	if (!child)
+		goto finish;
+
+	tz->ntrips = of_get_child_count(child);
+	tz->trips = kzalloc(tz->ntrips * sizeof(*tz->trips), GFP_KERNEL);
+	if (!tz->trips)
+		return ERR_PTR(-ENOMEM);
+	i = 0;
+	for_each_child_of_node(child, gchild)
+		thermal_of_populate_trip(gchild, &tz->trips[i++]);
+
+	of_node_put(child);
+
+	/* cooling-attachments */
+	child = of_get_child_by_name(np, "cooling-attachments");
+
+	/* cooling-attachments provided */
+	if (!child)
+		goto finish;
+
+	tz->num_tbps = of_get_child_count(child);
+	tz->tbps = kzalloc(tz->num_tbps * sizeof(*tz->tbps), GFP_KERNEL);
+	if (!tz->tbps)
+		return ERR_PTR(-ENOMEM);
+	i = 0;
+	for_each_child_of_node(child, gchild)
+		thermal_of_populate_bind_params(gchild, &tz->tbps[i++],
+						tz->trips, tz->ntrips);
+
+finish:
+	tz->mode = THERMAL_DEVICE_DISABLED;
+
+	return tz;
+}
+
+static inline void of_thermal_free_zone(struct __thermal_zone *tz)
+{
+	kfree(tz->tbps);
+	kfree(tz->trips);
+	kfree(tz);
+}
+
+/**
+ * of_parse_thermal_zones - parse device tree thermal data
+ *
+ * Initialization function that can be called by machine initialization
+ * code to parse thermal data and populate the thermal framework
+ * with hardware thermal zones info. This function only parses thermal zones.
+ * Cooling devices and sensor devices nodes are supposed to be parsed
+ * by their respective drivers.
+ *
+ * Return: 0 on success, proper error code otherwise
+ *
+ */
+int __init of_parse_thermal_zones(void)
+{
+	struct device_node *np, *child;
+	struct __thermal_zone *tz;
+	struct thermal_zone_device_ops *ops;
+
+	np = of_find_node_by_name(NULL, "thermal-zones");
+	if (!np) {
+		pr_debug("unable to find thermal zones\n");
+		return 0; /* Run successfully on systems without thermal DT */
+	}
+
+	for_each_child_of_node(np, child) {
+		struct thermal_zone_device *zone;
+		struct thermal_zone_params *tzp;
+
+		tz = thermal_of_build_thermal_zone(child);
+		if (IS_ERR(tz)) {
+			pr_err("failed to build thermal zone %s: %ld\n",
+			       child->name,
+			       PTR_ERR(tz));
+			continue;
+		}
+
+		ops = kmemdup(&of_thermal_ops, sizeof(*ops), GFP_KERNEL);
+		if (!ops)
+			goto exit_free;
+
+		tzp = kzalloc(sizeof(*tzp), GFP_KERNEL);
+		if (!tzp) {
+			kfree(ops);
+			goto exit_free;
+		}
+
+		/* No hwmon because there might be hwmon drivers registering */
+		tzp->no_hwmon = true;
+
+		zone = thermal_zone_device_register(child->name, tz->ntrips,
+						    0, tz,
+						    ops, tzp,
+						    tz->passive_delay,
+						    tz->polling_delay);
+		if (IS_ERR(zone)) {
+			pr_err("Failed to build %s zone %ld\n", child->name,
+			       PTR_ERR(zone));
+			kfree(tzp);
+			kfree(ops);
+			of_thermal_free_zone(tz);
+			/* attempting to build remaining zones still */
+		}
+	}
+
+	return 0;
+
+exit_free:
+	of_thermal_free_zone(tz);
+
+	/* no memory available, so free what we have built */
+	of_thermal_destroy_zones();
+
+	return -ENOMEM;
+}
+
+/**
+ * of_thermal_destroy_zones - remove all zones parsed and allocated resources
+ *
+ * Finds all zones parsed and added to the thermal framework and remove them
+ * from the system, together with their resources.
+ *
+ */
+void __exit of_thermal_destroy_zones(void)
+{
+	struct device_node *np, *child;
+
+	np = of_find_node_by_name(NULL, "thermal-zones");
+	if (!np) {
+		pr_err("unable to find thermal zones\n");
+		return;
+	}
+
+	for_each_child_of_node(np, child) {
+		struct thermal_zone_device *zone;
+
+		zone = thermal_zone_get_zone_by_name(child->name);
+		if (IS_ERR(zone))
+			continue;
+
+		thermal_zone_device_unregister(zone);
+		kfree(zone->tzp);
+		kfree(zone->ops);
+		of_thermal_free_zone(zone->devdata);
+	}
+}
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 8a94300..a733241 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1371,7 +1371,7 @@  static void remove_trip_attrs(struct thermal_zone_device *tz)
  */
 struct thermal_zone_device *thermal_zone_device_register(const char *type,
 	int trips, int mask, void *devdata,
-	const struct thermal_zone_device_ops *ops,
+	struct thermal_zone_device_ops *ops,
 	const struct thermal_zone_params *tzp,
 	int passive_delay, int polling_delay)
 {
@@ -1751,8 +1751,14 @@  static int __init thermal_init(void)
 	if (result)
 		goto unregister_class;
 
+	result = of_parse_thermal_zones();
+	if (result)
+		goto exit_netlink;
+
 	return 0;
 
+exit_netlink:
+	genetlink_exit();
 unregister_governors:
 	thermal_unregister_governors();
 unregister_class:
@@ -1768,6 +1774,7 @@  error:
 
 static void __exit thermal_exit(void)
 {
+	of_thermal_destroy_zones();
 	genetlink_exit();
 	class_unregister(&thermal_class);
 	thermal_unregister_governors();
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 7cf2f66..3db339f 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -77,4 +77,13 @@  static inline int thermal_gov_user_space_register(void) { return 0; }
 static inline void thermal_gov_user_space_unregister(void) {}
 #endif /* CONFIG_THERMAL_GOV_USER_SPACE */
 
+/* device tree support */
+#ifdef CONFIG_THERMAL_OF
+int of_parse_thermal_zones(void);
+void of_thermal_destroy_zones(void);
+#else
+static inline int of_parse_thermal_zones(void) { return 0; }
+static inline void of_thermal_destroy_zones(void) { }
+#endif
+
 #endif /* __THERMAL_CORE_H__ */
diff --git a/include/dt-bindings/thermal/thermal.h b/include/dt-bindings/thermal/thermal.h
new file mode 100644
index 0000000..6dd6ccd
--- /dev/null
+++ b/include/dt-bindings/thermal/thermal.h
@@ -0,0 +1,27 @@ 
+/*
+ * This header provides constants for most thermal bindings.
+ *
+ * Copyright (C) 2013 Texas Instruments
+ *	Eduardo Valentin <eduardo.valentin@ti.com>
+ *
+ * GPLv2 only
+ */
+
+#ifndef _DT_BINDINGS_THERMAL_THERMAL_H
+#define _DT_BINDINGS_THERMAL_THERMAL_H
+
+/*
+ * Here are the thermal trip types. This must
+ * match with enum thermal_trip_type at
+ * include/linux/thermal.h
+ */
+#define THERMAL_TRIP_ACTIVE		0
+#define THERMAL_TRIP_PASSIVE		1
+#define THERMAL_TRIP_HOT		2
+#define THERMAL_TRIP_CRITICAL		3
+
+/* On cooling devices upper and lower limits */
+#define THERMAL_NO_LIMIT		(-1UL)
+
+#endif
+
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index b268d3c..b780c5b 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -143,6 +143,7 @@  struct thermal_cooling_device {
 	int id;
 	char type[THERMAL_NAME_LENGTH];
 	struct device device;
+	struct device_node *np;
 	void *devdata;
 	const struct thermal_cooling_device_ops *ops;
 	bool updated; /* true if the cooling device does not need update */
@@ -172,7 +173,7 @@  struct thermal_zone_device {
 	int emul_temperature;
 	int passive;
 	unsigned int forced_passive;
-	const struct thermal_zone_device_ops *ops;
+	struct thermal_zone_device_ops *ops;
 	const struct thermal_zone_params *tzp;
 	struct thermal_governor *governor;
 	struct list_head thermal_instances;
@@ -242,8 +243,31 @@  struct thermal_genl_event {
 };
 
 /* Function declarations */
+#ifdef CONFIG_THERMAL_OF
+struct thermal_zone_device *
+thermal_zone_of_sensor_register(struct device *dev, int id,
+				void *data, int (*get_temp)(void *, long *),
+				int (*get_trend)(void *, long *));
+void thermal_zone_of_sensor_unregister(struct device *dev,
+				       struct thermal_zone_device *tz);
+#else
+static inline struct thermal_zone_device *
+thermal_zone_of_sensor_register(struct device *dev, int id,
+				void *data, int (*get_temp)(void *, long *),
+				int (*get_trend)(void *, long *))
+{
+	return NULL;
+}
+
+static inline
+void thermal_zone_of_sensor_unregister(struct device *dev,
+				       struct thermal_zone_device *tz)
+{
+}
+
+#endif
 struct thermal_zone_device *thermal_zone_device_register(const char *, int, int,
-		void *, const struct thermal_zone_device_ops *,
+		void *, struct thermal_zone_device_ops *,
 		const struct thermal_zone_params *, int, int);
 void thermal_zone_device_unregister(struct thermal_zone_device *);