diff mbox

[RFC,02/14] drivers: thermal: introduce device tree parser

Message ID 1377299755-5134-3-git-send-email-eduardo.valentin@ti.com (mailing list archive)
State Superseded, archived
Delegated to: Zhang Rui
Headers show

Commit Message

Eduardo Valentin Aug. 23, 2013, 11:15 p.m. UTC
In order to be able to build thermal policies
based on generic sensors, like I2C device, that
can be places in different points on different boards,
there is a need to have a way to feed board dependent
data into the thermal framework.

This patch introduces a thermal data parser for device
tree. The parsed data is used to build thermal zones
and thermal binding parameters. The output data
can then be used to deploy thermal policies.

This patch adds also documentation regarding this
API and how to define tree nodes to use
this infrastructure.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
---
 .../devicetree/bindings/thermal/thermal.txt        | 160 +++++++
 drivers/thermal/Kconfig                            |  13 +
 drivers/thermal/Makefile                           |   1 +
 drivers/thermal/thermal_dt.c                       | 473 +++++++++++++++++++++
 include/dt-bindings/thermal/thermal.h              |  27 ++
 include/linux/thermal.h                            |   3 +
 6 files changed, 677 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
 create mode 100644 drivers/thermal/thermal_dt.c
 create mode 100644 include/dt-bindings/thermal/thermal.h

Comments

Mark Rutland Aug. 27, 2013, 10:22 a.m. UTC | #1
Hi,

On Sat, Aug 24, 2013 at 12:15:43AM +0100, Eduardo Valentin wrote:
> In order to be able to build thermal policies
> based on generic sensors, like I2C device, that
> can be places in different points on different boards,
> there is a need to have a way to feed board dependent
> data into the thermal framework.
> 
> This patch introduces a thermal data parser for device
> tree. The parsed data is used to build thermal zones
> and thermal binding parameters. The output data
> can then be used to deploy thermal policies.
> 
> This patch adds also documentation regarding this
> API and how to define tree nodes to use
> this infrastructure.
> 
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
> ---
>  .../devicetree/bindings/thermal/thermal.txt        | 160 +++++++
>  drivers/thermal/Kconfig                            |  13 +
>  drivers/thermal/Makefile                           |   1 +
>  drivers/thermal/thermal_dt.c                       | 473 +++++++++++++++++++++
>  include/dt-bindings/thermal/thermal.h              |  27 ++
>  include/linux/thermal.h                            |   3 +
>  6 files changed, 677 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
>  create mode 100644 drivers/thermal/thermal_dt.c
>  create mode 100644 include/dt-bindings/thermal/thermal.h
> 
> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
> new file mode 100644
> index 0000000..af20ab0
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
> @@ -0,0 +1,160 @@
> +----------------------------------------
> +Thermal Framework Device Tree descriptor
> +----------------------------------------
> +
> +This file describes how to define a thermal structure using device tree.
> +A thermal structure includes thermal zones and their components, such
> +as name, trip points, polling intervals and cooling devices binding
> +descriptors. A binding descriptor may contain information on how to
> +react, with a cooling stepped action or a weight on a fair share.
> +The target of device tree thermal descriptors is to describe only
> +the hardware thermal aspects, not how the system must control or which
> +algorithm or policy must take place. It follows a description of each
> +type of device tree node.
> +
> +****
> +trip
> +****
> +
> +The trip node is a node to describe a point in the temperature domain
> +in which the system takes an action. This node describes just the point,
> +not the action.
> +
> +A node describing a trip must contain:
> +- temperature: the trip temperature level, in milliCelsius.
> +- hysteresis: a (low) hysteresis value on 'temperature'. This is a relative
> +value, in milliCelsius.

Presumably this is a single (u32) cell?

> +- type: the trip type. Here is the type mapping:
> +       THERMAL_TRIP_ACTIVE
> +       THERMAL_TRIP_PASSIVE
> +       THERMAL_TRIP_HOT
> +       THERMAL_TRIP_CRITICAL

What type is this property?

What are these values? Do you need to refer to a header somewhere?
Symbolic constants should have an associated value for an ABI.

> +
> +**********
> +bind_param
> +**********
> +
> +The bind_param node is a node to describe how cooling devices get assigned
> +to trip points of the zone. The cooling devices are expected to be loaded
> +in the target system.
> +
> +A node describing a bind_param must contain:
> +- cooling_device: A string with the cooling device name.

Please use '-' rather than '_' in bindings. That's the normal devicetree
convention, and there's no point gonig against it and causing more
confusion.

What are valid values of this string, and what do they mean?

Why not a phandle + cells binding to cooling devices?

> +- weight: The 'influence' of a particular cooling device on this zone.
> +             This is on a percentage scale. The sum of all these weights
> +             (for a particular zone) cannot exceed 100.

This is a bit fuzzy, and certainly needs more guidance.

> +- trip_mask: This is a bit mask that gives the binding relation between
> +               this thermal zone and cdev, for a particular trip point.
> +               If nth bit is set, then the cdev and thermal zone are bound
> +               for trip point n.

What type is this?

The nth bit from which end?

Could you explain what "bound for trip point n" means?

Just because this is a bitmask in Linux does not mean it needs to be so
in the dt.

> +- limits: An array integers consisting of 2-tuples items, and each item means
> +  lower and upper state limits, like <min-state max-state>.
> +
> +**************************
> +temperature sensor devices
> +**************************
> +
> +Temperature sensor devices have to be linked to a specific thermal zone.
> +This is done by means of the 'monitored-zones' list.
> +- monitored-zones: A list of thermal zones phandles, identifying which
> +thermal zones the temperature sensor device is used to monitor.

Why does this need to be described that way around?

Can the zone not have a link to the sensor(s) it needs?

> +
> +************
> +thermal_zone
> +************
> +
> +The thermal_zone node is the node containing all the required info
> +for describing a thermal zone, including its cdev bindings. The thermal_zone
> +node must contain, apart from its own properties, one node containing
> +trip nodes and one node containing all the zone bind parameters.
> +
> +Required properties:
> +- mask: Bit string: If 'n'th bit is set, then trip point 'n' is writeable.

What type is a bit string in devicetree? I'm aware of cells and (byte)
strings, but not bit strings.

What does 'writeable' mean in this context?

There's presumably a better name than 'mask'.

> +
> +- passive_delay: number of milliseconds to wait between polls when
> +  performing passive cooling.
> +
> +- polling_delay: number of milliseconds to wait between polls when checking.

These sound very much like configuration of the driver rather than
hardware description. What if my temperature sensor can generate an
interrupt at some trigger temperature? Why would I need polling times?

Why does this need to be in the dt at all? Surely the OS can choose some
sensible polling period, possibly dynamically?

> +
> +- #thermal-cells: An integer that is used to specify how many cells compose
> +  sensor ids.

I don't see why this is necessary. If the zone itself described its
related sensors rather than the other way around, we wouldn't need it at
all. Most bindings so far have consumers link to their providers rather
than the other way around.

> +
> +Below is an example:
> +cpu_thermal: thermal_zone {
> +            #thermal-cells = <1>;
> +            mask = <0x03>; /* trips writability */
> +            passive_delay = <250>; /* milliseconds */
> +            polling_delay = <1000>; /* milliseconds */
> +            trips {
> +                    alert@100000{
> +                            temperature = <100000>; /* milliCelsius */
> +                            hysteresis = <0>; /* milliCelsius */
> +                            type = <THERMAL_TRIP_PASSIVE>;
> +                    };
> +                    crit@125000{
> +                            temperature = <125000>; /* milliCelsius */
> +                            hysteresis = <0>; /* milliCelsius */
> +                            type = <THERMAL_TRIP_CRITICAL>;
> +                    };
> +            };
> +            bind_params {
> +                    action@0{
> +                            cooling_device = "thermal-cpufreq";

NAK. That value was not defined anywhere, and represents Linux
infrastructure rather than hardware.

For this case, we could just as easily describe the thermal points, and
if no physical cooling device (e.g. a fan) is present, assume
cpufreq-base cooling.  Other OSs could choose to do otherwise, and we
could change later...

> +                            weight = <100>; /* percentage */
> +                            mask = <0x01>;

This should presumably be trip-mask (assuming my suggested corrections).

What is it supposed to mean here?

> +                           limits = <
> +                             /* lower-state            upper-state */
> +                               1                       THERMAL_NO_LIMITS
> +                               THERMAL_NO_LIMITS       THERMAL_NO_LIMITS
> +                           >;
> +                    };
> +            };
> +};
> +
> +sensor: adc@0xFFFF {
> +       ...
> +       monitored-zones = <&cpu_thermal 0>;
> +};
> +
> +cpu@0 {
> +       ...
> +       cooling-zones = <&cpu_thermal>;
> +};
> +
> +In the example above, the ADC sensor at address 0xFFFF is used to monitor
> +the zone 'cpu_thermal' using its the sensor 0. The cpu@0 device is also linked
> +to the same thermal zone 'cpu_thermal' as a cooling device.

Huh? That seems to imply the the '0' in '<&cpu_thermal 0>' on the sensor
node describes a property of the sensor rather than the thermal node,
unless I've misunderstood?

> +
> +***************
> +cpufreq-cooling
> +***************
> +
> +The generic cpu cooling (freq clipping) can be loaded by the
> +generic cpufreq-cpu0 driver in case the device tree node informs
> +this need.
> +
> +In order to load the cpufreq cooling device, it is possible to
> +inform that the cpu needs cooling by adding the list of thermal zones
> +in the 'cooling-zones' property at the cpu0 phandle.

This should be reworded so as to describe the binding rather than the
Linux behaviour based on the binding.

> +
> +Example:
> +       cpus {
> +               cpu@0 {
> +                       operating-points = <
> +                               /* kHz    uV */
> +                               800000  1313000
> +                               1008000 1375000
> +                       >;
> +                       cooling-zones = <&cpu_thermal>;
> +               };
> +       ...
> +       };
> +
> +The above will cause the cpufreq-cpu0 driver to understand that
> +the cpu will need passive cooling and while probing that node it will
> +also load the cpufreq cpu cooling device in that system.

The above describe that the CPU is part of a cooling (thermal?) zone and
requires cooling.

> +
> +However, be advised that this flag will not describe what needs to
> +be done or how the cpufreq cooling device will be used. In order to
> +accomplish this, one needs to write a phandle for a thermal zone, as
> +described in the section 'thermal_zone'.

The above already has a phandle to a thermal (cooling?) zone.

Please choose either cooling zone or thermal zone, having the two names
makes this more confusing than necessary.

> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index 7fb16bc..753f0dc 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -29,6 +29,19 @@ config THERMAL_HWMON
>           Say 'Y' here if you want all thermal sensors to
>           have hwmon sysfs interface too.
> 
> +config THERMAL_OF
> +       bool
> +       prompt "APIs to parse thermal data out of device tree"
> +       depends on OF
> +       default y
> +       help
> +         This options provides helpers to add the support to
> +         read and parse thermal data definitions out of the
> +         device tree blob.
> +
> +         Say 'Y' here if you need to build thermal infrastructure
> +         based on device tree.
> +
>  choice
>         prompt "Default Thermal governor"
>         default THERMAL_DEFAULT_GOV_STEP_WISE
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 24cb894..eedb273 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -7,6 +7,7 @@ thermal_sys-y                   += thermal_core.o
> 
>  # interface to/from other layers providing sensors
>  thermal_sys-$(CONFIG_THERMAL_HWMON)            += thermal_hwmon.o
> +thermal_sys-$(CONFIG_THERMAL_OF)               += thermal_dt.o
> 
>  # governors
>  thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE)   += fair_share.o
> diff --git a/drivers/thermal/thermal_dt.c b/drivers/thermal/thermal_dt.c
> new file mode 100644
> index 0000000..cc35485
> --- /dev/null
> +++ b/drivers/thermal/thermal_dt.c
> @@ -0,0 +1,473 @@
> +/*
> + *  thermal_dt.c - Generic Thermal Management device tree support.
> + *
> + *  Copyright (C) 2013 Texas Instruments
> + *  Copyright (C) 2013 Eduardo Valentin <eduardo.valentin@ti.com>
> + *  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; version 2 of the License.
> + *
> + *  This program is distributed in the hope that it will be useful, but
> + *  WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *  General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License along
> + *  with this program; if not, write to the Free Software Foundation, Inc.,
> + *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
> + *
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + */
> +#include <linux/thermal.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/of_device.h>
> +#include <linux/of_platform.h>
> +#include <linux/err.h>
> +#include <linux/export.h>
> +#include <linux/string.h>
> +
> +struct __thermal_bind_params {
> +       char cooling_device[THERMAL_NAME_LENGTH];
> +};
> +
> +static
> +int thermal_of_match(struct thermal_zone_device *tz,
> +                    struct thermal_cooling_device *cdev);
> +
> +static int thermal_of_populate_bind_params(struct device *dev,
> +                                          struct device_node *node,
> +                                          struct __thermal_bind_params *__tbp,
> +                                          struct thermal_bind_params *tbp,
> +                                          int ntrips)
> +{
> +       const char *cdev_name;
> +       u32 *limits;
> +       int ret;
> +       u32 prop;
> +
> +       ret = of_property_read_u32(node, "weight", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing weight property\n");
> +               return ret;
> +       }
> +       tbp->weight = prop;
> +
> +       ret = of_property_read_u32(node, "mask", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing mask property\n");
> +               return ret;
> +       }
> +       tbp->trip_mask = prop;
> +
> +       /* this will allow us to bind with cooling devices */
> +       tbp->match = thermal_of_match;
> +
> +       ret = of_property_read_string(node, "cooling_device", &cdev_name);
> +       if (ret < 0) {
> +               dev_err(dev, "missing cooling_device property\n");
> +               return ret;
> +       }
> +       strncpy(__tbp->cooling_device, cdev_name,
> +               sizeof(__tbp->cooling_device));
> +
> +       limits = kzalloc(ntrips * sizeof(*limits) * 2, GFP_KERNEL);

Why do you use kzalloc here, but devm_kzalloc elsewhere? That'll lead to
problems as the driver gets refactored over time.

> +       if (!limits) {
> +               dev_err(dev, "not enough mem for reading limits\n");
> +               return -ENOMEM;
> +       }
> +       ret = of_property_read_u32_array(node, "limits", limits, 2 * ntrips);
> +       if (!ret) {
> +               int i = ntrips;
> +
> +               tbp->binding_limits = devm_kzalloc(dev,
> +                                                  ntrips * 2 *
> +                                                  sizeof(*tbp->binding_limits),
> +                                                  GFP_KERNEL);
> +               if (!tbp->binding_limits) {
> +                       dev_err(dev, "not enough mem for storing limits\n");
> +                       return -ENOMEM;
> +               }
> +               while (i--)
> +                       tbp->binding_limits[i] = limits[i];
> +       }
> +       kfree(limits);
> +
> +       return 0;

What if you failed to read the array? Surely you should return an error?

> +}
> +
> +struct __thermal_trip {
> +       unsigned long int temperature;
> +       unsigned long int hysteresis;
> +       enum thermal_trip_type type;
> +};
> +
> +static
> +int thermal_of_populate_trip(struct device *dev,
> +                            struct device_node *node,
> +                            struct __thermal_trip *trip)
> +{
> +       int ret;
> +       int prop;
> +
> +       ret = of_property_read_u32(node, "temperature", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing temperature property\n");
> +               return ret;
> +       }
> +       trip->temperature = prop;
> +
> +       ret = of_property_read_u32(node, "hysteresis", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing hysteresis property\n");
> +               return ret;
> +       }
> +       trip->hysteresis = prop;
> +
> +       ret = of_property_read_u32(node, "type", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing type property\n");
> +               return ret;
> +       }
> +       trip->type = prop;

This will require sanity checking.

> +
> +       return 0;
> +}
> +
> +struct __thermal_zone_device {
> +       enum thermal_device_mode mode;
> +       int passive_delay;
> +       int polling_delay;
> +       int mask;
> +       int ntrips;
> +       char type[THERMAL_NAME_LENGTH];
> +       struct __thermal_trip *trips;
> +       struct __thermal_bind_params *bind_params;
> +       struct thermal_bind_params *tbps;
> +       struct thermal_zone_params zone_params;
> +       int (*get_temp)(void *, unsigned long *);
> +       void *devdata;
> +};
> +
> +static struct __thermal_zone_device *
> +thermal_of_build_thermal_zone(struct device *dev, struct device_node *node)
> +{
> +       struct device_node *child, *gchild;
> +       struct __thermal_zone_device *tz;
> +       int ret, i;
> +       u32 prop;
> +
> +       if (!node) {
> +               dev_err(dev, "no thermal zone node\n");
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       tz = devm_kzalloc(dev, sizeof(*tz), GFP_KERNEL);
> +       if (!tz) {
> +               dev_err(dev, "not enough memory for thermal of zone\n");
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       ret = of_property_read_u32(node, "passive_delay", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing passive_delay property\n");
> +               return ERR_PTR(ret);
> +       }
> +       tz->passive_delay = prop;
> +
> +       ret = of_property_read_u32(node, "polling_delay", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing polling_delay property\n");
> +               return ERR_PTR(ret);
> +       }
> +       tz->polling_delay = prop;
> +
> +       ret = of_property_read_u32(node, "mask", &prop);
> +       if (ret < 0) {
> +               dev_err(dev, "missing mask property\n");
> +               return ERR_PTR(ret);
> +       }
> +       tz->mask = prop;
> +
> +       /* assume node name as our zone type */
> +       strncpy(tz->type, node->name, sizeof(tz->type));
> +
> +       /* default policy */
> +       strncpy(tz->zone_params.governor_name, "step_wise",
> +               sizeof(tz->zone_params.governor_name));
> +
> +       /* trips */
> +       child = of_find_node_by_name(node, "trips");

That might not be a child node, as of_find_node_by_name walks over the
list of all nodes. Use of_get_child_by_name.

> +
> +       /* No trips provided */
> +       if (!child)
> +               goto finish;
> +
> +       tz->ntrips = of_get_child_count(child);
> +       tz->trips = devm_kzalloc(dev, tz->ntrips * sizeof(*tz->trips),
> +                                GFP_KERNEL);
> +       if (!tz->trips)
> +               return ERR_PTR(-ENOMEM);
> +       i = 0;
> +       for_each_child_of_node(child, gchild)
> +               thermal_of_populate_trip(dev, gchild, &tz->trips[i++]);

What if a child node was not a valid trip node? It seems
thermal_of_populate_trip returns errors you're ignoring.

> +
> +       /* bind_params */
> +       child = of_find_node_by_name(node, "bind_params");

Another of_get_child_by_name candidate.

> +
> +       /* No binding info */
> +       if (!child)
> +               goto finish;
> +
> +       tz->zone_params.num_tbps = of_get_child_count(child);
> +       tz->bind_params = devm_kzalloc(dev,
> +                                      tz->zone_params.num_tbps *
> +                                               sizeof(*tz->bind_params),
> +                                      GFP_KERNEL);
> +       if (!tz->bind_params)
> +               return ERR_PTR(-ENOMEM);
> +       tz->zone_params.tbp = devm_kzalloc(dev,
> +                                          tz->zone_params.num_tbps *
> +                                               sizeof(*tz->zone_params.tbp),
> +                                          GFP_KERNEL);
> +       if (!tz->zone_params.tbp)
> +               return ERR_PTR(-ENOMEM);
> +       i = 0;
> +       for_each_child_of_node(child, gchild) {
> +               thermal_of_populate_bind_params(dev, gchild,
> +                                               &tz->bind_params[i],
> +                                               &tz->zone_params.tbp[i],
> +                                               tz->ntrips);

Similarly, you're ignoring error conditions here. What if a bind_params
node was malformed?

> +               i++;
> +       }
> +
> +finish:
> +       tz->mode = THERMAL_DEVICE_ENABLED;
> +
> +       return tz;
> +}
> +
> +static
> +int thermal_of_match(struct thermal_zone_device *tz,
> +                    struct thermal_cooling_device *cdev)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +       int i;
> +
> +       for (i = 0; i < data->zone_params.num_tbps; i++) {
> +               if (!strncmp(data->bind_params[i].cooling_device,
> +                            cdev->type,
> +                            strlen(data->bind_params[i].cooling_device)) &&
> +                   (data->zone_params.tbp[i].trip_mask & (1 << i)))
> +                       return 0;
> +       }
> +
> +       return -EINVAL;
> +}

I'm not keen on binding to cooling devices using a string, as suddenly a
name to allow humans to inspect the system is turned into an ABI.

> +
> +static
> +int of_thermal_get_temp(struct thermal_zone_device *tz,
> +                       unsigned long *temp)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       return data->get_temp(data->devdata, temp);
> +}
> +
> +static
> +int of_thermal_get_mode(struct thermal_zone_device *tz,
> +                       enum thermal_device_mode *mode)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       *mode = data->mode;
> +
> +       return 0;
> +}
> +
> +static
> +int of_thermal_set_mode(struct thermal_zone_device *tz,
> +                       enum thermal_device_mode mode)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       mutex_lock(&tz->lock);
> +
> +       if (mode == THERMAL_DEVICE_ENABLED)
> +               tz->polling_delay = data->polling_delay;
> +       else
> +               tz->polling_delay = 0;
> +
> +       mutex_unlock(&tz->lock);
> +
> +       data->mode = mode;
> +       thermal_zone_device_update(tz);
> +
> +       return 0;
> +}
> +
> +static
> +int of_thermal_get_trip_type(struct thermal_zone_device *tz, int trip,
> +                            enum thermal_trip_type *type)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       if (trip >= data->ntrips || trip < 0)
> +               return -EDOM;

It's far more common to use -EINVAL.

> +
> +       *type = data->trips[trip].type;
> +
> +       return 0;
> +}
> +
> +static
> +int of_thermal_get_trip_temp(struct thermal_zone_device *tz, int trip,
> +                            unsigned long *temp)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       if (trip >= data->ntrips || trip < 0)
> +               return -EDOM;
> +
> +       *temp = data->trips[trip].temperature;
> +
> +       return 0;
> +}
> +
> +static
> +int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
> +                            unsigned long temp)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       if (trip >= data->ntrips || trip < 0)
> +               return -EDOM;
> +
> +       /* thermal fw should take care of data->mask & (1 << trip) */
> +       data->trips[trip].temperature = temp;
> +
> +       return 0;
> +}
> +
> +static
> +int of_thermal_get_trip_hyst(struct thermal_zone_device *tz, int trip,
> +                            unsigned long *hyst)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       if (trip >= data->ntrips || trip < 0)
> +               return -EDOM;
> +
> +       *hyst = data->trips[trip].hysteresis;
> +
> +       return 0;
> +}
> +
> +static
> +int of_thermal_set_trip_hyst(struct thermal_zone_device *tz, int trip,
> +                            unsigned long hyst)
> +{
> +       struct __thermal_zone_device *data = tz->devdata;
> +
> +       if (trip >= data->ntrips || trip < 0)
> +               return -EDOM;
> +
> +       /* thermal fw should take care of data->mask & (1 << trip) */

Could you elaborate on that comment?

> +       data->trips[trip].hysteresis = hyst;
> +
> +       return 0;
> +}

[...]

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Aug. 27, 2013, 1:44 p.m. UTC | #2
Hello Mark,


First of all, thanks for taking the time to review in such level of
detail. Lets try to align and have a common understanding. Answers go
inline. Please let me know if I missed something or if I made it more
confusing :-)

On 27-08-2013 06:22, Mark Rutland wrote:
> Hi,
> 
> On Sat, Aug 24, 2013 at 12:15:43AM +0100, Eduardo Valentin wrote:
>> In order to be able to build thermal policies
>> based on generic sensors, like I2C device, that
>> can be places in different points on different boards,
>> there is a need to have a way to feed board dependent
>> data into the thermal framework.
>>
>> This patch introduces a thermal data parser for device
>> tree. The parsed data is used to build thermal zones
>> and thermal binding parameters. The output data
>> can then be used to deploy thermal policies.
>>
>> This patch adds also documentation regarding this
>> API and how to define tree nodes to use
>> this infrastructure.
>>
>> Cc: Zhang Rui <rui.zhang@intel.com>
>> Cc: linux-pm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
>> ---
>>  .../devicetree/bindings/thermal/thermal.txt        | 160 +++++++
>>  drivers/thermal/Kconfig                            |  13 +
>>  drivers/thermal/Makefile                           |   1 +
>>  drivers/thermal/thermal_dt.c                       | 473 +++++++++++++++++++++
>>  include/dt-bindings/thermal/thermal.h              |  27 ++
>>  include/linux/thermal.h                            |   3 +
>>  6 files changed, 677 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
>>  create mode 100644 drivers/thermal/thermal_dt.c
>>  create mode 100644 include/dt-bindings/thermal/thermal.h
>>


General question on device tree documentation. Assuming there is such an
effort to make it Linux independent, is there other places out of Linux
tree that these binding must be documented?


>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
>> new file mode 100644
>> index 0000000..af20ab0
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>> @@ -0,0 +1,160 @@
>> +----------------------------------------
>> +Thermal Framework Device Tree descriptor
>> +----------------------------------------
>> +
>> +This file describes how to define a thermal structure using device tree.
>> +A thermal structure includes thermal zones and their components, such
>> +as name, trip points, polling intervals and cooling devices binding
>> +descriptors. A binding descriptor may contain information on how to
>> +react, with a cooling stepped action or a weight on a fair share.
>> +The target of device tree thermal descriptors is to describe only
>> +the hardware thermal aspects, not how the system must control or which
>> +algorithm or policy must take place. It follows a description of each
>> +type of device tree node.
>> +
>> +****
>> +trip
>> +****
>> +
>> +The trip node is a node to describe a point in the temperature domain
>> +in which the system takes an action. This node describes just the point,
>> +not the action.
>> +
>> +A node describing a trip must contain:
>> +- temperature: the trip temperature level, in milliCelsius.
>> +- hysteresis: a (low) hysteresis value on 'temperature'. This is a relative
>> +value, in milliCelsius.
> 
> Presumably this is a single (u32) cell?

Yes, both of them above are a single u32.

> 
>> +- type: the trip type. Here is the type mapping:
>> +       THERMAL_TRIP_ACTIVE
>> +       THERMAL_TRIP_PASSIVE
>> +       THERMAL_TRIP_HOT
>> +       THERMAL_TRIP_CRITICAL
> 
> What type is this property?
> 
> What are these values? Do you need to refer to a header somewhere?

There is a header, introduced in this patch.
include/dt-bindings/thermal/thermal.h

> Symbolic constants should have an associated value for an ABI.

OK. Sounds reasonable to me. Shall we be descriptive and enlist the
values in this doc too or just pointing to header is fine?

> 
>> +
>> +**********
>> +bind_param
>> +**********
>> +
>> +The bind_param node is a node to describe how cooling devices get assigned
>> +to trip points of the zone. The cooling devices are expected to be loaded
>> +in the target system.
>> +
>> +A node describing a bind_param must contain:
>> +- cooling_device: A string with the cooling device name.
> 
> Please use '-' rather than '_' in bindings. That's the normal devicetree
> convention, and there's no point gonig against it and causing more
> confusion.

Agreed.

> 
> What are valid values of this string, and what do they mean?

These are the name of the cooling devices to match to.

> 
> Why not a phandle + cells binding to cooling devices?
> 

Yeah, this could work too. I am going to check how that would look like
and come back to you.

>> +- weight: The 'influence' of a particular cooling device on this zone.
>> +             This is on a percentage scale. The sum of all these weights
>> +             (for a particular zone) cannot exceed 100.
> 
> This is a bit fuzzy, and certainly needs more guidance.

OK. This property is used to describe the amount of cooling contribution
when activating a cooling device at a certain trip point. I will work on
a better guidance.

> 
>> +- trip_mask: This is a bit mask that gives the binding relation between
>> +               this thermal zone and cdev, for a particular trip point.
>> +               If nth bit is set, then the cdev and thermal zone are bound
>> +               for trip point n.
> 
> What type is this?
> 
> The nth bit from which end?
> 
> Could you explain what "bound for trip point n" means?
> 

It is just a way to match a cooling device with a set of trip points.
Say you want to match one specific cooling device with trips 0 and 1,
you would then create a bind_param with trip_mask = 0x3.

> Just because this is a bitmask in Linux does not mean it needs to be so
> in the dt.

Indeed.

In case we work with phandle + cell binging parameters, the trip and
weight could be passed as params.

	cooling-devices = <&devX tripY weightZ>, <... >;



> 
>> +- limits: An array integers consisting of 2-tuples items, and each item means
>> +  lower and upper state limits, like <min-state max-state>.
>> +
>> +**************************
>> +temperature sensor devices
>> +**************************
>> +
>> +Temperature sensor devices have to be linked to a specific thermal zone.
>> +This is done by means of the 'monitored-zones' list.
>> +- monitored-zones: A list of thermal zones phandles, identifying which
>> +thermal zones the temperature sensor device is used to monitor.
> 
> Why does this need to be described that way around?
> 
> Can the zone not have a link to the sensor(s) it needs?
> 

Do you see any issues with it?

Idea is that sensor nodes would list which zones they are used.  Same
idea goes to cooling devices. It can be the other way around, yes. It
depends on what is the best way to describe these relations. Idea would
be so that sensors and cooling devices would be composing a thermal zone.

>> +
>> +************
>> +thermal_zone
>> +************
>> +
>> +The thermal_zone node is the node containing all the required info
>> +for describing a thermal zone, including its cdev bindings. The thermal_zone
>> +node must contain, apart from its own properties, one node containing
>> +trip nodes and one node containing all the zone bind parameters.
>> +
>> +Required properties:
>> +- mask: Bit string: If 'n'th bit is set, then trip point 'n' is writeable.
> 
> What type is a bit string in devicetree? I'm aware of cells and (byte)
> strings, but not bit strings.
> 
> What does 'writeable' mean in this context?
> 
> There's presumably a better name than 'mask'.
> 

This is used to describe if trip points are writeable under sysfs. I
think this is a OS implementation leak. I will be removing this from
device tree and making them RO by default.

>> +
>> +- passive_delay: number of milliseconds to wait between polls when
>> +  performing passive cooling.
>> +
>> +- polling_delay: number of milliseconds to wait between polls when checking.
> 
> These sound very much like configuration of the driver rather than
> hardware description. What if my temperature sensor can generate an
> interrupt at some trigger temperature? Why would I need polling times?
> 
> Why does this need to be in the dt at all? Surely the OS can choose some
> sensible polling period, possibly dynamically?

Well, let me answer this one with another question. How the OS will
determine how fast is the temperature rise on a thermal zone described
in device tree?

> 
>> +
>> +- #thermal-cells: An integer that is used to specify how many cells compose
>> +  sensor ids.
> 
> I don't see why this is necessary. If the zone itself described its
> related sensors rather than the other way around, we wouldn't need it at
> all. Most bindings so far have consumers link to their providers rather
> than the other way around.
> 

I see.


>> +
>> +Below is an example:
>> +cpu_thermal: thermal_zone {
>> +            #thermal-cells = <1>;
>> +            mask = <0x03>; /* trips writability */
>> +            passive_delay = <250>; /* milliseconds */
>> +            polling_delay = <1000>; /* milliseconds */
>> +            trips {
>> +                    alert@100000{
>> +                            temperature = <100000>; /* milliCelsius */
>> +                            hysteresis = <0>; /* milliCelsius */
>> +                            type = <THERMAL_TRIP_PASSIVE>;
>> +                    };
>> +                    crit@125000{
>> +                            temperature = <125000>; /* milliCelsius */
>> +                            hysteresis = <0>; /* milliCelsius */
>> +                            type = <THERMAL_TRIP_CRITICAL>;
>> +                    };
>> +            };
>> +            bind_params {
>> +                    action@0{
>> +                            cooling_device = "thermal-cpufreq";
> 
> NAK. That value was not defined anywhere, and represents Linux
> infrastructure rather than hardware.
> 

Yes, this was another leak as discussed above in the upper part of the
doc file.

> For this case, we could just as easily describe the thermal points, and
> if no physical cooling device (e.g. a fan) is present, assume
> cpufreq-base cooling.  Other OSs could choose to do otherwise, and we
> could change later...
> 

The way other specifications do (aka ACPI) is to have trip types. In
case a trip type is passive cooling, it means one is supposed to
modulate the dissipated power rather than activating a device to remove
heat (active cooling).

The part of 'assuming' things that bothers me. I would rather have a way
to really say one wants to have passive cooling at a specific trip point.


>> +                            weight = <100>; /* percentage */
>> +                            mask = <0x01>;
> 
> This should presumably be trip-mask (assuming my suggested corrections).
> 
> What is it supposed to mean here?

As stated above, it means the trip points that this binding represents.

> 
>> +                           limits = <
>> +                             /* lower-state            upper-state */
>> +                               1                       THERMAL_NO_LIMITS
>> +                               THERMAL_NO_LIMITS       THERMAL_NO_LIMITS
>> +                           >;
>> +                    };
>> +            };
>> +};
>> +
>> +sensor: adc@0xFFFF {
>> +       ...
>> +       monitored-zones = <&cpu_thermal 0>;
>> +};
>> +
>> +cpu@0 {
>> +       ...
>> +       cooling-zones = <&cpu_thermal>;
>> +};
>> +
>> +In the example above, the ADC sensor at address 0xFFFF is used to monitor
>> +the zone 'cpu_thermal' using its the sensor 0. The cpu@0 device is also linked
>> +to the same thermal zone 'cpu_thermal' as a cooling device.
> 
> Huh? That seems to imply the the '0' in '<&cpu_thermal 0>' on the sensor
> node describes a property of the sensor rather than the thermal node,
> unless I've misunderstood?
> 

It is essentially which sensor id is used to monitor a specific zone. In
case of sensor IPs that feature several internal sensors, one needs to
have a way to describe which sensor monitors which zone. Example of this
type of IPs: lm90 (up to 3 sensors) and TI bandgap IP (can contain up to
5 internal sensors). The probe happens for the IP device and not for the
internal sensors.


>> +
>> +***************
>> +cpufreq-cooling
>> +***************
>> +
>> +The generic cpu cooling (freq clipping) can be loaded by the
>> +generic cpufreq-cpu0 driver in case the device tree node informs
>> +this need.
>> +
>> +In order to load the cpufreq cooling device, it is possible to
>> +inform that the cpu needs cooling by adding the list of thermal zones
>> +in the 'cooling-zones' property at the cpu0 phandle.
> 
> This should be reworded so as to describe the binding rather than the
> Linux behaviour based on the binding.

OK.

> 
>> +
>> +Example:
>> +       cpus {
>> +               cpu@0 {
>> +                       operating-points = <
>> +                               /* kHz    uV */
>> +                               800000  1313000
>> +                               1008000 1375000
>> +                       >;
>> +                       cooling-zones = <&cpu_thermal>;
>> +               };
>> +       ...
>> +       };
>> +
>> +The above will cause the cpufreq-cpu0 driver to understand that
>> +the cpu will need passive cooling and while probing that node it will
>> +also load the cpufreq cpu cooling device in that system.
> 
> The above describe that the CPU is part of a cooling (thermal?) zone and
> requires cooling.

That says the cpu participates in a thermal zone as a cooling
device/mechanism, thus cpu cooling (passive cooling) is required in
target system.

> 
>> +
>> +However, be advised that this flag will not describe what needs to
>> +be done or how the cpufreq cooling device will be used. In order to
>> +accomplish this, one needs to write a phandle for a thermal zone, as
>> +described in the section 'thermal_zone'.
> 
> The above already has a phandle to a thermal (cooling?) zone.
> 
> Please choose either cooling zone or thermal zone, having the two names
> makes this more confusing than necessary.

I see. Idea is to have thermal zone as a node. cooling-zone is just a
list of thermal zones that the referred device will be used as cooling
device.

> 
>> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
>> index 7fb16bc..753f0dc 100644
>> --- a/drivers/thermal/Kconfig
>> +++ b/drivers/thermal/Kconfig
>> @@ -29,6 +29,19 @@ config THERMAL_HWMON
>>           Say 'Y' here if you want all thermal sensors to
>>           have hwmon sysfs interface too.
>>
>> +config THERMAL_OF
>> +       bool
>> +       prompt "APIs to parse thermal data out of device tree"
>> +       depends on OF
>> +       default y
>> +       help
>> +         This options provides helpers to add the support to
>> +         read and parse thermal data definitions out of the
>> +         device tree blob.
>> +
>> +         Say 'Y' here if you need to build thermal infrastructure
>> +         based on device tree.
>> +
>>  choice
>>         prompt "Default Thermal governor"
>>         default THERMAL_DEFAULT_GOV_STEP_WISE
>> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
>> index 24cb894..eedb273 100644
>> --- a/drivers/thermal/Makefile
>> +++ b/drivers/thermal/Makefile
>> @@ -7,6 +7,7 @@ thermal_sys-y                   += thermal_core.o
>>
>>  # interface to/from other layers providing sensors
>>  thermal_sys-$(CONFIG_THERMAL_HWMON)            += thermal_hwmon.o
>> +thermal_sys-$(CONFIG_THERMAL_OF)               += thermal_dt.o
>>
>>  # governors
>>  thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE)   += fair_share.o
>> diff --git a/drivers/thermal/thermal_dt.c b/drivers/thermal/thermal_dt.c
>> new file mode 100644
>> index 0000000..cc35485
>> --- /dev/null
>> +++ b/drivers/thermal/thermal_dt.c
>> @@ -0,0 +1,473 @@
>> +/*
>> + *  thermal_dt.c - Generic Thermal Management device tree support.
>> + *
>> + *  Copyright (C) 2013 Texas Instruments
>> + *  Copyright (C) 2013 Eduardo Valentin <eduardo.valentin@ti.com>
>> + *  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> + *
>> + *  This program is free software; you can redistribute it and/or modify
>> + *  it under the terms of the GNU General Public License as published by
>> + *  the Free Software Foundation; version 2 of the License.
>> + *
>> + *  This program is distributed in the hope that it will be useful, but
>> + *  WITHOUT ANY WARRANTY; without even the implied warranty of
>> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + *  General Public License for more details.
>> + *
>> + *  You should have received a copy of the GNU General Public License along
>> + *  with this program; if not, write to the Free Software Foundation, Inc.,
>> + *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
>> + *
>> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> + */
>> +#include <linux/thermal.h>
>> +#include <linux/slab.h>
>> +#include <linux/types.h>
>> +#include <linux/of_device.h>
>> +#include <linux/of_platform.h>
>> +#include <linux/err.h>
>> +#include <linux/export.h>
>> +#include <linux/string.h>
>> +
>> +struct __thermal_bind_params {
>> +       char cooling_device[THERMAL_NAME_LENGTH];
>> +};
>> +
>> +static
>> +int thermal_of_match(struct thermal_zone_device *tz,
>> +                    struct thermal_cooling_device *cdev);
>> +
>> +static int thermal_of_populate_bind_params(struct device *dev,
>> +                                          struct device_node *node,
>> +                                          struct __thermal_bind_params *__tbp,
>> +                                          struct thermal_bind_params *tbp,
>> +                                          int ntrips)
>> +{
>> +       const char *cdev_name;
>> +       u32 *limits;
>> +       int ret;
>> +       u32 prop;
>> +
>> +       ret = of_property_read_u32(node, "weight", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing weight property\n");
>> +               return ret;
>> +       }
>> +       tbp->weight = prop;
>> +
>> +       ret = of_property_read_u32(node, "mask", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing mask property\n");
>> +               return ret;
>> +       }
>> +       tbp->trip_mask = prop;
>> +
>> +       /* this will allow us to bind with cooling devices */
>> +       tbp->match = thermal_of_match;
>> +
>> +       ret = of_property_read_string(node, "cooling_device", &cdev_name);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing cooling_device property\n");
>> +               return ret;
>> +       }
>> +       strncpy(__tbp->cooling_device, cdev_name,
>> +               sizeof(__tbp->cooling_device));
>> +
>> +       limits = kzalloc(ntrips * sizeof(*limits) * 2, GFP_KERNEL);
> 
> Why do you use kzalloc here, but devm_kzalloc elsewhere? That'll lead to
> problems as the driver gets refactored over time.

Because this memory is used only  within this function and thus released
at the bottom of it.

> 
>> +       if (!limits) {
>> +               dev_err(dev, "not enough mem for reading limits\n");
>> +               return -ENOMEM;
>> +       }
>> +       ret = of_property_read_u32_array(node, "limits", limits, 2 * ntrips);
>> +       if (!ret) {
>> +               int i = ntrips;
>> +
>> +               tbp->binding_limits = devm_kzalloc(dev,
>> +                                                  ntrips * 2 *
>> +                                                  sizeof(*tbp->binding_limits),
>> +                                                  GFP_KERNEL);
>> +               if (!tbp->binding_limits) {
>> +                       dev_err(dev, "not enough mem for storing limits\n");
>> +                       return -ENOMEM;
>> +               }
>> +               while (i--)
>> +                       tbp->binding_limits[i] = limits[i];
>> +       }
>> +       kfree(limits);

here.

>> +
>> +       return 0;
> 
> What if you failed to read the array? Surely you should return an error?

Attempting to make this property not mandatory.

> 
>> +}
>> +
>> +struct __thermal_trip {
>> +       unsigned long int temperature;
>> +       unsigned long int hysteresis;
>> +       enum thermal_trip_type type;
>> +};
>> +
>> +static
>> +int thermal_of_populate_trip(struct device *dev,
>> +                            struct device_node *node,
>> +                            struct __thermal_trip *trip)
>> +{
>> +       int ret;
>> +       int prop;
>> +
>> +       ret = of_property_read_u32(node, "temperature", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing temperature property\n");
>> +               return ret;
>> +       }
>> +       trip->temperature = prop;
>> +
>> +       ret = of_property_read_u32(node, "hysteresis", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing hysteresis property\n");
>> +               return ret;
>> +       }
>> +       trip->hysteresis = prop;
>> +
>> +       ret = of_property_read_u32(node, "type", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing type property\n");
>> +               return ret;
>> +       }
>> +       trip->type = prop;
> 
> This will require sanity checking.

OK.

> 
>> +
>> +       return 0;
>> +}
>> +
>> +struct __thermal_zone_device {
>> +       enum thermal_device_mode mode;
>> +       int passive_delay;
>> +       int polling_delay;
>> +       int mask;
>> +       int ntrips;
>> +       char type[THERMAL_NAME_LENGTH];
>> +       struct __thermal_trip *trips;
>> +       struct __thermal_bind_params *bind_params;
>> +       struct thermal_bind_params *tbps;
>> +       struct thermal_zone_params zone_params;
>> +       int (*get_temp)(void *, unsigned long *);
>> +       void *devdata;
>> +};
>> +
>> +static struct __thermal_zone_device *
>> +thermal_of_build_thermal_zone(struct device *dev, struct device_node *node)
>> +{
>> +       struct device_node *child, *gchild;
>> +       struct __thermal_zone_device *tz;
>> +       int ret, i;
>> +       u32 prop;
>> +
>> +       if (!node) {
>> +               dev_err(dev, "no thermal zone node\n");
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>> +       tz = devm_kzalloc(dev, sizeof(*tz), GFP_KERNEL);
>> +       if (!tz) {
>> +               dev_err(dev, "not enough memory for thermal of zone\n");
>> +               return ERR_PTR(-ENOMEM);
>> +       }
>> +
>> +       ret = of_property_read_u32(node, "passive_delay", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing passive_delay property\n");
>> +               return ERR_PTR(ret);
>> +       }
>> +       tz->passive_delay = prop;
>> +
>> +       ret = of_property_read_u32(node, "polling_delay", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing polling_delay property\n");
>> +               return ERR_PTR(ret);
>> +       }
>> +       tz->polling_delay = prop;
>> +
>> +       ret = of_property_read_u32(node, "mask", &prop);
>> +       if (ret < 0) {
>> +               dev_err(dev, "missing mask property\n");
>> +               return ERR_PTR(ret);
>> +       }
>> +       tz->mask = prop;
>> +
>> +       /* assume node name as our zone type */
>> +       strncpy(tz->type, node->name, sizeof(tz->type));
>> +
>> +       /* default policy */
>> +       strncpy(tz->zone_params.governor_name, "step_wise",
>> +               sizeof(tz->zone_params.governor_name));
>> +
>> +       /* trips */
>> +       child = of_find_node_by_name(node, "trips");
> 
> That might not be a child node, as of_find_node_by_name walks over the
> list of all nodes. Use of_get_child_by_name.

Right! This is a bug.

> 
>> +
>> +       /* No trips provided */
>> +       if (!child)
>> +               goto finish;
>> +
>> +       tz->ntrips = of_get_child_count(child);
>> +       tz->trips = devm_kzalloc(dev, tz->ntrips * sizeof(*tz->trips),
>> +                                GFP_KERNEL);
>> +       if (!tz->trips)
>> +               return ERR_PTR(-ENOMEM);
>> +       i = 0;
>> +       for_each_child_of_node(child, gchild)
>> +               thermal_of_populate_trip(dev, gchild, &tz->trips[i++]);
> 
> What if a child node was not a valid trip node? It seems
> thermal_of_populate_trip returns errors you're ignoring.

OK. Adding proper treatment.

> 
>> +
>> +       /* bind_params */
>> +       child = of_find_node_by_name(node, "bind_params");
> 
> Another of_get_child_by_name candidate.

OK.

> 
>> +
>> +       /* No binding info */
>> +       if (!child)
>> +               goto finish;
>> +
>> +       tz->zone_params.num_tbps = of_get_child_count(child);
>> +       tz->bind_params = devm_kzalloc(dev,
>> +                                      tz->zone_params.num_tbps *
>> +                                               sizeof(*tz->bind_params),
>> +                                      GFP_KERNEL);
>> +       if (!tz->bind_params)
>> +               return ERR_PTR(-ENOMEM);
>> +       tz->zone_params.tbp = devm_kzalloc(dev,
>> +                                          tz->zone_params.num_tbps *
>> +                                               sizeof(*tz->zone_params.tbp),
>> +                                          GFP_KERNEL);
>> +       if (!tz->zone_params.tbp)
>> +               return ERR_PTR(-ENOMEM);
>> +       i = 0;
>> +       for_each_child_of_node(child, gchild) {
>> +               thermal_of_populate_bind_params(dev, gchild,
>> +                                               &tz->bind_params[i],
>> +                                               &tz->zone_params.tbp[i],
>> +                                               tz->ntrips);
> 
> Similarly, you're ignoring error conditions here. What if a bind_params
> node was malformed?


Adding proper error handling.

> 
>> +               i++;
>> +       }
>> +
>> +finish:
>> +       tz->mode = THERMAL_DEVICE_ENABLED;
>> +
>> +       return tz;
>> +}
>> +
>> +static
>> +int thermal_of_match(struct thermal_zone_device *tz,
>> +                    struct thermal_cooling_device *cdev)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +       int i;
>> +
>> +       for (i = 0; i < data->zone_params.num_tbps; i++) {
>> +               if (!strncmp(data->bind_params[i].cooling_device,
>> +                            cdev->type,
>> +                            strlen(data->bind_params[i].cooling_device)) &&
>> +                   (data->zone_params.tbp[i].trip_mask & (1 << i)))
>> +                       return 0;
>> +       }
>> +
>> +       return -EINVAL;
>> +}
> 
> I'm not keen on binding to cooling devices using a string, as suddenly a
> name to allow humans to inspect the system is turned into an ABI.

OK. We need to align on how we describe the binding then.

> 
>> +
>> +static
>> +int of_thermal_get_temp(struct thermal_zone_device *tz,
>> +                       unsigned long *temp)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       return data->get_temp(data->devdata, temp);
>> +}
>> +
>> +static
>> +int of_thermal_get_mode(struct thermal_zone_device *tz,
>> +                       enum thermal_device_mode *mode)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       *mode = data->mode;
>> +
>> +       return 0;
>> +}
>> +
>> +static
>> +int of_thermal_set_mode(struct thermal_zone_device *tz,
>> +                       enum thermal_device_mode mode)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       mutex_lock(&tz->lock);
>> +
>> +       if (mode == THERMAL_DEVICE_ENABLED)
>> +               tz->polling_delay = data->polling_delay;
>> +       else
>> +               tz->polling_delay = 0;
>> +
>> +       mutex_unlock(&tz->lock);
>> +
>> +       data->mode = mode;
>> +       thermal_zone_device_update(tz);
>> +
>> +       return 0;
>> +}
>> +
>> +static
>> +int of_thermal_get_trip_type(struct thermal_zone_device *tz, int trip,
>> +                            enum thermal_trip_type *type)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       if (trip >= data->ntrips || trip < 0)
>> +               return -EDOM;
> 
> It's far more common to use -EINVAL.

hmm.. OK.

> 
>> +
>> +       *type = data->trips[trip].type;
>> +
>> +       return 0;
>> +}
>> +
>> +static
>> +int of_thermal_get_trip_temp(struct thermal_zone_device *tz, int trip,
>> +                            unsigned long *temp)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       if (trip >= data->ntrips || trip < 0)
>> +               return -EDOM;
>> +
>> +       *temp = data->trips[trip].temperature;
>> +
>> +       return 0;
>> +}
>> +
>> +static
>> +int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
>> +                            unsigned long temp)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       if (trip >= data->ntrips || trip < 0)
>> +               return -EDOM;
>> +
>> +       /* thermal fw should take care of data->mask & (1 << trip) */
>> +       data->trips[trip].temperature = temp;
>> +
>> +       return 0;
>> +}
>> +
>> +static
>> +int of_thermal_get_trip_hyst(struct thermal_zone_device *tz, int trip,
>> +                            unsigned long *hyst)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       if (trip >= data->ntrips || trip < 0)
>> +               return -EDOM;
>> +
>> +       *hyst = data->trips[trip].hysteresis;
>> +
>> +       return 0;
>> +}
>> +
>> +static
>> +int of_thermal_set_trip_hyst(struct thermal_zone_device *tz, int trip,
>> +                            unsigned long hyst)
>> +{
>> +       struct __thermal_zone_device *data = tz->devdata;
>> +
>> +       if (trip >= data->ntrips || trip < 0)
>> +               return -EDOM;
>> +
>> +       /* thermal fw should take care of data->mask & (1 << trip) */
> 
> Could you elaborate on that comment?

In  case the trip is not assigned to be writable, this function won't be
called.


> 
>> +       data->trips[trip].hysteresis = hyst;
>> +
>> +       return 0;
>> +}
> 
> [...]
> 
> Thanks,
> Mark.
> 
>
Mark Rutland Aug. 27, 2013, 4:23 p.m. UTC | #3
On Tue, Aug 27, 2013 at 02:44:40PM +0100, Eduardo Valentin wrote:
> Hello Mark,
>
>
> First of all, thanks for taking the time to review in such level of
> detail. Lets try to align and have a common understanding. Answers go
> inline. Please let me know if I missed something or if I made it more
> confusing :-)
>
> On 27-08-2013 06:22, Mark Rutland wrote:
> > Hi,
> >
> > On Sat, Aug 24, 2013 at 12:15:43AM +0100, Eduardo Valentin wrote:
> >> In order to be able to build thermal policies
> >> based on generic sensors, like I2C device, that
> >> can be places in different points on different boards,
> >> there is a need to have a way to feed board dependent
> >> data into the thermal framework.
> >>
> >> This patch introduces a thermal data parser for device
> >> tree. The parsed data is used to build thermal zones
> >> and thermal binding parameters. The output data
> >> can then be used to deploy thermal policies.
> >>
> >> This patch adds also documentation regarding this
> >> API and how to define tree nodes to use
> >> this infrastructure.
> >>
> >> Cc: Zhang Rui <rui.zhang@intel.com>
> >> Cc: linux-pm@vger.kernel.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
> >> ---
> >>  .../devicetree/bindings/thermal/thermal.txt        | 160 +++++++
> >>  drivers/thermal/Kconfig                            |  13 +
> >>  drivers/thermal/Makefile                           |   1 +
> >>  drivers/thermal/thermal_dt.c                       | 473 +++++++++++++++++++++
> >>  include/dt-bindings/thermal/thermal.h              |  27 ++
> >>  include/linux/thermal.h                            |   3 +
> >>  6 files changed, 677 insertions(+)
> >>  create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
> >>  create mode 100644 drivers/thermal/thermal_dt.c
> >>  create mode 100644 include/dt-bindings/thermal/thermal.h
> >>
>
>
> General question on device tree documentation. Assuming there is such an
> effort to make it Linux independent, is there other places out of Linux
> tree that these binding must be documented?

At present, no. There is a plan to move the documentation out of the
kernel, but I'm not sure on how that's currently progressing. Some
general cleanup needs to occur before we can actually do that.

>
>
> >> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
> >> new file mode 100644
> >> index 0000000..af20ab0
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
> >> @@ -0,0 +1,160 @@
> >> +----------------------------------------
> >> +Thermal Framework Device Tree descriptor
> >> +----------------------------------------
> >> +
> >> +This file describes how to define a thermal structure using device tree.
> >> +A thermal structure includes thermal zones and their components, such
> >> +as name, trip points, polling intervals and cooling devices binding
> >> +descriptors. A binding descriptor may contain information on how to
> >> +react, with a cooling stepped action or a weight on a fair share.
> >> +The target of device tree thermal descriptors is to describe only
> >> +the hardware thermal aspects, not how the system must control or which
> >> +algorithm or policy must take place. It follows a description of each
> >> +type of device tree node.
> >> +
> >> +****
> >> +trip
> >> +****
> >> +
> >> +The trip node is a node to describe a point in the temperature domain
> >> +in which the system takes an action. This node describes just the point,
> >> +not the action.
> >> +
> >> +A node describing a trip must contain:
> >> +- temperature: the trip temperature level, in milliCelsius.
> >> +- hysteresis: a (low) hysteresis value on 'temperature'. This is a relative
> >> +value, in milliCelsius.
> >
> > Presumably this is a single (u32) cell?
>
> Yes, both of them above are a single u32.

Ok. This seems to be the default assumption in many bindings, so I
shan't argue against it here, but in general we should be as precise as
possible as to the format of the devicetree.

>
> >
> >> +- type: the trip type. Here is the type mapping:
> >> +       THERMAL_TRIP_ACTIVE
> >> +       THERMAL_TRIP_PASSIVE
> >> +       THERMAL_TRIP_HOT
> >> +       THERMAL_TRIP_CRITICAL
> >
> > What type is this property?
> >
> > What are these values? Do you need to refer to a header somewhere?
>
> There is a header, introduced in this patch.
> include/dt-bindings/thermal/thermal.h
>
> > Symbolic constants should have an associated value for an ABI.
>
> OK. Sounds reasonable to me. Shall we be descriptive and enlist the
> values in this doc too or just pointing to header is fine?

I believe that referring to the header is fine, though personally I'd
prefer if we were explicit in binding docs. I'll leave Stephen to
comment on what the preferred way of handling header constants with
binding documents is.

>
> >
> >> +
> >> +**********
> >> +bind_param
> >> +**********
> >> +
> >> +The bind_param node is a node to describe how cooling devices get assigned
> >> +to trip points of the zone. The cooling devices are expected to be loaded
> >> +in the target system.
> >> +
> >> +A node describing a bind_param must contain:
> >> +- cooling_device: A string with the cooling device name.
> >
> > Please use '-' rather than '_' in bindings. That's the normal devicetree
> > convention, and there's no point gonig against it and causing more
> > confusion.
>
> Agreed.
>
> >
> > What are valid values of this string, and what do they mean?
>
> These are the name of the cooling devices to match to.

Which are what exactly? What constitutes a "cooling device"?

Is "black-body radiation" [1] a valid cooling device?

How about "copper heat sink"?

>
> >
> > Why not a phandle + cells binding to cooling devices?
> >
>
> Yeah, this could work too. I am going to check how that would look like
> and come back to you.

Ok.

>
> >> +- weight: The 'influence' of a particular cooling device on this zone.
> >> +             This is on a percentage scale. The sum of all these weights
> >> +             (for a particular zone) cannot exceed 100.
> >
> > This is a bit fuzzy, and certainly needs more guidance.
>
> OK. This property is used to describe the amount of cooling contribution
> when activating a cooling device at a certain trip point. I will work on
> a better guidance.
>
> >
> >> +- trip_mask: This is a bit mask that gives the binding relation between
> >> +               this thermal zone and cdev, for a particular trip point.
> >> +               If nth bit is set, then the cdev and thermal zone are bound
> >> +               for trip point n.
> >
> > What type is this?
> >
> > The nth bit from which end?
> >
> > Could you explain what "bound for trip point n" means?
> >
>
> It is just a way to match a cooling device with a set of trip points.
> Say you want to match one specific cooling device with trips 0 and 1,
> you would then create a bind_param with trip_mask = 0x3.
>
> > Just because this is a bitmask in Linux does not mean it needs to be so
> > in the dt.
>
> Indeed.
>
> In case we work with phandle + cell binging parameters, the trip and
> weight could be passed as params.
>
>       cooling-devices = <&devX tripY weightZ>, <... >;

Something like that could certainly work.

>
> >
> >> +- limits: An array integers consisting of 2-tuples items, and each item means
> >> +  lower and upper state limits, like <min-state max-state>.
> >> +
> >> +**************************
> >> +temperature sensor devices
> >> +**************************
> >> +
> >> +Temperature sensor devices have to be linked to a specific thermal zone.
> >> +This is done by means of the 'monitored-zones' list.
> >> +- monitored-zones: A list of thermal zones phandles, identifying which
> >> +thermal zones the temperature sensor device is used to monitor.
> >
> > Why does this need to be described that way around?
> >
> > Can the zone not have a link to the sensor(s) it needs?
> >
>
> Do you see any issues with it?
>
> Idea is that sensor nodes would list which zones they are used.  Same
> idea goes to cooling devices. It can be the other way around, yes. It
> depends on what is the best way to describe these relations. Idea would
> be so that sensors and cooling devices would be composing a thermal zone.

I think that having a zone refer to the sensors that monitor it would be
preferable. Otherwise we're encoding that way a devices output is to be
consumed on the binding for the device itself, which feels odd (it's
arguable as to whether it's a property of the device, however).

>
> >> +
> >> +************
> >> +thermal_zone
> >> +************
> >> +
> >> +The thermal_zone node is the node containing all the required info
> >> +for describing a thermal zone, including its cdev bindings. The thermal_zone
> >> +node must contain, apart from its own properties, one node containing
> >> +trip nodes and one node containing all the zone bind parameters.
> >> +
> >> +Required properties:
> >> +- mask: Bit string: If 'n'th bit is set, then trip point 'n' is writeable.
> >
> > What type is a bit string in devicetree? I'm aware of cells and (byte)
> > strings, but not bit strings.
> >
> > What does 'writeable' mean in this context?
> >
> > There's presumably a better name than 'mask'.
> >
>
> This is used to describe if trip points are writeable under sysfs. I
> think this is a OS implementation leak. I will be removing this from
> device tree and making them RO by default.

Ok.

>
> >> +
> >> +- passive_delay: number of milliseconds to wait between polls when
> >> +  performing passive cooling.
> >> +
> >> +- polling_delay: number of milliseconds to wait between polls when checking.
> >
> > These sound very much like configuration of the driver rather than
> > hardware description. What if my temperature sensor can generate an
> > interrupt at some trigger temperature? Why would I need polling times?
> >
> > Why does this need to be in the dt at all? Surely the OS can choose some
> > sensible polling period, possibly dynamically?
>
> Well, let me answer this one with another question. How the OS will
> determine how fast is the temperature rise on a thermal zone described
> in device tree?

How will the devicetree writer determine this?

It's possible for the thermal management code to dynamically adjust the
rate between some extreme polling intervals based on how big a rise it
detects between polls or possibly something more complex taking into
account some metric for load.

>
> >
> >> +
> >> +- #thermal-cells: An integer that is used to specify how many cells compose
> >> +  sensor ids.
> >
> > I don't see why this is necessary. If the zone itself described its
> > related sensors rather than the other way around, we wouldn't need it at
> > all. Most bindings so far have consumers link to their providers rather
> > than the other way around.
> >
>
> I see.
>
>
> >> +
> >> +Below is an example:
> >> +cpu_thermal: thermal_zone {
> >> +            #thermal-cells = <1>;
> >> +            mask = <0x03>; /* trips writability */
> >> +            passive_delay = <250>; /* milliseconds */
> >> +            polling_delay = <1000>; /* milliseconds */
> >> +            trips {
> >> +                    alert@100000{
> >> +                            temperature = <100000>; /* milliCelsius */
> >> +                            hysteresis = <0>; /* milliCelsius */
> >> +                            type = <THERMAL_TRIP_PASSIVE>;
> >> +                    };
> >> +                    crit@125000{
> >> +                            temperature = <125000>; /* milliCelsius */
> >> +                            hysteresis = <0>; /* milliCelsius */
> >> +                            type = <THERMAL_TRIP_CRITICAL>;
> >> +                    };

I didn't spot this on my first round, but the trips node will need
#address-cells and #size-cells, and the sub nodes unit-addresses should
match their reg properties.

> >> +            };
> >> +            bind_params {
> >> +                    action@0{
> >> +                            cooling_device = "thermal-cpufreq";
> >
> > NAK. That value was not defined anywhere, and represents Linux
> > infrastructure rather than hardware.
> >
>
> Yes, this was another leak as discussed above in the upper part of the
> doc file.
>
> > For this case, we could just as easily describe the thermal points, and
> > if no physical cooling device (e.g. a fan) is present, assume
> > cpufreq-base cooling.  Other OSs could choose to do otherwise, and we
> > could change later...
> >
>
> The way other specifications do (aka ACPI) is to have trip types. In
> case a trip type is passive cooling, it means one is supposed to
> modulate the dissipated power rather than activating a device to remove
> heat (active cooling).
>
> The part of 'assuming' things that bothers me. I would rather have a way
> to really say one wants to have passive cooling at a specific trip point.

I don't see the problem here. If you don't specify a device, your only
options will be some sort of passive management.

>
>
> >> +                            weight = <100>; /* percentage */
> >> +                            mask = <0x01>;
> >
> > This should presumably be trip-mask (assuming my suggested corrections).
> >
> > What is it supposed to mean here?
>
> As stated above, it means the trip points that this binding represents.

If you have more than 32 trip points (which is admittedly unlikely), how
do you represent them? I'm not sure a mask is the best representation
here, as it's difficult for a human to match up with a list of entries.

Additionally, the trip points don't have a logical contiguous numbering
from zero, so how do the bits in the mask correspond to entries in the
list? This seems to rely on the order that the subnodes are in, which
AFAIK is not guaranteed. It would be far nicer to have an explicit
linkage.

>
> >
> >> +                           limits = <
> >> +                             /* lower-state            upper-state */
> >> +                               1                       THERMAL_NO_LIMITS
> >> +                               THERMAL_NO_LIMITS       THERMAL_NO_LIMITS
> >> +                           >;
> >> +                    };
> >> +            };
> >> +};
> >> +
> >> +sensor: adc@0xFFFF {
> >> +       ...
> >> +       monitored-zones = <&cpu_thermal 0>;
> >> +};
> >> +
> >> +cpu@0 {
> >> +       ...
> >> +       cooling-zones = <&cpu_thermal>;
> >> +};
> >> +
> >> +In the example above, the ADC sensor at address 0xFFFF is used to monitor
> >> +the zone 'cpu_thermal' using its the sensor 0. The cpu@0 device is also linked
> >> +to the same thermal zone 'cpu_thermal' as a cooling device.
> >
> > Huh? That seems to imply the the '0' in '<&cpu_thermal 0>' on the sensor
> > node describes a property of the sensor rather than the thermal node,
> > unless I've misunderstood?
> >
>
> It is essentially which sensor id is used to monitor a specific zone. In
> case of sensor IPs that feature several internal sensors, one needs to
> have a way to describe which sensor monitors which zone. Example of this
> type of IPs: lm90 (up to 3 sensors) and TI bandgap IP (can contain up to
> 5 internal sensors). The probe happens for the IP device and not for the
> internal sensors.

Which is exactly why this should be the opposite way round, with thermal
zones referring to sensors in this fashion.

The way you're suggesting means that the phandle+cells specifier refers
half to the consumer and half to the producer. Having thermal zones
refer to sensors would make this refer solely to the producer (the
sensor).

>
>
> >> +
> >> +***************
> >> +cpufreq-cooling
> >> +***************
> >> +
> >> +The generic cpu cooling (freq clipping) can be loaded by the
> >> +generic cpufreq-cpu0 driver in case the device tree node informs
> >> +this need.
> >> +
> >> +In order to load the cpufreq cooling device, it is possible to
> >> +inform that the cpu needs cooling by adding the list of thermal zones
> >> +in the 'cooling-zones' property at the cpu0 phandle.
> >
> > This should be reworded so as to describe the binding rather than the
> > Linux behaviour based on the binding.
>
> OK.
>
> >
> >> +
> >> +Example:
> >> +       cpus {
> >> +               cpu@0 {
> >> +                       operating-points = <
> >> +                               /* kHz    uV */
> >> +                               800000  1313000
> >> +                               1008000 1375000
> >> +                       >;
> >> +                       cooling-zones = <&cpu_thermal>;
> >> +               };
> >> +       ...
> >> +       };
> >> +
> >> +The above will cause the cpufreq-cpu0 driver to understand that
> >> +the cpu will need passive cooling and while probing that node it will
> >> +also load the cpufreq cpu cooling device in that system.
> >
> > The above describe that the CPU is part of a cooling (thermal?) zone and
> > requires cooling.
>
> That says the cpu participates in a thermal zone as a cooling
> device/mechanism, thus cpu cooling (passive cooling) is required in
> target system.

I think you need the rest of the nodes in this example. You don't show
whether or not there's a cooling-device, so it's not clear that the CPU
is required to be passively cooled...

>
> >
> >> +
> >> +However, be advised that this flag will not describe what needs to
> >> +be done or how the cpufreq cooling device will be used. In order to
> >> +accomplish this, one needs to write a phandle for a thermal zone, as
> >> +described in the section 'thermal_zone'.
> >
> > The above already has a phandle to a thermal (cooling?) zone.
> >
> > Please choose either cooling zone or thermal zone, having the two names
> > makes this more confusing than necessary.
>
> I see. Idea is to have thermal zone as a node. cooling-zone is just a
> list of thermal zones that the referred device will be used as cooling
> device.

While I can see a distinction, the nomenclature is horrendously
confusing.

>
> >
> >> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> >> index 7fb16bc..753f0dc 100644
> >> --- a/drivers/thermal/Kconfig
> >> +++ b/drivers/thermal/Kconfig
> >> @@ -29,6 +29,19 @@ config THERMAL_HWMON
> >>           Say 'Y' here if you want all thermal sensors to
> >>           have hwmon sysfs interface too.
> >>
> >> +config THERMAL_OF
> >> +       bool
> >> +       prompt "APIs to parse thermal data out of device tree"
> >> +       depends on OF
> >> +       default y
> >> +       help
> >> +         This options provides helpers to add the support to
> >> +         read and parse thermal data definitions out of the
> >> +         device tree blob.
> >> +
> >> +         Say 'Y' here if you need to build thermal infrastructure
> >> +         based on device tree.
> >> +
> >>  choice
> >>         prompt "Default Thermal governor"
> >>         default THERMAL_DEFAULT_GOV_STEP_WISE
> >> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> >> index 24cb894..eedb273 100644
> >> --- a/drivers/thermal/Makefile
> >> +++ b/drivers/thermal/Makefile
> >> @@ -7,6 +7,7 @@ thermal_sys-y                   += thermal_core.o
> >>
> >>  # interface to/from other layers providing sensors
> >>  thermal_sys-$(CONFIG_THERMAL_HWMON)            += thermal_hwmon.o
> >> +thermal_sys-$(CONFIG_THERMAL_OF)               += thermal_dt.o
> >>
> >>  # governors
> >>  thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE)   += fair_share.o
> >> diff --git a/drivers/thermal/thermal_dt.c b/drivers/thermal/thermal_dt.c
> >> new file mode 100644
> >> index 0000000..cc35485
> >> --- /dev/null
> >> +++ b/drivers/thermal/thermal_dt.c
> >> @@ -0,0 +1,473 @@
> >> +/*
> >> + *  thermal_dt.c - Generic Thermal Management device tree support.
> >> + *
> >> + *  Copyright (C) 2013 Texas Instruments
> >> + *  Copyright (C) 2013 Eduardo Valentin <eduardo.valentin@ti.com>
> >> + *  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> + *
> >> + *  This program is free software; you can redistribute it and/or modify
> >> + *  it under the terms of the GNU General Public License as published by
> >> + *  the Free Software Foundation; version 2 of the License.
> >> + *
> >> + *  This program is distributed in the hope that it will be useful, but
> >> + *  WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >> + *  General Public License for more details.
> >> + *
> >> + *  You should have received a copy of the GNU General Public License along
> >> + *  with this program; if not, write to the Free Software Foundation, Inc.,
> >> + *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
> >> + *
> >> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> + */
> >> +#include <linux/thermal.h>
> >> +#include <linux/slab.h>
> >> +#include <linux/types.h>
> >> +#include <linux/of_device.h>
> >> +#include <linux/of_platform.h>
> >> +#include <linux/err.h>
> >> +#include <linux/export.h>
> >> +#include <linux/string.h>
> >> +
> >> +struct __thermal_bind_params {
> >> +       char cooling_device[THERMAL_NAME_LENGTH];
> >> +};
> >> +
> >> +static
> >> +int thermal_of_match(struct thermal_zone_device *tz,
> >> +                    struct thermal_cooling_device *cdev);
> >> +
> >> +static int thermal_of_populate_bind_params(struct device *dev,
> >> +                                          struct device_node *node,
> >> +                                          struct __thermal_bind_params *__tbp,
> >> +                                          struct thermal_bind_params *tbp,
> >> +                                          int ntrips)
> >> +{
> >> +       const char *cdev_name;
> >> +       u32 *limits;
> >> +       int ret;
> >> +       u32 prop;
> >> +
> >> +       ret = of_property_read_u32(node, "weight", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing weight property\n");
> >> +               return ret;
> >> +       }
> >> +       tbp->weight = prop;
> >> +
> >> +       ret = of_property_read_u32(node, "mask", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing mask property\n");
> >> +               return ret;
> >> +       }
> >> +       tbp->trip_mask = prop;
> >> +
> >> +       /* this will allow us to bind with cooling devices */
> >> +       tbp->match = thermal_of_match;
> >> +
> >> +       ret = of_property_read_string(node, "cooling_device", &cdev_name);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing cooling_device property\n");
> >> +               return ret;
> >> +       }
> >> +       strncpy(__tbp->cooling_device, cdev_name,
> >> +               sizeof(__tbp->cooling_device));
> >> +
> >> +       limits = kzalloc(ntrips * sizeof(*limits) * 2, GFP_KERNEL);
> >
> > Why do you use kzalloc here, but devm_kzalloc elsewhere? That'll lead to
> > problems as the driver gets refactored over time.
>
> Because this memory is used only  within this function and thus released
> at the bottom of it.
>
> >
> >> +       if (!limits) {
> >> +               dev_err(dev, "not enough mem for reading limits\n");
> >> +               return -ENOMEM;
> >> +       }
> >> +       ret = of_property_read_u32_array(node, "limits", limits, 2 * ntrips);
> >> +       if (!ret) {
> >> +               int i = ntrips;
> >> +
> >> +               tbp->binding_limits = devm_kzalloc(dev,
> >> +                                                  ntrips * 2 *
> >> +                                                  sizeof(*tbp->binding_limits),
> >> +                                                  GFP_KERNEL);
> >> +               if (!tbp->binding_limits) {
> >> +                       dev_err(dev, "not enough mem for storing limits\n");
> >> +                       return -ENOMEM;
> >> +               }
> >> +               while (i--)
> >> +                       tbp->binding_limits[i] = limits[i];
> >> +       }
> >> +       kfree(limits);
>
> here.

I saw that, I did say that this may lead to problems in future if the
function's reorganised, not now. :)

>
> >> +
> >> +       return 0;
> >
> > What if you failed to read the array? Surely you should return an error?
>
> Attempting to make this property not mandatory.

That wasn't clear from the binding, and now I've re-read it, I'm still
not entirely clear on what the limits represent, and why you may have
multiple sets of pairs.

>
> >
> >> +}
> >> +
> >> +struct __thermal_trip {
> >> +       unsigned long int temperature;
> >> +       unsigned long int hysteresis;
> >> +       enum thermal_trip_type type;
> >> +};
> >> +
> >> +static
> >> +int thermal_of_populate_trip(struct device *dev,
> >> +                            struct device_node *node,
> >> +                            struct __thermal_trip *trip)
> >> +{
> >> +       int ret;
> >> +       int prop;
> >> +
> >> +       ret = of_property_read_u32(node, "temperature", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing temperature property\n");
> >> +               return ret;
> >> +       }
> >> +       trip->temperature = prop;
> >> +
> >> +       ret = of_property_read_u32(node, "hysteresis", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing hysteresis property\n");
> >> +               return ret;
> >> +       }
> >> +       trip->hysteresis = prop;
> >> +
> >> +       ret = of_property_read_u32(node, "type", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing type property\n");
> >> +               return ret;
> >> +       }
> >> +       trip->type = prop;
> >
> > This will require sanity checking.
>
> OK.
>
> >
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +struct __thermal_zone_device {
> >> +       enum thermal_device_mode mode;
> >> +       int passive_delay;
> >> +       int polling_delay;
> >> +       int mask;
> >> +       int ntrips;
> >> +       char type[THERMAL_NAME_LENGTH];
> >> +       struct __thermal_trip *trips;
> >> +       struct __thermal_bind_params *bind_params;
> >> +       struct thermal_bind_params *tbps;
> >> +       struct thermal_zone_params zone_params;
> >> +       int (*get_temp)(void *, unsigned long *);
> >> +       void *devdata;
> >> +};
> >> +
> >> +static struct __thermal_zone_device *
> >> +thermal_of_build_thermal_zone(struct device *dev, struct device_node *node)
> >> +{
> >> +       struct device_node *child, *gchild;
> >> +       struct __thermal_zone_device *tz;
> >> +       int ret, i;
> >> +       u32 prop;
> >> +
> >> +       if (!node) {
> >> +               dev_err(dev, "no thermal zone node\n");
> >> +               return ERR_PTR(-EINVAL);
> >> +       }
> >> +
> >> +       tz = devm_kzalloc(dev, sizeof(*tz), GFP_KERNEL);
> >> +       if (!tz) {
> >> +               dev_err(dev, "not enough memory for thermal of zone\n");
> >> +               return ERR_PTR(-ENOMEM);
> >> +       }
> >> +
> >> +       ret = of_property_read_u32(node, "passive_delay", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing passive_delay property\n");
> >> +               return ERR_PTR(ret);
> >> +       }
> >> +       tz->passive_delay = prop;
> >> +
> >> +       ret = of_property_read_u32(node, "polling_delay", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing polling_delay property\n");
> >> +               return ERR_PTR(ret);
> >> +       }
> >> +       tz->polling_delay = prop;
> >> +
> >> +       ret = of_property_read_u32(node, "mask", &prop);
> >> +       if (ret < 0) {
> >> +               dev_err(dev, "missing mask property\n");
> >> +               return ERR_PTR(ret);
> >> +       }
> >> +       tz->mask = prop;
> >> +
> >> +       /* assume node name as our zone type */
> >> +       strncpy(tz->type, node->name, sizeof(tz->type));
> >> +
> >> +       /* default policy */
> >> +       strncpy(tz->zone_params.governor_name, "step_wise",
> >> +               sizeof(tz->zone_params.governor_name));
> >> +
> >> +       /* trips */
> >> +       child = of_find_node_by_name(node, "trips");
> >
> > That might not be a child node, as of_find_node_by_name walks over the
> > list of all nodes. Use of_get_child_by_name.
>
> Right! This is a bug.
>
> >
> >> +
> >> +       /* No trips provided */
> >> +       if (!child)
> >> +               goto finish;
> >> +
> >> +       tz->ntrips = of_get_child_count(child);
> >> +       tz->trips = devm_kzalloc(dev, tz->ntrips * sizeof(*tz->trips),
> >> +                                GFP_KERNEL);
> >> +       if (!tz->trips)
> >> +               return ERR_PTR(-ENOMEM);
> >> +       i = 0;
> >> +       for_each_child_of_node(child, gchild)
> >> +               thermal_of_populate_trip(dev, gchild, &tz->trips[i++]);
> >
> > What if a child node was not a valid trip node? It seems
> > thermal_of_populate_trip returns errors you're ignoring.
>
> OK. Adding proper treatment.
>
> >
> >> +
> >> +       /* bind_params */
> >> +       child = of_find_node_by_name(node, "bind_params");
> >
> > Another of_get_child_by_name candidate.
>
> OK.
>
> >
> >> +
> >> +       /* No binding info */
> >> +       if (!child)
> >> +               goto finish;
> >> +
> >> +       tz->zone_params.num_tbps = of_get_child_count(child);
> >> +       tz->bind_params = devm_kzalloc(dev,
> >> +                                      tz->zone_params.num_tbps *
> >> +                                               sizeof(*tz->bind_params),
> >> +                                      GFP_KERNEL);
> >> +       if (!tz->bind_params)
> >> +               return ERR_PTR(-ENOMEM);
> >> +       tz->zone_params.tbp = devm_kzalloc(dev,
> >> +                                          tz->zone_params.num_tbps *
> >> +                                               sizeof(*tz->zone_params.tbp),
> >> +                                          GFP_KERNEL);
> >> +       if (!tz->zone_params.tbp)
> >> +               return ERR_PTR(-ENOMEM);
> >> +       i = 0;
> >> +       for_each_child_of_node(child, gchild) {
> >> +               thermal_of_populate_bind_params(dev, gchild,
> >> +                                               &tz->bind_params[i],
> >> +                                               &tz->zone_params.tbp[i],
> >> +                                               tz->ntrips);
> >
> > Similarly, you're ignoring error conditions here. What if a bind_params
> > node was malformed?
>
>
> Adding proper error handling.
>
> >
> >> +               i++;
> >> +       }
> >> +
> >> +finish:
> >> +       tz->mode = THERMAL_DEVICE_ENABLED;
> >> +
> >> +       return tz;
> >> +}
> >> +
> >> +static
> >> +int thermal_of_match(struct thermal_zone_device *tz,
> >> +                    struct thermal_cooling_device *cdev)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +       int i;
> >> +
> >> +       for (i = 0; i < data->zone_params.num_tbps; i++) {
> >> +               if (!strncmp(data->bind_params[i].cooling_device,
> >> +                            cdev->type,
> >> +                            strlen(data->bind_params[i].cooling_device)) &&
> >> +                   (data->zone_params.tbp[i].trip_mask & (1 << i)))
> >> +                       return 0;
> >> +       }
> >> +
> >> +       return -EINVAL;
> >> +}
> >
> > I'm not keen on binding to cooling devices using a string, as suddenly a
> > name to allow humans to inspect the system is turned into an ABI.
>
> OK. We need to align on how we describe the binding then.

Definitely. We need to ensure that we fully define the relationship too
cooling devices, and don't suddenly expose some piece of internal Linux
infrastructure to the world.

>
> >
> >> +
> >> +static
> >> +int of_thermal_get_temp(struct thermal_zone_device *tz,
> >> +                       unsigned long *temp)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       return data->get_temp(data->devdata, temp);
> >> +}
> >> +
> >> +static
> >> +int of_thermal_get_mode(struct thermal_zone_device *tz,
> >> +                       enum thermal_device_mode *mode)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       *mode = data->mode;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static
> >> +int of_thermal_set_mode(struct thermal_zone_device *tz,
> >> +                       enum thermal_device_mode mode)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       mutex_lock(&tz->lock);
> >> +
> >> +       if (mode == THERMAL_DEVICE_ENABLED)
> >> +               tz->polling_delay = data->polling_delay;
> >> +       else
> >> +               tz->polling_delay = 0;
> >> +
> >> +       mutex_unlock(&tz->lock);
> >> +
> >> +       data->mode = mode;
> >> +       thermal_zone_device_update(tz);
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static
> >> +int of_thermal_get_trip_type(struct thermal_zone_device *tz, int trip,
> >> +                            enum thermal_trip_type *type)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       if (trip >= data->ntrips || trip < 0)
> >> +               return -EDOM;
> >
> > It's far more common to use -EINVAL.
>
> hmm.. OK.
>
> >
> >> +
> >> +       *type = data->trips[trip].type;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static
> >> +int of_thermal_get_trip_temp(struct thermal_zone_device *tz, int trip,
> >> +                            unsigned long *temp)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       if (trip >= data->ntrips || trip < 0)
> >> +               return -EDOM;
> >> +
> >> +       *temp = data->trips[trip].temperature;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static
> >> +int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
> >> +                            unsigned long temp)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       if (trip >= data->ntrips || trip < 0)
> >> +               return -EDOM;
> >> +
> >> +       /* thermal fw should take care of data->mask & (1 << trip) */
> >> +       data->trips[trip].temperature = temp;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static
> >> +int of_thermal_get_trip_hyst(struct thermal_zone_device *tz, int trip,
> >> +                            unsigned long *hyst)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       if (trip >= data->ntrips || trip < 0)
> >> +               return -EDOM;
> >> +
> >> +       *hyst = data->trips[trip].hysteresis;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static
> >> +int of_thermal_set_trip_hyst(struct thermal_zone_device *tz, int trip,
> >> +                            unsigned long hyst)
> >> +{
> >> +       struct __thermal_zone_device *data = tz->devdata;
> >> +
> >> +       if (trip >= data->ntrips || trip < 0)
> >> +               return -EDOM;
> >> +
> >> +       /* thermal fw should take care of data->mask & (1 << trip) */
> >
> > Could you elaborate on that comment?
>
> In  case the trip is not assigned to be writable, this function won't be
> called.

Ah. I misunderstood "fw" as "firmware", rather than "framework", and got
confused. It would be nice if you could expand "fw" to make this
clearer. :)

Thanks,
Mark.

[1] http://en.wikipedia.org/wiki/Black-body_radiation
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Aug. 27, 2013, 6:17 p.m. UTC | #4
On 27-08-2013 12:23, Mark Rutland wrote:
> On Tue, Aug 27, 2013 at 02:44:40PM +0100, Eduardo Valentin wrote:
>> Hello Mark,
>>
>>
>> First of all, thanks for taking the time to review in such level of
>> detail. Lets try to align and have a common understanding. Answers go
>> inline. Please let me know if I missed something or if I made it more
>> confusing :-)
>>
>> On 27-08-2013 06:22, Mark Rutland wrote:
>>> Hi,
>>>
>>> On Sat, Aug 24, 2013 at 12:15:43AM +0100, Eduardo Valentin wrote:
>>>> In order to be able to build thermal policies
>>>> based on generic sensors, like I2C device, that
>>>> can be places in different points on different boards,
>>>> there is a need to have a way to feed board dependent
>>>> data into the thermal framework.
>>>>
>>>> This patch introduces a thermal data parser for device
>>>> tree. The parsed data is used to build thermal zones
>>>> and thermal binding parameters. The output data
>>>> can then be used to deploy thermal policies.
>>>>
>>>> This patch adds also documentation regarding this
>>>> API and how to define tree nodes to use
>>>> this infrastructure.
>>>>
>>>> Cc: Zhang Rui <rui.zhang@intel.com>
>>>> Cc: linux-pm@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
>>>> ---
>>>>  .../devicetree/bindings/thermal/thermal.txt        | 160 +++++++
>>>>  drivers/thermal/Kconfig                            |  13 +
>>>>  drivers/thermal/Makefile                           |   1 +
>>>>  drivers/thermal/thermal_dt.c                       | 473 +++++++++++++++++++++
>>>>  include/dt-bindings/thermal/thermal.h              |  27 ++
>>>>  include/linux/thermal.h                            |   3 +
>>>>  6 files changed, 677 insertions(+)
>>>>  create mode 100644 Documentation/devicetree/bindings/thermal/thermal.txt
>>>>  create mode 100644 drivers/thermal/thermal_dt.c
>>>>  create mode 100644 include/dt-bindings/thermal/thermal.h
>>>>
>>
>>
>> General question on device tree documentation. Assuming there is such an
>> effort to make it Linux independent, is there other places out of Linux
>> tree that these binding must be documented?
> 
> At present, no. There is a plan to move the documentation out of the
> kernel, but I'm not sure on how that's currently progressing. Some
> general cleanup needs to occur before we can actually do that.
> 

OK.

>>
>>
>>>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
>>>> new file mode 100644
>>>> index 0000000..af20ab0
>>>> --- /dev/null
>>>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt
>>>> @@ -0,0 +1,160 @@
>>>> +----------------------------------------
>>>> +Thermal Framework Device Tree descriptor
>>>> +----------------------------------------
>>>> +
>>>> +This file describes how to define a thermal structure using device tree.
>>>> +A thermal structure includes thermal zones and their components, such
>>>> +as name, trip points, polling intervals and cooling devices binding
>>>> +descriptors. A binding descriptor may contain information on how to
>>>> +react, with a cooling stepped action or a weight on a fair share.
>>>> +The target of device tree thermal descriptors is to describe only
>>>> +the hardware thermal aspects, not how the system must control or which
>>>> +algorithm or policy must take place. It follows a description of each
>>>> +type of device tree node.
>>>> +
>>>> +****
>>>> +trip
>>>> +****
>>>> +
>>>> +The trip node is a node to describe a point in the temperature domain
>>>> +in which the system takes an action. This node describes just the point,
>>>> +not the action.
>>>> +
>>>> +A node describing a trip must contain:
>>>> +- temperature: the trip temperature level, in milliCelsius.
>>>> +- hysteresis: a (low) hysteresis value on 'temperature'. This is a relative
>>>> +value, in milliCelsius.
>>>
>>> Presumably this is a single (u32) cell?
>>
>> Yes, both of them above are a single u32.
> 
> Ok. This seems to be the default assumption in many bindings, so I
> shan't argue against it here, but in general we should be as precise as
> possible as to the format of the devicetree.
> 

Yeah, I followed the default assumption. It does not hurt to be
descriptive though.

>>
>>>
>>>> +- type: the trip type. Here is the type mapping:
>>>> +       THERMAL_TRIP_ACTIVE
>>>> +       THERMAL_TRIP_PASSIVE
>>>> +       THERMAL_TRIP_HOT
>>>> +       THERMAL_TRIP_CRITICAL
>>>
>>> What type is this property?
>>>
>>> What are these values? Do you need to refer to a header somewhere?
>>
>> There is a header, introduced in this patch.
>> include/dt-bindings/thermal/thermal.h
>>
>>> Symbolic constants should have an associated value for an ABI.
>>
>> OK. Sounds reasonable to me. Shall we be descriptive and enlist the
>> values in this doc too or just pointing to header is fine?
> 
> I believe that referring to the header is fine, though personally I'd
> prefer if we were explicit in binding docs. I'll leave Stephen to
> comment on what the preferred way of handling header constants with
> binding documents is.
> 

OK. I will do both.

>>
>>>
>>>> +
>>>> +**********
>>>> +bind_param
>>>> +**********
>>>> +
>>>> +The bind_param node is a node to describe how cooling devices get assigned
>>>> +to trip points of the zone. The cooling devices are expected to be loaded
>>>> +in the target system.
>>>> +
>>>> +A node describing a bind_param must contain:
>>>> +- cooling_device: A string with the cooling device name.
>>>
>>> Please use '-' rather than '_' in bindings. That's the normal devicetree
>>> convention, and there's no point gonig against it and causing more
>>> confusion.
>>
>> Agreed.
>>
>>>
>>> What are valid values of this string, and what do they mean?
>>
>> These are the name of the cooling devices to match to.
> 
> Which are what exactly? What constitutes a "cooling device"?
> 
> Is "black-body radiation" [1] a valid cooling device?
> 
> How about "copper heat sink"?
> 

Those are example of cooling devices, yes. But in this context what is
interesting is those devices that allow you to control the amount of
dissipated power, i.e. operating frequency/voltage, lcd brightness,
audio volume, radio transmission power, or if those that you burn power
by turning them on but you remove heat out of your thermal zone, like a
speed controllable fan.

>>
>>>
>>> Why not a phandle + cells binding to cooling devices?
>>>
>>
>> Yeah, this could work too. I am going to check how that would look like
>> and come back to you.
> 
> Ok.
> 
>>
>>>> +- weight: The 'influence' of a particular cooling device on this zone.
>>>> +             This is on a percentage scale. The sum of all these weights
>>>> +             (for a particular zone) cannot exceed 100.
>>>
>>> This is a bit fuzzy, and certainly needs more guidance.
>>
>> OK. This property is used to describe the amount of cooling contribution
>> when activating a cooling device at a certain trip point. I will work on
>> a better guidance.
>>
>>>
>>>> +- trip_mask: This is a bit mask that gives the binding relation between
>>>> +               this thermal zone and cdev, for a particular trip point.
>>>> +               If nth bit is set, then the cdev and thermal zone are bound
>>>> +               for trip point n.
>>>
>>> What type is this?
>>>
>>> The nth bit from which end?
>>>
>>> Could you explain what "bound for trip point n" means?
>>>
>>
>> It is just a way to match a cooling device with a set of trip points.
>> Say you want to match one specific cooling device with trips 0 and 1,
>> you would then create a bind_param with trip_mask = 0x3.
>>
>>> Just because this is a bitmask in Linux does not mean it needs to be so
>>> in the dt.
>>
>> Indeed.
>>
>> In case we work with phandle + cell binging parameters, the trip and
>> weight could be passed as params.
>>
>>       cooling-devices = <&devX tripY weightZ>, <... >;
> 
> Something like that could certainly work.

I see. I will check that option.

> 
>>
>>>
>>>> +- limits: An array integers consisting of 2-tuples items, and each item means
>>>> +  lower and upper state limits, like <min-state max-state>.
>>>> +
>>>> +**************************
>>>> +temperature sensor devices
>>>> +**************************
>>>> +
>>>> +Temperature sensor devices have to be linked to a specific thermal zone.
>>>> +This is done by means of the 'monitored-zones' list.
>>>> +- monitored-zones: A list of thermal zones phandles, identifying which
>>>> +thermal zones the temperature sensor device is used to monitor.
>>>
>>> Why does this need to be described that way around?
>>>
>>> Can the zone not have a link to the sensor(s) it needs?
>>>
>>
>> Do you see any issues with it?
>>
>> Idea is that sensor nodes would list which zones they are used.  Same
>> idea goes to cooling devices. It can be the other way around, yes. It
>> depends on what is the best way to describe these relations. Idea would
>> be so that sensors and cooling devices would be composing a thermal zone.
> 
> I think that having a zone refer to the sensors that monitor it would be
> preferable. Otherwise we're encoding that way a devices output is to be
> consumed on the binding for the device itself, which feels odd (it's
> arguable as to whether it's a property of the device, however).
> 

I got confused now. How adding a monitored-zones property on sensor's
nodes would imply that device's output is to be consumed on the binding
for the device itself?

>>
>>>> +
>>>> +************
>>>> +thermal_zone
>>>> +************
>>>> +
>>>> +The thermal_zone node is the node containing all the required info
>>>> +for describing a thermal zone, including its cdev bindings. The thermal_zone
>>>> +node must contain, apart from its own properties, one node containing
>>>> +trip nodes and one node containing all the zone bind parameters.
>>>> +
>>>> +Required properties:
>>>> +- mask: Bit string: If 'n'th bit is set, then trip point 'n' is writeable.
>>>
>>> What type is a bit string in devicetree? I'm aware of cells and (byte)
>>> strings, but not bit strings.
>>>
>>> What does 'writeable' mean in this context?
>>>
>>> There's presumably a better name than 'mask'.
>>>
>>
>> This is used to describe if trip points are writeable under sysfs. I
>> think this is a OS implementation leak. I will be removing this from
>> device tree and making them RO by default.
> 
> Ok.
> 
>>
>>>> +
>>>> +- passive_delay: number of milliseconds to wait between polls when
>>>> +  performing passive cooling.
>>>> +
>>>> +- polling_delay: number of milliseconds to wait between polls when checking.
>>>
>>> These sound very much like configuration of the driver rather than
>>> hardware description. What if my temperature sensor can generate an
>>> interrupt at some trigger temperature? Why would I need polling times?
>>>
>>> Why does this need to be in the dt at all? Surely the OS can choose some
>>> sensible polling period, possibly dynamically?
>>
>> Well, let me answer this one with another question. How the OS will
>> determine how fast is the temperature rise on a thermal zone described
>> in device tree?
> 
> How will the devicetree writer determine this?
> 
> It's possible for the thermal management code to dynamically adjust the
> rate between some extreme polling intervals based on how big a rise it
> detects between polls or possibly something more complex taking into
> account some metric for load.

It is possible yes, by means of a couple of complex differential
equations implementation.  But still, determining the maximum dT/dt may
require unnecessary polling on low activity use cases. Besides, people
who do transient thermal simulation should have a better estimate on
necessary polling requirement. I don't see why we should not use that
handy info and impose hard software requirements.

> 
>>
>>>
>>>> +
>>>> +- #thermal-cells: An integer that is used to specify how many cells compose
>>>> +  sensor ids.
>>>
>>> I don't see why this is necessary. If the zone itself described its
>>> related sensors rather than the other way around, we wouldn't need it at
>>> all. Most bindings so far have consumers link to their providers rather
>>> than the other way around.
>>>
>>
>> I see.
>>
>>
>>>> +
>>>> +Below is an example:
>>>> +cpu_thermal: thermal_zone {
>>>> +            #thermal-cells = <1>;
>>>> +            mask = <0x03>; /* trips writability */
>>>> +            passive_delay = <250>; /* milliseconds */
>>>> +            polling_delay = <1000>; /* milliseconds */
>>>> +            trips {
>>>> +                    alert@100000{
>>>> +                            temperature = <100000>; /* milliCelsius */
>>>> +                            hysteresis = <0>; /* milliCelsius */
>>>> +                            type = <THERMAL_TRIP_PASSIVE>;
>>>> +                    };
>>>> +                    crit@125000{
>>>> +                            temperature = <125000>; /* milliCelsius */
>>>> +                            hysteresis = <0>; /* milliCelsius */
>>>> +                            type = <THERMAL_TRIP_CRITICAL>;
>>>> +                    };
> 
> I didn't spot this on my first round, but the trips node will need
> #address-cells and #size-cells, and the sub nodes unit-addresses should
> match their reg properties.

You mean, thermal zones must have address and size cells to describe trips?

> 
>>>> +            };
>>>> +            bind_params {
>>>> +                    action@0{
>>>> +                            cooling_device = "thermal-cpufreq";
>>>
>>> NAK. That value was not defined anywhere, and represents Linux
>>> infrastructure rather than hardware.
>>>
>>
>> Yes, this was another leak as discussed above in the upper part of the
>> doc file.
>>
>>> For this case, we could just as easily describe the thermal points, and
>>> if no physical cooling device (e.g. a fan) is present, assume
>>> cpufreq-base cooling.  Other OSs could choose to do otherwise, and we
>>> could change later...
>>>
>>
>> The way other specifications do (aka ACPI) is to have trip types. In
>> case a trip type is passive cooling, it means one is supposed to
>> modulate the dissipated power rather than activating a device to remove
>> heat (active cooling).
>>
>> The part of 'assuming' things that bothers me. I would rather have a way
>> to really say one wants to have passive cooling at a specific trip point.
> 
> I don't see the problem here. If you don't specify a device, your only
> options will be some sort of passive management.

But it does not need to be active all the time. You need to say when (in
temperature domain) that is need. Besides, there are even other means of
managing average power other than cpufreq cooling. idle injections is
another passive cooling device, for instance.

> 
>>
>>
>>>> +                            weight = <100>; /* percentage */
>>>> +                            mask = <0x01>;
>>>
>>> This should presumably be trip-mask (assuming my suggested corrections).
>>>
>>> What is it supposed to mean here?
>>
>> As stated above, it means the trip points that this binding represents.
> 
> If you have more than 32 trip points (which is admittedly unlikely), how
> do you represent them? I'm not sure a mask is the best representation
> here, as it's difficult for a human to match up with a list of entries.
> 
> Additionally, the trip points don't have a logical contiguous numbering
> from zero, so how do the bits in the mask correspond to entries in the
> list? This seems to rely on the order that the subnodes are in, which
> AFAIK is not guaranteed. It would be far nicer to have an explicit
> linkage.


Do you mean some sort of list of trip points property inside binding
params node?

> 
>>
>>>
>>>> +                           limits = <
>>>> +                             /* lower-state            upper-state */
>>>> +                               1                       THERMAL_NO_LIMITS
>>>> +                               THERMAL_NO_LIMITS       THERMAL_NO_LIMITS
>>>> +                           >;
>>>> +                    };
>>>> +            };
>>>> +};
>>>> +
>>>> +sensor: adc@0xFFFF {
>>>> +       ...
>>>> +       monitored-zones = <&cpu_thermal 0>;
>>>> +};
>>>> +
>>>> +cpu@0 {
>>>> +       ...
>>>> +       cooling-zones = <&cpu_thermal>;
>>>> +};
>>>> +
>>>> +In the example above, the ADC sensor at address 0xFFFF is used to monitor
>>>> +the zone 'cpu_thermal' using its the sensor 0. The cpu@0 device is also linked
>>>> +to the same thermal zone 'cpu_thermal' as a cooling device.
>>>
>>> Huh? That seems to imply the the '0' in '<&cpu_thermal 0>' on the sensor
>>> node describes a property of the sensor rather than the thermal node,
>>> unless I've misunderstood?
>>>
>>
>> It is essentially which sensor id is used to monitor a specific zone. In
>> case of sensor IPs that feature several internal sensors, one needs to
>> have a way to describe which sensor monitors which zone. Example of this
>> type of IPs: lm90 (up to 3 sensors) and TI bandgap IP (can contain up to
>> 5 internal sensors). The probe happens for the IP device and not for the
>> internal sensors.
> 
> Which is exactly why this should be the opposite way round, with thermal
> zones referring to sensors in this fashion.
> 
> The way you're suggesting means that the phandle+cells specifier refers
> half to the consumer and half to the producer. Having thermal zones
> refer to sensors would make this refer solely to the producer (the
> sensor).
> 

Are you suggesting having inside thermal zone node:

	sensors = <&sensor0 Y>, ...., <&sensorN Z> ;

Where sensor0 and sensorN are sensor IPs nodes and Y and Z represent
their internal sensor instances?

>>
>>
>>>> +
>>>> +***************
>>>> +cpufreq-cooling
>>>> +***************
>>>> +
>>>> +The generic cpu cooling (freq clipping) can be loaded by the
>>>> +generic cpufreq-cpu0 driver in case the device tree node informs
>>>> +this need.
>>>> +
>>>> +In order to load the cpufreq cooling device, it is possible to
>>>> +inform that the cpu needs cooling by adding the list of thermal zones
>>>> +in the 'cooling-zones' property at the cpu0 phandle.
>>>
>>> This should be reworded so as to describe the binding rather than the
>>> Linux behaviour based on the binding.
>>
>> OK.
>>
>>>
>>>> +
>>>> +Example:
>>>> +       cpus {
>>>> +               cpu@0 {
>>>> +                       operating-points = <
>>>> +                               /* kHz    uV */
>>>> +                               800000  1313000
>>>> +                               1008000 1375000
>>>> +                       >;
>>>> +                       cooling-zones = <&cpu_thermal>;
>>>> +               };
>>>> +       ...
>>>> +       };
>>>> +
>>>> +The above will cause the cpufreq-cpu0 driver to understand that
>>>> +the cpu will need passive cooling and while probing that node it will
>>>> +also load the cpufreq cpu cooling device in that system.
>>>
>>> The above describe that the CPU is part of a cooling (thermal?) zone and
>>> requires cooling.
>>
>> That says the cpu participates in a thermal zone as a cooling
>> device/mechanism, thus cpu cooling (passive cooling) is required in
>> target system.
> 
> I think you need the rest of the nodes in this example. You don't show
> whether or not there's a cooling-device, so it's not clear that the CPU
> is required to be passively cooled...

The property states the cpu is used as cooling device for those thermal
zones in the list 'cooling-zones'

> 
>>
>>>
>>>> +
>>>> +However, be advised that this flag will not describe what needs to
>>>> +be done or how the cpufreq cooling device will be used. In order to
>>>> +accomplish this, one needs to write a phandle for a thermal zone, as
>>>> +described in the section 'thermal_zone'.
>>>
>>> The above already has a phandle to a thermal (cooling?) zone.
>>>
>>> Please choose either cooling zone or thermal zone, having the two names
>>> makes this more confusing than necessary.
>>
>> I see. Idea is to have thermal zone as a node. cooling-zone is just a
>> list of thermal zones that the referred device will be used as cooling
>> device.
> 
> While I can see a distinction, the nomenclature is horrendously
> confusing.


OK. I will try harder to improve it.

> 
>>
>>>
>>>> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
>>>> index 7fb16bc..753f0dc 100644
>>>> --- a/drivers/thermal/Kconfig
>>>> +++ b/drivers/thermal/Kconfig
>>>> @@ -29,6 +29,19 @@ config THERMAL_HWMON
>>>>           Say 'Y' here if you want all thermal sensors to
>>>>           have hwmon sysfs interface too.
>>>>
>>>> +config THERMAL_OF
>>>> +       bool
>>>> +       prompt "APIs to parse thermal data out of device tree"
>>>> +       depends on OF
>>>> +       default y
>>>> +       help
>>>> +         This options provides helpers to add the support to
>>>> +         read and parse thermal data definitions out of the
>>>> +         device tree blob.
>>>> +
>>>> +         Say 'Y' here if you need to build thermal infrastructure
>>>> +         based on device tree.
>>>> +
>>>>  choice
>>>>         prompt "Default Thermal governor"
>>>>         default THERMAL_DEFAULT_GOV_STEP_WISE
>>>> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
>>>> index 24cb894..eedb273 100644
>>>> --- a/drivers/thermal/Makefile
>>>> +++ b/drivers/thermal/Makefile
>>>> @@ -7,6 +7,7 @@ thermal_sys-y                   += thermal_core.o
>>>>
>>>>  # interface to/from other layers providing sensors
>>>>  thermal_sys-$(CONFIG_THERMAL_HWMON)            += thermal_hwmon.o
>>>> +thermal_sys-$(CONFIG_THERMAL_OF)               += thermal_dt.o
>>>>
>>>>  # governors
>>>>  thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE)   += fair_share.o
>>>> diff --git a/drivers/thermal/thermal_dt.c b/drivers/thermal/thermal_dt.c
>>>> new file mode 100644
>>>> index 0000000..cc35485
>>>> --- /dev/null
>>>> +++ b/drivers/thermal/thermal_dt.c
>>>> @@ -0,0 +1,473 @@
>>>> +/*
>>>> + *  thermal_dt.c - Generic Thermal Management device tree support.
>>>> + *
>>>> + *  Copyright (C) 2013 Texas Instruments
>>>> + *  Copyright (C) 2013 Eduardo Valentin <eduardo.valentin@ti.com>
>>>> + *  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> + *
>>>> + *  This program is free software; you can redistribute it and/or modify
>>>> + *  it under the terms of the GNU General Public License as published by
>>>> + *  the Free Software Foundation; version 2 of the License.
>>>> + *
>>>> + *  This program is distributed in the hope that it will be useful, but
>>>> + *  WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>> + *  General Public License for more details.
>>>> + *
>>>> + *  You should have received a copy of the GNU General Public License along
>>>> + *  with this program; if not, write to the Free Software Foundation, Inc.,
>>>> + *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
>>>> + *
>>>> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> + */
>>>> +#include <linux/thermal.h>
>>>> +#include <linux/slab.h>
>>>> +#include <linux/types.h>
>>>> +#include <linux/of_device.h>
>>>> +#include <linux/of_platform.h>
>>>> +#include <linux/err.h>
>>>> +#include <linux/export.h>
>>>> +#include <linux/string.h>
>>>> +
>>>> +struct __thermal_bind_params {
>>>> +       char cooling_device[THERMAL_NAME_LENGTH];
>>>> +};
>>>> +
>>>> +static
>>>> +int thermal_of_match(struct thermal_zone_device *tz,
>>>> +                    struct thermal_cooling_device *cdev);
>>>> +
>>>> +static int thermal_of_populate_bind_params(struct device *dev,
>>>> +                                          struct device_node *node,
>>>> +                                          struct __thermal_bind_params *__tbp,
>>>> +                                          struct thermal_bind_params *tbp,
>>>> +                                          int ntrips)
>>>> +{
>>>> +       const char *cdev_name;
>>>> +       u32 *limits;
>>>> +       int ret;
>>>> +       u32 prop;
>>>> +
>>>> +       ret = of_property_read_u32(node, "weight", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing weight property\n");
>>>> +               return ret;
>>>> +       }
>>>> +       tbp->weight = prop;
>>>> +
>>>> +       ret = of_property_read_u32(node, "mask", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing mask property\n");
>>>> +               return ret;
>>>> +       }
>>>> +       tbp->trip_mask = prop;
>>>> +
>>>> +       /* this will allow us to bind with cooling devices */
>>>> +       tbp->match = thermal_of_match;
>>>> +
>>>> +       ret = of_property_read_string(node, "cooling_device", &cdev_name);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing cooling_device property\n");
>>>> +               return ret;
>>>> +       }
>>>> +       strncpy(__tbp->cooling_device, cdev_name,
>>>> +               sizeof(__tbp->cooling_device));
>>>> +
>>>> +       limits = kzalloc(ntrips * sizeof(*limits) * 2, GFP_KERNEL);
>>>
>>> Why do you use kzalloc here, but devm_kzalloc elsewhere? That'll lead to
>>> problems as the driver gets refactored over time.
>>
>> Because this memory is used only  within this function and thus released
>> at the bottom of it.
>>
>>>
>>>> +       if (!limits) {
>>>> +               dev_err(dev, "not enough mem for reading limits\n");
>>>> +               return -ENOMEM;
>>>> +       }
>>>> +       ret = of_property_read_u32_array(node, "limits", limits, 2 * ntrips);
>>>> +       if (!ret) {
>>>> +               int i = ntrips;
>>>> +
>>>> +               tbp->binding_limits = devm_kzalloc(dev,
>>>> +                                                  ntrips * 2 *
>>>> +                                                  sizeof(*tbp->binding_limits),
>>>> +                                                  GFP_KERNEL);
>>>> +               if (!tbp->binding_limits) {
>>>> +                       dev_err(dev, "not enough mem for storing limits\n");
>>>> +                       return -ENOMEM;
>>>> +               }
>>>> +               while (i--)
>>>> +                       tbp->binding_limits[i] = limits[i];
>>>> +       }
>>>> +       kfree(limits);
>>
>> here.
> 
> I saw that, I did say that this may lead to problems in future if the
> function's reorganised, not now. :)

Ahh.. OK. I just didnt want o let the memory be managed and freed only
when the struct device is release.

> 
>>
>>>> +
>>>> +       return 0;
>>>
>>> What if you failed to read the array? Surely you should return an error?
>>
>> Attempting to make this property not mandatory.
> 
> That wasn't clear from the binding, and now I've re-read it, I'm still
> not entirely clear on what the limits represent, and why you may have
> multiple sets of pairs.

Because you may assign cooling state limits to specific trip points,
e.g., a fan with 5 different speeds would operate from speed 0 to 2 in
the first trip point, while from speed 3 to 4 on last trip point.

> 
>>
>>>
>>>> +}
>>>> +
>>>> +struct __thermal_trip {
>>>> +       unsigned long int temperature;
>>>> +       unsigned long int hysteresis;
>>>> +       enum thermal_trip_type type;
>>>> +};
>>>> +
>>>> +static
>>>> +int thermal_of_populate_trip(struct device *dev,
>>>> +                            struct device_node *node,
>>>> +                            struct __thermal_trip *trip)
>>>> +{
>>>> +       int ret;
>>>> +       int prop;
>>>> +
>>>> +       ret = of_property_read_u32(node, "temperature", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing temperature property\n");
>>>> +               return ret;
>>>> +       }
>>>> +       trip->temperature = prop;
>>>> +
>>>> +       ret = of_property_read_u32(node, "hysteresis", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing hysteresis property\n");
>>>> +               return ret;
>>>> +       }
>>>> +       trip->hysteresis = prop;
>>>> +
>>>> +       ret = of_property_read_u32(node, "type", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing type property\n");
>>>> +               return ret;
>>>> +       }
>>>> +       trip->type = prop;
>>>
>>> This will require sanity checking.
>>
>> OK.
>>
>>>
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +struct __thermal_zone_device {
>>>> +       enum thermal_device_mode mode;
>>>> +       int passive_delay;
>>>> +       int polling_delay;
>>>> +       int mask;
>>>> +       int ntrips;
>>>> +       char type[THERMAL_NAME_LENGTH];
>>>> +       struct __thermal_trip *trips;
>>>> +       struct __thermal_bind_params *bind_params;
>>>> +       struct thermal_bind_params *tbps;
>>>> +       struct thermal_zone_params zone_params;
>>>> +       int (*get_temp)(void *, unsigned long *);
>>>> +       void *devdata;
>>>> +};
>>>> +
>>>> +static struct __thermal_zone_device *
>>>> +thermal_of_build_thermal_zone(struct device *dev, struct device_node *node)
>>>> +{
>>>> +       struct device_node *child, *gchild;
>>>> +       struct __thermal_zone_device *tz;
>>>> +       int ret, i;
>>>> +       u32 prop;
>>>> +
>>>> +       if (!node) {
>>>> +               dev_err(dev, "no thermal zone node\n");
>>>> +               return ERR_PTR(-EINVAL);
>>>> +       }
>>>> +
>>>> +       tz = devm_kzalloc(dev, sizeof(*tz), GFP_KERNEL);
>>>> +       if (!tz) {
>>>> +               dev_err(dev, "not enough memory for thermal of zone\n");
>>>> +               return ERR_PTR(-ENOMEM);
>>>> +       }
>>>> +
>>>> +       ret = of_property_read_u32(node, "passive_delay", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing passive_delay property\n");
>>>> +               return ERR_PTR(ret);
>>>> +       }
>>>> +       tz->passive_delay = prop;
>>>> +
>>>> +       ret = of_property_read_u32(node, "polling_delay", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing polling_delay property\n");
>>>> +               return ERR_PTR(ret);
>>>> +       }
>>>> +       tz->polling_delay = prop;
>>>> +
>>>> +       ret = of_property_read_u32(node, "mask", &prop);
>>>> +       if (ret < 0) {
>>>> +               dev_err(dev, "missing mask property\n");
>>>> +               return ERR_PTR(ret);
>>>> +       }
>>>> +       tz->mask = prop;
>>>> +
>>>> +       /* assume node name as our zone type */
>>>> +       strncpy(tz->type, node->name, sizeof(tz->type));
>>>> +
>>>> +       /* default policy */
>>>> +       strncpy(tz->zone_params.governor_name, "step_wise",
>>>> +               sizeof(tz->zone_params.governor_name));
>>>> +
>>>> +       /* trips */
>>>> +       child = of_find_node_by_name(node, "trips");
>>>
>>> That might not be a child node, as of_find_node_by_name walks over the
>>> list of all nodes. Use of_get_child_by_name.
>>
>> Right! This is a bug.
>>
>>>
>>>> +
>>>> +       /* No trips provided */
>>>> +       if (!child)
>>>> +               goto finish;
>>>> +
>>>> +       tz->ntrips = of_get_child_count(child);
>>>> +       tz->trips = devm_kzalloc(dev, tz->ntrips * sizeof(*tz->trips),
>>>> +                                GFP_KERNEL);
>>>> +       if (!tz->trips)
>>>> +               return ERR_PTR(-ENOMEM);
>>>> +       i = 0;
>>>> +       for_each_child_of_node(child, gchild)
>>>> +               thermal_of_populate_trip(dev, gchild, &tz->trips[i++]);
>>>
>>> What if a child node was not a valid trip node? It seems
>>> thermal_of_populate_trip returns errors you're ignoring.
>>
>> OK. Adding proper treatment.
>>
>>>
>>>> +
>>>> +       /* bind_params */
>>>> +       child = of_find_node_by_name(node, "bind_params");
>>>
>>> Another of_get_child_by_name candidate.
>>
>> OK.
>>
>>>
>>>> +
>>>> +       /* No binding info */
>>>> +       if (!child)
>>>> +               goto finish;
>>>> +
>>>> +       tz->zone_params.num_tbps = of_get_child_count(child);
>>>> +       tz->bind_params = devm_kzalloc(dev,
>>>> +                                      tz->zone_params.num_tbps *
>>>> +                                               sizeof(*tz->bind_params),
>>>> +                                      GFP_KERNEL);
>>>> +       if (!tz->bind_params)
>>>> +               return ERR_PTR(-ENOMEM);
>>>> +       tz->zone_params.tbp = devm_kzalloc(dev,
>>>> +                                          tz->zone_params.num_tbps *
>>>> +                                               sizeof(*tz->zone_params.tbp),
>>>> +                                          GFP_KERNEL);
>>>> +       if (!tz->zone_params.tbp)
>>>> +               return ERR_PTR(-ENOMEM);
>>>> +       i = 0;
>>>> +       for_each_child_of_node(child, gchild) {
>>>> +               thermal_of_populate_bind_params(dev, gchild,
>>>> +                                               &tz->bind_params[i],
>>>> +                                               &tz->zone_params.tbp[i],
>>>> +                                               tz->ntrips);
>>>
>>> Similarly, you're ignoring error conditions here. What if a bind_params
>>> node was malformed?
>>
>>
>> Adding proper error handling.
>>
>>>
>>>> +               i++;
>>>> +       }
>>>> +
>>>> +finish:
>>>> +       tz->mode = THERMAL_DEVICE_ENABLED;
>>>> +
>>>> +       return tz;
>>>> +}
>>>> +
>>>> +static
>>>> +int thermal_of_match(struct thermal_zone_device *tz,
>>>> +                    struct thermal_cooling_device *cdev)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +       int i;
>>>> +
>>>> +       for (i = 0; i < data->zone_params.num_tbps; i++) {
>>>> +               if (!strncmp(data->bind_params[i].cooling_device,
>>>> +                            cdev->type,
>>>> +                            strlen(data->bind_params[i].cooling_device)) &&
>>>> +                   (data->zone_params.tbp[i].trip_mask & (1 << i)))
>>>> +                       return 0;
>>>> +       }
>>>> +
>>>> +       return -EINVAL;
>>>> +}
>>>
>>> I'm not keen on binding to cooling devices using a string, as suddenly a
>>> name to allow humans to inspect the system is turned into an ABI.
>>
>> OK. We need to align on how we describe the binding then.
> 
> Definitely. We need to ensure that we fully define the relationship too
> cooling devices, and don't suddenly expose some piece of internal Linux
> infrastructure to the world.
> 

Yeah, agreed, any suggestions?

>>
>>>
>>>> +
>>>> +static
>>>> +int of_thermal_get_temp(struct thermal_zone_device *tz,
>>>> +                       unsigned long *temp)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       return data->get_temp(data->devdata, temp);
>>>> +}
>>>> +
>>>> +static
>>>> +int of_thermal_get_mode(struct thermal_zone_device *tz,
>>>> +                       enum thermal_device_mode *mode)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       *mode = data->mode;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static
>>>> +int of_thermal_set_mode(struct thermal_zone_device *tz,
>>>> +                       enum thermal_device_mode mode)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       mutex_lock(&tz->lock);
>>>> +
>>>> +       if (mode == THERMAL_DEVICE_ENABLED)
>>>> +               tz->polling_delay = data->polling_delay;
>>>> +       else
>>>> +               tz->polling_delay = 0;
>>>> +
>>>> +       mutex_unlock(&tz->lock);
>>>> +
>>>> +       data->mode = mode;
>>>> +       thermal_zone_device_update(tz);
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static
>>>> +int of_thermal_get_trip_type(struct thermal_zone_device *tz, int trip,
>>>> +                            enum thermal_trip_type *type)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       if (trip >= data->ntrips || trip < 0)
>>>> +               return -EDOM;
>>>
>>> It's far more common to use -EINVAL.
>>
>> hmm.. OK.
>>
>>>
>>>> +
>>>> +       *type = data->trips[trip].type;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static
>>>> +int of_thermal_get_trip_temp(struct thermal_zone_device *tz, int trip,
>>>> +                            unsigned long *temp)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       if (trip >= data->ntrips || trip < 0)
>>>> +               return -EDOM;
>>>> +
>>>> +       *temp = data->trips[trip].temperature;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static
>>>> +int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
>>>> +                            unsigned long temp)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       if (trip >= data->ntrips || trip < 0)
>>>> +               return -EDOM;
>>>> +
>>>> +       /* thermal fw should take care of data->mask & (1 << trip) */
>>>> +       data->trips[trip].temperature = temp;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static
>>>> +int of_thermal_get_trip_hyst(struct thermal_zone_device *tz, int trip,
>>>> +                            unsigned long *hyst)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       if (trip >= data->ntrips || trip < 0)
>>>> +               return -EDOM;
>>>> +
>>>> +       *hyst = data->trips[trip].hysteresis;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static
>>>> +int of_thermal_set_trip_hyst(struct thermal_zone_device *tz, int trip,
>>>> +                            unsigned long hyst)
>>>> +{
>>>> +       struct __thermal_zone_device *data = tz->devdata;
>>>> +
>>>> +       if (trip >= data->ntrips || trip < 0)
>>>> +               return -EDOM;
>>>> +
>>>> +       /* thermal fw should take care of data->mask & (1 << trip) */
>>>
>>> Could you elaborate on that comment?
>>
>> In  case the trip is not assigned to be writable, this function won't be
>> called.
> 
> Ah. I misunderstood "fw" as "firmware", rather than "framework", and got
> confused. It would be nice if you could expand "fw" to make this
> clearer. :)

Sorry for that :-)

> 
> Thanks,
> Mark.
> 
> [1] http://en.wikipedia.org/wiki/Black-body_radiation
> 
>
Eduardo Valentin Aug. 29, 2013, 11:19 p.m. UTC | #5
Mark, Pawel and Stephen,


On 27-08-2013 14:17, Eduardo Valentin wrote:
> On 27-08-2013 12:23, Mark Rutland wrote:
>> On Tue, Aug 27, 2013 at 02:44:40PM +0100, Eduardo Valentin wrote:
>>> Hello Mark,

<cut>

I believe now we need to align on thermal bindings, before I propose
next version of this series. So, following the discussions on this
thread, I believe a typical CPU thermal zone would look like the following:
+ #include <dt-bindings/thermal/thermal.h>
+
+ cpu_thermal: cpu_thermal {
+         passive-delay = <250>; /* milliseconds */
+         polling-delay = <1000>; /* milliseconds */
+                 /*
+                  * sensor       ID
+                  */
+         sensors = <&bandgap0     0>;
+                         /*
+                          * cooling
+                          * device       Usage   Trip            lower
bound             upper bound
+                          */
+         cooling-devices = <&cpu0        100     &cpu-alert
THERMAL_NO_LIMITS        THERMAL_NO_LIMITS>;
+         trips {
+                 cpu-alert: cpu-alert {
+                         temperature = <100000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_PASSIVE>;
+                 };
+                 cpu-crit: cpu-crit {
+                         temperature = <125000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_CRITICAL>;
+                 };
+         };
+ };

*Note: THERMAL_NO_LIMIT means, it will be using the cooling device
minimum and maximum limits.

Couple of comments.
1. I am keeping the pooling intervals. A possible alternative way to
describe that would be specifying the maximum dT/dt. Essentially because
I am still convinced that this is part of hw specs.

2. The above follows your suggestion to use consumer pointing to
producers. Although, I still need to figure out how this could be
implemented for Linux. But I think that is another story, at least for
now. We need to first align on the DT binding itself.

3. The link from cooling device specification to trip points, from my
perspective, should be enough, and thus we shall not need the thermal
cells and size for trip points, as you proposed. Let me know what you think.

Below is another example, on a more complex scenario, board specific
case (hypothetical, just for exemplification):

+ #include <dt-bindings/thermal/thermal.h>
+
+ board_thermal: board_thermal {
+         passive-delay = <1000>; /* milliseconds */
+         polling-delay = <2500>; /* milliseconds */
+                 /*
+                  * sensor       ID
+                  */
+         sensors = <&adc-dummy     0>,
+		    <&adc-dummy     1>,
+		    <&adc-dymmy     2>;
+                         /*
+                          * cooling
+                          * device       Usage   Trip		lower	upper
+                          */
+         cooling-devices = <&cpu0       75     &cpu-trip	0      2>,
+			    <&gpu0       40     &gpu-trip	0      2>;
+			    <&lcd0       25     &lcd-trip	5      10>;
+         trips {
+                 cpu-trip: cpu-trip {
+                         temperature = <60000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_PASSIVE>;
+                 };
+                 gpu-trip: gpu-trip {
+                         temperature = <55000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_PASSIVE>;
+                 }
+                 lcd-trip: lcp-trip {
+                         temperature = <53000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_PASSIVE>;
+                 };
+                 crit-trip: crit-trip {
+                         temperature = <68000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_CRITICAL>;
+                 };
+         };
+ };

Now writing the above example, one might want to have a way to say the
sensor correlation equation.

Another example:
+ #include <dt-bindings/thermal/thermal.h>
+
+ dsp_thermal: dsp_thermal {
+         passive-delay = <250>; /* milliseconds */
+         polling-delay = <1000>; /* milliseconds */
+                 /*
+                  * sensor       ID
+                  */
+         sensors = <&bandgap0     1>;
+                         /*
+                          * cooling
+                          * device       Usage   Trip            lower
bound             upper bound
+                          */
+         cooling-devices = <&dsp0        100     &dsp-alert
THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
+         trips {
+                 dsp-alert: dsp-alert {
+                         temperature = <100000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_PASSIVE>;
+                 };
+                 dsp-crit: cdsp-crit {
+                         temperature = <125000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_CRITICAL>;
+                 };
+         };
+ };
+
+ gpu_thermal: gpu_thermal {
+         passive-delay = <500>; /* milliseconds */
+         polling-delay = <1000>; /* milliseconds */
+                 /*
+                  * sensor       ID
+                  */
+         sensors = <&bandgap0     2>;
+                         /*
+                          * cooling
+                          * device       Usage   Trip            lower
bound             upper bound
+                          */
+         cooling-devices = <&gpu0        100     &gpu-alert
THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
+         trips {
+                 gpu-alert: gpu-alert {
+                         temperature = <100000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_PASSIVE>;
+                 };
+                 gpu-crit: gpu-crit {
+                         temperature = <125000>; /* milliCelsius */
+                         hysteresis = <2000>; /* milliCelsius */
+                         type = <THERMAL_TRIP_CRITICAL>;
+                 };
+         };
+ };


Wei, I think the above would cover for the IPs with multiple internal
sensors cases you were looking for, right?

Durga, please also check if I am missing something to cover for your
plans to, in case you will be using DT to describe your HW.

Anyways, Mark and Pawel, let me know if I missed something from our
discussion. I believe the above bindings would look more like regular
standard DT bindings. And also they should be now slim enough, without
Linux implementation details and also without policies.

> 
>
Wei Ni Sept. 2, 2013, 8:14 a.m. UTC | #6
On 08/30/2013 07:19 AM, Eduardo Valentin wrote:
> * PGP Signed: 08/29/2013 at 04:19:46 PM
> 
> Mark, Pawel and Stephen,
> 
> 
> On 27-08-2013 14:17, Eduardo Valentin wrote:
>> On 27-08-2013 12:23, Mark Rutland wrote:
>>> On Tue, Aug 27, 2013 at 02:44:40PM +0100, Eduardo Valentin wrote:
>>>> Hello Mark,
> 
> <cut>
> 
> I believe now we need to align on thermal bindings, before I propose
> next version of this series. So, following the discussions on this
> thread, I believe a typical CPU thermal zone would look like the following:
> + #include <dt-bindings/thermal/thermal.h>
> +
> + cpu_thermal: cpu_thermal {
> +         passive-delay = <250>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     0>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&cpu0        100     &cpu-alert
> THERMAL_NO_LIMITS        THERMAL_NO_LIMITS>;
> +         trips {
> +                 cpu-alert: cpu-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 cpu-crit: cpu-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> *Note: THERMAL_NO_LIMIT means, it will be using the cooling device
> minimum and maximum limits.
> 
> Couple of comments.
> 1. I am keeping the pooling intervals. A possible alternative way to
> describe that would be specifying the maximum dT/dt. Essentially because
> I am still convinced that this is part of hw specs.
> 
> 2. The above follows your suggestion to use consumer pointing to
> producers. Although, I still need to figure out how this could be
> implemented for Linux. But I think that is another story, at least for
> now. We need to first align on the DT binding itself.
> 
> 3. The link from cooling device specification to trip points, from my
> perspective, should be enough, and thus we shall not need the thermal
> cells and size for trip points, as you proposed. Let me know what you think.
> 
> Below is another example, on a more complex scenario, board specific
> case (hypothetical, just for exemplification):
> 
> + #include <dt-bindings/thermal/thermal.h>
> +
> + board_thermal: board_thermal {
> +         passive-delay = <1000>; /* milliseconds */
> +         polling-delay = <2500>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&adc-dummy     0>,
> +		    <&adc-dummy     1>,
> +		    <&adc-dymmy     2>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip		lower	upper
> +                          */
> +         cooling-devices = <&cpu0       75     &cpu-trip	0      2>,
> +			    <&gpu0       40     &gpu-trip	0      2>;
> +			    <&lcd0       25     &lcd-trip	5      10>;
> +         trips {
> +                 cpu-trip: cpu-trip {
> +                         temperature = <60000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 gpu-trip: gpu-trip {
> +                         temperature = <55000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 }
> +                 lcd-trip: lcp-trip {
> +                         temperature = <53000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 crit-trip: crit-trip {
> +                         temperature = <68000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> Now writing the above example, one might want to have a way to say the
> sensor correlation equation.
> 
> Another example:
> + #include <dt-bindings/thermal/thermal.h>
> +
> + dsp_thermal: dsp_thermal {
> +         passive-delay = <250>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     1>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&dsp0        100     &dsp-alert
> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
> +         trips {
> +                 dsp-alert: dsp-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 dsp-crit: cdsp-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> +
> + gpu_thermal: gpu_thermal {
> +         passive-delay = <500>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     2>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&gpu0        100     &gpu-alert
> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
> +         trips {
> +                 gpu-alert: gpu-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 gpu-crit: gpu-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> 
> Wei, I think the above would cover for the IPs with multiple internal
> sensors cases you were looking for, right?
Yes, I think so, and this series is what I'm looking for, thanks again.

> 
> Durga, please also check if I am missing something to cover for your
> plans to, in case you will be using DT to describe your HW.
> 
> Anyways, Mark and Pawel, let me know if I missed something from our
> discussion. I believe the above bindings would look more like regular
> standard DT bindings. And also they should be now slim enough, without
> Linux implementation details and also without policies.
> 
>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Sept. 2, 2013, 4:28 p.m. UTC | #7
On 29-08-2013 19:19, Eduardo Valentin wrote:
> Mark, Pawel and Stephen,
> 
> 
> On 27-08-2013 14:17, Eduardo Valentin wrote:
>> On 27-08-2013 12:23, Mark Rutland wrote:
>>> On Tue, Aug 27, 2013 at 02:44:40PM +0100, Eduardo Valentin wrote:
>>>> Hello Mark,
> 
> <cut>
> 
> I believe now we need to align on thermal bindings, before I propose
> next version of this series. So, following the discussions on this
> thread, I believe a typical CPU thermal zone would look like the following:
> + #include <dt-bindings/thermal/thermal.h>
> +
> + cpu_thermal: cpu_thermal {
> +         passive-delay = <250>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     0>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&cpu0        100     &cpu-alert
> THERMAL_NO_LIMITS        THERMAL_NO_LIMITS>;
> +         trips {
> +                 cpu-alert: cpu-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 cpu-crit: cpu-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> *Note: THERMAL_NO_LIMIT means, it will be using the cooling device
> minimum and maximum limits.
> 
> Couple of comments.
> 1. I am keeping the pooling intervals. A possible alternative way to
> describe that would be specifying the maximum dT/dt. Essentially because
> I am still convinced that this is part of hw specs.
> 
> 2. The above follows your suggestion to use consumer pointing to
> producers. Although, I still need to figure out how this could be
> implemented for Linux. But I think that is another story, at least for
> now. We need to first align on the DT binding itself.
> 
> 3. The link from cooling device specification to trip points, from my
> perspective, should be enough, and thus we shall not need the thermal
> cells and size for trip points, as you proposed. Let me know what you think.
> 
> Below is another example, on a more complex scenario, board specific
> case (hypothetical, just for exemplification):
> 
> + #include <dt-bindings/thermal/thermal.h>
> +
> + board_thermal: board_thermal {
> +         passive-delay = <1000>; /* milliseconds */
> +         polling-delay = <2500>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&adc-dummy     0>,
> +		    <&adc-dummy     1>,
> +		    <&adc-dymmy     2>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip		lower	upper
> +                          */
> +         cooling-devices = <&cpu0       75     &cpu-trip	0      2>,
> +			    <&gpu0       40     &gpu-trip	0      2>;
> +			    <&lcd0       25     &lcd-trip	5      10>;
> +         trips {
> +                 cpu-trip: cpu-trip {
> +                         temperature = <60000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 gpu-trip: gpu-trip {
> +                         temperature = <55000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 }
> +                 lcd-trip: lcp-trip {
> +                         temperature = <53000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 crit-trip: crit-trip {
> +                         temperature = <68000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> Now writing the above example, one might want to have a way to say the
> sensor correlation equation.
> 
> Another example:
> + #include <dt-bindings/thermal/thermal.h>
> +
> + dsp_thermal: dsp_thermal {
> +         passive-delay = <250>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     1>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&dsp0        100     &dsp-alert
> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
> +         trips {
> +                 dsp-alert: dsp-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 dsp-crit: cdsp-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> +
> + gpu_thermal: gpu_thermal {
> +         passive-delay = <500>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     2>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&gpu0        100     &gpu-alert
> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
> +         trips {
> +                 gpu-alert: gpu-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 gpu-crit: gpu-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> 
> Wei, I think the above would cover for the IPs with multiple internal
> sensors cases you were looking for, right?
> 
> Durga, please also check if I am missing something to cover for your
> plans to, in case you will be using DT to describe your HW.
> 
> Anyways, Mark and Pawel, let me know if I missed something from our
> discussion. I believe the above bindings would look more like regular
> standard DT bindings. And also they should be now slim enough, without
> Linux implementation details and also without policies.

Any objections on the above binding proposal?

> 
>>
>>
> 
>
Mark Rutland Sept. 3, 2013, 1:15 p.m. UTC | #8
On Fri, Aug 30, 2013 at 12:19:43AM +0100, Eduardo Valentin wrote:
> Mark, Pawel and Stephen,
> 
> 
> On 27-08-2013 14:17, Eduardo Valentin wrote:
> > On 27-08-2013 12:23, Mark Rutland wrote:
> >> On Tue, Aug 27, 2013 at 02:44:40PM +0100, Eduardo Valentin wrote:
> >>> Hello Mark,

Hi, sorry for the delay.

> 
> <cut>
> 
> I believe now we need to align on thermal bindings, before I propose
> next version of this series. So, following the discussions on this
> thread, I believe a typical CPU thermal zone would look like the following:
> + #include <dt-bindings/thermal/thermal.h>
> +
> + cpu_thermal: cpu_thermal {
> +         passive-delay = <250>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */

I'm still not keen on placing these intervals in the dt. Why can the
kernel not choose sensible values?

> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     0>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&cpu0        100     &cpu-alert
> THERMAL_NO_LIMITS        THERMAL_NO_LIMITS>;

I'm not sure it makes sense to group this data within a property. We
might need to extend this in future and it might not be possible to
extend this unambiguously. It may make more sense as a sub-node or
collection of properties.

> +         trips {
> +                 cpu-alert: cpu-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 cpu-crit: cpu-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };

How do the upper and lower bounds in the cooling-devices property
interact with temperatures described in the trips node?

> 
> *Note: THERMAL_NO_LIMIT means, it will be using the cooling device
> minimum and maximum limits.

I'm confused what are the cooling device min and max limits? Where are
they defined if not in the lower bound and upper bound entries of the
cooling-devices property?

> 
> Couple of comments.
> 1. I am keeping the pooling intervals. A possible alternative way to
> describe that would be specifying the maximum dT/dt. Essentially because
> I am still convinced that this is part of hw specs.

I still don't see why it's not possible to do this dynamically. I am not
entirely convinced this is part of the hw spec.

> 
> 2. The above follows your suggestion to use consumer pointing to
> producers. Although, I still need to figure out how this could be
> implemented for Linux. But I think that is another story, at least for
> now. We need to first align on the DT binding itself.

I assume you mean the sensors property? That looks fine to me, though we
may need to specify the way sensors relate to temperatures in trip
points and cooling device properties -- do those temperatures correspond
to an average of sensors readings, min/max, etc?

> 
> 3. The link from cooling device specification to trip points, from my
> perspective, should be enough, and thus we shall not need the thermal
> cells and size for trip points, as you proposed. Let me know what you think.

I'm not keen on the mixed bag of values in the cooling-device property,
as it's difficult to extend in future if necessary. I think we may need
cells for cooling devices -- what if a single fan controller controls
fans in different thermal zones?

I'm also not keen on devices being described as their own cooling
device, as I think this is more difficult to support than not listing a
cooling device -- it'll have to be special-cased anyway.

> Below is another example, on a more complex scenario, board specific
> case (hypothetical, just for exemplification):
> 
> + #include <dt-bindings/thermal/thermal.h>
> +
> + board_thermal: board_thermal {
> +         passive-delay = <1000>; /* milliseconds */
> +         polling-delay = <2500>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&adc-dummy     0>,
> +		    <&adc-dummy     1>,
> +		    <&adc-dymmy     2>;

Here we have a set of sensors, but no relationship between these sensors
and the trips or cooling-devices are described. How does that work?

> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip		lower	upper
> +                          */
> +         cooling-devices = <&cpu0       75     &cpu-trip	0      2>,
> +			    <&gpu0       40     &gpu-trip	0      2>;
> +			    <&lcd0       25     &lcd-trip	5      10>;
> +         trips {
> +                 cpu-trip: cpu-trip {
> +                         temperature = <60000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 gpu-trip: gpu-trip {
> +                         temperature = <55000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 }
> +                 lcd-trip: lcp-trip {
> +                         temperature = <53000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 crit-trip: crit-trip {
> +                         temperature = <68000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> Now writing the above example, one might want to have a way to say the
> sensor correlation equation.
> 
> Another example:
> + #include <dt-bindings/thermal/thermal.h>
> +
> + dsp_thermal: dsp_thermal {
> +         passive-delay = <250>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     1>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&dsp0        100     &dsp-alert
> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
> +         trips {
> +                 dsp-alert: dsp-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 dsp-crit: cdsp-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> +
> + gpu_thermal: gpu_thermal {
> +         passive-delay = <500>; /* milliseconds */
> +         polling-delay = <1000>; /* milliseconds */
> +                 /*
> +                  * sensor       ID
> +                  */
> +         sensors = <&bandgap0     2>;
> +                         /*
> +                          * cooling
> +                          * device       Usage   Trip            lower
> bound             upper bound
> +                          */
> +         cooling-devices = <&gpu0        100     &gpu-alert
> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
> +         trips {
> +                 gpu-alert: gpu-alert {
> +                         temperature = <100000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_PASSIVE>;
> +                 };
> +                 gpu-crit: gpu-crit {
> +                         temperature = <125000>; /* milliCelsius */
> +                         hysteresis = <2000>; /* milliCelsius */
> +                         type = <THERMAL_TRIP_CRITICAL>;
> +                 };
> +         };
> + };
> 
> 
> Wei, I think the above would cover for the IPs with multiple internal
> sensors cases you were looking for, right?
> 
> Durga, please also check if I am missing something to cover for your
> plans to, in case you will be using DT to describe your HW.
> 
> Anyways, Mark and Pawel, let me know if I missed something from our
> discussion. I believe the above bindings would look more like regular
> standard DT bindings. And also they should be now slim enough, without
> Linux implementation details and also without policies.

I think that the above can describe that, but I'd like to see a binding
document so we can consider it in more detail.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eduardo Valentin Sept. 3, 2013, 5:12 p.m. UTC | #9
Hi Mark,

On 03-09-2013 09:15, Mark Rutland wrote:
> On Fri, Aug 30, 2013 at 12:19:43AM +0100, Eduardo Valentin wrote:
>> Mark, Pawel and Stephen,
>>
>>
>> On 27-08-2013 14:17, Eduardo Valentin wrote:
>>> On 27-08-2013 12:23, Mark Rutland wrote:
>>>> On Tue, Aug 27, 2013 at 02:44:40PM +0100, Eduardo Valentin wrote:
>>>>> Hello Mark,
> 
> Hi, sorry for the delay.
> 

No issues,

>>
>> <cut>
>>
>> I believe now we need to align on thermal bindings, before I propose
>> next version of this series. So, following the discussions on this
>> thread, I believe a typical CPU thermal zone would look like the following:
>> + #include <dt-bindings/thermal/thermal.h>
>> +
>> + cpu_thermal: cpu_thermal {
>> +         passive-delay = <250>; /* milliseconds */
>> +         polling-delay = <1000>; /* milliseconds */
> 
> I'm still not keen on placing these intervals in the dt. Why can the
> kernel not choose sensible values?
> 
>> +                 /*
>> +                  * sensor       ID
>> +                  */
>> +         sensors = <&bandgap0     0>;
>> +                         /*
>> +                          * cooling
>> +                          * device       Usage   Trip            lower
>> bound             upper bound
>> +                          */
>> +         cooling-devices = <&cpu0        100     &cpu-alert
>> THERMAL_NO_LIMITS        THERMAL_NO_LIMITS>;
> 
> I'm not sure it makes sense to group this data within a property. We
> might need to extend this in future and it might not be possible to
> extend this unambiguously. It may make more sense as a sub-node or
> collection of properties.
> 

I see. Something like the previous DT binding? My previous proposal was
using sub-nodes.

>> +         trips {
>> +                 cpu-alert: cpu-alert {
>> +                         temperature = <100000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_PASSIVE>;
>> +                 };
>> +                 cpu-crit: cpu-crit {
>> +                         temperature = <125000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_CRITICAL>;
>> +                 };
>> +         };
>> + };
> 
> How do the upper and lower bounds in the cooling-devices property
> interact with temperatures described in the trips node?

By means of the trip phandle. It would mean once the trip is crossed, it
is expected to modulate the cooling device between the specified limits.

> 
>>
>> *Note: THERMAL_NO_LIMIT means, it will be using the cooling device
>> minimum and maximum limits.
> 
> I'm confused what are the cooling device min and max limits? Where are
> they defined if not in the lower bound and upper bound entries of the
> cooling-devices property?

They were not specified in this example, because previous discussions
suggested that we shall not represent cooling devices as dt nodes as
they are virtual. Those would be part of the cooling device node, in the
case we want to have cooling devices nodes and property in DT.

> 
>>
>> Couple of comments.
>> 1. I am keeping the pooling intervals. A possible alternative way to
>> describe that would be specifying the maximum dT/dt. Essentially because
>> I am still convinced that this is part of hw specs.
> 
> I still don't see why it's not possible to do this dynamically. I am not
> entirely convinced this is part of the hw spec.

Because not all hardware have power sensors and deriving power vs.
temperature vs. time is not a simple task, specially based on estimates,
say on cpu load. Remember that we are not only talking about CPU
temperature here, there are other heat sources. Same load has different
consumption on different HW, how SW is supposed to find that out wihtout
proper sensing?

> 
>>
>> 2. The above follows your suggestion to use consumer pointing to
>> producers. Although, I still need to figure out how this could be
>> implemented for Linux. But I think that is another story, at least for
>> now. We need to first align on the DT binding itself.
> 
> I assume you mean the sensors property? That looks fine to me, though we

Sensors and cooling-devices.

> may need to specify the way sensors relate to temperatures in trip
> points and cooling device properties -- do those temperatures correspond
> to an average of sensors readings, min/max, etc?

Those are based on instant measurements.

> 
>>
>> 3. The link from cooling device specification to trip points, from my
>> perspective, should be enough, and thus we shall not need the thermal
>> cells and size for trip points, as you proposed. Let me know what you think.
> 
> I'm not keen on the mixed bag of values in the cooling-device property,
> as it's difficult to extend in future if necessary. I think we may need
> cells for cooling devices -- what if a single fan controller controls
> fans in different thermal zones?

I believe it is just a matter of representation. Using either a list or
subnodes would imply that one would need to add links to fan device
twice, one in each thermal zone (it does not matter if inside a subnode
or inside a list).

> 
> I'm also not keen on devices being described as their own cooling
> device, as I think this is more difficult to support than not listing a
> cooling device -- it'll have to be special-cased anyway.

Then we need to have cooling device nodes, even if they are virtual.

> 
>> Below is another example, on a more complex scenario, board specific
>> case (hypothetical, just for exemplification):
>>
>> + #include <dt-bindings/thermal/thermal.h>
>> +
>> + board_thermal: board_thermal {
>> +         passive-delay = <1000>; /* milliseconds */
>> +         polling-delay = <2500>; /* milliseconds */
>> +                 /*
>> +                  * sensor       ID
>> +                  */
>> +         sensors = <&adc-dummy     0>,
>> +		    <&adc-dummy     1>,
>> +		    <&adc-dymmy     2>;
> 
> Here we have a set of sensors, but no relationship between these sensors
> and the trips or cooling-devices are described. How does that work?

See my query below.

> 
>> +                         /*
>> +                          * cooling
>> +                          * device       Usage   Trip		lower	upper
>> +                          */
>> +         cooling-devices = <&cpu0       75     &cpu-trip	0      2>,
>> +			    <&gpu0       40     &gpu-trip	0      2>;
>> +			    <&lcd0       25     &lcd-trip	5      10>;
>> +         trips {
>> +                 cpu-trip: cpu-trip {
>> +                         temperature = <60000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_PASSIVE>;
>> +                 };
>> +                 gpu-trip: gpu-trip {
>> +                         temperature = <55000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_PASSIVE>;
>> +                 }
>> +                 lcd-trip: lcp-trip {
>> +                         temperature = <53000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_PASSIVE>;
>> +                 };
>> +                 crit-trip: crit-trip {
>> +                         temperature = <68000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_CRITICAL>;
>> +                 };
>> +         };
>> + };
>>
>> Now writing the above example, one might want to have a way to say the
>> sensor correlation equation.

One way to go is to add a property to describe the linear relation
between them.

>>
>> Another example:
>> + #include <dt-bindings/thermal/thermal.h>
>> +
>> + dsp_thermal: dsp_thermal {
>> +         passive-delay = <250>; /* milliseconds */
>> +         polling-delay = <1000>; /* milliseconds */
>> +                 /*
>> +                  * sensor       ID
>> +                  */
>> +         sensors = <&bandgap0     1>;
>> +                         /*
>> +                          * cooling
>> +                          * device       Usage   Trip            lower
>> bound             upper bound
>> +                          */
>> +         cooling-devices = <&dsp0        100     &dsp-alert
>> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
>> +         trips {
>> +                 dsp-alert: dsp-alert {
>> +                         temperature = <100000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_PASSIVE>;
>> +                 };
>> +                 dsp-crit: cdsp-crit {
>> +                         temperature = <125000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_CRITICAL>;
>> +                 };
>> +         };
>> + };
>> +
>> + gpu_thermal: gpu_thermal {
>> +         passive-delay = <500>; /* milliseconds */
>> +         polling-delay = <1000>; /* milliseconds */
>> +                 /*
>> +                  * sensor       ID
>> +                  */
>> +         sensors = <&bandgap0     2>;
>> +                         /*
>> +                          * cooling
>> +                          * device       Usage   Trip            lower
>> bound             upper bound
>> +                          */
>> +         cooling-devices = <&gpu0        100     &gpu-alert
>> THERMAL_NO_LIMIT        THERMAL_NO_LIMIT>;
>> +         trips {
>> +                 gpu-alert: gpu-alert {
>> +                         temperature = <100000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_PASSIVE>;
>> +                 };
>> +                 gpu-crit: gpu-crit {
>> +                         temperature = <125000>; /* milliCelsius */
>> +                         hysteresis = <2000>; /* milliCelsius */
>> +                         type = <THERMAL_TRIP_CRITICAL>;
>> +                 };
>> +         };
>> + };
>>
>>
>> Wei, I think the above would cover for the IPs with multiple internal
>> sensors cases you were looking for, right?
>>
>> Durga, please also check if I am missing something to cover for your
>> plans to, in case you will be using DT to describe your HW.
>>
>> Anyways, Mark and Pawel, let me know if I missed something from our
>> discussion. I believe the above bindings would look more like regular
>> standard DT bindings. And also they should be now slim enough, without
>> Linux implementation details and also without policies.
> 
> I think that the above can describe that, but I'd like to see a binding
> document so we can consider it in more detail.

Well, I can write another RFC based on this discussion, for sure. I just
want to know what really blocks this work, that is why I sent the
examples above. But it does not seam we are progressing, looks like we
are move discussing in circles.

Cooling devices seams to be bouncing back and forward. I can send an RFC
considering:
i) addition of cooling devices virtual device nodes
ii) description of used cooling devices inside thermal zones by means of
sub-nodes, instead of lists

Pooling intervals seams to be also a loose lace. To me it is a matter of
choosing a good way to avoid missing events. We can either get the info
from HW simulation / experimentation based on HW itself or we can rely
on SW and assume SW can handle to see all events based on loose info
(assume that sw can be smart to do a self adaptive closed loop control
system based on no defined set of inputs). Both solutions are doable,
except that one may be closer to reality than the other. Also, one is
based on HW info and the other forces sw to deduce HW info. If you can
provide info that your proposal has strong evidence to be closer to
reality, I am find to signed it off.

All best,


Eduardo.
> 
> Thanks,
> Mark.
> 
>
Eduardo Valentin Sept. 7, 2013, 12:19 a.m. UTC | #10
Hi Mark, Stephen and Pawel,

On 03-09-2013 09:15, Mark Rutland wrote:
> On Fri, Aug 30, 2013 at 12:19:43AM +0100, Eduardo Valentin wrote:

<cut>

> I think that the above can describe that, but I'd like to see a binding
> document so we can consider it in more detail.

Find below another proposal. It is the updated binding document, with
the your comments applied (at least those I agree :-) ). It is obviously
an work in progress, but I think it is getting closer to what we are
trying to achieve, I believe. And of course, much better after using
your suggestions.

As I stated before, I believe it is crucial to first agree on the
bindings, then I can go ahead and update the corresponding code.

The change from the last binding examples I sent is basically on sensors
and cooling devices. This time, as suggested by Mark, I am adding
cooling device nodes (or at least properties to be embedded into
existing nodes). At some point, I remember that Pawel was not so in
favor on this type of node, but lets discuss on top of the document
below. I also added the #cells properties, as needed.

Hopefully we may end with an agreement. :-)

So, the document would look like this:

-----------------------------------------------------------------------
* Thermal Framework Device Tree descriptor

Generic binding to provide a way of defining hardware thermal
structure using device tree. A thermal structure includes thermal
zones and their components, such as trip points, polling intervals,
sensors and cooling devices binding descriptors.

The target of device tree thermal descriptors is to describe only
the hardware thermal aspects, not how the system must control or which
algorithm or policy must be taken in place.

There are five types of nodes involved to describe thermal bindings:
- sensors: used to describe the device source of temperature sensing;
- cooling devices: used to describe devices source of power dissipation
control;
- trip points: used to describe points in temperature domain defined to
make the system aware of hardware limits;
- cooling attachments: used to describe links between trip points and
cooling devices;
- thermal zones: used to describe thermal data within the hardware;

It follows a description of each type of these device tree nodes.

* Sensor devices

Sensor devices are nodes providing temperature sensing capabilities on
thermal
zones. Typical devices are I2C ADC converters and bandgaps. Theses are nodes
providing temperature data to thermal zones. Temperature sensor devices may
control one or more internal sensors.

Required property:
- #sensor-cells:	Used to provide sensor device specific information
			while referring to it. Must be at least 1, in order
			to identify uniquely the sensor instances within
			the IC. See thermal zone binding for more details
			on how consumers refer to sensor devices.

* Cooling device nodes

Cooling devices are nodes providing control on power dissipation. There
are essentially two ways to provide control on power dissipation. First
is by means of regulating device performance, which is known as passive
cooling. Second is by means of activating devices in order to remove
the dissipated heat, which is known as active cooling, e.g. regulating
fan speeds. In both cases, cooling devices shall have a way to determine
the level of cooling.

Required property:
- cooling-min-level:	A unsigned integer indicating the smallest
			cooling level accepted. Typically 0.
- cooling-max-level:	An unsigned integer indicating the largest
			cooling level accepted.
- #cooling-cells:	Used to provide cooling device specific information
			while referring to it. Must be at least 2, in order
			to specify minimum and maximum cooling level used
			in the reference. See Cooling device attachments section
			below for more details on how consumers refer to
			cooling devices.

* Trip points

The trip node is a node to describe a point in the temperature domain
in which the system takes an action. This node describes just the point,
not the action.

Required properties:
- temperature:		the trip temperature level, in milliCelsius.
- hysteresis:		a (low) hysteresis value on 'temperature'. This is a
			relative value, in milliCelsius.
- type:			the trip type. Here is the type mapping:
	THERMAL_TRIP_ACTIVE	0:	A trip point to enable active cooling
	THERMAL_TRIP_PASSIVE	1:	A trip point to enable passive cooling
	THERMAL_TRIP_HOT	2:	A trip point to notify emergency
	THERMAL_TRIP_CRITICAL	3:	Hardware not reliable.

Refer to include/dt-bindings/thermal/thermal.h for definition of these
consts.

* Cooling device attachments

The cooling device attachments node is a node to describe how cooling
devices
get assigned to trip points of the zone. The cooling devices are expected
to be loaded in the target system.

Required properties:
- cooling-device:	A phandle of a cooling device with its parameters,
			referring to which cooling device is used in this
			binding. The required parameters are: the minimum
			cooling level and the maximum cooling level used
			in this attach.
- trip:			A phandle of a trip point node within the same thermal
			zone.

Optional property:
- contribution:		The cooling contribution to the thermal zone of the
			referred cooling device at the referred trip point.
			The contribution is a value from 0 to 100. The sum
			of all cooling contributions within a thermal zone
			must never exceed 100.

Note: Using the THERMAL_NO_LIMIT (-1L) constant in the cooling-device
phandle
limit parameters means:
(i)   - minimum level allowed for minimum cooling level used in the
reference.
(ii)  - maximum level allowed for maximum cooling level used in the
reference.
Refer to include/dt-bindings/thermal/thermal.h for definition of this
constant.

* Thermal zones

The thermal-zone node is the node containing all the required info
for describing a thermal zone, including its cdev bindings. The thermal_zone
node must contain, apart from its own properties, one node containing
trip nodes and one node containing all the zone cooling attachments.

Required properties:
- passive-delay:	The maximum number of milliseconds to wait between polls
			when performing passive cooling.
- polling-delay:	The maximum number of milliseconds to wait between polls
			when checking this thermal zone.
- sensors:		A list of sensor phandles and their parameters. The
			required parameter is the sensor id, in order to
			identify internal sensors when the sensor IC features
			several sensing units.
- trips:		A sub-node containing several trip point nodes required
			to describe the thermal zone.
- cooling-attachments	A sub-node containing several cooling device attaches
			nodes, used to describe the relation between trips
			and cooling devices.

Optional property:
- coefficients:		An array of integers (one signed cell) containing
			coefficients to compose a linear relation between
			the sensors described in the sensors property.
			Coefficients defaults to 1, in case this property
			is not specified. A simple linear polynomial is used:
			Z = c0 * x0 + c1 + x1 + ... + c(n-1) * x(n-1) + cn.

			The coefficients are ordered and they match with sensors
			by means of sensor ID. Additional coefficients are
			interpreted as constant offsets.

Note: The delay properties are bound to the maximum dT/dt (temperature
derivative over time) in two situations for a thermal zone:
(i)  - when active cooling is activated (passive-delay); and
(ii) - when the zone just needs to be monitored (polling-delay).
The maximum dT/dt is highly bound to hardware power consumption and
dissipation
capability.

* Examples

Below are several examples on how to use thermal data descriptors
using device tree bindings:

(a) - CPU thermal zone

The CPU thermal zone example below describes how to setup one thermal zone
using one single sensor as temperature source and many cooling devices and
power dissipation control sources.

#include <dt-bindings/thermal/thermal.h>

cpus {
	cpu0: cpu@0 {
		...
		cooling-min-level = 0;
		cooling-max-level = 3;
		#cooling-cells = <2>; /* min followed by max */
	};
	...
};

&i2c1 {
	...
	fan0: fan@0x48 {
		...
		cooling-min-level = 0;
		cooling-max-level = 9;
		#cooling-cells = <2>; /* min followed by max */
	};
};

bandgap0: bandgap@0x0000ED00 {
	...
	#sensor-cells = <1>;
};

cpu-thermal: cpu-thermal {
	passive-delay = <250>; /* milliseconds */
	polling-delay = <1000>; /* milliseconds */

		/* sensor       ID */
        sensors = <&bandgap0     0>;

        trips {
                cpu-alert0: cpu-alert {
                        temperature = <90000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_ACTIVE>;
                };
                cpu-alert1: cpu-alert {
                        temperature = <100000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_PASSIVE>;
                };
                cpu-crit: cpu-crit {
                        temperature = <125000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_CRITICAL>;
                };
        };

	cooling-attachments {
		attach0 {
			trip = <&cpu-alert0>;
			cooling-devices = <&fan0 THERMAL_NO_LIMITS 4>;
 		};
		attach1 {
			trip = <&cpu-alert1>;
			cooling-device = <&fan0 5 THERMAL_NO_LIMITS>;
		};
		attach2 {
			trip = <&cpu-alert1>;
			cooling-device =
				<&cpu0 THERMAL_NO_LIMITS THERMAL_NO_LIMITS>;
		};
	};
};

In the example above, the ADC sensor at address 0x0000ED00 is used to
monitor
the zone 'cpu-thermal' using its the sensor 0. The fan0, a fan device
controlled
via I2C bus 1, at adress 0x48, is used to remove the heat out of the thermal
zone 'cpu-thermal' using its cooling levels from its minimum to 4, when it
reaches trip point 'cpu-alert0' at 90C, as an example of active cooling. The
same cooling device is used at 'cpu-alert1', but from 5 to its maximum
level.
The cpu@0 device is also linked to the same thermal zone, 'cpu-thermal',
as a
passive cooling device, using all its cooling levels at trip point
'cpu-alert1',
which is a trip point at 100C.

(b) - IC with several internal sensors

The example below describes how to deploy several thermal zones based off a
single sensor IC, assuming it has several internal sensors. This is a common
case on SoC designs with several internal IPs that may need different
thermal
requirements, and thus may have their own sensor to monitor or detect
internal
hotspots in their silicon.

#include <dt-bindings/thermal/thermal.h>

bandgap0: bandgap@0x0000ED00 {
	...
	#sensor-cells = <1>;
};

cpu-thermal: cpu-thermal {
	passive-delay = <250>; /* milliseconds */
	polling-delay = <1000>; /* milliseconds */

		/* sensor       ID */
        sensors = <&bandgap0     0>;

        trips {
		/* each zone within the SoC may have its own trips */
                cpu-alert: cpu-alert {
                        temperature = <100000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_PASSIVE>;
                };
                cpu-crit: cpu-crit {
                        temperature = <125000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_CRITICAL>;
                };
        };

	cooling-attachments {
		/* each zone within the SoC may have its own cooling */
		...
	};
};

gpu-thermal: gpu-thermal {
	passive-delay = <120>; /* milliseconds */
	polling-delay = <1000>; /* milliseconds */

		/* sensor       ID */
        sensors = <&bandgap0     1>;

        trips {
		/* each zone within the SoC may have its own trips */
                gpu-alert: gpu-alert {
                        temperature = <90000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_PASSIVE>;
                };
                gpu-crit: gpu-crit {
                        temperature = <105000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_CRITICAL>;
                };
        };

	cooling-attachments {
		/* each zone within the SoC may have its own cooling */
		...
	};
};

dsp-thermal: dsp-thermal {
	passive-delay = <50>; /* milliseconds */
	polling-delay = <1000>; /* milliseconds */

		/* sensor       ID */
        sensors = <&bandgap0     2>;

        trips {
		/* each zone within the SoC may have its own trips */
                dsp-alert: gpu-alert {
                        temperature = <90000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_PASSIVE>;
                };
                dsp-crit: gpu-crit {
                        temperature = <135000>; /* milliCelsius */
                        hysteresis = <2000>; /* milliCelsius */
                        type = <THERMAL_TRIP_CRITICAL>;
                };
        };

	cooling-attachments {
		/* each zone within the SoC may have its own cooling */
		...
	};
};

In the example above there is one bandgap IC which has the capability to
monitor three sensors. The hardware has been designed so that sensors are
placed on different places in the DIE to monitor different temperature
hotspots: one for CPU thermal zone, one for GPU thermal zone and the
other to monitor a DSP thermal zone.

Thus, there is a need to assign each sensor provided by the bandgap IC
to different thermal zones. This is achieved by means of using the
#sensor-cells property and using the first parameter as sensor ID.
In the example, then, bandgap.sensor0 is used to monitor CPU thermal zone,
bandgap.sensor1 is used to monitor GPU thermal zone and bandgap.sensor2
is used to monitor DSP thermal zone. Each zone may be uncorrelated,
having its own dT/dt requirements, trips and cooling attachments.

(c) - Several sensors within one single thermal zone

The example below illustrates how to use more than one sensor within
one thermal zone.

#include <dt-bindings/thermal/thermal.h>

&i2c1 {
	...
	adc: sensor@0x49 {
		...
		#sensor-cells = <1>;
	};
};

bandgap0: bandgap@0x0000ED00 {
	...
	#sensor-cells = <1>;
};

cpu-thermal: cpu-thermal {
	passive-delay = <250>; /* milliseconds */
	polling-delay = <1000>; /* milliseconds */

		/* sensor       ID */
        sensors = <&bandgap0	0>,
		  <&adc		0>;

		/* hotspot = 100 * bandgap - 120 * adc + 484 */
 	coefficients = 		<100	-120	484>;

        trips {
		...
        };

	cooling-attachments {
		...
	};
};

In some cases, there is a need to use more than one sensor to extrapolate
a thermal hotspot in the silicon. The above example illustrate this
situation.
For instance, it may be the case that a sensor external to CPU IP may be
place
close to CPU hotspot and together with internal CPU sensor, it is used
to determine the hotspot. The hyppotetical extrapolation rule would be:
		hotspot = 100 * bandgap - 120 * adc + 484

The same idea can be used to add fixed offset:
	passive-delay = <1000>; /* milliseconds */
	polling-delay = <2500>; /* milliseconds */
		hotspot = 1 * adc + 6000

In the above equation, the hotspot is always 6C higher than what is read
from the sensor ADC. The binding would be then:
		/* sensor       ID */
        sensors =  <&adc	0>;

		/* hotspot = 1 * adc + 6000 */
	coefficients = 		<1	6000>;

(d) - Board thermal

The board thermal example below illustrates how to setup one thermal zone
with many sensors and many cooling devices.

#include <dt-bindings/thermal/thermal.h>

&i2c1 {
 	...
	adc-dummy: sensor@0x50 {
		...
		#sensor-cells = <1>; /* sensor internal ID */
	};
};

batt-thermal {
	passive-delay = <500>; /* milliseconds */
	polling-delay = <2500>; /* milliseconds */

	/* sensor       ID */
	sensors = <&adc-dummy     4>;

	trips {
		...
	};

	cooling-attachments {
		...
	};
};

board-thermal: board-thermal {
	passive-delay = <1000>; /* milliseconds */
	polling-delay = <2500>; /* milliseconds */

		/* sensor       ID */
	sensors = <&adc-dummy     0>,
		  <&adc-dummy     1>,
		  <&adc-dymmy     2>;
			/*
			 * An array of coefficients describing the sensor
			 * linear relation. E.g.:
			 * z = c1*x1 + c2*x2 + c3*x3
			 */
	coefficients =		<1200	-345	890>;

	trips {
		/* Trips are based on resulting linear equation */
		cpu-trip: cpu-trip {
			temperature = <60000>; /* milliCelsius */
			hysteresis = <2000>; /* milliCelsius */
			type = <THERMAL_TRIP_PASSIVE>;
		};
		gpu-trip: gpu-trip {
			temperature = <55000>; /* milliCelsius */
			hysteresis = <2000>; /* milliCelsius */
			type = <THERMAL_TRIP_PASSIVE>;
		}
		lcd-trip: lcp-trip {
			temperature = <53000>; /* milliCelsius */
			hysteresis = <2000>; /* milliCelsius */
			type = <THERMAL_TRIP_PASSIVE>;
		};
		crit-trip: crit-trip {
			temperature = <68000>; /* milliCelsius */
			hysteresis = <2000>; /* milliCelsius */
 			type = <THERMAL_TRIP_CRITICAL>;
		};
	};

	cooling-attachments {
		attach0 {
			trip = <&cpu-trip>;
			cooling-device = <&cpu0 0 2>;
			contribution = <55>;
		};
		attach1 {
			trip = <&gpu-trip>;
			cooling-device = <&gpu0 0 2>;
			contribution = <20>;
		};
 		attach2 {
			trip = <&lcd-trip>;
			cooling-device = <&lcd0 5 10>;
			contribution = <15>;
		};
	};
};

The above example is a mix of previous examples, a sensor IP with
several internal
sensors used to monitor different zones, one of them is composed by
several sensors and
with different cooling devices.
-----------------------------------------------------------------------

All best,
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
new file mode 100644
index 0000000..af20ab0
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/thermal.txt
@@ -0,0 +1,160 @@ 
+----------------------------------------
+Thermal Framework Device Tree descriptor
+----------------------------------------
+
+This file describes how to define a thermal structure using device tree.
+A thermal structure includes thermal zones and their components, such
+as name, trip points, polling intervals and cooling devices binding
+descriptors. A binding descriptor may contain information on how to
+react, with a cooling stepped action or a weight on a fair share.
+The target of device tree thermal descriptors is to describe only
+the hardware thermal aspects, not how the system must control or which
+algorithm or policy must take place. It follows a description of each
+type of device tree node.
+
+****
+trip
+****
+
+The trip node is a node to describe a point in the temperature domain
+in which the system takes an action. This node describes just the point,
+not the action.
+
+A node describing a trip must contain:
+- temperature: the trip temperature level, in milliCelsius.
+- hysteresis: a (low) hysteresis value on 'temperature'. This is a relative
+value, in milliCelsius.
+- type: the trip type. Here is the type mapping:
+	THERMAL_TRIP_ACTIVE
+	THERMAL_TRIP_PASSIVE
+	THERMAL_TRIP_HOT
+	THERMAL_TRIP_CRITICAL
+
+**********
+bind_param
+**********
+
+The bind_param node is a node to describe how cooling devices get assigned
+to trip points of the zone. The cooling devices are expected to be loaded
+in the target system.
+
+A node describing a bind_param must contain:
+- cooling_device: A string with the cooling device name.
+- weight: The 'influence' of a particular cooling device on this zone.
+             This is on a percentage scale. The sum of all these weights
+             (for a particular zone) cannot exceed 100.
+- trip_mask: This is a bit mask that gives the binding relation between
+               this thermal zone and cdev, for a particular trip point.
+               If nth bit is set, then the cdev and thermal zone are bound
+               for trip point n.
+- limits: An array integers consisting of 2-tuples items, and each item means
+  lower and upper state limits, like <min-state max-state>.
+
+**************************
+temperature sensor devices
+**************************
+
+Temperature sensor devices have to be linked to a specific thermal zone.
+This is done by means of the 'monitored-zones' list.
+- monitored-zones: A list of thermal zones phandles, identifying which
+thermal zones the temperature sensor device is used to monitor.
+
+************
+thermal_zone
+************
+
+The thermal_zone node is the node containing all the required info
+for describing a thermal zone, including its cdev bindings. The thermal_zone
+node must contain, apart from its own properties, one node containing
+trip nodes and one node containing all the zone bind parameters.
+
+Required properties:
+- mask: Bit string: If 'n'th bit is set, then trip point 'n' is writeable.
+
+- passive_delay: number of milliseconds to wait between polls when
+  performing passive cooling.
+
+- polling_delay: number of milliseconds to wait between polls when checking.
+
+- #thermal-cells: An integer that is used to specify how many cells compose
+  sensor ids.
+
+Below is an example:
+cpu_thermal: thermal_zone {
+            #thermal-cells = <1>;
+            mask = <0x03>; /* trips writability */
+            passive_delay = <250>; /* milliseconds */
+            polling_delay = <1000>; /* milliseconds */
+            trips {
+                    alert@100000{
+                            temperature = <100000>; /* milliCelsius */
+                            hysteresis = <0>; /* milliCelsius */
+                            type = <THERMAL_TRIP_PASSIVE>;
+                    };
+                    crit@125000{
+                            temperature = <125000>; /* milliCelsius */
+                            hysteresis = <0>; /* milliCelsius */
+                            type = <THERMAL_TRIP_CRITICAL>;
+                    };
+            };
+            bind_params {
+                    action@0{
+                            cooling_device = "thermal-cpufreq";
+                            weight = <100>; /* percentage */
+                            mask = <0x01>;
+			    limits = <
+			      /* lower-state		upper-state */
+				1			THERMAL_NO_LIMITS
+				THERMAL_NO_LIMITS	THERMAL_NO_LIMITS
+			    >;
+                    };
+            };
+};
+
+sensor: adc@0xFFFF {
+	...
+	monitored-zones = <&cpu_thermal 0>;
+};
+
+cpu@0 {
+	...
+	cooling-zones = <&cpu_thermal>;
+};
+
+In the example above, the ADC sensor at address 0xFFFF is used to monitor
+the zone 'cpu_thermal' using its the sensor 0. The cpu@0 device is also linked
+to the same thermal zone 'cpu_thermal' as a cooling device.
+
+***************
+cpufreq-cooling
+***************
+
+The generic cpu cooling (freq clipping) can be loaded by the
+generic cpufreq-cpu0 driver in case the device tree node informs
+this need.
+
+In order to load the cpufreq cooling device, it is possible to
+inform that the cpu needs cooling by adding the list of thermal zones
+in the 'cooling-zones' property at the cpu0 phandle.
+
+Example:
+	cpus {
+		cpu@0 {
+			operating-points = <
+				/* kHz    uV */
+				800000  1313000
+				1008000 1375000
+			>;
+			cooling-zones = <&cpu_thermal>;
+		};
+	...
+	};
+
+The above will cause the cpufreq-cpu0 driver to understand that
+the cpu will need passive cooling and while probing that node it will
+also load the cpufreq cpu cooling device in that system.
+
+However, be advised that this flag will not describe what needs to
+be done or how the cpufreq cooling device will be used. In order to
+accomplish this, one needs to write a phandle for a thermal zone, as
+described in the section 'thermal_zone'.
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 7fb16bc..753f0dc 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -29,6 +29,19 @@  config THERMAL_HWMON
 	  Say 'Y' here if you want all thermal sensors to
 	  have hwmon sysfs interface too.
 
+config THERMAL_OF
+	bool
+	prompt "APIs to parse thermal data out of device tree"
+	depends on OF
+	default y
+	help
+	  This options provides helpers to add the support to
+	  read and parse thermal data definitions out of the
+	  device tree blob.
+
+	  Say 'Y' here if you need to build thermal infrastructure
+	  based on device tree.
+
 choice
 	prompt "Default Thermal governor"
 	default THERMAL_DEFAULT_GOV_STEP_WISE
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 24cb894..eedb273 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -7,6 +7,7 @@  thermal_sys-y			+= thermal_core.o
 
 # interface to/from other layers providing sensors
 thermal_sys-$(CONFIG_THERMAL_HWMON)		+= thermal_hwmon.o
+thermal_sys-$(CONFIG_THERMAL_OF)		+= thermal_dt.o
 
 # governors
 thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE)	+= fair_share.o
diff --git a/drivers/thermal/thermal_dt.c b/drivers/thermal/thermal_dt.c
new file mode 100644
index 0000000..cc35485
--- /dev/null
+++ b/drivers/thermal/thermal_dt.c
@@ -0,0 +1,473 @@ 
+/*
+ *  thermal_dt.c - Generic Thermal Management device tree support.
+ *
+ *  Copyright (C) 2013 Texas Instruments
+ *  Copyright (C) 2013 Eduardo Valentin <eduardo.valentin@ti.com>
+ *  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, write to the Free Software Foundation, Inc.,
+ *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+#include <linux/thermal.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/of_device.h>
+#include <linux/of_platform.h>
+#include <linux/err.h>
+#include <linux/export.h>
+#include <linux/string.h>
+
+struct __thermal_bind_params {
+	char cooling_device[THERMAL_NAME_LENGTH];
+};
+
+static
+int thermal_of_match(struct thermal_zone_device *tz,
+		     struct thermal_cooling_device *cdev);
+
+static int thermal_of_populate_bind_params(struct device *dev,
+					   struct device_node *node,
+					   struct __thermal_bind_params *__tbp,
+					   struct thermal_bind_params *tbp,
+					   int ntrips)
+{
+	const char *cdev_name;
+	u32 *limits;
+	int ret;
+	u32 prop;
+
+	ret = of_property_read_u32(node, "weight", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing weight property\n");
+		return ret;
+	}
+	tbp->weight = prop;
+
+	ret = of_property_read_u32(node, "mask", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing mask property\n");
+		return ret;
+	}
+	tbp->trip_mask = prop;
+
+	/* this will allow us to bind with cooling devices */
+	tbp->match = thermal_of_match;
+
+	ret = of_property_read_string(node, "cooling_device", &cdev_name);
+	if (ret < 0) {
+		dev_err(dev, "missing cooling_device property\n");
+		return ret;
+	}
+	strncpy(__tbp->cooling_device, cdev_name,
+		sizeof(__tbp->cooling_device));
+
+	limits = kzalloc(ntrips * sizeof(*limits) * 2, GFP_KERNEL);
+	if (!limits) {
+		dev_err(dev, "not enough mem for reading limits\n");
+		return -ENOMEM;
+	}
+	ret = of_property_read_u32_array(node, "limits", limits, 2 * ntrips);
+	if (!ret) {
+		int i = ntrips;
+
+		tbp->binding_limits = devm_kzalloc(dev,
+						   ntrips * 2 *
+						   sizeof(*tbp->binding_limits),
+						   GFP_KERNEL);
+		if (!tbp->binding_limits) {
+			dev_err(dev, "not enough mem for storing limits\n");
+			return -ENOMEM;
+		}
+		while (i--)
+			tbp->binding_limits[i] = limits[i];
+	}
+	kfree(limits);
+
+	return 0;
+}
+
+struct __thermal_trip {
+	unsigned long int temperature;
+	unsigned long int hysteresis;
+	enum thermal_trip_type type;
+};
+
+static
+int thermal_of_populate_trip(struct device *dev,
+			     struct device_node *node,
+			     struct __thermal_trip *trip)
+{
+	int ret;
+	int prop;
+
+	ret = of_property_read_u32(node, "temperature", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing temperature property\n");
+		return ret;
+	}
+	trip->temperature = prop;
+
+	ret = of_property_read_u32(node, "hysteresis", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing hysteresis property\n");
+		return ret;
+	}
+	trip->hysteresis = prop;
+
+	ret = of_property_read_u32(node, "type", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing type property\n");
+		return ret;
+	}
+	trip->type = prop;
+
+	return 0;
+}
+
+struct __thermal_zone_device {
+	enum thermal_device_mode mode;
+	int passive_delay;
+	int polling_delay;
+	int mask;
+	int ntrips;
+	char type[THERMAL_NAME_LENGTH];
+	struct __thermal_trip *trips;
+	struct __thermal_bind_params *bind_params;
+	struct thermal_bind_params *tbps;
+	struct thermal_zone_params zone_params;
+	int (*get_temp)(void *, unsigned long *);
+	void *devdata;
+};
+
+static struct __thermal_zone_device *
+thermal_of_build_thermal_zone(struct device *dev, struct device_node *node)
+{
+	struct device_node *child, *gchild;
+	struct __thermal_zone_device *tz;
+	int ret, i;
+	u32 prop;
+
+	if (!node) {
+		dev_err(dev, "no thermal zone node\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	tz = devm_kzalloc(dev, sizeof(*tz), GFP_KERNEL);
+	if (!tz) {
+		dev_err(dev, "not enough memory for thermal of zone\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret = of_property_read_u32(node, "passive_delay", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing passive_delay property\n");
+		return ERR_PTR(ret);
+	}
+	tz->passive_delay = prop;
+
+	ret = of_property_read_u32(node, "polling_delay", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing polling_delay property\n");
+		return ERR_PTR(ret);
+	}
+	tz->polling_delay = prop;
+
+	ret = of_property_read_u32(node, "mask", &prop);
+	if (ret < 0) {
+		dev_err(dev, "missing mask property\n");
+		return ERR_PTR(ret);
+	}
+	tz->mask = prop;
+
+	/* assume node name as our zone type */
+	strncpy(tz->type, node->name, sizeof(tz->type));
+
+	/* default policy */
+	strncpy(tz->zone_params.governor_name, "step_wise",
+		sizeof(tz->zone_params.governor_name));
+
+	/* trips */
+	child = of_find_node_by_name(node, "trips");
+
+	/* No trips provided */
+	if (!child)
+		goto finish;
+
+	tz->ntrips = of_get_child_count(child);
+	tz->trips = devm_kzalloc(dev, tz->ntrips * sizeof(*tz->trips),
+				 GFP_KERNEL);
+	if (!tz->trips)
+		return ERR_PTR(-ENOMEM);
+	i = 0;
+	for_each_child_of_node(child, gchild)
+		thermal_of_populate_trip(dev, gchild, &tz->trips[i++]);
+
+	/* bind_params */
+	child = of_find_node_by_name(node, "bind_params");
+
+	/* No binding info */
+	if (!child)
+		goto finish;
+
+	tz->zone_params.num_tbps = of_get_child_count(child);
+	tz->bind_params = devm_kzalloc(dev,
+				       tz->zone_params.num_tbps *
+						sizeof(*tz->bind_params),
+				       GFP_KERNEL);
+	if (!tz->bind_params)
+		return ERR_PTR(-ENOMEM);
+	tz->zone_params.tbp = devm_kzalloc(dev,
+					   tz->zone_params.num_tbps *
+						sizeof(*tz->zone_params.tbp),
+					   GFP_KERNEL);
+	if (!tz->zone_params.tbp)
+		return ERR_PTR(-ENOMEM);
+	i = 0;
+	for_each_child_of_node(child, gchild) {
+		thermal_of_populate_bind_params(dev, gchild,
+						&tz->bind_params[i],
+						&tz->zone_params.tbp[i],
+						tz->ntrips);
+		i++;
+	}
+
+finish:
+	tz->mode = THERMAL_DEVICE_ENABLED;
+
+	return tz;
+}
+
+static
+int thermal_of_match(struct thermal_zone_device *tz,
+		     struct thermal_cooling_device *cdev)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+	int i;
+
+	for (i = 0; i < data->zone_params.num_tbps; i++) {
+		if (!strncmp(data->bind_params[i].cooling_device,
+			     cdev->type,
+			     strlen(data->bind_params[i].cooling_device)) &&
+		    (data->zone_params.tbp[i].trip_mask & (1 << i)))
+			return 0;
+	}
+
+	return -EINVAL;
+}
+
+static
+int of_thermal_get_temp(struct thermal_zone_device *tz,
+			unsigned long *temp)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	return data->get_temp(data->devdata, temp);
+}
+
+static
+int of_thermal_get_mode(struct thermal_zone_device *tz,
+			enum thermal_device_mode *mode)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	*mode = data->mode;
+
+	return 0;
+}
+
+static
+int of_thermal_set_mode(struct thermal_zone_device *tz,
+			enum thermal_device_mode mode)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	mutex_lock(&tz->lock);
+
+	if (mode == THERMAL_DEVICE_ENABLED)
+		tz->polling_delay = data->polling_delay;
+	else
+		tz->polling_delay = 0;
+
+	mutex_unlock(&tz->lock);
+
+	data->mode = mode;
+	thermal_zone_device_update(tz);
+
+	return 0;
+}
+
+static
+int of_thermal_get_trip_type(struct thermal_zone_device *tz, int trip,
+			     enum thermal_trip_type *type)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	*type = data->trips[trip].type;
+
+	return 0;
+}
+
+static
+int of_thermal_get_trip_temp(struct thermal_zone_device *tz, int trip,
+			     unsigned long *temp)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	*temp = data->trips[trip].temperature;
+
+	return 0;
+}
+
+static
+int of_thermal_set_trip_temp(struct thermal_zone_device *tz, int trip,
+			     unsigned long temp)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	/* thermal fw should take care of data->mask & (1 << trip) */
+	data->trips[trip].temperature = temp;
+
+	return 0;
+}
+
+static
+int of_thermal_get_trip_hyst(struct thermal_zone_device *tz, int trip,
+			     unsigned long *hyst)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	*hyst = data->trips[trip].hysteresis;
+
+	return 0;
+}
+
+static
+int of_thermal_set_trip_hyst(struct thermal_zone_device *tz, int trip,
+			     unsigned long hyst)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+
+	if (trip >= data->ntrips || trip < 0)
+		return -EDOM;
+
+	/* thermal fw should take care of data->mask & (1 << trip) */
+	data->trips[trip].hysteresis = hyst;
+
+	return 0;
+}
+
+static int
+of_thermal_get_crit_temp(struct thermal_zone_device *tz, unsigned long *temp)
+{
+	struct __thermal_zone_device *data = tz->devdata;
+	int i;
+
+	for (i = 0; i < data->ntrips; i++)
+		if (data->trips[i].type == THERMAL_TRIP_CRITICAL) {
+			*temp = data->trips[i].temperature;
+			return 0;
+		}
+
+	return -EINVAL;
+}
+
+static const
+struct thermal_zone_device_ops of_thermal_ops = {
+	.get_temp = of_thermal_get_temp,
+	.get_mode = of_thermal_get_mode,
+	.set_mode = of_thermal_set_mode,
+	.get_trip_type = of_thermal_get_trip_type,
+	.get_trip_temp = of_thermal_get_trip_temp,
+	.set_trip_temp = of_thermal_set_trip_temp,
+	.get_trip_hyst = of_thermal_get_trip_hyst,
+	.set_trip_hyst = of_thermal_set_trip_hyst,
+	.get_crit_temp = of_thermal_get_crit_temp,
+};
+
+/**
+ * thermal_zone_of_device_register() - registers a thermal zone based on DT
+ * @device: a valid struct device pointer containing device tree node
+ *          with thermal_zone descriptor.
+ * @data: a private pointer (owned by the caller) that will be passed
+ *        back, when a temperature reading is needed.
+ * @get_temp: a pointer to a function that returns the thermal zone
+ *            temperature.
+ * @add_hwmon:	a boolean to indicate if the thermal to hwmon sysfs interface
+ *		is required. When add_hwmon == true, a hwmon sysfs interface
+ *		will be created. When add_hwmon == false, nothing will be done
+ *
+ * This function will parse the device tree node associated with the
+ * @dev struct and will attempt to build a thermal zone device based
+ * upon the thermal data provided and described by DT.
+ *
+ * The thermal zone temperature is provided by the @get_temp function
+ * pointer. When called, it will have the private pointer @data back.
+ *
+ * Return: On success returns a valid struct thermal_zone_device,
+ * otherwise, it returns a corresponding ERR_PTR(). Caller must
+ * check the return value with help of IS_ERR() helper.
+ */
+struct thermal_zone_device *
+thermal_zone_of_device_register(struct device *dev, int id, bool add_hwmon,
+			void *data, int (*get_temp)(void *, unsigned long *))
+{
+	struct __thermal_zone_device *tz;
+	struct thermal_zone_device_ops *ops;
+	struct of_phandle_args thermal_spec;
+	int ret;
+
+	ret = of_parse_phandle_with_args(dev->of_node,
+					 "monitored-zones",
+					 "#thermal-cells",
+					 id,
+					 &thermal_spec);
+	if (ret) {
+		dev_err(dev, "unable to find desired monitored zone %d\n", id);
+		return ERR_PTR(ret);
+	}
+
+	tz = thermal_of_build_thermal_zone(dev, thermal_spec.np);
+	if (IS_ERR(tz)) {
+		dev_err(dev, "failed to build thermal zone\n");
+		return ERR_CAST(tz);
+	}
+	ops = devm_kzalloc(dev, sizeof(*ops), GFP_KERNEL);
+	if (!ops) {
+		dev_err(dev, "no memory available for thermal ops\n");
+		return ERR_PTR(-ENOMEM);
+	}
+	memcpy(ops, &of_thermal_ops, sizeof(*ops));
+	tz->get_temp = get_temp;
+	tz->devdata = data;
+
+	return thermal_zone_device_register(tz->type, tz->ntrips, tz->mask, tz,
+					    ops, &tz->zone_params,
+					    tz->passive_delay,
+					    tz->polling_delay,
+					    add_hwmon);
+}
+EXPORT_SYMBOL_GPL(thermal_zone_of_device_register);
diff --git a/include/dt-bindings/thermal/thermal.h b/include/dt-bindings/thermal/thermal.h
new file mode 100644
index 0000000..6dd6ccd
--- /dev/null
+++ b/include/dt-bindings/thermal/thermal.h
@@ -0,0 +1,27 @@ 
+/*
+ * This header provides constants for most thermal bindings.
+ *
+ * Copyright (C) 2013 Texas Instruments
+ *	Eduardo Valentin <eduardo.valentin@ti.com>
+ *
+ * GPLv2 only
+ */
+
+#ifndef _DT_BINDINGS_THERMAL_THERMAL_H
+#define _DT_BINDINGS_THERMAL_THERMAL_H
+
+/*
+ * Here are the thermal trip types. This must
+ * match with enum thermal_trip_type at
+ * include/linux/thermal.h
+ */
+#define THERMAL_TRIP_ACTIVE		0
+#define THERMAL_TRIP_PASSIVE		1
+#define THERMAL_TRIP_HOT		2
+#define THERMAL_TRIP_CRITICAL		3
+
+/* On cooling devices upper and lower limits */
+#define THERMAL_NO_LIMIT		(-1UL)
+
+#endif
+
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 7c2769c..501b2fe 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -234,6 +234,9 @@  struct thermal_genl_event {
 };
 
 /* Function declarations */
+struct thermal_zone_device
+*thermal_zone_of_device_register(struct device *, int id, bool add_hwmon,
+	void *data, int (*get_temp) (void *, unsigned long *));
 struct thermal_zone_device *thermal_zone_device_register(const char *, int, int,
 		void *, const struct thermal_zone_device_ops *,
 		const struct thermal_zone_params *, int, int, bool);