diff mbox series

thermal/core: Introduce user trip points

Message ID 20240627085451.3813989-1-daniel.lezcano@linaro.org (mailing list archive)
State Changes Requested, archived
Headers show
Series thermal/core: Introduce user trip points | expand

Commit Message

Daniel Lezcano June 27, 2024, 8:54 a.m. UTC
Currently the thermal framework has 4 trip point types:

- active : basically for fans (or anything requiring energy to cool
  down)

- passive : a performance limiter

- hot : for a last action before reaching critical

- critical : a without return threshold leading to a system shutdown

A thermal zone monitors the temperature regarding these trip
points. The old way to do that is actively polling the temperature
which is very bad for embedded systems, especially mobile and it is
even worse today as we can have more than fifty thermal zones. The
modern way is to rely on the driver to send an interrupt when the trip
points are crossed, so the system can sleep while the temperature
monitoring is offloaded to a dedicated hardware.

However, the thermal aspect is also managed from userspace to protect
the user, especially tracking down the skin temperature sensor. The
logic is more complex than what we found in the kernel because it
needs multiple sources indicating the thermal situation of the entire
system.

For this reason it needs to setup trip points at different levels in
order to get informed about what is going on with some thermal zones
when running some specific application.

For instance, the skin temperature must be limited to 43°C on a long
run but can go to 48°C for 10 minutes, or 60°C for 1 minute.

The thermal engine must then rely on trip points to monitor those
temperatures. Unfortunately, today there is only 'active' and
'passive' trip points which has a specific meaning for the kernel, not
the userspace. That leads to hacks in different platforms for mobile
and embedded systems where 'active' trip points are used to send
notification to the userspace. This is obviously not right because
these trip are handled by the kernel.

This patch introduces the 'user' trip point type where its semantic is
simple: do nothing at the kernel level, just send a notification to
the user space.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
 drivers/thermal/thermal_core.c                            | 8 ++++++++
 drivers/thermal/thermal_of.c                              | 1 +
 drivers/thermal/thermal_trace.h                           | 4 +++-
 drivers/thermal/thermal_trip.c                            | 1 +
 include/uapi/linux/thermal.h                              | 1 +
 6 files changed, 15 insertions(+), 1 deletion(-)

Comments

Rafael J. Wysocki June 28, 2024, 1:56 p.m. UTC | #1
On Thu, Jun 27, 2024 at 10:55 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> Currently the thermal framework has 4 trip point types:
>
> - active : basically for fans (or anything requiring energy to cool
>   down)
>
> - passive : a performance limiter
>
> - hot : for a last action before reaching critical
>
> - critical : a without return threshold leading to a system shutdown
>
> A thermal zone monitors the temperature regarding these trip
> points. The old way to do that is actively polling the temperature
> which is very bad for embedded systems, especially mobile and it is
> even worse today as we can have more than fifty thermal zones. The
> modern way is to rely on the driver to send an interrupt when the trip
> points are crossed, so the system can sleep while the temperature
> monitoring is offloaded to a dedicated hardware.
>
> However, the thermal aspect is also managed from userspace to protect
> the user, especially tracking down the skin temperature sensor. The
> logic is more complex than what we found in the kernel because it
> needs multiple sources indicating the thermal situation of the entire
> system.
>
> For this reason it needs to setup trip points at different levels in
> order to get informed about what is going on with some thermal zones
> when running some specific application.
>
> For instance, the skin temperature must be limited to 43°C on a long
> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
>
> The thermal engine must then rely on trip points to monitor those
> temperatures. Unfortunately, today there is only 'active' and
> 'passive' trip points which has a specific meaning for the kernel, not
> the userspace. That leads to hacks in different platforms for mobile
> and embedded systems where 'active' trip points are used to send
> notification to the userspace. This is obviously not right because
> these trip are handled by the kernel.
>
> This patch introduces the 'user' trip point type where its semantic is
> simple: do nothing at the kernel level, just send a notification to
> the user space.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
>  drivers/thermal/thermal_core.c                            | 8 ++++++++
>  drivers/thermal/thermal_of.c                              | 1 +
>  drivers/thermal/thermal_trace.h                           | 4 +++-
>  drivers/thermal/thermal_trip.c                            | 1 +
>  include/uapi/linux/thermal.h                              | 1 +
>  6 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> index 68398e7e8655..cb9ea54a192e 100644
> --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> @@ -153,6 +153,7 @@ patternProperties:
>                type:
>                  $ref: /schemas/types.yaml#/definitions/string
>                  enum:
> +                  - user     # enable user notification
>                    - active   # enable active cooling e.g. fans
>                    - passive  # enable passive cooling e.g. throttling cpu
>                    - hot      # send notification to driver
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 2aa04c46a425..506f880d9aa9 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -734,6 +734,14 @@ int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
>         if (tz != pos1 || cdev != pos2)
>                 return -EINVAL;
>
> +       /*
> +        * It is not allowed to bind a cooling device with a trip
> +        * point user type because no mitigation should happen from
> +        * the kernel with these trip points
> +        */
> +       if (trip->type == THERMAL_TRIP_USER)
> +               return -EINVAL;

Maybe print a debug message when bailing out here?

A check for "user" trips would need to be added to
thermal_governor_trip_crossed() and to the .manage() callbacks in the
power allocator, step-wise and fair-share governors, if I'm not
mistaken.  Especially fair-share and power allocator should not take
them into account IMV.

> +
>         /* lower default 0, upper default max_state */
>         lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
>
> diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
> index aa34b6e82e26..f6daf921a136 100644
> --- a/drivers/thermal/thermal_of.c
> +++ b/drivers/thermal/thermal_of.c
> @@ -60,6 +60,7 @@ static const char * const trip_types[] = {
>         [THERMAL_TRIP_PASSIVE]  = "passive",
>         [THERMAL_TRIP_HOT]      = "hot",
>         [THERMAL_TRIP_CRITICAL] = "critical",
> +       [THERMAL_TRIP_USER]     = "user",
>  };
>
>  /**
> diff --git a/drivers/thermal/thermal_trace.h b/drivers/thermal/thermal_trace.h
> index df8f4edd6068..739228ecc2e2 100644
> --- a/drivers/thermal/thermal_trace.h
> +++ b/drivers/thermal/thermal_trace.h
> @@ -15,13 +15,15 @@ TRACE_DEFINE_ENUM(THERMAL_TRIP_CRITICAL);
>  TRACE_DEFINE_ENUM(THERMAL_TRIP_HOT);
>  TRACE_DEFINE_ENUM(THERMAL_TRIP_PASSIVE);
>  TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
> +TRACE_DEFINE_ENUM(THERMAL_TRIP_USER);
>
>  #define show_tzt_type(type)                                    \
>         __print_symbolic(type,                                  \
>                          { THERMAL_TRIP_CRITICAL, "CRITICAL"},  \
>                          { THERMAL_TRIP_HOT,      "HOT"},       \
>                          { THERMAL_TRIP_PASSIVE,  "PASSIVE"},   \
> -                        { THERMAL_TRIP_ACTIVE,   "ACTIVE"})
> +                        { THERMAL_TRIP_ACTIVE,   "ACTIVE"}),   \
> +                        { THERMAL_TRIP_USER,     "USER"})
>
>  TRACE_EVENT(thermal_temperature,
>
> diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
> index 2a876d3b93aa..a0780bb4ff0d 100644
> --- a/drivers/thermal/thermal_trip.c
> +++ b/drivers/thermal/thermal_trip.c
> @@ -10,6 +10,7 @@
>  #include "thermal_core.h"
>
>  static const char *trip_type_names[] = {
> +       [THERMAL_TRIP_USER] = "user",
>         [THERMAL_TRIP_ACTIVE] = "active",
>         [THERMAL_TRIP_PASSIVE] = "passive",
>         [THERMAL_TRIP_HOT] = "hot",
> diff --git a/include/uapi/linux/thermal.h b/include/uapi/linux/thermal.h
> index fc78bf3aead7..84e556ace5f5 100644
> --- a/include/uapi/linux/thermal.h
> +++ b/include/uapi/linux/thermal.h
> @@ -14,6 +14,7 @@ enum thermal_trip_type {
>         THERMAL_TRIP_PASSIVE,
>         THERMAL_TRIP_HOT,
>         THERMAL_TRIP_CRITICAL,
> +       THERMAL_TRIP_USER,
>  };
>
>  /* Adding event notification support elements */
> --
> 2.43.0
>
Daniel Lezcano July 1, 2024, 1:17 p.m. UTC | #2
On 28/06/2024 15:56, Rafael J. Wysocki wrote:
> On Thu, Jun 27, 2024 at 10:55 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> Currently the thermal framework has 4 trip point types:
>>
>> - active : basically for fans (or anything requiring energy to cool
>>    down)
>>
>> - passive : a performance limiter
>>
>> - hot : for a last action before reaching critical
>>
>> - critical : a without return threshold leading to a system shutdown
>>
>> A thermal zone monitors the temperature regarding these trip
>> points. The old way to do that is actively polling the temperature
>> which is very bad for embedded systems, especially mobile and it is
>> even worse today as we can have more than fifty thermal zones. The
>> modern way is to rely on the driver to send an interrupt when the trip
>> points are crossed, so the system can sleep while the temperature
>> monitoring is offloaded to a dedicated hardware.
>>
>> However, the thermal aspect is also managed from userspace to protect
>> the user, especially tracking down the skin temperature sensor. The
>> logic is more complex than what we found in the kernel because it
>> needs multiple sources indicating the thermal situation of the entire
>> system.
>>
>> For this reason it needs to setup trip points at different levels in
>> order to get informed about what is going on with some thermal zones
>> when running some specific application.
>>
>> For instance, the skin temperature must be limited to 43°C on a long
>> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
>>
>> The thermal engine must then rely on trip points to monitor those
>> temperatures. Unfortunately, today there is only 'active' and
>> 'passive' trip points which has a specific meaning for the kernel, not
>> the userspace. That leads to hacks in different platforms for mobile
>> and embedded systems where 'active' trip points are used to send
>> notification to the userspace. This is obviously not right because
>> these trip are handled by the kernel.
>>
>> This patch introduces the 'user' trip point type where its semantic is
>> simple: do nothing at the kernel level, just send a notification to
>> the user space.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> ---
>>   .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
>>   drivers/thermal/thermal_core.c                            | 8 ++++++++
>>   drivers/thermal/thermal_of.c                              | 1 +
>>   drivers/thermal/thermal_trace.h                           | 4 +++-
>>   drivers/thermal/thermal_trip.c                            | 1 +
>>   include/uapi/linux/thermal.h                              | 1 +
>>   6 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>> index 68398e7e8655..cb9ea54a192e 100644
>> --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>> @@ -153,6 +153,7 @@ patternProperties:
>>                 type:
>>                   $ref: /schemas/types.yaml#/definitions/string
>>                   enum:
>> +                  - user     # enable user notification
>>                     - active   # enable active cooling e.g. fans
>>                     - passive  # enable passive cooling e.g. throttling cpu
>>                     - hot      # send notification to driver
>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>> index 2aa04c46a425..506f880d9aa9 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -734,6 +734,14 @@ int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
>>          if (tz != pos1 || cdev != pos2)
>>                  return -EINVAL;
>>
>> +       /*
>> +        * It is not allowed to bind a cooling device with a trip
>> +        * point user type because no mitigation should happen from
>> +        * the kernel with these trip points
>> +        */
>> +       if (trip->type == THERMAL_TRIP_USER)
>> +               return -EINVAL;
> 
> Maybe print a debug message when bailing out here?
> 
> A check for "user" trips would need to be added to
> thermal_governor_trip_crossed() and to the .manage() callbacks in the
> power allocator, step-wise and fair-share governors, if I'm not
> mistaken.  Especially fair-share and power allocator should not take
> them into account IMV.

I'm not sure the power_allocator needs to change anything. The trip 
point used is switch_on which is only derived from passive or active 
trip point, so it is not possible to have a user trip point used in the 
manage callback.

Did I miss something ?
Rafael J. Wysocki July 1, 2024, 1:36 p.m. UTC | #3
On Mon, Jul 1, 2024 at 3:17 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
> On 28/06/2024 15:56, Rafael J. Wysocki wrote:
> > On Thu, Jun 27, 2024 at 10:55 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> Currently the thermal framework has 4 trip point types:
> >>
> >> - active : basically for fans (or anything requiring energy to cool
> >>    down)
> >>
> >> - passive : a performance limiter
> >>
> >> - hot : for a last action before reaching critical
> >>
> >> - critical : a without return threshold leading to a system shutdown
> >>
> >> A thermal zone monitors the temperature regarding these trip
> >> points. The old way to do that is actively polling the temperature
> >> which is very bad for embedded systems, especially mobile and it is
> >> even worse today as we can have more than fifty thermal zones. The
> >> modern way is to rely on the driver to send an interrupt when the trip
> >> points are crossed, so the system can sleep while the temperature
> >> monitoring is offloaded to a dedicated hardware.
> >>
> >> However, the thermal aspect is also managed from userspace to protect
> >> the user, especially tracking down the skin temperature sensor. The
> >> logic is more complex than what we found in the kernel because it
> >> needs multiple sources indicating the thermal situation of the entire
> >> system.
> >>
> >> For this reason it needs to setup trip points at different levels in
> >> order to get informed about what is going on with some thermal zones
> >> when running some specific application.
> >>
> >> For instance, the skin temperature must be limited to 43°C on a long
> >> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
> >>
> >> The thermal engine must then rely on trip points to monitor those
> >> temperatures. Unfortunately, today there is only 'active' and
> >> 'passive' trip points which has a specific meaning for the kernel, not
> >> the userspace. That leads to hacks in different platforms for mobile
> >> and embedded systems where 'active' trip points are used to send
> >> notification to the userspace. This is obviously not right because
> >> these trip are handled by the kernel.
> >>
> >> This patch introduces the 'user' trip point type where its semantic is
> >> simple: do nothing at the kernel level, just send a notification to
> >> the user space.
> >>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> >> ---
> >>   .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
> >>   drivers/thermal/thermal_core.c                            | 8 ++++++++
> >>   drivers/thermal/thermal_of.c                              | 1 +
> >>   drivers/thermal/thermal_trace.h                           | 4 +++-
> >>   drivers/thermal/thermal_trip.c                            | 1 +
> >>   include/uapi/linux/thermal.h                              | 1 +
> >>   6 files changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >> index 68398e7e8655..cb9ea54a192e 100644
> >> --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >> @@ -153,6 +153,7 @@ patternProperties:
> >>                 type:
> >>                   $ref: /schemas/types.yaml#/definitions/string
> >>                   enum:
> >> +                  - user     # enable user notification
> >>                     - active   # enable active cooling e.g. fans
> >>                     - passive  # enable passive cooling e.g. throttling cpu
> >>                     - hot      # send notification to driver
> >> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> >> index 2aa04c46a425..506f880d9aa9 100644
> >> --- a/drivers/thermal/thermal_core.c
> >> +++ b/drivers/thermal/thermal_core.c
> >> @@ -734,6 +734,14 @@ int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
> >>          if (tz != pos1 || cdev != pos2)
> >>                  return -EINVAL;
> >>
> >> +       /*
> >> +        * It is not allowed to bind a cooling device with a trip
> >> +        * point user type because no mitigation should happen from
> >> +        * the kernel with these trip points
> >> +        */
> >> +       if (trip->type == THERMAL_TRIP_USER)
> >> +               return -EINVAL;
> >
> > Maybe print a debug message when bailing out here?
> >
> > A check for "user" trips would need to be added to
> > thermal_governor_trip_crossed() and to the .manage() callbacks in the
> > power allocator, step-wise and fair-share governors, if I'm not
> > mistaken.  Especially fair-share and power allocator should not take
> > them into account IMV.
>
> I'm not sure the power_allocator needs to change anything. The trip
> point used is switch_on which is only derived from passive or active
> trip point, so it is not possible to have a user trip point used in the
> manage callback.

OK, it checks for "active" specifically.

> Did I miss something ?

No, I don't think so.
Daniel Lezcano July 1, 2024, 3:13 p.m. UTC | #4
On 28/06/2024 15:56, Rafael J. Wysocki wrote:
> On Thu, Jun 27, 2024 at 10:55 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> Currently the thermal framework has 4 trip point types:
>>
>> - active : basically for fans (or anything requiring energy to cool
>>    down)
>>
>> - passive : a performance limiter
>>
>> - hot : for a last action before reaching critical
>>
>> - critical : a without return threshold leading to a system shutdown
>>
>> A thermal zone monitors the temperature regarding these trip
>> points. The old way to do that is actively polling the temperature
>> which is very bad for embedded systems, especially mobile and it is
>> even worse today as we can have more than fifty thermal zones. The
>> modern way is to rely on the driver to send an interrupt when the trip
>> points are crossed, so the system can sleep while the temperature
>> monitoring is offloaded to a dedicated hardware.
>>
>> However, the thermal aspect is also managed from userspace to protect
>> the user, especially tracking down the skin temperature sensor. The
>> logic is more complex than what we found in the kernel because it
>> needs multiple sources indicating the thermal situation of the entire
>> system.
>>
>> For this reason it needs to setup trip points at different levels in
>> order to get informed about what is going on with some thermal zones
>> when running some specific application.
>>
>> For instance, the skin temperature must be limited to 43°C on a long
>> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
>>
>> The thermal engine must then rely on trip points to monitor those
>> temperatures. Unfortunately, today there is only 'active' and
>> 'passive' trip points which has a specific meaning for the kernel, not
>> the userspace. That leads to hacks in different platforms for mobile
>> and embedded systems where 'active' trip points are used to send
>> notification to the userspace. This is obviously not right because
>> these trip are handled by the kernel.
>>
>> This patch introduces the 'user' trip point type where its semantic is
>> simple: do nothing at the kernel level, just send a notification to
>> the user space.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> ---
>>   .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
>>   drivers/thermal/thermal_core.c                            | 8 ++++++++
>>   drivers/thermal/thermal_of.c                              | 1 +
>>   drivers/thermal/thermal_trace.h                           | 4 +++-
>>   drivers/thermal/thermal_trip.c                            | 1 +
>>   include/uapi/linux/thermal.h                              | 1 +
>>   6 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>> index 68398e7e8655..cb9ea54a192e 100644
>> --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>> @@ -153,6 +153,7 @@ patternProperties:
>>                 type:
>>                   $ref: /schemas/types.yaml#/definitions/string
>>                   enum:
>> +                  - user     # enable user notification
>>                     - active   # enable active cooling e.g. fans
>>                     - passive  # enable passive cooling e.g. throttling cpu
>>                     - hot      # send notification to driver
>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>> index 2aa04c46a425..506f880d9aa9 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -734,6 +734,14 @@ int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
>>          if (tz != pos1 || cdev != pos2)
>>                  return -EINVAL;
>>
>> +       /*
>> +        * It is not allowed to bind a cooling device with a trip
>> +        * point user type because no mitigation should happen from
>> +        * the kernel with these trip points
>> +        */
>> +       if (trip->type == THERMAL_TRIP_USER)
>> +               return -EINVAL;
> 
> Maybe print a debug message when bailing out here?

After thinking a bit about the message, it sounds to me that is a really 
an error in the firmware if we end up binding an 'user' trip point.

What about the following message:

dev_err(tz->device, "Trying to bind the cooling device '%s' with an 
'user' trip point id=%d", cdev->type, trip->id);
Rafael J. Wysocki July 1, 2024, 3:47 p.m. UTC | #5
On Mon, Jul 1, 2024 at 5:13 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
> On 28/06/2024 15:56, Rafael J. Wysocki wrote:
> > On Thu, Jun 27, 2024 at 10:55 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> Currently the thermal framework has 4 trip point types:
> >>
> >> - active : basically for fans (or anything requiring energy to cool
> >>    down)
> >>
> >> - passive : a performance limiter
> >>
> >> - hot : for a last action before reaching critical
> >>
> >> - critical : a without return threshold leading to a system shutdown
> >>
> >> A thermal zone monitors the temperature regarding these trip
> >> points. The old way to do that is actively polling the temperature
> >> which is very bad for embedded systems, especially mobile and it is
> >> even worse today as we can have more than fifty thermal zones. The
> >> modern way is to rely on the driver to send an interrupt when the trip
> >> points are crossed, so the system can sleep while the temperature
> >> monitoring is offloaded to a dedicated hardware.
> >>
> >> However, the thermal aspect is also managed from userspace to protect
> >> the user, especially tracking down the skin temperature sensor. The
> >> logic is more complex than what we found in the kernel because it
> >> needs multiple sources indicating the thermal situation of the entire
> >> system.
> >>
> >> For this reason it needs to setup trip points at different levels in
> >> order to get informed about what is going on with some thermal zones
> >> when running some specific application.
> >>
> >> For instance, the skin temperature must be limited to 43°C on a long
> >> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
> >>
> >> The thermal engine must then rely on trip points to monitor those
> >> temperatures. Unfortunately, today there is only 'active' and
> >> 'passive' trip points which has a specific meaning for the kernel, not
> >> the userspace. That leads to hacks in different platforms for mobile
> >> and embedded systems where 'active' trip points are used to send
> >> notification to the userspace. This is obviously not right because
> >> these trip are handled by the kernel.
> >>
> >> This patch introduces the 'user' trip point type where its semantic is
> >> simple: do nothing at the kernel level, just send a notification to
> >> the user space.
> >>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> >> ---
> >>   .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
> >>   drivers/thermal/thermal_core.c                            | 8 ++++++++
> >>   drivers/thermal/thermal_of.c                              | 1 +
> >>   drivers/thermal/thermal_trace.h                           | 4 +++-
> >>   drivers/thermal/thermal_trip.c                            | 1 +
> >>   include/uapi/linux/thermal.h                              | 1 +
> >>   6 files changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >> index 68398e7e8655..cb9ea54a192e 100644
> >> --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >> @@ -153,6 +153,7 @@ patternProperties:
> >>                 type:
> >>                   $ref: /schemas/types.yaml#/definitions/string
> >>                   enum:
> >> +                  - user     # enable user notification
> >>                     - active   # enable active cooling e.g. fans
> >>                     - passive  # enable passive cooling e.g. throttling cpu
> >>                     - hot      # send notification to driver
> >> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> >> index 2aa04c46a425..506f880d9aa9 100644
> >> --- a/drivers/thermal/thermal_core.c
> >> +++ b/drivers/thermal/thermal_core.c
> >> @@ -734,6 +734,14 @@ int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
> >>          if (tz != pos1 || cdev != pos2)
> >>                  return -EINVAL;
> >>
> >> +       /*
> >> +        * It is not allowed to bind a cooling device with a trip
> >> +        * point user type because no mitigation should happen from
> >> +        * the kernel with these trip points
> >> +        */
> >> +       if (trip->type == THERMAL_TRIP_USER)
> >> +               return -EINVAL;
> >
> > Maybe print a debug message when bailing out here?
>
> After thinking a bit about the message, it sounds to me that is a really
> an error in the firmware if we end up binding an 'user' trip point.
>
> What about the following message:
>
> dev_err(tz->device, "Trying to bind the cooling device '%s' with an
> 'user' trip point id=%d", cdev->type, trip->id);

s/an// I think.

Also I wouldn't use dev_err() as it indicates a kernel issue.  Maybe
dev_info(tz->device, FW_BUG ...)?
Daniel Lezcano July 1, 2024, 3:50 p.m. UTC | #6
On 01/07/2024 17:47, Rafael J. Wysocki wrote:
> On Mon, Jul 1, 2024 at 5:13 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>
>> On 28/06/2024 15:56, Rafael J. Wysocki wrote:
>>> On Thu, Jun 27, 2024 at 10:55 AM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> Currently the thermal framework has 4 trip point types:
>>>>
>>>> - active : basically for fans (or anything requiring energy to cool
>>>>     down)
>>>>
>>>> - passive : a performance limiter
>>>>
>>>> - hot : for a last action before reaching critical
>>>>
>>>> - critical : a without return threshold leading to a system shutdown
>>>>
>>>> A thermal zone monitors the temperature regarding these trip
>>>> points. The old way to do that is actively polling the temperature
>>>> which is very bad for embedded systems, especially mobile and it is
>>>> even worse today as we can have more than fifty thermal zones. The
>>>> modern way is to rely on the driver to send an interrupt when the trip
>>>> points are crossed, so the system can sleep while the temperature
>>>> monitoring is offloaded to a dedicated hardware.
>>>>
>>>> However, the thermal aspect is also managed from userspace to protect
>>>> the user, especially tracking down the skin temperature sensor. The
>>>> logic is more complex than what we found in the kernel because it
>>>> needs multiple sources indicating the thermal situation of the entire
>>>> system.
>>>>
>>>> For this reason it needs to setup trip points at different levels in
>>>> order to get informed about what is going on with some thermal zones
>>>> when running some specific application.
>>>>
>>>> For instance, the skin temperature must be limited to 43°C on a long
>>>> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
>>>>
>>>> The thermal engine must then rely on trip points to monitor those
>>>> temperatures. Unfortunately, today there is only 'active' and
>>>> 'passive' trip points which has a specific meaning for the kernel, not
>>>> the userspace. That leads to hacks in different platforms for mobile
>>>> and embedded systems where 'active' trip points are used to send
>>>> notification to the userspace. This is obviously not right because
>>>> these trip are handled by the kernel.
>>>>
>>>> This patch introduces the 'user' trip point type where its semantic is
>>>> simple: do nothing at the kernel level, just send a notification to
>>>> the user space.
>>>>
>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>> ---
>>>>    .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
>>>>    drivers/thermal/thermal_core.c                            | 8 ++++++++
>>>>    drivers/thermal/thermal_of.c                              | 1 +
>>>>    drivers/thermal/thermal_trace.h                           | 4 +++-
>>>>    drivers/thermal/thermal_trip.c                            | 1 +
>>>>    include/uapi/linux/thermal.h                              | 1 +
>>>>    6 files changed, 15 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>>>> index 68398e7e8655..cb9ea54a192e 100644
>>>> --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>>>> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
>>>> @@ -153,6 +153,7 @@ patternProperties:
>>>>                  type:
>>>>                    $ref: /schemas/types.yaml#/definitions/string
>>>>                    enum:
>>>> +                  - user     # enable user notification
>>>>                      - active   # enable active cooling e.g. fans
>>>>                      - passive  # enable passive cooling e.g. throttling cpu
>>>>                      - hot      # send notification to driver
>>>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>>>> index 2aa04c46a425..506f880d9aa9 100644
>>>> --- a/drivers/thermal/thermal_core.c
>>>> +++ b/drivers/thermal/thermal_core.c
>>>> @@ -734,6 +734,14 @@ int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
>>>>           if (tz != pos1 || cdev != pos2)
>>>>                   return -EINVAL;
>>>>
>>>> +       /*
>>>> +        * It is not allowed to bind a cooling device with a trip
>>>> +        * point user type because no mitigation should happen from
>>>> +        * the kernel with these trip points
>>>> +        */
>>>> +       if (trip->type == THERMAL_TRIP_USER)
>>>> +               return -EINVAL;
>>>
>>> Maybe print a debug message when bailing out here?
>>
>> After thinking a bit about the message, it sounds to me that is a really
>> an error in the firmware if we end up binding an 'user' trip point.
>>
>> What about the following message:
>>
>> dev_err(tz->device, "Trying to bind the cooling device '%s' with an
>> 'user' trip point id=%d", cdev->type, trip->id);
> 
> s/an// I think.
> 
> Also I wouldn't use dev_err() as it indicates a kernel issue.  Maybe
> dev_info(tz->device, FW_BUG ...)?

Right, thanks
Rob Herring July 1, 2024, 4:26 p.m. UTC | #7
On Thu, Jun 27, 2024 at 10:54:50AM +0200, Daniel Lezcano wrote:
> Currently the thermal framework has 4 trip point types:
> 
> - active : basically for fans (or anything requiring energy to cool
>   down)
> 
> - passive : a performance limiter
> 
> - hot : for a last action before reaching critical
> 
> - critical : a without return threshold leading to a system shutdown
> 
> A thermal zone monitors the temperature regarding these trip
> points. The old way to do that is actively polling the temperature
> which is very bad for embedded systems, especially mobile and it is
> even worse today as we can have more than fifty thermal zones. The
> modern way is to rely on the driver to send an interrupt when the trip
> points are crossed, so the system can sleep while the temperature
> monitoring is offloaded to a dedicated hardware.
> 
> However, the thermal aspect is also managed from userspace to protect
> the user, especially tracking down the skin temperature sensor. The
> logic is more complex than what we found in the kernel because it
> needs multiple sources indicating the thermal situation of the entire
> system.
> 
> For this reason it needs to setup trip points at different levels in
> order to get informed about what is going on with some thermal zones
> when running some specific application.
> 
> For instance, the skin temperature must be limited to 43°C on a long
> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
> 
> The thermal engine must then rely on trip points to monitor those
> temperatures. Unfortunately, today there is only 'active' and
> 'passive' trip points which has a specific meaning for the kernel, not
> the userspace. That leads to hacks in different platforms for mobile
> and embedded systems where 'active' trip points are used to send
> notification to the userspace. This is obviously not right because
> these trip are handled by the kernel.
> 
> This patch introduces the 'user' trip point type where its semantic is
> simple: do nothing at the kernel level, just send a notification to
> the user space.

Sounds like OS behavior/policy though I guess the existing ones kind are 
too. Maybe we should have defined *what* action to take and then the OS 
could decide whether what actions to handle vs. pass it up a level.

Why can't userspace just ask to be notified at a trip point it 
defines?

If we keep this in DT, perhaps 'notice' would be a better name that 
doesn't encode the OS architecture details.


BTW, can we decide what to do about 'trips' node being required or not? 
That's nearly the only DT warning left for some platforms.

> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +

Please make bindings a separate patch.

>  drivers/thermal/thermal_core.c                            | 8 ++++++++
>  drivers/thermal/thermal_of.c                              | 1 +
>  drivers/thermal/thermal_trace.h                           | 4 +++-
>  drivers/thermal/thermal_trip.c                            | 1 +
>  include/uapi/linux/thermal.h                              | 1 +
>  6 files changed, 15 insertions(+), 1 deletion(-)
Daniel Lezcano July 2, 2024, 9:29 a.m. UTC | #8
On 01/07/2024 18:26, Rob Herring wrote:
> On Thu, Jun 27, 2024 at 10:54:50AM +0200, Daniel Lezcano wrote:
>> Currently the thermal framework has 4 trip point types:
>>
>> - active : basically for fans (or anything requiring energy to cool
>>    down)
>>
>> - passive : a performance limiter
>>
>> - hot : for a last action before reaching critical
>>
>> - critical : a without return threshold leading to a system shutdown
>>
>> A thermal zone monitors the temperature regarding these trip
>> points. The old way to do that is actively polling the temperature
>> which is very bad for embedded systems, especially mobile and it is
>> even worse today as we can have more than fifty thermal zones. The
>> modern way is to rely on the driver to send an interrupt when the trip
>> points are crossed, so the system can sleep while the temperature
>> monitoring is offloaded to a dedicated hardware.
>>
>> However, the thermal aspect is also managed from userspace to protect
>> the user, especially tracking down the skin temperature sensor. The
>> logic is more complex than what we found in the kernel because it
>> needs multiple sources indicating the thermal situation of the entire
>> system.
>>
>> For this reason it needs to setup trip points at different levels in
>> order to get informed about what is going on with some thermal zones
>> when running some specific application.
>>
>> For instance, the skin temperature must be limited to 43°C on a long
>> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
>>
>> The thermal engine must then rely on trip points to monitor those
>> temperatures. Unfortunately, today there is only 'active' and
>> 'passive' trip points which has a specific meaning for the kernel, not
>> the userspace. That leads to hacks in different platforms for mobile
>> and embedded systems where 'active' trip points are used to send
>> notification to the userspace. This is obviously not right because
>> these trip are handled by the kernel.
>>
>> This patch introduces the 'user' trip point type where its semantic is
>> simple: do nothing at the kernel level, just send a notification to
>> the user space.
> 
> Sounds like OS behavior/policy though I guess the existing ones kind are
> too. Maybe we should have defined *what* action to take and then the OS
> could decide whether what actions to handle vs. pass it up a level.

Right

> Why can't userspace just ask to be notified at a trip point it
> defines?

Yes I think it is possible to create a netlink message to create a trip 
point which will return a trip id.

Rafael what do you think ?

> If we keep this in DT, perhaps 'notice' would be a better name that
> doesn't encode the OS architecture details.

[ ... ]

> BTW, can we decide what to do about 'trips' node being required or not?
> That's nearly the only DT warning left for some platforms.

A thermal zone is a combination of a sensor, a mitigation logic (user or 
kernel), hardware limits with trip points to activate the logic. Without 
trip points, this logic can not operate, consequently the thermal zone 
description is incomplete.

I guess those thermal zones are set to have the sensor exported in 
/sys/class/thermal, so the userspace can access the temperature.

However, existing thermal zone description should have at least a 'hot' 
trip point and a 'critical' trip point.

On the other hand, now that we are introducing the 'user' trip point, 
those thermal zone can exist without trip points because we can create 
them at any time from userspace.

So at the first glance, I would say we can drop the "required" 
constraint for the trip points in the thermal zone description.


>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> ---
>>   .../devicetree/bindings/thermal/thermal-zones.yaml        | 1 +
> 
> Please make bindings a separate patch.
> 
>>   drivers/thermal/thermal_core.c                            | 8 ++++++++
>>   drivers/thermal/thermal_of.c                              | 1 +
>>   drivers/thermal/thermal_trace.h                           | 4 +++-
>>   drivers/thermal/thermal_trip.c                            | 1 +
>>   include/uapi/linux/thermal.h                              | 1 +
>>   6 files changed, 15 insertions(+), 1 deletion(-)
>
Rafael J. Wysocki July 2, 2024, 10:22 a.m. UTC | #9
On Tue, Jul 2, 2024 at 11:29 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 01/07/2024 18:26, Rob Herring wrote:
> > On Thu, Jun 27, 2024 at 10:54:50AM +0200, Daniel Lezcano wrote:
> >> Currently the thermal framework has 4 trip point types:
> >>
> >> - active : basically for fans (or anything requiring energy to cool
> >>    down)
> >>
> >> - passive : a performance limiter
> >>
> >> - hot : for a last action before reaching critical
> >>
> >> - critical : a without return threshold leading to a system shutdown
> >>
> >> A thermal zone monitors the temperature regarding these trip
> >> points. The old way to do that is actively polling the temperature
> >> which is very bad for embedded systems, especially mobile and it is
> >> even worse today as we can have more than fifty thermal zones. The
> >> modern way is to rely on the driver to send an interrupt when the trip
> >> points are crossed, so the system can sleep while the temperature
> >> monitoring is offloaded to a dedicated hardware.
> >>
> >> However, the thermal aspect is also managed from userspace to protect
> >> the user, especially tracking down the skin temperature sensor. The
> >> logic is more complex than what we found in the kernel because it
> >> needs multiple sources indicating the thermal situation of the entire
> >> system.
> >>
> >> For this reason it needs to setup trip points at different levels in
> >> order to get informed about what is going on with some thermal zones
> >> when running some specific application.
> >>
> >> For instance, the skin temperature must be limited to 43°C on a long
> >> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
> >>
> >> The thermal engine must then rely on trip points to monitor those
> >> temperatures. Unfortunately, today there is only 'active' and
> >> 'passive' trip points which has a specific meaning for the kernel, not
> >> the userspace. That leads to hacks in different platforms for mobile
> >> and embedded systems where 'active' trip points are used to send
> >> notification to the userspace. This is obviously not right because
> >> these trip are handled by the kernel.
> >>
> >> This patch introduces the 'user' trip point type where its semantic is
> >> simple: do nothing at the kernel level, just send a notification to
> >> the user space.
> >
> > Sounds like OS behavior/policy though I guess the existing ones kind are
> > too. Maybe we should have defined *what* action to take and then the OS
> > could decide whether what actions to handle vs. pass it up a level.
>
> Right
>
> > Why can't userspace just ask to be notified at a trip point it
> > defines?
>
> Yes I think it is possible to create a netlink message to create a trip
> point which will return a trip id.
>
> Rafael what do you think ?

Trips cannot be created on the fly ATM.

What can be done is to create trips that are invalid to start with and
then set their temperature via sysfs.  This has been done already for
quite a while AFAICS.

> > If we keep this in DT, perhaps 'notice' would be a better name that
> > doesn't encode the OS architecture details.
>
> [ ... ]
>
> > BTW, can we decide what to do about 'trips' node being required or not?
> > That's nearly the only DT warning left for some platforms.
>
> A thermal zone is a combination of a sensor, a mitigation logic (user or
> kernel), hardware limits with trip points to activate the logic. Without
> trip points, this logic can not operate, consequently the thermal zone
> description is incomplete.

Well, there is a concept of a tripless thermal zone which simply
represents a sensor.

> I guess those thermal zones are set to have the sensor exported in
> /sys/class/thermal, so the userspace can access the temperature.

I think so.

> However, existing thermal zone description should have at least a 'hot'
> trip point and a 'critical' trip point.
>
> On the other hand, now that we are introducing the 'user' trip point,
> those thermal zone can exist without trip points because we can create
> them at any time from userspace.

No, they cannot be created at any time.

> So at the first glance, I would say we can drop the "required"
> constraint for the trip points in the thermal zone description.

That's correct, but for other reasons.
Daniel Lezcano July 2, 2024, 10:56 a.m. UTC | #10
On 02/07/2024 12:22, Rafael J. Wysocki wrote:
> On Tue, Jul 2, 2024 at 11:29 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 01/07/2024 18:26, Rob Herring wrote:
>>> On Thu, Jun 27, 2024 at 10:54:50AM +0200, Daniel Lezcano wrote:
>>>> Currently the thermal framework has 4 trip point types:
>>>>
>>>> - active : basically for fans (or anything requiring energy to cool
>>>>     down)
>>>>
>>>> - passive : a performance limiter
>>>>
>>>> - hot : for a last action before reaching critical
>>>>
>>>> - critical : a without return threshold leading to a system shutdown
>>>>
>>>> A thermal zone monitors the temperature regarding these trip
>>>> points. The old way to do that is actively polling the temperature
>>>> which is very bad for embedded systems, especially mobile and it is
>>>> even worse today as we can have more than fifty thermal zones. The
>>>> modern way is to rely on the driver to send an interrupt when the trip
>>>> points are crossed, so the system can sleep while the temperature
>>>> monitoring is offloaded to a dedicated hardware.
>>>>
>>>> However, the thermal aspect is also managed from userspace to protect
>>>> the user, especially tracking down the skin temperature sensor. The
>>>> logic is more complex than what we found in the kernel because it
>>>> needs multiple sources indicating the thermal situation of the entire
>>>> system.
>>>>
>>>> For this reason it needs to setup trip points at different levels in
>>>> order to get informed about what is going on with some thermal zones
>>>> when running some specific application.
>>>>
>>>> For instance, the skin temperature must be limited to 43°C on a long
>>>> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
>>>>
>>>> The thermal engine must then rely on trip points to monitor those
>>>> temperatures. Unfortunately, today there is only 'active' and
>>>> 'passive' trip points which has a specific meaning for the kernel, not
>>>> the userspace. That leads to hacks in different platforms for mobile
>>>> and embedded systems where 'active' trip points are used to send
>>>> notification to the userspace. This is obviously not right because
>>>> these trip are handled by the kernel.
>>>>
>>>> This patch introduces the 'user' trip point type where its semantic is
>>>> simple: do nothing at the kernel level, just send a notification to
>>>> the user space.
>>>
>>> Sounds like OS behavior/policy though I guess the existing ones kind are
>>> too. Maybe we should have defined *what* action to take and then the OS
>>> could decide whether what actions to handle vs. pass it up a level.
>>
>> Right
>>
>>> Why can't userspace just ask to be notified at a trip point it
>>> defines?
>>
>> Yes I think it is possible to create a netlink message to create a trip
>> point which will return a trip id.
>>
>> Rafael what do you think ?
> 
> Trips cannot be created on the fly ATM.
> 
> What can be done is to create trips that are invalid to start with and
> then set their temperature via sysfs.  This has been done already for
> quite a while AFAICS.

Yes, I remember that.

I would like to avoid introducing more weirdness in the thermal 
framework which deserve a clear ABI.

What is missing to create new trip points on the fly ?
Rafael J. Wysocki July 2, 2024, 11:03 a.m. UTC | #11
On Tue, Jul 2, 2024 at 12:56 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 02/07/2024 12:22, Rafael J. Wysocki wrote:
> > On Tue, Jul 2, 2024 at 11:29 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 01/07/2024 18:26, Rob Herring wrote:
> >>> On Thu, Jun 27, 2024 at 10:54:50AM +0200, Daniel Lezcano wrote:
> >>>> Currently the thermal framework has 4 trip point types:
> >>>>
> >>>> - active : basically for fans (or anything requiring energy to cool
> >>>>     down)
> >>>>
> >>>> - passive : a performance limiter
> >>>>
> >>>> - hot : for a last action before reaching critical
> >>>>
> >>>> - critical : a without return threshold leading to a system shutdown
> >>>>
> >>>> A thermal zone monitors the temperature regarding these trip
> >>>> points. The old way to do that is actively polling the temperature
> >>>> which is very bad for embedded systems, especially mobile and it is
> >>>> even worse today as we can have more than fifty thermal zones. The
> >>>> modern way is to rely on the driver to send an interrupt when the trip
> >>>> points are crossed, so the system can sleep while the temperature
> >>>> monitoring is offloaded to a dedicated hardware.
> >>>>
> >>>> However, the thermal aspect is also managed from userspace to protect
> >>>> the user, especially tracking down the skin temperature sensor. The
> >>>> logic is more complex than what we found in the kernel because it
> >>>> needs multiple sources indicating the thermal situation of the entire
> >>>> system.
> >>>>
> >>>> For this reason it needs to setup trip points at different levels in
> >>>> order to get informed about what is going on with some thermal zones
> >>>> when running some specific application.
> >>>>
> >>>> For instance, the skin temperature must be limited to 43°C on a long
> >>>> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
> >>>>
> >>>> The thermal engine must then rely on trip points to monitor those
> >>>> temperatures. Unfortunately, today there is only 'active' and
> >>>> 'passive' trip points which has a specific meaning for the kernel, not
> >>>> the userspace. That leads to hacks in different platforms for mobile
> >>>> and embedded systems where 'active' trip points are used to send
> >>>> notification to the userspace. This is obviously not right because
> >>>> these trip are handled by the kernel.
> >>>>
> >>>> This patch introduces the 'user' trip point type where its semantic is
> >>>> simple: do nothing at the kernel level, just send a notification to
> >>>> the user space.
> >>>
> >>> Sounds like OS behavior/policy though I guess the existing ones kind are
> >>> too. Maybe we should have defined *what* action to take and then the OS
> >>> could decide whether what actions to handle vs. pass it up a level.
> >>
> >> Right
> >>
> >>> Why can't userspace just ask to be notified at a trip point it
> >>> defines?
> >>
> >> Yes I think it is possible to create a netlink message to create a trip
> >> point which will return a trip id.
> >>
> >> Rafael what do you think ?
> >
> > Trips cannot be created on the fly ATM.
> >
> > What can be done is to create trips that are invalid to start with and
> > then set their temperature via sysfs.  This has been done already for
> > quite a while AFAICS.
>
> Yes, I remember that.
>
> I would like to avoid introducing more weirdness in the thermal
> framework which deserve a clear ABI.
>
> What is missing to create new trip points on the fly ?

A different data structure to store them (essentially, a list instead
of an array).

I doubt it's worth the hassle.

What's wrong with the current approach mentioned above?  It will need
to be supported going forward anyway.
Rafael J. Wysocki July 2, 2024, 11:17 a.m. UTC | #12
On Tue, Jul 2, 2024 at 1:03 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Jul 2, 2024 at 12:56 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
> >
> > On 02/07/2024 12:22, Rafael J. Wysocki wrote:
> > > On Tue, Jul 2, 2024 at 11:29 AM Daniel Lezcano
> > > <daniel.lezcano@linaro.org> wrote:
> > >>
> > >> On 01/07/2024 18:26, Rob Herring wrote:
> > >>> On Thu, Jun 27, 2024 at 10:54:50AM +0200, Daniel Lezcano wrote:
> > >>>> Currently the thermal framework has 4 trip point types:
> > >>>>
> > >>>> - active : basically for fans (or anything requiring energy to cool
> > >>>>     down)
> > >>>>
> > >>>> - passive : a performance limiter
> > >>>>
> > >>>> - hot : for a last action before reaching critical
> > >>>>
> > >>>> - critical : a without return threshold leading to a system shutdown
> > >>>>
> > >>>> A thermal zone monitors the temperature regarding these trip
> > >>>> points. The old way to do that is actively polling the temperature
> > >>>> which is very bad for embedded systems, especially mobile and it is
> > >>>> even worse today as we can have more than fifty thermal zones. The
> > >>>> modern way is to rely on the driver to send an interrupt when the trip
> > >>>> points are crossed, so the system can sleep while the temperature
> > >>>> monitoring is offloaded to a dedicated hardware.
> > >>>>
> > >>>> However, the thermal aspect is also managed from userspace to protect
> > >>>> the user, especially tracking down the skin temperature sensor. The
> > >>>> logic is more complex than what we found in the kernel because it
> > >>>> needs multiple sources indicating the thermal situation of the entire
> > >>>> system.
> > >>>>
> > >>>> For this reason it needs to setup trip points at different levels in
> > >>>> order to get informed about what is going on with some thermal zones
> > >>>> when running some specific application.
> > >>>>
> > >>>> For instance, the skin temperature must be limited to 43°C on a long
> > >>>> run but can go to 48°C for 10 minutes, or 60°C for 1 minute.
> > >>>>
> > >>>> The thermal engine must then rely on trip points to monitor those
> > >>>> temperatures. Unfortunately, today there is only 'active' and
> > >>>> 'passive' trip points which has a specific meaning for the kernel, not
> > >>>> the userspace. That leads to hacks in different platforms for mobile
> > >>>> and embedded systems where 'active' trip points are used to send
> > >>>> notification to the userspace. This is obviously not right because
> > >>>> these trip are handled by the kernel.
> > >>>>
> > >>>> This patch introduces the 'user' trip point type where its semantic is
> > >>>> simple: do nothing at the kernel level, just send a notification to
> > >>>> the user space.
> > >>>
> > >>> Sounds like OS behavior/policy though I guess the existing ones kind are
> > >>> too. Maybe we should have defined *what* action to take and then the OS
> > >>> could decide whether what actions to handle vs. pass it up a level.
> > >>
> > >> Right
> > >>
> > >>> Why can't userspace just ask to be notified at a trip point it
> > >>> defines?
> > >>
> > >> Yes I think it is possible to create a netlink message to create a trip
> > >> point which will return a trip id.
> > >>
> > >> Rafael what do you think ?
> > >
> > > Trips cannot be created on the fly ATM.
> > >
> > > What can be done is to create trips that are invalid to start with and
> > > then set their temperature via sysfs.  This has been done already for
> > > quite a while AFAICS.
> >
> > Yes, I remember that.
> >
> > I would like to avoid introducing more weirdness in the thermal
> > framework which deserve a clear ABI.
> >
> > What is missing to create new trip points on the fly ?
>
> A different data structure to store them (essentially, a list instead
> of an array).
>
> I doubt it's worth the hassle.
>
> What's wrong with the current approach mentioned above?  It will need
> to be supported going forward anyway.

BTW, there are two different concepts that seem to be mixed here.

One of them is a "trigger" that will cause a netlink message to be
sent to user space when a given temperature level is crossed (either
way) and nothing more.  This in principle can be added to any thermal
zone (even tripless) and should be possible to implement as a separate
mechanism independent of trip points.

The other one is a pair of trip points that can be set "around" the
current zone temperature so that the .set_trips() callback uses them
to program interrupts to trigger when one of them is crossed.  This at
least requires the thermal zone to provide a .set_trips() callback, so
it depends on the driver registering the thermal zone.  Arguably, the
driver in question can reserve a pair of "trip slots" in the trip
table passed to the zone registration function.
Daniel Lezcano July 2, 2024, 4:31 p.m. UTC | #13
On 02/07/2024 13:03, Rafael J. Wysocki wrote:

[ ... ]

>>> Trips cannot be created on the fly ATM.
>>>
>>> What can be done is to create trips that are invalid to start with and
>>> then set their temperature via sysfs.  This has been done already for
>>> quite a while AFAICS.
>>
>> Yes, I remember that.
>>
>> I would like to avoid introducing more weirdness in the thermal
>> framework which deserve a clear ABI.
>>
>> What is missing to create new trip points on the fly ?
> 
> A different data structure to store them (essentially, a list instead
> of an array).
> 
> I doubt it's worth the hassle.
> 
> What's wrong with the current approach mentioned above?  It will need
> to be supported going forward anyway.

So when the "user trip point" option will be set, a thermal zone will 
have ~ten(?) user trip points initialized to an invalid temperature ?
Rafael J. Wysocki July 2, 2024, 5:15 p.m. UTC | #14
On Tue, Jul 2, 2024 at 6:31 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
> On 02/07/2024 13:03, Rafael J. Wysocki wrote:
>
> [ ... ]
>
> >>> Trips cannot be created on the fly ATM.
> >>>
> >>> What can be done is to create trips that are invalid to start with and
> >>> then set their temperature via sysfs.  This has been done already for
> >>> quite a while AFAICS.
> >>
> >> Yes, I remember that.
> >>
> >> I would like to avoid introducing more weirdness in the thermal
> >> framework which deserve a clear ABI.
> >>
> >> What is missing to create new trip points on the fly ?
> >
> > A different data structure to store them (essentially, a list instead
> > of an array).
> >
> > I doubt it's worth the hassle.
> >
> > What's wrong with the current approach mentioned above?  It will need
> > to be supported going forward anyway.
>
> So when the "user trip point" option will be set, a thermal zone will
> have ~ten(?) user trip points initialized to an invalid temperature ?

If a thermal zone is registered with 10 invalid trip points, htat can
happen already today.

Let's talk about the usage model, though.

IIUC, this would be something like "triggers" I mentioned before: If a
certain temperature level is reached, a notification is sent to user
space, and there are multiple (possibly many) levels like this.  They
can be added and deleted at any time.

There can be an interface for this, as simple as a pair of sysfs
attributes under a thermal zone: add_trigger and remove_trigger.  If
root (or equivalent) writes a (valid) temperature value to
add_trigger, a new trigger is added (up to a limit and provided that
enough memory can be allocated).  Conversely, if a temperature value
is written to remove_trigger and there is a trigger with that
temperature, it will be deleted.

Internally, the triggers can be stored in a sorted list (with some
optimizations, so it need not be walked every time the zone
temperature changes) or a tree, independent of the trips table (if
any).  Every time the zone temperature changes, the triggers list is
consulted (in addition to the trips table) and if any of them have
been crossed, notifications are sent to user space.

If polling is used, this would just work, but without polling the
driver needs to support setting a pair (at least) of temperature
levels causing an interrupt to occur.  If a specific callback, say
.set_triggers(), is provided by the driver, it can be used for setting
those temperature levels to the triggers right above and right below
the current zone temperature, in analogy with .set_trips().

Does this reflect what you are after?
Daniel Lezcano July 2, 2024, 10:49 p.m. UTC | #15
On 02/07/2024 19:15, Rafael J. Wysocki wrote:
> On Tue, Jul 2, 2024 at 6:31 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>
>> On 02/07/2024 13:03, Rafael J. Wysocki wrote:
>>
>> [ ... ]
>>
>>>>> Trips cannot be created on the fly ATM.
>>>>>
>>>>> What can be done is to create trips that are invalid to start with and
>>>>> then set their temperature via sysfs.  This has been done already for
>>>>> quite a while AFAICS.
>>>>
>>>> Yes, I remember that.
>>>>
>>>> I would like to avoid introducing more weirdness in the thermal
>>>> framework which deserve a clear ABI.
>>>>
>>>> What is missing to create new trip points on the fly ?
>>>
>>> A different data structure to store them (essentially, a list instead
>>> of an array).
>>>
>>> I doubt it's worth the hassle.
>>>
>>> What's wrong with the current approach mentioned above?  It will need
>>> to be supported going forward anyway.
>>
>> So when the "user trip point" option will be set, a thermal zone will
>> have ~ten(?) user trip points initialized to an invalid temperature ?
> 
> If a thermal zone is registered with 10 invalid trip points, htat can
> happen already today.

IINW, this is the case for a particular driver (int340x_thermal_zone?), 
may be for a thermal zone. But in the general case where we can have 
more the 50 thermal zones it is not adequate as we will end up with more 
than 500 trip points overall.

Assuming it is the int340x_thermal_zone driver, it is active trip 
points, so that assumes the associated cooling device will be active. 
TBH, it is fuzzy regarding a notification mechanism

> Let's talk about the usage model, though.

Sure

> IIUC, this would be something like "triggers" I mentioned before: If a
> certain temperature level is reached, a notification is sent to user
> space, and there are multiple (possibly many) levels like this.  They
> can be added and deleted at any time.

Yes, except I don't think the usage will be to often creating trip 
points. More likely, depending on the kind of sensors and the associated 
logic, a number of trip points will created for a specific profile and 
then modified on the fly.

> There can be an interface for this, as simple as a pair of sysfs
> attributes under a thermal zone: add_trigger and remove_trigger.  If
> root (or equivalent) writes a (valid) temperature value to
> add_trigger, a new trigger is added (up to a limit and provided that
> enough memory can be allocated).  Conversely, if a temperature value
> is written to remove_trigger and there is a trigger with that
> temperature, it will be deleted.

A hysteresis would be needed too. IMO, netlinks are more adequate for 
this purpose.

> Internally, the triggers can be stored in a sorted list (with some
> optimizations, so it need not be walked every time the zone
> temperature changes) or a tree, independent of the trips table (if
> any).  Every time the zone temperature changes, the triggers list is
> consulted (in addition to the trips table) and if any of them have
> been crossed, notifications are sent to user space.

So basically, thermal_zone_device_update() will browse two lists, 
triggers + trip points, right ?

> If polling is used, this would just work, but without polling the
> driver needs to support setting a pair (at least) of temperature
> levels causing an interrupt to occur.  

I'm missing this point, can you elaborate ?

> If a specific callback, say
> .set_triggers(), is provided by the driver, it can be used for setting
> those temperature levels to the triggers right above and right below
> the current zone temperature, in analogy with .set_trips().
> 
> Does this reflect what you are after?

At the first glance I would say yes, but I don't get why it is more 
complicate to just add 'triggers' with the trip points (formerly 'user' 
trip points)
Rafael J. Wysocki July 3, 2024, 11:20 a.m. UTC | #16
On Wed, Jul 3, 2024 at 12:49 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 02/07/2024 19:15, Rafael J. Wysocki wrote:
> > On Tue, Jul 2, 2024 at 6:31 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 02/07/2024 13:03, Rafael J. Wysocki wrote:
> >>
> >> [ ... ]
> >>
> >>>>> Trips cannot be created on the fly ATM.
> >>>>>
> >>>>> What can be done is to create trips that are invalid to start with and
> >>>>> then set their temperature via sysfs.  This has been done already for
> >>>>> quite a while AFAICS.
> >>>>
> >>>> Yes, I remember that.
> >>>>
> >>>> I would like to avoid introducing more weirdness in the thermal
> >>>> framework which deserve a clear ABI.
> >>>>
> >>>> What is missing to create new trip points on the fly ?
> >>>
> >>> A different data structure to store them (essentially, a list instead
> >>> of an array).
> >>>
> >>> I doubt it's worth the hassle.
> >>>
> >>> What's wrong with the current approach mentioned above?  It will need
> >>> to be supported going forward anyway.
> >>
> >> So when the "user trip point" option will be set, a thermal zone will
> >> have ~ten(?) user trip points initialized to an invalid temperature ?
> >
> > If a thermal zone is registered with 10 invalid trip points, htat can
> > happen already today.
>
> IINW, this is the case for a particular driver (int340x_thermal_zone?),
> may be for a thermal zone. But in the general case where we can have
> more the 50 thermal zones it is not adequate as we will end up with more
> than 500 trip points overall.
>
> Assuming it is the int340x_thermal_zone driver, it is active trip
> points, so that assumes the associated cooling device will be active.
> TBH, it is fuzzy regarding a notification mechanism

The trip points that are invalid to start with are passive IIRC, but
the point I wanted to make was that the number of trip points, their
type and whether or not they were invalid to start with depended on
the driver registering a thermal zone.  Drivers put whatever they like
into the trips table today.

> > Let's talk about the usage model, though.
>
> Sure
>
> > IIUC, this would be something like "triggers" I mentioned before: If a
> > certain temperature level is reached, a notification is sent to user
> > space, and there are multiple (possibly many) levels like this.  They
> > can be added and deleted at any time.
>
> Yes, except I don't think the usage will be to often creating trip
> points. More likely, depending on the kind of sensors and the associated
> logic, a number of trip points will created for a specific profile and
> then modified on the fly.

So I gather that you'd like the initial set to come from DT.

> > There can be an interface for this, as simple as a pair of sysfs
> > attributes under a thermal zone: add_trigger and remove_trigger.  If
> > root (or equivalent) writes a (valid) temperature value to
> > add_trigger, a new trigger is added (up to a limit and provided that
> > enough memory can be allocated).  Conversely, if a temperature value
> > is written to remove_trigger and there is a trigger with that
> > temperature, it will be deleted.
>
> A hysteresis would be needed too. IMO, netlinks are more adequate for
> this purpose.

That depends.

One way to implement hysteresis is to add a new trigger when an
existing one is crossed, either below (on the way up) or above (on the
way down) it, and remove it.  Then you don't need an additional
hysteresis value.

> > Internally, the triggers can be stored in a sorted list (with some
> > optimizations, so it need not be walked every time the zone
> > temperature changes) or a tree, independent of the trips table (if
> > any).  Every time the zone temperature changes, the triggers list is
> > consulted (in addition to the trips table) and if any of them have
> > been crossed, notifications are sent to user space.
>
> So basically, thermal_zone_device_update() will browse two lists,
> triggers + trip points, right ?

Right.

> > If polling is used, this would just work, but without polling the
> > driver needs to support setting a pair (at least) of temperature
> > levels causing an interrupt to occur.
>
> I'm missing this point, can you elaborate ?

Polling guarantees that __thermal_zone_device_update() will be
executed periodically and so it guarantees detection of crossing trip
points (or trigger temperature levels).

If there is no polling, interrupts (or equivalent) need to be used to
invoke __thermal_zone_device_update() when trips are crossed.
Basically, if any of them is crossed, you need an interrupt.

Usually, however, hardware supports a limited number of temperature
levels that can trigger interrupts and I wanted to make the point that
it was sufficient for it to support two of them for the usage model in
question.

> > If a specific callback, say
> > .set_triggers(), is provided by the driver, it can be used for setting
> > those temperature levels to the triggers right above and right below
> > the current zone temperature, in analogy with .set_trips().
> >
> > Does this reflect what you are after?
>
> At the first glance I would say yes, but I don't get why it is more
> complicate to just add 'triggers' with the trip points (formerly 'user'
> trip points)

The most problematic part is the requirement to be able to add and
remove "triggers" on the fly from user space.  This requires two
things: (a) a user space interface for that and (b) a data structure
suitable for adding and removing entries (ideally, a sorted one).
None of these things exist for trip points today and the trip points
in use today don't require any of them either.  Adding more complexity
to the already complex trip point implementation doesn't look
particularly attractive to me.
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
index 68398e7e8655..cb9ea54a192e 100644
--- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
+++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
@@ -153,6 +153,7 @@  patternProperties:
               type:
                 $ref: /schemas/types.yaml#/definitions/string
                 enum:
+                  - user     # enable user notification
                   - active   # enable active cooling e.g. fans
                   - passive  # enable passive cooling e.g. throttling cpu
                   - hot      # send notification to driver
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2aa04c46a425..506f880d9aa9 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -734,6 +734,14 @@  int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
 	if (tz != pos1 || cdev != pos2)
 		return -EINVAL;
 
+	/*
+	 * It is not allowed to bind a cooling device with a trip
+	 * point user type because no mitigation should happen from
+	 * the kernel with these trip points
+	 */
+	if (trip->type == THERMAL_TRIP_USER)
+		return -EINVAL;
+
 	/* lower default 0, upper default max_state */
 	lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
 
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index aa34b6e82e26..f6daf921a136 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -60,6 +60,7 @@  static const char * const trip_types[] = {
 	[THERMAL_TRIP_PASSIVE]	= "passive",
 	[THERMAL_TRIP_HOT]	= "hot",
 	[THERMAL_TRIP_CRITICAL]	= "critical",
+	[THERMAL_TRIP_USER]	= "user",
 };
 
 /**
diff --git a/drivers/thermal/thermal_trace.h b/drivers/thermal/thermal_trace.h
index df8f4edd6068..739228ecc2e2 100644
--- a/drivers/thermal/thermal_trace.h
+++ b/drivers/thermal/thermal_trace.h
@@ -15,13 +15,15 @@  TRACE_DEFINE_ENUM(THERMAL_TRIP_CRITICAL);
 TRACE_DEFINE_ENUM(THERMAL_TRIP_HOT);
 TRACE_DEFINE_ENUM(THERMAL_TRIP_PASSIVE);
 TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
+TRACE_DEFINE_ENUM(THERMAL_TRIP_USER);
 
 #define show_tzt_type(type)					\
 	__print_symbolic(type,					\
 			 { THERMAL_TRIP_CRITICAL, "CRITICAL"},	\
 			 { THERMAL_TRIP_HOT,      "HOT"},	\
 			 { THERMAL_TRIP_PASSIVE,  "PASSIVE"},	\
-			 { THERMAL_TRIP_ACTIVE,   "ACTIVE"})
+			 { THERMAL_TRIP_ACTIVE,   "ACTIVE"}),	\
+			 { THERMAL_TRIP_USER,     "USER"})
 
 TRACE_EVENT(thermal_temperature,
 
diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
index 2a876d3b93aa..a0780bb4ff0d 100644
--- a/drivers/thermal/thermal_trip.c
+++ b/drivers/thermal/thermal_trip.c
@@ -10,6 +10,7 @@ 
 #include "thermal_core.h"
 
 static const char *trip_type_names[] = {
+	[THERMAL_TRIP_USER] = "user",
 	[THERMAL_TRIP_ACTIVE] = "active",
 	[THERMAL_TRIP_PASSIVE] = "passive",
 	[THERMAL_TRIP_HOT] = "hot",
diff --git a/include/uapi/linux/thermal.h b/include/uapi/linux/thermal.h
index fc78bf3aead7..84e556ace5f5 100644
--- a/include/uapi/linux/thermal.h
+++ b/include/uapi/linux/thermal.h
@@ -14,6 +14,7 @@  enum thermal_trip_type {
 	THERMAL_TRIP_PASSIVE,
 	THERMAL_TRIP_HOT,
 	THERMAL_TRIP_CRITICAL,
+	THERMAL_TRIP_USER,
 };
 
 /* Adding event notification support elements */