mbox series

[v2,0/7] Add thermal thresholds support

Message ID 20240816081241.1925221-1-daniel.lezcano@linaro.org (mailing list archive)
Headers show
Series Add thermal thresholds support | expand

Message

Daniel Lezcano Aug. 16, 2024, 8:12 a.m. UTC
The trip points are a firmware description of the temperature limits
of a specific thermal zone where we associate an action which is done
by the kernel. The time resolution is low.

The userspace has to deal with a more complex thermal management based
on heuristics from different information coming from different
places. The logic is much more complex but based on a bigger time
resolution, usually one second based.

The purpose of the userspace is to monitor the temperatures from
different places and take actions. However, it can not be constantly
reading the temperature to detect when a temperature threshold has
been reached. This is especially bad for mobile or embedded system as
that will lead to an unacceptable number of wakeup to check the
temperature with nothing to do.

On the other side, the sensors are now most of the time interrupt
driven. That means the thermal framework will use the temperature trip
points to program the sensor to trigger an interrupt when a
temperature limit is crossed.

Unfortunately, the userspace can not benefit this feature and current
solutions found here and there, iow out-of-tree, are to add fake trip
points in the firmware and enable the writable trip points.

This is bad for different reasons, the trip points are for in-kernel
actions, the semantic of their types is used by the thermal framework
and by adding trip points in the device tree is a way to overcome the
current limitation but tampering with how the thermal framework is
supposed to work. The writable trip points is a way to adjust a
temperature limit given a specific platform if the firmware is not
accurate enough and TBH it is more a debug feature from my POV.

The thresholds mechanism is a way to have the userspace to tell
thermal framework to send a notification when a temperature limit is
crossed. There is no id, no hysteresis, just the temperature and the
direction of the limit crossing. That means we can be notified when a
threshold is crossed the way up only, or the way down only or both
ways. That allows to create hysteresis values if it is needed.

A threshold can be added, deleted or flushed. The latter means all
thresholds belonging to a thermal zone will be deleted.
    
When a threshold is added:

 - if the same threshold (temperature and direction) exists, an error
   is returned

 - if a threshold is specified with the same temperature but a
   different direction, the specified direction is added

 - if there is no threshold with the same temperature then it is
   created

When a threshold is deleted:

 - if the same threshold (temperature and direction) exists, it is
   deleted

 - if a threshold is specified with the same temperature but a
   different direction, the specified direction is removed

 - if there is no threshold with the same temperature, then an error
   is returned

When the threshold are flushed:

 - All thresholds related to a thermal zone are deleted

When a threshold is crossed:

 - the userspace does not need to know which threshold(s) have been
   crossed, it will be notified with the thermal zone identifier, the
   current temperature and the previous temperature

 - if multiple thresholds have been crossed between two updates only
   one notification will be send to the userspace, it is pointless to
   send a notification per thresholds crossed as the userspace can
   handle that easily when it has the temperature delta information

All aforementioned actions and events lead to a notification to the
userspace. A threshold change (add, delete and flush) is notified to
the userspace with the process id responsible of the action.

Along with the kernel changes, the thermal library has been extended
to provide the different API to deal with the new threshold netlink
events and commands.

In addition, the thermal-engine skeleton uses these new API by
flushing and adding thresholds as well as getting the notification
about these actions.

Overall the series has been tested with the thermal-engine skeleton
and some selftests which are not part of this series.

Changelog:
  V2:
    - Compute min and max in thermal_zone_device_update() but keep
      the loop as it is (Rafael)

    - Include slab.h to fix compilation warnings on some architectures
      with kmalloc and kfree (kernel test robot)

Daniel Lezcano (7):
  thermal/core: Compute low and high boundaries in
    thermal_zone_device_update()
  thermal/core: Add thresholds support
  thermal/core: Connect the threshold with the core
  thermal/netlink: Add the commands and the events for the thresholds
  tools/lib/thermal: Make more generic the command encoding function
  tools/lib/thermal: Add the threshold netlink ABI
  tools/thermal/thermal-engine: Take into account the thresholds API

 drivers/thermal/Kconfig                       |  15 ++
 drivers/thermal/Makefile                      |   3 +
 drivers/thermal/thermal_core.c                |  21 +-
 drivers/thermal/thermal_core.h                |   6 +-
 drivers/thermal/thermal_netlink.c             | 239 ++++++++++++++++-
 drivers/thermal/thermal_netlink.h             |   7 +
 drivers/thermal/thermal_thresholds.c          | 247 ++++++++++++++++++
 drivers/thermal/thermal_thresholds.h          |  54 ++++
 drivers/thermal/thermal_trip.c                |  27 +-
 include/linux/thermal.h                       |   3 +
 include/uapi/linux/thermal.h                  |  30 ++-
 tools/lib/thermal/commands.c                  | 167 +++++++++++-
 tools/lib/thermal/events.c                    |  58 +++-
 tools/lib/thermal/include/thermal.h           |  40 +++
 tools/lib/thermal/libthermal.map              |   5 +
 tools/lib/thermal/thermal.c                   |  17 ++
 tools/thermal/lib/Makefile                    |   2 +-
 tools/thermal/thermal-engine/thermal-engine.c | 109 +++++++-
 18 files changed, 972 insertions(+), 78 deletions(-)
 create mode 100644 drivers/thermal/thermal_thresholds.c
 create mode 100644 drivers/thermal/thermal_thresholds.h

Comments

Rafael J. Wysocki Aug. 21, 2024, 7:06 p.m. UTC | #1
On Fri, Aug 16, 2024 at 10:12 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> The trip points are a firmware description of the temperature limits
> of a specific thermal zone where we associate an action which is done
> by the kernel. The time resolution is low.
>
> The userspace has to deal with a more complex thermal management based
> on heuristics from different information coming from different
> places. The logic is much more complex but based on a bigger time
> resolution, usually one second based.
>
> The purpose of the userspace is to monitor the temperatures from
> different places and take actions. However, it can not be constantly
> reading the temperature to detect when a temperature threshold has
> been reached. This is especially bad for mobile or embedded system as
> that will lead to an unacceptable number of wakeup to check the
> temperature with nothing to do.
>
> On the other side, the sensors are now most of the time interrupt
> driven. That means the thermal framework will use the temperature trip
> points to program the sensor to trigger an interrupt when a
> temperature limit is crossed.
>
> Unfortunately, the userspace can not benefit this feature and current
> solutions found here and there, iow out-of-tree, are to add fake trip
> points in the firmware and enable the writable trip points.
>
> This is bad for different reasons, the trip points are for in-kernel
> actions, the semantic of their types is used by the thermal framework
> and by adding trip points in the device tree is a way to overcome the
> current limitation but tampering with how the thermal framework is
> supposed to work. The writable trip points is a way to adjust a
> temperature limit given a specific platform if the firmware is not
> accurate enough and TBH it is more a debug feature from my POV.
>
> The thresholds mechanism is a way to have the userspace to tell
> thermal framework to send a notification when a temperature limit is
> crossed. There is no id, no hysteresis, just the temperature and the
> direction of the limit crossing. That means we can be notified when a
> threshold is crossed the way up only, or the way down only or both
> ways. That allows to create hysteresis values if it is needed.

First off, I'd prefer these things to be referred to as "user
thresholds" to avoid confusion with trip thresholds.

> A threshold can be added, deleted or flushed. The latter means all
> thresholds belonging to a thermal zone will be deleted.
>
> When a threshold is added:
>
>  - if the same threshold (temperature and direction) exists, an error
>    is returned

Why is it useful to return an error in this case?  It appears that the
existing threshold could be used just fine.

>  - if a threshold is specified with the same temperature but a
>    different direction, the specified direction is added
>
>  - if there is no threshold with the same temperature then it is
>    created
>
> When a threshold is deleted:
>
>  - if the same threshold (temperature and direction) exists, it is
>    deleted
>
>  - if a threshold is specified with the same temperature but a
>    different direction, the specified direction is removed
>
>  - if there is no threshold with the same temperature, then an error
>    is returned
>
> When the threshold are flushed:
>
>  - All thresholds related to a thermal zone are deleted
>
> When a threshold is crossed:
>
>  - the userspace does not need to know which threshold(s) have been
>    crossed, it will be notified with the thermal zone identifier, the
>    current temperature and the previous temperature

This has a bit of a warm-up issue when the zone temperature is above a
threshold to start with.

Or is it not a problem because thresholds can only be added when the
zone temperature is known?

>  - if multiple thresholds have been crossed between two updates only
>    one notification will be send to the userspace, it is pointless to
>    send a notification per thresholds crossed as the userspace can
>    handle that easily when it has the temperature delta information

That's clever.

> All aforementioned actions and events lead to a notification to the
> userspace. A threshold change (add, delete and flush) is notified to
> the userspace with the process id responsible of the action.
>
> Along with the kernel changes, the thermal library has been extended
> to provide the different API to deal with the new threshold netlink
> events and commands.
>
> In addition, the thermal-engine skeleton uses these new API by
> flushing and adding thresholds as well as getting the notification
> about these actions.
>
> Overall the series has been tested with the thermal-engine skeleton
> and some selftests which are not part of this series.
Pandruvada, Srinivas Aug. 21, 2024, 8:04 p.m. UTC | #2
Hi Daniel,

On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
> The trip points are a firmware description of the temperature limits
> of a specific thermal zone where we associate an action which is done
> by the kernel. The time resolution is low.
> 
> The userspace has to deal with a more complex thermal management
> based
> on heuristics from different information coming from different
> places. The logic is much more complex but based on a bigger time
> resolution, usually one second based.
> 
> The purpose of the userspace is to monitor the temperatures from
> different places and take actions. However, it can not be constantly
> reading the temperature to detect when a temperature threshold has
> been reached. This is especially bad for mobile or embedded system as
> that will lead to an unacceptable number of wakeup to check the
> temperature with nothing to do.
> 
> On the other side, the sensors are now most of the time interrupt
> driven. That means the thermal framework will use the temperature
> trip
> points to program the sensor to trigger an interrupt when a
> temperature limit is crossed.
> 
> Unfortunately, the userspace can not benefit this feature and current
> solutions found here and there, iow out-of-tree, are to add fake trip
> points in the firmware and enable the writable trip points.
> 
> This is bad for different reasons, the trip points are for in-kernel
> actions, the semantic of their types is used by the thermal framework
> and by adding trip points in the device tree is a way to overcome the
> current limitation but tampering with how the thermal framework is
> supposed to work. The writable trip points is a way to adjust a
> temperature limit given a specific platform if the firmware is not
> accurate enough and TBH it is more a debug feature from my POV.
> 
> The thresholds mechanism is a way to have the userspace to tell
> thermal framework to send a notification when a temperature limit is
> crossed. There is no id, no hysteresis, just the temperature and the
> direction of the limit crossing. That means we can be notified when a
> threshold is crossed the way up only, or the way down only or both
> ways. That allows to create hysteresis values if it is needed.
> 
> A threshold can be added, deleted or flushed. The latter means all
> thresholds belonging to a thermal zone will be deleted.
>     

So you are proposing to add threshold via netlink, not adding any new
sysfs attribute? That is not clear here.

I think you are adding"
THERMAL_GENL_CMD_THRESHOLD_GET
THERMAL_GENL_CMD_THRESHOLD_ADD
THERMAL_GENL_CMD_THRESHOLD_DELETE
THERMAL_GENL_CMD_THRESHOLD_FLUSH

We need to document our netlink messages including old ones.

Also we should add "MODIFY" as we tend to change them quite often.

Also no hysteresis, that is practically we can't use. Temperature
changes so much that that will flood user space. You will get 100s of
events on CPU temperature if you set temperature threshold in CPU.
We have a whole filtering in driver to avoid this.
You need a rate limit here.

There are multiple user processes can add threshold and there is no
ownership. So one process can cause too much noise to others as it is
multicast.

We worked on a change to filer these as discussed during last LPC, but
not posted yet. This will really need this as this will be too many
messages.

Thanks,
Srinivas



> When a threshold is added:
> 
>  - if the same threshold (temperature and direction) exists, an error
>    is returned
> 
>  - if a threshold is specified with the same temperature but a
>    different direction, the specified direction is added
> 
>  - if there is no threshold with the same temperature then it is
>    created
> 
> When a threshold is deleted:
> 
>  - if the same threshold (temperature and direction) exists, it is
>    deleted
> 
>  - if a threshold is specified with the same temperature but a
>    different direction, the specified direction is removed
> 
>  - if there is no threshold with the same temperature, then an error
>    is returned
> 
> When the threshold are flushed:
> 
>  - All thresholds related to a thermal zone are deleted
> 
> When a threshold is crossed:
> 
>  - the userspace does not need to know which threshold(s) have been
>    crossed, it will be notified with the thermal zone identifier, the
>    current temperature and the previous temperature
> 
>  - if multiple thresholds have been crossed between two updates only
>    one notification will be send to the userspace, it is pointless to
>    send a notification per thresholds crossed as the userspace can
>    handle that easily when it has the temperature delta information
> 
> All aforementioned actions and events lead to a notification to the
> userspace. A threshold change (add, delete and flush) is notified to
> the userspace with the process id responsible of the action.
> 
> Along with the kernel changes, the thermal library has been extended
> to provide the different API to deal with the new threshold netlink
> events and commands.
> 
> In addition, the thermal-engine skeleton uses these new API by
> flushing and adding thresholds as well as getting the notification
> about these actions.
> 
> Overall the series has been tested with the thermal-engine skeleton
> and some selftests which are not part of this series.
> 
> Changelog:
>   V2:
>     - Compute min and max in thermal_zone_device_update() but keep
>       the loop as it is (Rafael)
> 
>     - Include slab.h to fix compilation warnings on some
> architectures
>       with kmalloc and kfree (kernel test robot)
> 
> Daniel Lezcano (7):
>   thermal/core: Compute low and high boundaries in
>     thermal_zone_device_update()
>   thermal/core: Add thresholds support
>   thermal/core: Connect the threshold with the core
>   thermal/netlink: Add the commands and the events for the thresholds
>   tools/lib/thermal: Make more generic the command encoding function
>   tools/lib/thermal: Add the threshold netlink ABI
>   tools/thermal/thermal-engine: Take into account the thresholds API
> 
>  drivers/thermal/Kconfig                       |  15 ++
>  drivers/thermal/Makefile                      |   3 +
>  drivers/thermal/thermal_core.c                |  21 +-
>  drivers/thermal/thermal_core.h                |   6 +-
>  drivers/thermal/thermal_netlink.c             | 239
> ++++++++++++++++-
>  drivers/thermal/thermal_netlink.h             |   7 +
>  drivers/thermal/thermal_thresholds.c          | 247
> ++++++++++++++++++
>  drivers/thermal/thermal_thresholds.h          |  54 ++++
>  drivers/thermal/thermal_trip.c                |  27 +-
>  include/linux/thermal.h                       |   3 +
>  include/uapi/linux/thermal.h                  |  30 ++-
>  tools/lib/thermal/commands.c                  | 167 +++++++++++-
>  tools/lib/thermal/events.c                    |  58 +++-
>  tools/lib/thermal/include/thermal.h           |  40 +++
>  tools/lib/thermal/libthermal.map              |   5 +
>  tools/lib/thermal/thermal.c                   |  17 ++
>  tools/thermal/lib/Makefile                    |   2 +-
>  tools/thermal/thermal-engine/thermal-engine.c | 109 +++++++-
>  18 files changed, 972 insertions(+), 78 deletions(-)
>  create mode 100644 drivers/thermal/thermal_thresholds.c
>  create mode 100644 drivers/thermal/thermal_thresholds.h
>
Rafael J. Wysocki Aug. 21, 2024, 8:20 p.m. UTC | #3
Hi Srinivas,

On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas
<srinivas.pandruvada@intel.com> wrote:
>
> Hi Daniel,
>
> On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
> > The trip points are a firmware description of the temperature limits
> > of a specific thermal zone where we associate an action which is done
> > by the kernel. The time resolution is low.
> >
> > The userspace has to deal with a more complex thermal management
> > based
> > on heuristics from different information coming from different
> > places. The logic is much more complex but based on a bigger time
> > resolution, usually one second based.
> >
> > The purpose of the userspace is to monitor the temperatures from
> > different places and take actions. However, it can not be constantly
> > reading the temperature to detect when a temperature threshold has
> > been reached. This is especially bad for mobile or embedded system as
> > that will lead to an unacceptable number of wakeup to check the
> > temperature with nothing to do.
> >
> > On the other side, the sensors are now most of the time interrupt
> > driven. That means the thermal framework will use the temperature
> > trip
> > points to program the sensor to trigger an interrupt when a
> > temperature limit is crossed.
> >
> > Unfortunately, the userspace can not benefit this feature and current
> > solutions found here and there, iow out-of-tree, are to add fake trip
> > points in the firmware and enable the writable trip points.
> >
> > This is bad for different reasons, the trip points are for in-kernel
> > actions, the semantic of their types is used by the thermal framework
> > and by adding trip points in the device tree is a way to overcome the
> > current limitation but tampering with how the thermal framework is
> > supposed to work. The writable trip points is a way to adjust a
> > temperature limit given a specific platform if the firmware is not
> > accurate enough and TBH it is more a debug feature from my POV.
> >
> > The thresholds mechanism is a way to have the userspace to tell
> > thermal framework to send a notification when a temperature limit is
> > crossed. There is no id, no hysteresis, just the temperature and the
> > direction of the limit crossing. That means we can be notified when a
> > threshold is crossed the way up only, or the way down only or both
> > ways. That allows to create hysteresis values if it is needed.
> >
> > A threshold can be added, deleted or flushed. The latter means all
> > thresholds belonging to a thermal zone will be deleted.
> >
>
> So you are proposing to add threshold via netlink, not adding any new
> sysfs attribute? That is not clear here.
>
> I think you are adding"
> THERMAL_GENL_CMD_THRESHOLD_GET
> THERMAL_GENL_CMD_THRESHOLD_ADD
> THERMAL_GENL_CMD_THRESHOLD_DELETE
> THERMAL_GENL_CMD_THRESHOLD_FLUSH
>
> We need to document our netlink messages including old ones.
>
> Also we should add "MODIFY" as we tend to change them quite often.
>
> Also no hysteresis, that is practically we can't use.

The direction thing is equivalent to hysteresis though.

Instead of using one threshold with a given hysteresis value, use two
of them with different temperature values and different directions.

> Temperature changes so much that that will flood user space. You will get 100s of
> events on CPU temperature if you set temperature threshold in CPU.

Events only trigger when thresholds are crossed in a specific
direction, but overall you have a point.

> We have a whole filtering in driver to avoid this.
> You need a rate limit here.
>
> There are multiple user processes can add threshold and there is no
> ownership. So one process can cause too much noise to others as it is
> multicast.
>
> We worked on a change to filer these as discussed during last LPC, but
> not posted yet. This will really need this as this will be too many
> messages.

Unless there is a way to limit your subscription to events regarding a
specific thermal zone, for instance.

Anyway, I have to admit ignorance regarding the user space usage model
related to this.  For example, is it expected that there will be one
user space entity managing the thresholds or there can be many.
Pandruvada, Srinivas Aug. 21, 2024, 10:16 p.m. UTC | #4
On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote:
> Hi Srinivas,
> 
> On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas
> <srinivas.pandruvada@intel.com> wrote:
> > 
> > Hi Daniel,
> > 
> > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
> > > The trip points are a firmware description of the temperature
> > > limits
> > > of a specific thermal zone where we associate an action which is
> > > done
> > > by the kernel. The time resolution is low.
> > > 
> > > The userspace has to deal with a more complex thermal management
> > > based
> > > on heuristics from different information coming from different
> > > places. The logic is much more complex but based on a bigger time
> > > resolution, usually one second based.
> > > 
> > > The purpose of the userspace is to monitor the temperatures from
> > > different places and take actions. However, it can not be
> > > constantly
> > > reading the temperature to detect when a temperature threshold
> > > has
> > > been reached. This is especially bad for mobile or embedded
> > > system as
> > > that will lead to an unacceptable number of wakeup to check the
> > > temperature with nothing to do.
> > > 
> > > On the other side, the sensors are now most of the time interrupt
> > > driven. That means the thermal framework will use the temperature
> > > trip
> > > points to program the sensor to trigger an interrupt when a
> > > temperature limit is crossed.
> > > 
> > > Unfortunately, the userspace can not benefit this feature and
> > > current
> > > solutions found here and there, iow out-of-tree, are to add fake
> > > trip
> > > points in the firmware and enable the writable trip points.
> > > 
> > > This is bad for different reasons, the trip points are for in-
> > > kernel
> > > actions, the semantic of their types is used by the thermal
> > > framework
> > > and by adding trip points in the device tree is a way to overcome
> > > the
> > > current limitation but tampering with how the thermal framework
> > > is
> > > supposed to work. The writable trip points is a way to adjust a
> > > temperature limit given a specific platform if the firmware is
> > > not
> > > accurate enough and TBH it is more a debug feature from my POV.
> > > 
> > > The thresholds mechanism is a way to have the userspace to tell
> > > thermal framework to send a notification when a temperature limit
> > > is
> > > crossed. There is no id, no hysteresis, just the temperature and
> > > the
> > > direction of the limit crossing. That means we can be notified
> > > when a
> > > threshold is crossed the way up only, or the way down only or
> > > both
> > > ways. That allows to create hysteresis values if it is needed.
> > > 
> > > A threshold can be added, deleted or flushed. The latter means
> > > all
> > > thresholds belonging to a thermal zone will be deleted.
> > > 
> > 
> > So you are proposing to add threshold via netlink, not adding any
> > new
> > sysfs attribute? That is not clear here.
> > 
> > I think you are adding"
> > THERMAL_GENL_CMD_THRESHOLD_GET
> > THERMAL_GENL_CMD_THRESHOLD_ADD
> > THERMAL_GENL_CMD_THRESHOLD_DELETE
> > THERMAL_GENL_CMD_THRESHOLD_FLUSH
> > 
> > We need to document our netlink messages including old ones.
> > 
> > Also we should add "MODIFY" as we tend to change them quite often.
> > 
> > Also no hysteresis, that is practically we can't use.
> 
> The direction thing is equivalent to hysteresis though.
> 
> Instead of using one threshold with a given hysteresis value, use two
> of them with different temperature values and different directions.
> 
> > Temperature changes so much that that will flood user space. You
> > will get 100s of
> > events on CPU temperature if you set temperature threshold in CPU.
> 
> Events only trigger when thresholds are crossed in a specific
> direction, but overall you have a point.

This is good enough for some sensor which changes slowly like skin. But
some sensors like CPU and board are not like that. We publish both raw
and filtered event count in debugfs for x86 package temp.
For example, I just ran a sample on a client for one threshold.

Total notification from hardware: 224
Notified to user space: 16



> 
> > We have a whole filtering in driver to avoid this.
> > You need a rate limit here.
> > 
> > There are multiple user processes can add threshold and there is no
> > ownership. So one process can cause too much noise to others as it
> > is
> > multicast.
> > 
> > We worked on a change to filer these as discussed during last LPC,
> > but
> > not posted yet. This will really need this as this will be too many
> > messages.
> 
> Unless there is a way to limit your subscription to events regarding
> a
> specific thermal zone, for instance.
That's what this change does.

> 
> Anyway, I have to admit ignorance regarding the user space usage
> model
> related to this.  For example, is it expected that there will be one
> user space entity managing the thresholds or there can be many.

Good question. That's why we didn't send this change. If there is one
agent this is fine.

But in embedded space, there may be more than one. Hence they needed
netlink multicast instead of just one char device.

Thanks,
Srinivas
Rafael J. Wysocki Aug. 22, 2024, 9:41 a.m. UTC | #5
On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas
<srinivas.pandruvada@intel.com> wrote:
>
> On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote:
> > Hi Srinivas,
> >
> > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas
> > <srinivas.pandruvada@intel.com> wrote:
> > >
> > > Hi Daniel,
> > >
> > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
> > > > The trip points are a firmware description of the temperature
> > > > limits
> > > > of a specific thermal zone where we associate an action which is
> > > > done
> > > > by the kernel. The time resolution is low.
> > > >
> > > > The userspace has to deal with a more complex thermal management
> > > > based
> > > > on heuristics from different information coming from different
> > > > places. The logic is much more complex but based on a bigger time
> > > > resolution, usually one second based.
> > > >
> > > > The purpose of the userspace is to monitor the temperatures from
> > > > different places and take actions. However, it can not be
> > > > constantly
> > > > reading the temperature to detect when a temperature threshold
> > > > has
> > > > been reached. This is especially bad for mobile or embedded
> > > > system as
> > > > that will lead to an unacceptable number of wakeup to check the
> > > > temperature with nothing to do.
> > > >
> > > > On the other side, the sensors are now most of the time interrupt
> > > > driven. That means the thermal framework will use the temperature
> > > > trip
> > > > points to program the sensor to trigger an interrupt when a
> > > > temperature limit is crossed.
> > > >
> > > > Unfortunately, the userspace can not benefit this feature and
> > > > current
> > > > solutions found here and there, iow out-of-tree, are to add fake
> > > > trip
> > > > points in the firmware and enable the writable trip points.
> > > >
> > > > This is bad for different reasons, the trip points are for in-
> > > > kernel
> > > > actions, the semantic of their types is used by the thermal
> > > > framework
> > > > and by adding trip points in the device tree is a way to overcome
> > > > the
> > > > current limitation but tampering with how the thermal framework
> > > > is
> > > > supposed to work. The writable trip points is a way to adjust a
> > > > temperature limit given a specific platform if the firmware is
> > > > not
> > > > accurate enough and TBH it is more a debug feature from my POV.
> > > >
> > > > The thresholds mechanism is a way to have the userspace to tell
> > > > thermal framework to send a notification when a temperature limit
> > > > is
> > > > crossed. There is no id, no hysteresis, just the temperature and
> > > > the
> > > > direction of the limit crossing. That means we can be notified
> > > > when a
> > > > threshold is crossed the way up only, or the way down only or
> > > > both
> > > > ways. That allows to create hysteresis values if it is needed.
> > > >
> > > > A threshold can be added, deleted or flushed. The latter means
> > > > all
> > > > thresholds belonging to a thermal zone will be deleted.
> > > >
> > >
> > > So you are proposing to add threshold via netlink, not adding any
> > > new
> > > sysfs attribute? That is not clear here.
> > >
> > > I think you are adding"
> > > THERMAL_GENL_CMD_THRESHOLD_GET
> > > THERMAL_GENL_CMD_THRESHOLD_ADD
> > > THERMAL_GENL_CMD_THRESHOLD_DELETE
> > > THERMAL_GENL_CMD_THRESHOLD_FLUSH
> > >
> > > We need to document our netlink messages including old ones.
> > >
> > > Also we should add "MODIFY" as we tend to change them quite often.
> > >
> > > Also no hysteresis, that is practically we can't use.
> >
> > The direction thing is equivalent to hysteresis though.
> >
> > Instead of using one threshold with a given hysteresis value, use two
> > of them with different temperature values and different directions.
> >
> > > Temperature changes so much that that will flood user space. You
> > > will get 100s of
> > > events on CPU temperature if you set temperature threshold in CPU.
> >
> > Events only trigger when thresholds are crossed in a specific
> > direction, but overall you have a point.
>
> This is good enough for some sensor which changes slowly like skin.

Right, and I think that this is the driving use case.

> But some sensors like CPU and board are not like that.

Sure, so this interface will not be for them.

> We publish both raw and filtered event count in debugfs for x86 package temp.
> For example, I just ran a sample on a client for one threshold.
>
> Total notification from hardware: 224
> Notified to user space: 16

So did you apply the Daniel's patches and run the test or did you do
something else?

> >
> > > We have a whole filtering in driver to avoid this.
> > > You need a rate limit here.
> > >
> > > There are multiple user processes can add threshold and there is no
> > > ownership. So one process can cause too much noise to others as it
> > > is
> > > multicast.
> > >
> > > We worked on a change to filer these as discussed during last LPC,
> > > but
> > > not posted yet. This will really need this as this will be too many
> > > messages.
> >
> > Unless there is a way to limit your subscription to events regarding
> > a
> > specific thermal zone, for instance.
> That's what this change does.
>
> >
> > Anyway, I have to admit ignorance regarding the user space usage
> > model
> > related to this.  For example, is it expected that there will be one
> > user space entity managing the thresholds or there can be many.
>
> Good question. That's why we didn't send this change. If there is one
> agent this is fine.

I think that the addition and deletion of a user threshold should be
privileged operations.

If that is the case, it all boils down to proper coordination in user space.

> But in embedded space, there may be more than one. Hence they needed
> netlink multicast instead of just one char device.

I see.
Pandruvada, Srinivas Aug. 22, 2024, 12:11 p.m. UTC | #6
On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas
> <srinivas.pandruvada@intel.com> wrote:
> > 
> > On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote:
> > > Hi Srinivas,
> > > 
> > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas
> > > <srinivas.pandruvada@intel.com> wrote:
> > > > 
> > > > Hi Daniel,
> > > > 
> > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
> > > > > The trip points are a firmware description of the temperature
> > > > > limits
> > > > > of a specific thermal zone where we associate an action which
> > > > > is
> > > > > done
> > > > > by the kernel. The time resolution is low.
> > > > > 
> > > > > The userspace has to deal with a more complex thermal
> > > > > management
> > > > > based
> > > > > on heuristics from different information coming from
> > > > > different
> > > > > places. The logic is much more complex but based on a bigger
> > > > > time
> > > > > resolution, usually one second based.
> > > > > 
> > > > > The purpose of the userspace is to monitor the temperatures
> > > > > from
> > > > > different places and take actions. However, it can not be
> > > > > constantly
> > > > > reading the temperature to detect when a temperature
> > > > > threshold
> > > > > has
> > > > > been reached. This is especially bad for mobile or embedded
> > > > > system as
> > > > > that will lead to an unacceptable number of wakeup to check
> > > > > the
> > > > > temperature with nothing to do.
> > > > > 
> > > > > On the other side, the sensors are now most of the time
> > > > > interrupt
> > > > > driven. That means the thermal framework will use the
> > > > > temperature
> > > > > trip
> > > > > points to program the sensor to trigger an interrupt when a
> > > > > temperature limit is crossed.
> > > > > 
> > > > > Unfortunately, the userspace can not benefit this feature and
> > > > > current
> > > > > solutions found here and there, iow out-of-tree, are to add
> > > > > fake
> > > > > trip
> > > > > points in the firmware and enable the writable trip points.
> > > > > 
> > > > > This is bad for different reasons, the trip points are for
> > > > > in-
> > > > > kernel
> > > > > actions, the semantic of their types is used by the thermal
> > > > > framework
> > > > > and by adding trip points in the device tree is a way to
> > > > > overcome
> > > > > the
> > > > > current limitation but tampering with how the thermal
> > > > > framework
> > > > > is
> > > > > supposed to work. The writable trip points is a way to adjust
> > > > > a
> > > > > temperature limit given a specific platform if the firmware
> > > > > is
> > > > > not
> > > > > accurate enough and TBH it is more a debug feature from my
> > > > > POV.
> > > > > 
> > > > > The thresholds mechanism is a way to have the userspace to
> > > > > tell
> > > > > thermal framework to send a notification when a temperature
> > > > > limit
> > > > > is
> > > > > crossed. There is no id, no hysteresis, just the temperature
> > > > > and
> > > > > the
> > > > > direction of the limit crossing. That means we can be
> > > > > notified
> > > > > when a
> > > > > threshold is crossed the way up only, or the way down only or
> > > > > both
> > > > > ways. That allows to create hysteresis values if it is
> > > > > needed.
> > > > > 
> > > > > A threshold can be added, deleted or flushed. The latter
> > > > > means
> > > > > all
> > > > > thresholds belonging to a thermal zone will be deleted.
> > > > > 
> > > > 
> > > > So you are proposing to add threshold via netlink, not adding
> > > > any
> > > > new
> > > > sysfs attribute? That is not clear here.
> > > > 
> > > > I think you are adding"
> > > > THERMAL_GENL_CMD_THRESHOLD_GET
> > > > THERMAL_GENL_CMD_THRESHOLD_ADD
> > > > THERMAL_GENL_CMD_THRESHOLD_DELETE
> > > > THERMAL_GENL_CMD_THRESHOLD_FLUSH
> > > > 
> > > > We need to document our netlink messages including old ones.
> > > > 
> > > > Also we should add "MODIFY" as we tend to change them quite
> > > > often.
> > > > 
> > > > Also no hysteresis, that is practically we can't use.
> > > 
> > > The direction thing is equivalent to hysteresis though.
> > > 
> > > Instead of using one threshold with a given hysteresis value, use
> > > two
> > > of them with different temperature values and different
> > > directions.
> > > 
> > > > Temperature changes so much that that will flood user space.
> > > > You
> > > > will get 100s of
> > > > events on CPU temperature if you set temperature threshold in
> > > > CPU.
> > > 
> > > Events only trigger when thresholds are crossed in a specific
> > > direction, but overall you have a point.
> > 
> > This is good enough for some sensor which changes slowly like skin.
> 
> Right, and I think that this is the driving use case.
> 
> > But some sensors like CPU and board are not like that.
> 
> Sure, so this interface will not be for them.
But when we design an interface, I think this shouldn't be sensor
dependent.

> 
> > We publish both raw and filtered event count in debugfs for x86
> > package temp.
> > For example, I just ran a sample on a client for one threshold.
> > 
> > Total notification from hardware: 224
> > Notified to user space: 16
> 
> So did you apply the Daniel's patches and run the test or did you do
> something else?
We already use netlink to send notification to user space via writable
trip. So didn't apply any patches. I don't see these patches will do
any different.


> 
> > > 
> > > > We have a whole filtering in driver to avoid this.
> > > > You need a rate limit here.
> > > > 
> > > > There are multiple user processes can add threshold and there
> > > > is no
> > > > ownership. So one process can cause too much noise to others as
> > > > it
> > > > is
> > > > multicast.
> > > > 
> > > > We worked on a change to filer these as discussed during last
> > > > LPC,
> > > > but
> > > > not posted yet. This will really need this as this will be too
> > > > many
> > > > messages.
> > > 
> > > Unless there is a way to limit your subscription to events
> > > regarding
> > > a
> > > specific thermal zone, for instance.
> > That's what this change does.
> > 
> > > 
> > > Anyway, I have to admit ignorance regarding the user space usage
> > > model
> > > related to this.  For example, is it expected that there will be
> > > one
> > > user space entity managing the thresholds or there can be many.
> > 
> > Good question. That's why we didn't send this change. If there is
> > one
> > agent this is fine.
> 
> I think that the addition and deletion of a user threshold should be
> privileged operations.
> 
> If that is the case, it all boils down to proper coordination in user
> space.
> 
Some kernel API use some cookie as parameter. Here also something like
that can be used to match.

> > But in embedded space, there may be more than one. Hence they
> > needed
> > netlink multicast instead of just one char device.
> 
> I see.

Thanks,
Srinivas
Pandruvada, Srinivas Aug. 22, 2024, 12:52 p.m. UTC | #7
On Thu, 2024-08-22 at 12:11 +0000, Pandruvada, Srinivas wrote:
> On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote:
> > On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas
> > <srinivas.pandruvada@intel.com> wrote:
> > > 
> > > On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote:
> > > > Hi Srinivas,
> > > > 
> > > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas
> > > > <srinivas.pandruvada@intel.com> wrote:
> > > > > 
> > > > > Hi Daniel,
> > > > > 
> > > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
> > > > > > The trip points are a firmware description of the
> > > > > > temperature
> > > > > > limits
> > > > > > of a specific thermal zone where we associate an action
> > > > > > whichcan't
> > > > > > is
> > > > > > done
> > > > > > by the kernel. The time resolution is low.
> > > > > > 
> > > > > > The userspace has to deal with a more complex thermal
> > > > > > management
> > > > > > based
> > > > > > on heuristics from different information coming from
> > > > > > different
> > > > > > places. The logic is much more complex but based on a
> > > > > > bigger
> > > > > > time
> > > > > > resolution, usually one second based.
> > > > > > 
> > > > > > The purpose of the userspace is to monitor the temperatures
> > > > > > from
> > > > > > different places and take actions. However, it can not be
> > > > > > constantly
> > > > > > reading the temperature to detect when a temperature
> > > > > > threshold
> > > > > > has
> > > > > > been reached. This is especially bad for mobile or embedded
> > > > > > system as
> > > > > > that will lead to an unacceptable number of wakeup to check
> > > > > > the
> > > > > > temperature with nothing to do.
> > > > > > 
> > > > > > On the other side, the sensors are now most of the time
> > > > > > interrupt
> > > > > > driven. That means the thermal framework will use the
> > > > > > temperature
> > > > > > trip
> > > > > > points to program the sensor to trigger an interrupt when a
> > > > > > temperature limit is crossed.
> > > > > > 
> > > > > > Unfortunately, the userspace can not benefit this feature
> > > > > > and
> > > > > > current
> > > > > > solutions found here and there, iow out-of-tree, are to add
> > > > > > fake
> > > > > > trip
> > > > > > points in the firmware and enable the writable trip points.
> > > > > > 
> > > > > > This is bad for different reasons, the trip points are for
> > > > > > in-
> > > > > > kernel
> > > > > > actions, the semantic of their types is used by the thermal
> > > > > > framework
> > > > > > and by adding trip points in the device tree is a way to
> > > > > > overcome
> > > > > > the
> > > > > > current limitation but tampering with how the thermal
> > > > > > framework
> > > > > > is
> > > > > > supposed to work. The writable trip points is a way to
> > > > > > adjust
> > > > > > a
> > > > > > temperature limit given a specific platform if the firmware
> > > > > > is
> > > > > > not
> > > > > > accurate enough and TBH it is more a debug feature from my
> > > > > > POV.
> > > > > > 
> > > > > > The thresholds mechanism is a way to have the userspace to
> > > > > > tell
> > > > > > thermal framework to send a notification when a temperature
> > > > > > limit
> > > > > > is
> > > > > > crossed. There is no id, no hysteresis, just the
> > > > > > temperature
> > > > > > and
> > > > > > the
> > > > > > direction of the limit crossing. That means we can be
> > > > > > notified
> > > > > > when a
> > > > > > threshold is crossed the way up only, or the way down only
> > > > > > or
> > > > > > both
> > > > > > ways. That allows to create hysteresis values if it is
> > > > > > needed.
> > > > > > 
> > > > > > A threshold can be added, deleted or flushed. The latter
> > > > > > means
> > > > > > all
> > > > > > thresholds belonging to a thermal zone will be deleted.
> > > > > > 
> > > > > 
> > > > > So you are proposing to add threshold via netlink, not adding
> > > > > any
> > > > > new
> > > > > sysfs attribute? That is not clear here.
> > > > > 
> > > > > I think you are adding"
> > > > > THERMAL_GENL_CMD_THRESHOLD_GET
> > > > > THERMAL_GENL_CMD_THRESHOLD_ADD
> > > > > THERMAL_GENL_CMD_THRESHOLD_DELETE
> > > > > THERMAL_GENL_CMD_THRESHOLD_FLUSH
> > > > > 
> > > > > We need to document our netlink messages including old ones.
> > > > > 
> > > > > Also we should add "MODIFY" as we tend to change them quite
> > > > > often.
> > > > > 
> > > > > Also no hysteresis, that is practically we can't use.
> > > > 
> > > > The direction thing is equivalent to hysteresis though.
> > > > 
> > > > Instead of using one threshold with a given hysteresis value,
> > > > use
> > > > two
> > > > of them with different temperature values and different
> > > > directions.
> > > > 
> > > > > Temperature changes so much that that will flood user space.
> > > > > You
> > > > > will get 100s of
> > > > > events on CPU temperature if you set temperature threshold in
> > > > > CPU.
> > > > 
> > > > Events only trigger when thresholds are crossed in a specific
> > > > direction, but overall you have a point.
> > > 
> > > This is good enough for some sensor which changes slowly like
> > > skin.
> > 
> > Right, and I think that this is the driving use case.
> > 
> > > But some sensors like CPU and board are not like that.
> > 
> > Sure, so this interface will not be for them.
> But when we design an interface, I think this shouldn't be sensor
> dependent.
> 
> > 
> > > We publish both raw and filtered event count in debugfs for x86
> > > package temp.
> > > For example, I just ran a sample on a client for one threshold.
> > > 
> > > Total notification from hardware: 224
> > > Notified to user space: 16
> > 
> > So did you apply the Daniel's patches and run the test or did you
> > do
> > something else?
> We already use netlink to send notification to user space via
> writable
> trip. So didn't apply any patches. I don't see these patches will do
> any different.

I mean without rate limit here all 224 notifications will be passed to
user space (not 16). Temperature goes up and down over threshold for
100s of times before settling.

I can give  a try with this series to confirm.

Thanks,
Srinivas

> 
> 
> > 
> > > > 
> > > > > We have a whole filtering in driver to avoid this.
> > > > > You need a rate limit here.
> > > > > 
> > > > > There are multiple user processes can add threshold and there
> > > > > is no
> > > > > ownership. So one process can cause too much noise to others
> > > > > as
> > > > > it
> > > > > is
> > > > > multicast.
> > > > > 
> > > > > We worked on a change to filer these as discussed during last
> > > > > LPC,
> > > > > but
> > > > > not posted yet. This will really need this as this will be
> > > > > too
> > > > > many
> > > > > messages.
> > > > 
> > > > Unless there is a way to limit your subscription to events
> > > > regarding
> > > > a
> > > > specific thermal zone, for instance.
> > > That's what this change does.
> > > 
> > > > 
> > > > Anyway, I have to admit ignorance regarding the user space
> > > > usage
> > > > model
> > > > related to this.  For example, is it expected that there will
> > > > be
> > > > one
> > > > user space entity managing the thresholds or there can be many.
> > > 
> > > Good question. That's why we didn't send this change. If there is
> > > one
> > > agent this is fine.
> > 
> > I think that the addition and deletion of a user threshold should
> > be
> > privileged operations.
> > 
> > If that is the case, it all boils down to proper coordination in
> > user
> > space.
> > 
> Some kernel API use some cookie as parameter. Here also something
> like
> that can be used to match.
> 
> > > But in embedded space, there may be more than one. Hence they
> > > needed
> > > netlink multicast instead of just one char device.
> > 
> > I see.
> 
> Thanks,
> Srinivas
>
Rafael J. Wysocki Aug. 22, 2024, 1:01 p.m. UTC | #8
On Thu, Aug 22, 2024 at 2:52 PM Pandruvada, Srinivas
<srinivas.pandruvada@intel.com> wrote:
>
> On Thu, 2024-08-22 at 12:11 +0000, Pandruvada, Srinivas wrote:
> > On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote:
> > > On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas
> > > <srinivas.pandruvada@intel.com> wrote:
> > > >
> > > > On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote:
> > > > > Hi Srinivas,
> > > > >
> > > > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas
> > > > > <srinivas.pandruvada@intel.com> wrote:
> > > > > >
> > > > > > Hi Daniel,
> > > > > >
> > > > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
> > > > > > > The trip points are a firmware description of the
> > > > > > > temperature
> > > > > > > limits
> > > > > > > of a specific thermal zone where we associate an action
> > > > > > > whichcan't
> > > > > > > is
> > > > > > > done
> > > > > > > by the kernel. The time resolution is low.
> > > > > > >
> > > > > > > The userspace has to deal with a more complex thermal
> > > > > > > management
> > > > > > > based
> > > > > > > on heuristics from different information coming from
> > > > > > > different
> > > > > > > places. The logic is much more complex but based on a
> > > > > > > bigger
> > > > > > > time
> > > > > > > resolution, usually one second based.
> > > > > > >
> > > > > > > The purpose of the userspace is to monitor the temperatures
> > > > > > > from
> > > > > > > different places and take actions. However, it can not be
> > > > > > > constantly
> > > > > > > reading the temperature to detect when a temperature
> > > > > > > threshold
> > > > > > > has
> > > > > > > been reached. This is especially bad for mobile or embedded
> > > > > > > system as
> > > > > > > that will lead to an unacceptable number of wakeup to check
> > > > > > > the
> > > > > > > temperature with nothing to do.
> > > > > > >
> > > > > > > On the other side, the sensors are now most of the time
> > > > > > > interrupt
> > > > > > > driven. That means the thermal framework will use the
> > > > > > > temperature
> > > > > > > trip
> > > > > > > points to program the sensor to trigger an interrupt when a
> > > > > > > temperature limit is crossed.
> > > > > > >
> > > > > > > Unfortunately, the userspace can not benefit this feature
> > > > > > > and
> > > > > > > current
> > > > > > > solutions found here and there, iow out-of-tree, are to add
> > > > > > > fake
> > > > > > > trip
> > > > > > > points in the firmware and enable the writable trip points.
> > > > > > >
> > > > > > > This is bad for different reasons, the trip points are for
> > > > > > > in-
> > > > > > > kernel
> > > > > > > actions, the semantic of their types is used by the thermal
> > > > > > > framework
> > > > > > > and by adding trip points in the device tree is a way to
> > > > > > > overcome
> > > > > > > the
> > > > > > > current limitation but tampering with how the thermal
> > > > > > > framework
> > > > > > > is
> > > > > > > supposed to work. The writable trip points is a way to
> > > > > > > adjust
> > > > > > > a
> > > > > > > temperature limit given a specific platform if the firmware
> > > > > > > is
> > > > > > > not
> > > > > > > accurate enough and TBH it is more a debug feature from my
> > > > > > > POV.
> > > > > > >
> > > > > > > The thresholds mechanism is a way to have the userspace to
> > > > > > > tell
> > > > > > > thermal framework to send a notification when a temperature
> > > > > > > limit
> > > > > > > is
> > > > > > > crossed. There is no id, no hysteresis, just the
> > > > > > > temperature
> > > > > > > and
> > > > > > > the
> > > > > > > direction of the limit crossing. That means we can be
> > > > > > > notified
> > > > > > > when a
> > > > > > > threshold is crossed the way up only, or the way down only
> > > > > > > or
> > > > > > > both
> > > > > > > ways. That allows to create hysteresis values if it is
> > > > > > > needed.
> > > > > > >
> > > > > > > A threshold can be added, deleted or flushed. The latter
> > > > > > > means
> > > > > > > all
> > > > > > > thresholds belonging to a thermal zone will be deleted.
> > > > > > >
> > > > > >
> > > > > > So you are proposing to add threshold via netlink, not adding
> > > > > > any
> > > > > > new
> > > > > > sysfs attribute? That is not clear here.
> > > > > >
> > > > > > I think you are adding"
> > > > > > THERMAL_GENL_CMD_THRESHOLD_GET
> > > > > > THERMAL_GENL_CMD_THRESHOLD_ADD
> > > > > > THERMAL_GENL_CMD_THRESHOLD_DELETE
> > > > > > THERMAL_GENL_CMD_THRESHOLD_FLUSH
> > > > > >
> > > > > > We need to document our netlink messages including old ones.
> > > > > >
> > > > > > Also we should add "MODIFY" as we tend to change them quite
> > > > > > often.
> > > > > >
> > > > > > Also no hysteresis, that is practically we can't use.
> > > > >
> > > > > The direction thing is equivalent to hysteresis though.
> > > > >
> > > > > Instead of using one threshold with a given hysteresis value,
> > > > > use
> > > > > two
> > > > > of them with different temperature values and different
> > > > > directions.
> > > > >
> > > > > > Temperature changes so much that that will flood user space.
> > > > > > You
> > > > > > will get 100s of
> > > > > > events on CPU temperature if you set temperature threshold in
> > > > > > CPU.
> > > > >
> > > > > Events only trigger when thresholds are crossed in a specific
> > > > > direction, but overall you have a point.
> > > >
> > > > This is good enough for some sensor which changes slowly like
> > > > skin.
> > >
> > > Right, and I think that this is the driving use case.
> > >
> > > > But some sensors like CPU and board are not like that.
> > >
> > > Sure, so this interface will not be for them.
> > But when we design an interface, I think this shouldn't be sensor
> > dependent.
> >
> > >
> > > > We publish both raw and filtered event count in debugfs for x86
> > > > package temp.
> > > > For example, I just ran a sample on a client for one threshold.
> > > >
> > > > Total notification from hardware: 224
> > > > Notified to user space: 16
> > >
> > > So did you apply the Daniel's patches and run the test or did you
> > > do
> > > something else?
> > We already use netlink to send notification to user space via
> > writable
> > trip. So didn't apply any patches. I don't see these patches will do
> > any different.
>
> I mean without rate limit here all 224 notifications will be passed to
> user space (not 16). Temperature goes up and down over threshold for
> 100s of times before settling.

Yeah, that needs a real trip point with hysteresis.

Or the threshold could be removed once crossed and then added back
when the temperature falls down sufficiently.

> I can give  a try with this series to confirm.

That would be premature at this point.
Daniel Lezcano Aug. 22, 2024, 4:51 p.m. UTC | #9
Hi Srinivas,

On 21/08/2024 22:04, Pandruvada, Srinivas wrote:
> Hi Daniel,
> 
> On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote:
>> The trip points are a firmware description of the temperature limits
>> of a specific thermal zone where we associate an action which is done
>> by the kernel. The time resolution is low.
>>
>> The userspace has to deal with a more complex thermal management
>> based
>> on heuristics from different information coming from different
>> places. The logic is much more complex but based on a bigger time
>> resolution, usually one second based.
>>
>> The purpose of the userspace is to monitor the temperatures from
>> different places and take actions. However, it can not be constantly
>> reading the temperature to detect when a temperature threshold has
>> been reached. This is especially bad for mobile or embedded system as
>> that will lead to an unacceptable number of wakeup to check the
>> temperature with nothing to do.
>>
>> On the other side, the sensors are now most of the time interrupt
>> driven. That means the thermal framework will use the temperature
>> trip
>> points to program the sensor to trigger an interrupt when a
>> temperature limit is crossed.
>>
>> Unfortunately, the userspace can not benefit this feature and current
>> solutions found here and there, iow out-of-tree, are to add fake trip
>> points in the firmware and enable the writable trip points.
>>
>> This is bad for different reasons, the trip points are for in-kernel
>> actions, the semantic of their types is used by the thermal framework
>> and by adding trip points in the device tree is a way to overcome the
>> current limitation but tampering with how the thermal framework is
>> supposed to work. The writable trip points is a way to adjust a
>> temperature limit given a specific platform if the firmware is not
>> accurate enough and TBH it is more a debug feature from my POV.
>>
>> The thresholds mechanism is a way to have the userspace to tell
>> thermal framework to send a notification when a temperature limit is
>> crossed. There is no id, no hysteresis, just the temperature and the
>> direction of the limit crossing. That means we can be notified when a
>> threshold is crossed the way up only, or the way down only or both
>> ways. That allows to create hysteresis values if it is needed.
>>
>> A threshold can be added, deleted or flushed. The latter means all
>> thresholds belonging to a thermal zone will be deleted.
>>      
> 
> So you are proposing to add threshold via netlink, not adding any new
> sysfs attribute? That is not clear here.
> 
> I think you are adding"
> THERMAL_GENL_CMD_THRESHOLD_GET
> THERMAL_GENL_CMD_THRESHOLD_ADD
> THERMAL_GENL_CMD_THRESHOLD_DELETE
> THERMAL_GENL_CMD_THRESHOLD_FLUSH
> 
> We need to document our netlink messages including old ones.

We can do that but I would prefer to do that in a separate series.

> Also we should add "MODIFY" as we tend to change them quite often.

We can not change a threshold, only delete or add it.

> Also no hysteresis, that is practically we can't use. Temperature
> changes so much that that will flood user space. 

A threshold with a direction allows to create an hysteresis. Instead of 
one threshold, you will have two with different directions and temperature.

> You will get 100s of
> events on CPU temperature if you set temperature threshold in CPU.
> We have a whole filtering in driver to avoid this.
> You need a rate limit here.

I disagree, the user space is not supposed to monitor at a high rate a 
particular thermal zone. If the CPU is sending 100s notifications per 
second that means the threshold is set close to the mitigation trip 
point temperature which is a bad user space configuration.

> There are multiple user processes can add threshold and there is no
> ownership. So one process can cause too much noise to others as it is
> multicast.

ownership is something we may want to handle later.

What about restricting the thermal netlink to the 'root' user only ATM?

> We worked on a change to filer these as discussed during last LPC, but
> not posted yet. This will really need this as this will be too many
> messages.

With a correct configuration, that would not happen because it is 
different from receiving events from the trip points.
Daniel Lezcano Aug. 22, 2024, 5:08 p.m. UTC | #10
On 22/08/2024 14:11, Pandruvada, Srinivas wrote:
> On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote:
>> On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas
>> <srinivas.pandruvada@intel.com> wrote:

[ ... ]

>> So did you apply the Daniel's patches and run the test or did you do
>> something else?
> We already use netlink to send notification to user space via writable
> trip. So didn't apply any patches. I don't see these patches will do
> any different.

Actually you are missing the point of the thresholds approach.

The goal is to track the temperature easily from userspace without 
constantly polling the temperatures in all the places.

The trip points are a firmware descriptions. Their number is fixed. They 
are designed for in-kernel thermal framework. They have a type. A 
governor is supposed to be tied with it. A cooling device also.

Writable trip points means you should be able to add trip points 
dedicated to the userspace to the firmware which is not possible. Then 
reuse them from userspace which is unrelated to the in-kernel thermal 
management. It is difficult to deal with because of the need of tracking 
the low and high limits from userspace.

The thresholds are there to allow the userspace to have the benefit of 
the interrupt driven temperature monitoring.

Obviously if you set the thresholds to a temperature equal to a 
mitigation trip point then when the limit is reached you will receive 
notifications for the trip point going back and forth as well as 
notifications for the thresholds.
Pandruvada, Srinivas Aug. 22, 2024, 6:12 p.m. UTC | #11
On Thu, 2024-08-22 at 19:08 +0200, Daniel Lezcano wrote:
> On 22/08/2024 14:11, Pandruvada, Srinivas wrote:
> > On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote:
> > > On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas
> > > <srinivas.pandruvada@intel.com> wrote:
> 
> [ ... ]
> 
> > > So did you apply the Daniel's patches and run the test or did you
> > > do
> > > something else?
> > We already use netlink to send notification to user space via
> > writable
> > trip. So didn't apply any patches. I don't see these patches will
> > do
> > any different.
> 
> Actually you are missing the point of the thresholds approach.

I very well understand. I proposed similar approach without netlink
several years back. Also submitted patches to use IIO.


> 
> The goal is to track the temperature easily from userspace without 
> constantly polling the temperatures in all the places.
> 
Exactly.


> The trip points are a firmware descriptions. Their number is fixed.
> They 
> are designed for in-kernel thermal framework. They have a type. A 
> governor is supposed to be tied with it. A cooling device also.
> 
Yes, I understand the whole approach. trips are trip where you want
governors to take action.

> Writable trip points means you should be able to add trip points 
> dedicated to the userspace to the firmware which is not possible.
> Then 
> reuse them from userspace which is unrelated to the in-kernel thermal
> management. It is difficult to deal with because of the need of
> tracking 
> the low and high limits from userspace.
> 
Yes.

> The thresholds are there to allow the userspace to have the benefit
> of 
> the interrupt driven temperature monitoring.
> 
Yes. I am not denying benefits.

> Obviously if you set the thresholds to a temperature equal to a 
> mitigation trip point then when the limit is reached you will receive
> notifications for the trip point going back and forth as well as 
> notifications for the thresholds.
> 
Not correct. It is not at mitigation point. This is how several sensors
several sensors work. I submitted a range of sensor drivers to IIO
framework, several have this issue.

To make it useful, you need to have some rate limiting. Netlink is not
a low overhead user-kernel interface.

Looks like you finalized design and just looking for patch reviews!


> 
> 
> 
>