Message ID | 20240816081241.1925221-1-daniel.lezcano@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | Add thermal thresholds support | expand |
On Fri, Aug 16, 2024 at 10:12 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > The trip points are a firmware description of the temperature limits > of a specific thermal zone where we associate an action which is done > by the kernel. The time resolution is low. > > The userspace has to deal with a more complex thermal management based > on heuristics from different information coming from different > places. The logic is much more complex but based on a bigger time > resolution, usually one second based. > > The purpose of the userspace is to monitor the temperatures from > different places and take actions. However, it can not be constantly > reading the temperature to detect when a temperature threshold has > been reached. This is especially bad for mobile or embedded system as > that will lead to an unacceptable number of wakeup to check the > temperature with nothing to do. > > On the other side, the sensors are now most of the time interrupt > driven. That means the thermal framework will use the temperature trip > points to program the sensor to trigger an interrupt when a > temperature limit is crossed. > > Unfortunately, the userspace can not benefit this feature and current > solutions found here and there, iow out-of-tree, are to add fake trip > points in the firmware and enable the writable trip points. > > This is bad for different reasons, the trip points are for in-kernel > actions, the semantic of their types is used by the thermal framework > and by adding trip points in the device tree is a way to overcome the > current limitation but tampering with how the thermal framework is > supposed to work. The writable trip points is a way to adjust a > temperature limit given a specific platform if the firmware is not > accurate enough and TBH it is more a debug feature from my POV. > > The thresholds mechanism is a way to have the userspace to tell > thermal framework to send a notification when a temperature limit is > crossed. There is no id, no hysteresis, just the temperature and the > direction of the limit crossing. That means we can be notified when a > threshold is crossed the way up only, or the way down only or both > ways. That allows to create hysteresis values if it is needed. First off, I'd prefer these things to be referred to as "user thresholds" to avoid confusion with trip thresholds. > A threshold can be added, deleted or flushed. The latter means all > thresholds belonging to a thermal zone will be deleted. > > When a threshold is added: > > - if the same threshold (temperature and direction) exists, an error > is returned Why is it useful to return an error in this case? It appears that the existing threshold could be used just fine. > - if a threshold is specified with the same temperature but a > different direction, the specified direction is added > > - if there is no threshold with the same temperature then it is > created > > When a threshold is deleted: > > - if the same threshold (temperature and direction) exists, it is > deleted > > - if a threshold is specified with the same temperature but a > different direction, the specified direction is removed > > - if there is no threshold with the same temperature, then an error > is returned > > When the threshold are flushed: > > - All thresholds related to a thermal zone are deleted > > When a threshold is crossed: > > - the userspace does not need to know which threshold(s) have been > crossed, it will be notified with the thermal zone identifier, the > current temperature and the previous temperature This has a bit of a warm-up issue when the zone temperature is above a threshold to start with. Or is it not a problem because thresholds can only be added when the zone temperature is known? > - if multiple thresholds have been crossed between two updates only > one notification will be send to the userspace, it is pointless to > send a notification per thresholds crossed as the userspace can > handle that easily when it has the temperature delta information That's clever. > All aforementioned actions and events lead to a notification to the > userspace. A threshold change (add, delete and flush) is notified to > the userspace with the process id responsible of the action. > > Along with the kernel changes, the thermal library has been extended > to provide the different API to deal with the new threshold netlink > events and commands. > > In addition, the thermal-engine skeleton uses these new API by > flushing and adding thresholds as well as getting the notification > about these actions. > > Overall the series has been tested with the thermal-engine skeleton > and some selftests which are not part of this series.
Hi Daniel, On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: > The trip points are a firmware description of the temperature limits > of a specific thermal zone where we associate an action which is done > by the kernel. The time resolution is low. > > The userspace has to deal with a more complex thermal management > based > on heuristics from different information coming from different > places. The logic is much more complex but based on a bigger time > resolution, usually one second based. > > The purpose of the userspace is to monitor the temperatures from > different places and take actions. However, it can not be constantly > reading the temperature to detect when a temperature threshold has > been reached. This is especially bad for mobile or embedded system as > that will lead to an unacceptable number of wakeup to check the > temperature with nothing to do. > > On the other side, the sensors are now most of the time interrupt > driven. That means the thermal framework will use the temperature > trip > points to program the sensor to trigger an interrupt when a > temperature limit is crossed. > > Unfortunately, the userspace can not benefit this feature and current > solutions found here and there, iow out-of-tree, are to add fake trip > points in the firmware and enable the writable trip points. > > This is bad for different reasons, the trip points are for in-kernel > actions, the semantic of their types is used by the thermal framework > and by adding trip points in the device tree is a way to overcome the > current limitation but tampering with how the thermal framework is > supposed to work. The writable trip points is a way to adjust a > temperature limit given a specific platform if the firmware is not > accurate enough and TBH it is more a debug feature from my POV. > > The thresholds mechanism is a way to have the userspace to tell > thermal framework to send a notification when a temperature limit is > crossed. There is no id, no hysteresis, just the temperature and the > direction of the limit crossing. That means we can be notified when a > threshold is crossed the way up only, or the way down only or both > ways. That allows to create hysteresis values if it is needed. > > A threshold can be added, deleted or flushed. The latter means all > thresholds belonging to a thermal zone will be deleted. > So you are proposing to add threshold via netlink, not adding any new sysfs attribute? That is not clear here. I think you are adding" THERMAL_GENL_CMD_THRESHOLD_GET THERMAL_GENL_CMD_THRESHOLD_ADD THERMAL_GENL_CMD_THRESHOLD_DELETE THERMAL_GENL_CMD_THRESHOLD_FLUSH We need to document our netlink messages including old ones. Also we should add "MODIFY" as we tend to change them quite often. Also no hysteresis, that is practically we can't use. Temperature changes so much that that will flood user space. You will get 100s of events on CPU temperature if you set temperature threshold in CPU. We have a whole filtering in driver to avoid this. You need a rate limit here. There are multiple user processes can add threshold and there is no ownership. So one process can cause too much noise to others as it is multicast. We worked on a change to filer these as discussed during last LPC, but not posted yet. This will really need this as this will be too many messages. Thanks, Srinivas > When a threshold is added: > > - if the same threshold (temperature and direction) exists, an error > is returned > > - if a threshold is specified with the same temperature but a > different direction, the specified direction is added > > - if there is no threshold with the same temperature then it is > created > > When a threshold is deleted: > > - if the same threshold (temperature and direction) exists, it is > deleted > > - if a threshold is specified with the same temperature but a > different direction, the specified direction is removed > > - if there is no threshold with the same temperature, then an error > is returned > > When the threshold are flushed: > > - All thresholds related to a thermal zone are deleted > > When a threshold is crossed: > > - the userspace does not need to know which threshold(s) have been > crossed, it will be notified with the thermal zone identifier, the > current temperature and the previous temperature > > - if multiple thresholds have been crossed between two updates only > one notification will be send to the userspace, it is pointless to > send a notification per thresholds crossed as the userspace can > handle that easily when it has the temperature delta information > > All aforementioned actions and events lead to a notification to the > userspace. A threshold change (add, delete and flush) is notified to > the userspace with the process id responsible of the action. > > Along with the kernel changes, the thermal library has been extended > to provide the different API to deal with the new threshold netlink > events and commands. > > In addition, the thermal-engine skeleton uses these new API by > flushing and adding thresholds as well as getting the notification > about these actions. > > Overall the series has been tested with the thermal-engine skeleton > and some selftests which are not part of this series. > > Changelog: > V2: > - Compute min and max in thermal_zone_device_update() but keep > the loop as it is (Rafael) > > - Include slab.h to fix compilation warnings on some > architectures > with kmalloc and kfree (kernel test robot) > > Daniel Lezcano (7): > thermal/core: Compute low and high boundaries in > thermal_zone_device_update() > thermal/core: Add thresholds support > thermal/core: Connect the threshold with the core > thermal/netlink: Add the commands and the events for the thresholds > tools/lib/thermal: Make more generic the command encoding function > tools/lib/thermal: Add the threshold netlink ABI > tools/thermal/thermal-engine: Take into account the thresholds API > > drivers/thermal/Kconfig | 15 ++ > drivers/thermal/Makefile | 3 + > drivers/thermal/thermal_core.c | 21 +- > drivers/thermal/thermal_core.h | 6 +- > drivers/thermal/thermal_netlink.c | 239 > ++++++++++++++++- > drivers/thermal/thermal_netlink.h | 7 + > drivers/thermal/thermal_thresholds.c | 247 > ++++++++++++++++++ > drivers/thermal/thermal_thresholds.h | 54 ++++ > drivers/thermal/thermal_trip.c | 27 +- > include/linux/thermal.h | 3 + > include/uapi/linux/thermal.h | 30 ++- > tools/lib/thermal/commands.c | 167 +++++++++++- > tools/lib/thermal/events.c | 58 +++- > tools/lib/thermal/include/thermal.h | 40 +++ > tools/lib/thermal/libthermal.map | 5 + > tools/lib/thermal/thermal.c | 17 ++ > tools/thermal/lib/Makefile | 2 +- > tools/thermal/thermal-engine/thermal-engine.c | 109 +++++++- > 18 files changed, 972 insertions(+), 78 deletions(-) > create mode 100644 drivers/thermal/thermal_thresholds.c > create mode 100644 drivers/thermal/thermal_thresholds.h >
Hi Srinivas, On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas <srinivas.pandruvada@intel.com> wrote: > > Hi Daniel, > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: > > The trip points are a firmware description of the temperature limits > > of a specific thermal zone where we associate an action which is done > > by the kernel. The time resolution is low. > > > > The userspace has to deal with a more complex thermal management > > based > > on heuristics from different information coming from different > > places. The logic is much more complex but based on a bigger time > > resolution, usually one second based. > > > > The purpose of the userspace is to monitor the temperatures from > > different places and take actions. However, it can not be constantly > > reading the temperature to detect when a temperature threshold has > > been reached. This is especially bad for mobile or embedded system as > > that will lead to an unacceptable number of wakeup to check the > > temperature with nothing to do. > > > > On the other side, the sensors are now most of the time interrupt > > driven. That means the thermal framework will use the temperature > > trip > > points to program the sensor to trigger an interrupt when a > > temperature limit is crossed. > > > > Unfortunately, the userspace can not benefit this feature and current > > solutions found here and there, iow out-of-tree, are to add fake trip > > points in the firmware and enable the writable trip points. > > > > This is bad for different reasons, the trip points are for in-kernel > > actions, the semantic of their types is used by the thermal framework > > and by adding trip points in the device tree is a way to overcome the > > current limitation but tampering with how the thermal framework is > > supposed to work. The writable trip points is a way to adjust a > > temperature limit given a specific platform if the firmware is not > > accurate enough and TBH it is more a debug feature from my POV. > > > > The thresholds mechanism is a way to have the userspace to tell > > thermal framework to send a notification when a temperature limit is > > crossed. There is no id, no hysteresis, just the temperature and the > > direction of the limit crossing. That means we can be notified when a > > threshold is crossed the way up only, or the way down only or both > > ways. That allows to create hysteresis values if it is needed. > > > > A threshold can be added, deleted or flushed. The latter means all > > thresholds belonging to a thermal zone will be deleted. > > > > So you are proposing to add threshold via netlink, not adding any new > sysfs attribute? That is not clear here. > > I think you are adding" > THERMAL_GENL_CMD_THRESHOLD_GET > THERMAL_GENL_CMD_THRESHOLD_ADD > THERMAL_GENL_CMD_THRESHOLD_DELETE > THERMAL_GENL_CMD_THRESHOLD_FLUSH > > We need to document our netlink messages including old ones. > > Also we should add "MODIFY" as we tend to change them quite often. > > Also no hysteresis, that is practically we can't use. The direction thing is equivalent to hysteresis though. Instead of using one threshold with a given hysteresis value, use two of them with different temperature values and different directions. > Temperature changes so much that that will flood user space. You will get 100s of > events on CPU temperature if you set temperature threshold in CPU. Events only trigger when thresholds are crossed in a specific direction, but overall you have a point. > We have a whole filtering in driver to avoid this. > You need a rate limit here. > > There are multiple user processes can add threshold and there is no > ownership. So one process can cause too much noise to others as it is > multicast. > > We worked on a change to filer these as discussed during last LPC, but > not posted yet. This will really need this as this will be too many > messages. Unless there is a way to limit your subscription to events regarding a specific thermal zone, for instance. Anyway, I have to admit ignorance regarding the user space usage model related to this. For example, is it expected that there will be one user space entity managing the thresholds or there can be many.
On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote: > Hi Srinivas, > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas > <srinivas.pandruvada@intel.com> wrote: > > > > Hi Daniel, > > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: > > > The trip points are a firmware description of the temperature > > > limits > > > of a specific thermal zone where we associate an action which is > > > done > > > by the kernel. The time resolution is low. > > > > > > The userspace has to deal with a more complex thermal management > > > based > > > on heuristics from different information coming from different > > > places. The logic is much more complex but based on a bigger time > > > resolution, usually one second based. > > > > > > The purpose of the userspace is to monitor the temperatures from > > > different places and take actions. However, it can not be > > > constantly > > > reading the temperature to detect when a temperature threshold > > > has > > > been reached. This is especially bad for mobile or embedded > > > system as > > > that will lead to an unacceptable number of wakeup to check the > > > temperature with nothing to do. > > > > > > On the other side, the sensors are now most of the time interrupt > > > driven. That means the thermal framework will use the temperature > > > trip > > > points to program the sensor to trigger an interrupt when a > > > temperature limit is crossed. > > > > > > Unfortunately, the userspace can not benefit this feature and > > > current > > > solutions found here and there, iow out-of-tree, are to add fake > > > trip > > > points in the firmware and enable the writable trip points. > > > > > > This is bad for different reasons, the trip points are for in- > > > kernel > > > actions, the semantic of their types is used by the thermal > > > framework > > > and by adding trip points in the device tree is a way to overcome > > > the > > > current limitation but tampering with how the thermal framework > > > is > > > supposed to work. The writable trip points is a way to adjust a > > > temperature limit given a specific platform if the firmware is > > > not > > > accurate enough and TBH it is more a debug feature from my POV. > > > > > > The thresholds mechanism is a way to have the userspace to tell > > > thermal framework to send a notification when a temperature limit > > > is > > > crossed. There is no id, no hysteresis, just the temperature and > > > the > > > direction of the limit crossing. That means we can be notified > > > when a > > > threshold is crossed the way up only, or the way down only or > > > both > > > ways. That allows to create hysteresis values if it is needed. > > > > > > A threshold can be added, deleted or flushed. The latter means > > > all > > > thresholds belonging to a thermal zone will be deleted. > > > > > > > So you are proposing to add threshold via netlink, not adding any > > new > > sysfs attribute? That is not clear here. > > > > I think you are adding" > > THERMAL_GENL_CMD_THRESHOLD_GET > > THERMAL_GENL_CMD_THRESHOLD_ADD > > THERMAL_GENL_CMD_THRESHOLD_DELETE > > THERMAL_GENL_CMD_THRESHOLD_FLUSH > > > > We need to document our netlink messages including old ones. > > > > Also we should add "MODIFY" as we tend to change them quite often. > > > > Also no hysteresis, that is practically we can't use. > > The direction thing is equivalent to hysteresis though. > > Instead of using one threshold with a given hysteresis value, use two > of them with different temperature values and different directions. > > > Temperature changes so much that that will flood user space. You > > will get 100s of > > events on CPU temperature if you set temperature threshold in CPU. > > Events only trigger when thresholds are crossed in a specific > direction, but overall you have a point. This is good enough for some sensor which changes slowly like skin. But some sensors like CPU and board are not like that. We publish both raw and filtered event count in debugfs for x86 package temp. For example, I just ran a sample on a client for one threshold. Total notification from hardware: 224 Notified to user space: 16 > > > We have a whole filtering in driver to avoid this. > > You need a rate limit here. > > > > There are multiple user processes can add threshold and there is no > > ownership. So one process can cause too much noise to others as it > > is > > multicast. > > > > We worked on a change to filer these as discussed during last LPC, > > but > > not posted yet. This will really need this as this will be too many > > messages. > > Unless there is a way to limit your subscription to events regarding > a > specific thermal zone, for instance. That's what this change does. > > Anyway, I have to admit ignorance regarding the user space usage > model > related to this. For example, is it expected that there will be one > user space entity managing the thresholds or there can be many. Good question. That's why we didn't send this change. If there is one agent this is fine. But in embedded space, there may be more than one. Hence they needed netlink multicast instead of just one char device. Thanks, Srinivas
On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas <srinivas.pandruvada@intel.com> wrote: > > On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote: > > Hi Srinivas, > > > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas > > <srinivas.pandruvada@intel.com> wrote: > > > > > > Hi Daniel, > > > > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: > > > > The trip points are a firmware description of the temperature > > > > limits > > > > of a specific thermal zone where we associate an action which is > > > > done > > > > by the kernel. The time resolution is low. > > > > > > > > The userspace has to deal with a more complex thermal management > > > > based > > > > on heuristics from different information coming from different > > > > places. The logic is much more complex but based on a bigger time > > > > resolution, usually one second based. > > > > > > > > The purpose of the userspace is to monitor the temperatures from > > > > different places and take actions. However, it can not be > > > > constantly > > > > reading the temperature to detect when a temperature threshold > > > > has > > > > been reached. This is especially bad for mobile or embedded > > > > system as > > > > that will lead to an unacceptable number of wakeup to check the > > > > temperature with nothing to do. > > > > > > > > On the other side, the sensors are now most of the time interrupt > > > > driven. That means the thermal framework will use the temperature > > > > trip > > > > points to program the sensor to trigger an interrupt when a > > > > temperature limit is crossed. > > > > > > > > Unfortunately, the userspace can not benefit this feature and > > > > current > > > > solutions found here and there, iow out-of-tree, are to add fake > > > > trip > > > > points in the firmware and enable the writable trip points. > > > > > > > > This is bad for different reasons, the trip points are for in- > > > > kernel > > > > actions, the semantic of their types is used by the thermal > > > > framework > > > > and by adding trip points in the device tree is a way to overcome > > > > the > > > > current limitation but tampering with how the thermal framework > > > > is > > > > supposed to work. The writable trip points is a way to adjust a > > > > temperature limit given a specific platform if the firmware is > > > > not > > > > accurate enough and TBH it is more a debug feature from my POV. > > > > > > > > The thresholds mechanism is a way to have the userspace to tell > > > > thermal framework to send a notification when a temperature limit > > > > is > > > > crossed. There is no id, no hysteresis, just the temperature and > > > > the > > > > direction of the limit crossing. That means we can be notified > > > > when a > > > > threshold is crossed the way up only, or the way down only or > > > > both > > > > ways. That allows to create hysteresis values if it is needed. > > > > > > > > A threshold can be added, deleted or flushed. The latter means > > > > all > > > > thresholds belonging to a thermal zone will be deleted. > > > > > > > > > > So you are proposing to add threshold via netlink, not adding any > > > new > > > sysfs attribute? That is not clear here. > > > > > > I think you are adding" > > > THERMAL_GENL_CMD_THRESHOLD_GET > > > THERMAL_GENL_CMD_THRESHOLD_ADD > > > THERMAL_GENL_CMD_THRESHOLD_DELETE > > > THERMAL_GENL_CMD_THRESHOLD_FLUSH > > > > > > We need to document our netlink messages including old ones. > > > > > > Also we should add "MODIFY" as we tend to change them quite often. > > > > > > Also no hysteresis, that is practically we can't use. > > > > The direction thing is equivalent to hysteresis though. > > > > Instead of using one threshold with a given hysteresis value, use two > > of them with different temperature values and different directions. > > > > > Temperature changes so much that that will flood user space. You > > > will get 100s of > > > events on CPU temperature if you set temperature threshold in CPU. > > > > Events only trigger when thresholds are crossed in a specific > > direction, but overall you have a point. > > This is good enough for some sensor which changes slowly like skin. Right, and I think that this is the driving use case. > But some sensors like CPU and board are not like that. Sure, so this interface will not be for them. > We publish both raw and filtered event count in debugfs for x86 package temp. > For example, I just ran a sample on a client for one threshold. > > Total notification from hardware: 224 > Notified to user space: 16 So did you apply the Daniel's patches and run the test or did you do something else? > > > > > We have a whole filtering in driver to avoid this. > > > You need a rate limit here. > > > > > > There are multiple user processes can add threshold and there is no > > > ownership. So one process can cause too much noise to others as it > > > is > > > multicast. > > > > > > We worked on a change to filer these as discussed during last LPC, > > > but > > > not posted yet. This will really need this as this will be too many > > > messages. > > > > Unless there is a way to limit your subscription to events regarding > > a > > specific thermal zone, for instance. > That's what this change does. > > > > > Anyway, I have to admit ignorance regarding the user space usage > > model > > related to this. For example, is it expected that there will be one > > user space entity managing the thresholds or there can be many. > > Good question. That's why we didn't send this change. If there is one > agent this is fine. I think that the addition and deletion of a user threshold should be privileged operations. If that is the case, it all boils down to proper coordination in user space. > But in embedded space, there may be more than one. Hence they needed > netlink multicast instead of just one char device. I see.
On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote: > On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas > <srinivas.pandruvada@intel.com> wrote: > > > > On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote: > > > Hi Srinivas, > > > > > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas > > > <srinivas.pandruvada@intel.com> wrote: > > > > > > > > Hi Daniel, > > > > > > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: > > > > > The trip points are a firmware description of the temperature > > > > > limits > > > > > of a specific thermal zone where we associate an action which > > > > > is > > > > > done > > > > > by the kernel. The time resolution is low. > > > > > > > > > > The userspace has to deal with a more complex thermal > > > > > management > > > > > based > > > > > on heuristics from different information coming from > > > > > different > > > > > places. The logic is much more complex but based on a bigger > > > > > time > > > > > resolution, usually one second based. > > > > > > > > > > The purpose of the userspace is to monitor the temperatures > > > > > from > > > > > different places and take actions. However, it can not be > > > > > constantly > > > > > reading the temperature to detect when a temperature > > > > > threshold > > > > > has > > > > > been reached. This is especially bad for mobile or embedded > > > > > system as > > > > > that will lead to an unacceptable number of wakeup to check > > > > > the > > > > > temperature with nothing to do. > > > > > > > > > > On the other side, the sensors are now most of the time > > > > > interrupt > > > > > driven. That means the thermal framework will use the > > > > > temperature > > > > > trip > > > > > points to program the sensor to trigger an interrupt when a > > > > > temperature limit is crossed. > > > > > > > > > > Unfortunately, the userspace can not benefit this feature and > > > > > current > > > > > solutions found here and there, iow out-of-tree, are to add > > > > > fake > > > > > trip > > > > > points in the firmware and enable the writable trip points. > > > > > > > > > > This is bad for different reasons, the trip points are for > > > > > in- > > > > > kernel > > > > > actions, the semantic of their types is used by the thermal > > > > > framework > > > > > and by adding trip points in the device tree is a way to > > > > > overcome > > > > > the > > > > > current limitation but tampering with how the thermal > > > > > framework > > > > > is > > > > > supposed to work. The writable trip points is a way to adjust > > > > > a > > > > > temperature limit given a specific platform if the firmware > > > > > is > > > > > not > > > > > accurate enough and TBH it is more a debug feature from my > > > > > POV. > > > > > > > > > > The thresholds mechanism is a way to have the userspace to > > > > > tell > > > > > thermal framework to send a notification when a temperature > > > > > limit > > > > > is > > > > > crossed. There is no id, no hysteresis, just the temperature > > > > > and > > > > > the > > > > > direction of the limit crossing. That means we can be > > > > > notified > > > > > when a > > > > > threshold is crossed the way up only, or the way down only or > > > > > both > > > > > ways. That allows to create hysteresis values if it is > > > > > needed. > > > > > > > > > > A threshold can be added, deleted or flushed. The latter > > > > > means > > > > > all > > > > > thresholds belonging to a thermal zone will be deleted. > > > > > > > > > > > > > So you are proposing to add threshold via netlink, not adding > > > > any > > > > new > > > > sysfs attribute? That is not clear here. > > > > > > > > I think you are adding" > > > > THERMAL_GENL_CMD_THRESHOLD_GET > > > > THERMAL_GENL_CMD_THRESHOLD_ADD > > > > THERMAL_GENL_CMD_THRESHOLD_DELETE > > > > THERMAL_GENL_CMD_THRESHOLD_FLUSH > > > > > > > > We need to document our netlink messages including old ones. > > > > > > > > Also we should add "MODIFY" as we tend to change them quite > > > > often. > > > > > > > > Also no hysteresis, that is practically we can't use. > > > > > > The direction thing is equivalent to hysteresis though. > > > > > > Instead of using one threshold with a given hysteresis value, use > > > two > > > of them with different temperature values and different > > > directions. > > > > > > > Temperature changes so much that that will flood user space. > > > > You > > > > will get 100s of > > > > events on CPU temperature if you set temperature threshold in > > > > CPU. > > > > > > Events only trigger when thresholds are crossed in a specific > > > direction, but overall you have a point. > > > > This is good enough for some sensor which changes slowly like skin. > > Right, and I think that this is the driving use case. > > > But some sensors like CPU and board are not like that. > > Sure, so this interface will not be for them. But when we design an interface, I think this shouldn't be sensor dependent. > > > We publish both raw and filtered event count in debugfs for x86 > > package temp. > > For example, I just ran a sample on a client for one threshold. > > > > Total notification from hardware: 224 > > Notified to user space: 16 > > So did you apply the Daniel's patches and run the test or did you do > something else? We already use netlink to send notification to user space via writable trip. So didn't apply any patches. I don't see these patches will do any different. > > > > > > > > We have a whole filtering in driver to avoid this. > > > > You need a rate limit here. > > > > > > > > There are multiple user processes can add threshold and there > > > > is no > > > > ownership. So one process can cause too much noise to others as > > > > it > > > > is > > > > multicast. > > > > > > > > We worked on a change to filer these as discussed during last > > > > LPC, > > > > but > > > > not posted yet. This will really need this as this will be too > > > > many > > > > messages. > > > > > > Unless there is a way to limit your subscription to events > > > regarding > > > a > > > specific thermal zone, for instance. > > That's what this change does. > > > > > > > > Anyway, I have to admit ignorance regarding the user space usage > > > model > > > related to this. For example, is it expected that there will be > > > one > > > user space entity managing the thresholds or there can be many. > > > > Good question. That's why we didn't send this change. If there is > > one > > agent this is fine. > > I think that the addition and deletion of a user threshold should be > privileged operations. > > If that is the case, it all boils down to proper coordination in user > space. > Some kernel API use some cookie as parameter. Here also something like that can be used to match. > > But in embedded space, there may be more than one. Hence they > > needed > > netlink multicast instead of just one char device. > > I see. Thanks, Srinivas
On Thu, 2024-08-22 at 12:11 +0000, Pandruvada, Srinivas wrote: > On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote: > > On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas > > <srinivas.pandruvada@intel.com> wrote: > > > > > > On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote: > > > > Hi Srinivas, > > > > > > > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas > > > > <srinivas.pandruvada@intel.com> wrote: > > > > > > > > > > Hi Daniel, > > > > > > > > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: > > > > > > The trip points are a firmware description of the > > > > > > temperature > > > > > > limits > > > > > > of a specific thermal zone where we associate an action > > > > > > whichcan't > > > > > > is > > > > > > done > > > > > > by the kernel. The time resolution is low. > > > > > > > > > > > > The userspace has to deal with a more complex thermal > > > > > > management > > > > > > based > > > > > > on heuristics from different information coming from > > > > > > different > > > > > > places. The logic is much more complex but based on a > > > > > > bigger > > > > > > time > > > > > > resolution, usually one second based. > > > > > > > > > > > > The purpose of the userspace is to monitor the temperatures > > > > > > from > > > > > > different places and take actions. However, it can not be > > > > > > constantly > > > > > > reading the temperature to detect when a temperature > > > > > > threshold > > > > > > has > > > > > > been reached. This is especially bad for mobile or embedded > > > > > > system as > > > > > > that will lead to an unacceptable number of wakeup to check > > > > > > the > > > > > > temperature with nothing to do. > > > > > > > > > > > > On the other side, the sensors are now most of the time > > > > > > interrupt > > > > > > driven. That means the thermal framework will use the > > > > > > temperature > > > > > > trip > > > > > > points to program the sensor to trigger an interrupt when a > > > > > > temperature limit is crossed. > > > > > > > > > > > > Unfortunately, the userspace can not benefit this feature > > > > > > and > > > > > > current > > > > > > solutions found here and there, iow out-of-tree, are to add > > > > > > fake > > > > > > trip > > > > > > points in the firmware and enable the writable trip points. > > > > > > > > > > > > This is bad for different reasons, the trip points are for > > > > > > in- > > > > > > kernel > > > > > > actions, the semantic of their types is used by the thermal > > > > > > framework > > > > > > and by adding trip points in the device tree is a way to > > > > > > overcome > > > > > > the > > > > > > current limitation but tampering with how the thermal > > > > > > framework > > > > > > is > > > > > > supposed to work. The writable trip points is a way to > > > > > > adjust > > > > > > a > > > > > > temperature limit given a specific platform if the firmware > > > > > > is > > > > > > not > > > > > > accurate enough and TBH it is more a debug feature from my > > > > > > POV. > > > > > > > > > > > > The thresholds mechanism is a way to have the userspace to > > > > > > tell > > > > > > thermal framework to send a notification when a temperature > > > > > > limit > > > > > > is > > > > > > crossed. There is no id, no hysteresis, just the > > > > > > temperature > > > > > > and > > > > > > the > > > > > > direction of the limit crossing. That means we can be > > > > > > notified > > > > > > when a > > > > > > threshold is crossed the way up only, or the way down only > > > > > > or > > > > > > both > > > > > > ways. That allows to create hysteresis values if it is > > > > > > needed. > > > > > > > > > > > > A threshold can be added, deleted or flushed. The latter > > > > > > means > > > > > > all > > > > > > thresholds belonging to a thermal zone will be deleted. > > > > > > > > > > > > > > > > So you are proposing to add threshold via netlink, not adding > > > > > any > > > > > new > > > > > sysfs attribute? That is not clear here. > > > > > > > > > > I think you are adding" > > > > > THERMAL_GENL_CMD_THRESHOLD_GET > > > > > THERMAL_GENL_CMD_THRESHOLD_ADD > > > > > THERMAL_GENL_CMD_THRESHOLD_DELETE > > > > > THERMAL_GENL_CMD_THRESHOLD_FLUSH > > > > > > > > > > We need to document our netlink messages including old ones. > > > > > > > > > > Also we should add "MODIFY" as we tend to change them quite > > > > > often. > > > > > > > > > > Also no hysteresis, that is practically we can't use. > > > > > > > > The direction thing is equivalent to hysteresis though. > > > > > > > > Instead of using one threshold with a given hysteresis value, > > > > use > > > > two > > > > of them with different temperature values and different > > > > directions. > > > > > > > > > Temperature changes so much that that will flood user space. > > > > > You > > > > > will get 100s of > > > > > events on CPU temperature if you set temperature threshold in > > > > > CPU. > > > > > > > > Events only trigger when thresholds are crossed in a specific > > > > direction, but overall you have a point. > > > > > > This is good enough for some sensor which changes slowly like > > > skin. > > > > Right, and I think that this is the driving use case. > > > > > But some sensors like CPU and board are not like that. > > > > Sure, so this interface will not be for them. > But when we design an interface, I think this shouldn't be sensor > dependent. > > > > > > We publish both raw and filtered event count in debugfs for x86 > > > package temp. > > > For example, I just ran a sample on a client for one threshold. > > > > > > Total notification from hardware: 224 > > > Notified to user space: 16 > > > > So did you apply the Daniel's patches and run the test or did you > > do > > something else? > We already use netlink to send notification to user space via > writable > trip. So didn't apply any patches. I don't see these patches will do > any different. I mean without rate limit here all 224 notifications will be passed to user space (not 16). Temperature goes up and down over threshold for 100s of times before settling. I can give a try with this series to confirm. Thanks, Srinivas > > > > > > > > > > > > > We have a whole filtering in driver to avoid this. > > > > > You need a rate limit here. > > > > > > > > > > There are multiple user processes can add threshold and there > > > > > is no > > > > > ownership. So one process can cause too much noise to others > > > > > as > > > > > it > > > > > is > > > > > multicast. > > > > > > > > > > We worked on a change to filer these as discussed during last > > > > > LPC, > > > > > but > > > > > not posted yet. This will really need this as this will be > > > > > too > > > > > many > > > > > messages. > > > > > > > > Unless there is a way to limit your subscription to events > > > > regarding > > > > a > > > > specific thermal zone, for instance. > > > That's what this change does. > > > > > > > > > > > Anyway, I have to admit ignorance regarding the user space > > > > usage > > > > model > > > > related to this. For example, is it expected that there will > > > > be > > > > one > > > > user space entity managing the thresholds or there can be many. > > > > > > Good question. That's why we didn't send this change. If there is > > > one > > > agent this is fine. > > > > I think that the addition and deletion of a user threshold should > > be > > privileged operations. > > > > If that is the case, it all boils down to proper coordination in > > user > > space. > > > Some kernel API use some cookie as parameter. Here also something > like > that can be used to match. > > > > But in embedded space, there may be more than one. Hence they > > > needed > > > netlink multicast instead of just one char device. > > > > I see. > > Thanks, > Srinivas >
On Thu, Aug 22, 2024 at 2:52 PM Pandruvada, Srinivas <srinivas.pandruvada@intel.com> wrote: > > On Thu, 2024-08-22 at 12:11 +0000, Pandruvada, Srinivas wrote: > > On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote: > > > On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas > > > <srinivas.pandruvada@intel.com> wrote: > > > > > > > > On Wed, 2024-08-21 at 22:20 +0200, Rafael J. Wysocki wrote: > > > > > Hi Srinivas, > > > > > > > > > > On Wed, Aug 21, 2024 at 10:04 PM Pandruvada, Srinivas > > > > > <srinivas.pandruvada@intel.com> wrote: > > > > > > > > > > > > Hi Daniel, > > > > > > > > > > > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: > > > > > > > The trip points are a firmware description of the > > > > > > > temperature > > > > > > > limits > > > > > > > of a specific thermal zone where we associate an action > > > > > > > whichcan't > > > > > > > is > > > > > > > done > > > > > > > by the kernel. The time resolution is low. > > > > > > > > > > > > > > The userspace has to deal with a more complex thermal > > > > > > > management > > > > > > > based > > > > > > > on heuristics from different information coming from > > > > > > > different > > > > > > > places. The logic is much more complex but based on a > > > > > > > bigger > > > > > > > time > > > > > > > resolution, usually one second based. > > > > > > > > > > > > > > The purpose of the userspace is to monitor the temperatures > > > > > > > from > > > > > > > different places and take actions. However, it can not be > > > > > > > constantly > > > > > > > reading the temperature to detect when a temperature > > > > > > > threshold > > > > > > > has > > > > > > > been reached. This is especially bad for mobile or embedded > > > > > > > system as > > > > > > > that will lead to an unacceptable number of wakeup to check > > > > > > > the > > > > > > > temperature with nothing to do. > > > > > > > > > > > > > > On the other side, the sensors are now most of the time > > > > > > > interrupt > > > > > > > driven. That means the thermal framework will use the > > > > > > > temperature > > > > > > > trip > > > > > > > points to program the sensor to trigger an interrupt when a > > > > > > > temperature limit is crossed. > > > > > > > > > > > > > > Unfortunately, the userspace can not benefit this feature > > > > > > > and > > > > > > > current > > > > > > > solutions found here and there, iow out-of-tree, are to add > > > > > > > fake > > > > > > > trip > > > > > > > points in the firmware and enable the writable trip points. > > > > > > > > > > > > > > This is bad for different reasons, the trip points are for > > > > > > > in- > > > > > > > kernel > > > > > > > actions, the semantic of their types is used by the thermal > > > > > > > framework > > > > > > > and by adding trip points in the device tree is a way to > > > > > > > overcome > > > > > > > the > > > > > > > current limitation but tampering with how the thermal > > > > > > > framework > > > > > > > is > > > > > > > supposed to work. The writable trip points is a way to > > > > > > > adjust > > > > > > > a > > > > > > > temperature limit given a specific platform if the firmware > > > > > > > is > > > > > > > not > > > > > > > accurate enough and TBH it is more a debug feature from my > > > > > > > POV. > > > > > > > > > > > > > > The thresholds mechanism is a way to have the userspace to > > > > > > > tell > > > > > > > thermal framework to send a notification when a temperature > > > > > > > limit > > > > > > > is > > > > > > > crossed. There is no id, no hysteresis, just the > > > > > > > temperature > > > > > > > and > > > > > > > the > > > > > > > direction of the limit crossing. That means we can be > > > > > > > notified > > > > > > > when a > > > > > > > threshold is crossed the way up only, or the way down only > > > > > > > or > > > > > > > both > > > > > > > ways. That allows to create hysteresis values if it is > > > > > > > needed. > > > > > > > > > > > > > > A threshold can be added, deleted or flushed. The latter > > > > > > > means > > > > > > > all > > > > > > > thresholds belonging to a thermal zone will be deleted. > > > > > > > > > > > > > > > > > > > So you are proposing to add threshold via netlink, not adding > > > > > > any > > > > > > new > > > > > > sysfs attribute? That is not clear here. > > > > > > > > > > > > I think you are adding" > > > > > > THERMAL_GENL_CMD_THRESHOLD_GET > > > > > > THERMAL_GENL_CMD_THRESHOLD_ADD > > > > > > THERMAL_GENL_CMD_THRESHOLD_DELETE > > > > > > THERMAL_GENL_CMD_THRESHOLD_FLUSH > > > > > > > > > > > > We need to document our netlink messages including old ones. > > > > > > > > > > > > Also we should add "MODIFY" as we tend to change them quite > > > > > > often. > > > > > > > > > > > > Also no hysteresis, that is practically we can't use. > > > > > > > > > > The direction thing is equivalent to hysteresis though. > > > > > > > > > > Instead of using one threshold with a given hysteresis value, > > > > > use > > > > > two > > > > > of them with different temperature values and different > > > > > directions. > > > > > > > > > > > Temperature changes so much that that will flood user space. > > > > > > You > > > > > > will get 100s of > > > > > > events on CPU temperature if you set temperature threshold in > > > > > > CPU. > > > > > > > > > > Events only trigger when thresholds are crossed in a specific > > > > > direction, but overall you have a point. > > > > > > > > This is good enough for some sensor which changes slowly like > > > > skin. > > > > > > Right, and I think that this is the driving use case. > > > > > > > But some sensors like CPU and board are not like that. > > > > > > Sure, so this interface will not be for them. > > But when we design an interface, I think this shouldn't be sensor > > dependent. > > > > > > > > > We publish both raw and filtered event count in debugfs for x86 > > > > package temp. > > > > For example, I just ran a sample on a client for one threshold. > > > > > > > > Total notification from hardware: 224 > > > > Notified to user space: 16 > > > > > > So did you apply the Daniel's patches and run the test or did you > > > do > > > something else? > > We already use netlink to send notification to user space via > > writable > > trip. So didn't apply any patches. I don't see these patches will do > > any different. > > I mean without rate limit here all 224 notifications will be passed to > user space (not 16). Temperature goes up and down over threshold for > 100s of times before settling. Yeah, that needs a real trip point with hysteresis. Or the threshold could be removed once crossed and then added back when the temperature falls down sufficiently. > I can give a try with this series to confirm. That would be premature at this point.
Hi Srinivas, On 21/08/2024 22:04, Pandruvada, Srinivas wrote: > Hi Daniel, > > On Fri, 2024-08-16 at 10:12 +0200, Daniel Lezcano wrote: >> The trip points are a firmware description of the temperature limits >> of a specific thermal zone where we associate an action which is done >> by the kernel. The time resolution is low. >> >> The userspace has to deal with a more complex thermal management >> based >> on heuristics from different information coming from different >> places. The logic is much more complex but based on a bigger time >> resolution, usually one second based. >> >> The purpose of the userspace is to monitor the temperatures from >> different places and take actions. However, it can not be constantly >> reading the temperature to detect when a temperature threshold has >> been reached. This is especially bad for mobile or embedded system as >> that will lead to an unacceptable number of wakeup to check the >> temperature with nothing to do. >> >> On the other side, the sensors are now most of the time interrupt >> driven. That means the thermal framework will use the temperature >> trip >> points to program the sensor to trigger an interrupt when a >> temperature limit is crossed. >> >> Unfortunately, the userspace can not benefit this feature and current >> solutions found here and there, iow out-of-tree, are to add fake trip >> points in the firmware and enable the writable trip points. >> >> This is bad for different reasons, the trip points are for in-kernel >> actions, the semantic of their types is used by the thermal framework >> and by adding trip points in the device tree is a way to overcome the >> current limitation but tampering with how the thermal framework is >> supposed to work. The writable trip points is a way to adjust a >> temperature limit given a specific platform if the firmware is not >> accurate enough and TBH it is more a debug feature from my POV. >> >> The thresholds mechanism is a way to have the userspace to tell >> thermal framework to send a notification when a temperature limit is >> crossed. There is no id, no hysteresis, just the temperature and the >> direction of the limit crossing. That means we can be notified when a >> threshold is crossed the way up only, or the way down only or both >> ways. That allows to create hysteresis values if it is needed. >> >> A threshold can be added, deleted or flushed. The latter means all >> thresholds belonging to a thermal zone will be deleted. >> > > So you are proposing to add threshold via netlink, not adding any new > sysfs attribute? That is not clear here. > > I think you are adding" > THERMAL_GENL_CMD_THRESHOLD_GET > THERMAL_GENL_CMD_THRESHOLD_ADD > THERMAL_GENL_CMD_THRESHOLD_DELETE > THERMAL_GENL_CMD_THRESHOLD_FLUSH > > We need to document our netlink messages including old ones. We can do that but I would prefer to do that in a separate series. > Also we should add "MODIFY" as we tend to change them quite often. We can not change a threshold, only delete or add it. > Also no hysteresis, that is practically we can't use. Temperature > changes so much that that will flood user space. A threshold with a direction allows to create an hysteresis. Instead of one threshold, you will have two with different directions and temperature. > You will get 100s of > events on CPU temperature if you set temperature threshold in CPU. > We have a whole filtering in driver to avoid this. > You need a rate limit here. I disagree, the user space is not supposed to monitor at a high rate a particular thermal zone. If the CPU is sending 100s notifications per second that means the threshold is set close to the mitigation trip point temperature which is a bad user space configuration. > There are multiple user processes can add threshold and there is no > ownership. So one process can cause too much noise to others as it is > multicast. ownership is something we may want to handle later. What about restricting the thermal netlink to the 'root' user only ATM? > We worked on a change to filer these as discussed during last LPC, but > not posted yet. This will really need this as this will be too many > messages. With a correct configuration, that would not happen because it is different from receiving events from the trip points.
On 22/08/2024 14:11, Pandruvada, Srinivas wrote: > On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote: >> On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas >> <srinivas.pandruvada@intel.com> wrote: [ ... ] >> So did you apply the Daniel's patches and run the test or did you do >> something else? > We already use netlink to send notification to user space via writable > trip. So didn't apply any patches. I don't see these patches will do > any different. Actually you are missing the point of the thresholds approach. The goal is to track the temperature easily from userspace without constantly polling the temperatures in all the places. The trip points are a firmware descriptions. Their number is fixed. They are designed for in-kernel thermal framework. They have a type. A governor is supposed to be tied with it. A cooling device also. Writable trip points means you should be able to add trip points dedicated to the userspace to the firmware which is not possible. Then reuse them from userspace which is unrelated to the in-kernel thermal management. It is difficult to deal with because of the need of tracking the low and high limits from userspace. The thresholds are there to allow the userspace to have the benefit of the interrupt driven temperature monitoring. Obviously if you set the thresholds to a temperature equal to a mitigation trip point then when the limit is reached you will receive notifications for the trip point going back and forth as well as notifications for the thresholds.
On Thu, 2024-08-22 at 19:08 +0200, Daniel Lezcano wrote: > On 22/08/2024 14:11, Pandruvada, Srinivas wrote: > > On Thu, 2024-08-22 at 11:41 +0200, Rafael J. Wysocki wrote: > > > On Thu, Aug 22, 2024 at 12:16 AM Pandruvada, Srinivas > > > <srinivas.pandruvada@intel.com> wrote: > > [ ... ] > > > > So did you apply the Daniel's patches and run the test or did you > > > do > > > something else? > > We already use netlink to send notification to user space via > > writable > > trip. So didn't apply any patches. I don't see these patches will > > do > > any different. > > Actually you are missing the point of the thresholds approach. I very well understand. I proposed similar approach without netlink several years back. Also submitted patches to use IIO. > > The goal is to track the temperature easily from userspace without > constantly polling the temperatures in all the places. > Exactly. > The trip points are a firmware descriptions. Their number is fixed. > They > are designed for in-kernel thermal framework. They have a type. A > governor is supposed to be tied with it. A cooling device also. > Yes, I understand the whole approach. trips are trip where you want governors to take action. > Writable trip points means you should be able to add trip points > dedicated to the userspace to the firmware which is not possible. > Then > reuse them from userspace which is unrelated to the in-kernel thermal > management. It is difficult to deal with because of the need of > tracking > the low and high limits from userspace. > Yes. > The thresholds are there to allow the userspace to have the benefit > of > the interrupt driven temperature monitoring. > Yes. I am not denying benefits. > Obviously if you set the thresholds to a temperature equal to a > mitigation trip point then when the limit is reached you will receive > notifications for the trip point going back and forth as well as > notifications for the thresholds. > Not correct. It is not at mitigation point. This is how several sensors several sensors work. I submitted a range of sensor drivers to IIO framework, several have this issue. To make it useful, you need to have some rate limiting. Netlink is not a low overhead user-kernel interface. Looks like you finalized design and just looking for patch reviews! > > > >