Message ID | 20241014094309.1430126-1-daniel.lezcano@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | Add thermal user thresholds support | expand |
Hi, as there are no more comments and I think I took into account all the feedback, is it possible to merge this series so I can get rid of monitoring its status ? Thanks On 14/10/2024 11:43, Daniel Lezcano wrote: > The trip points are a firmware description of the temperature limits > of a specific thermal zone where we associate an action which is done > by the kernel. The time resolution is low. > > The userspace has to deal with a more complex thermal management based > on heuristics from different information coming from different > places. The logic is much more complex but based on a bigger time > resolution, usually one second based. > > The purpose of the userspace is to monitor the temperatures from > different places and take actions. However, it can not be constantly > reading the temperature to detect when a temperature threshold has > been reached. This is especially bad for mobile or embedded system as > that will lead to an unacceptable number of wakeup to check the > temperature with nothing to do. > > On the other side, the sensors are now most of the time interrupt > driven. That means the thermal framework will use the temperature trip > points to program the sensor to trigger an interrupt when a > temperature limit is crossed. > > Unfortunately, the userspace can not benefit this feature and current > solutions found here and there, iow out-of-tree, are to add fake trip > points in the firmware and enable the writable trip points. > > This is bad for different reasons, the trip points are for in-kernel > actions, the semantic of their types is used by the thermal framework > and by adding trip points in the device tree is a way to overcome the > current limitation but tampering with how the thermal framework is > supposed to work. The writable trip points is a way to adjust a > temperature limit given a specific platform if the firmware is not > accurate enough and TBH it is more a debug feature from my POV. > > The user thresholds mechanism is a way to have the userspace to tell > thermal framework to send a notification when a temperature limit is > crossed. There is no id, no hysteresis, just the temperature and the > direction of the limit crossing. That means we can be notified when a > temperature threshold is crossed the way up only, or the way down only > or both ways. That allows to create hysteresis values if it is needed. > > Those thresholds are refered as user thresholds in order to do the > difference with the trip points which are similar. > > An user threshold can be added, deleted or flushed. The latter means > all user thresholds belonging to a thermal zone will be deleted. > > When one or several user thresholds are crossed, an event is sent to > the userspace. > > All aforementioned actions and events lead to a notification to the > userspace. > > Along with the kernel changes, the thermal library has been extended > to provide the different API to deal with the new user threshold > netlink events and commands. > > In addition, the thermal-engine skeleton uses these new API by > flushing and adding user thresholds as well as getting the > notification about these actions. > > Overall the series has been tested with the thermal-engine skeleton > and some selftests which are not part of this series. > > Changelog: > V5: > - Added CAP_SYS_ADMIN needed capability when adding, deleting and > flushing a threshold (Rafael) > > - Remove the pid information to prevent leaking pid inside > containers. Also the information is not really needed (Rafael) > > - Renamed "THERMAL_GENL_ATTR_THRESHOLD_WAY" to > "THERMAL_GENL_ATTR_THRESHOLD_DIRECTION". Did not used '*_DIR' as > suggested initially because it can be ambiguous with 'directory' > (Rafael) > > - Renamed 'last_temp' to 'prev_temp' (Rafael) > > - Used CLASS constructor/destructor to get / put the thermal > zone's device refcount (Rafael) > > - Moved locking inside thermal_thresholds_for_each() (Rafael) > > - Reflected the changes above in the thermal library and the > thermal engine skeleton > > > V4: > - Fix missing stubs when THERMAL_NETLINK=n (kernel test robot) > > V3: > - the first patch of the v2 series has been merged > > - Modified the description to split the information between the > cover letter and the patch 1 description (Rafael) > > - Made the thresholds code as part of the core (Rafael) > > - Converted the thresholds into a list and directly declared in > the thermal zone device structure (Rafael) > > - Changed the name of the field in the thermal zone device > structure to user_thresholds (Rafael) > > - Added #include "thermal_thresholds.h" (Rafael) > > - Combined the conditions in the function > __thermal_threshold_is_crossed (Rafael) > > - Moved the function thermal_thresholds_flush() before > thermal_thresholds_exit() (Rafael) > > - Change thermal_thresholds_handle() to return void (Rafael) > > - Move the list field on top the of the structure threshold and > renamed it list_node (Rafael) > > - Changed THERMAL_THRESHOLD_* notifications to > THERMAL_TZ_THRESHOLD_* (Rafael) > > V2: > - Compute min and max in thermal_zone_device_update() but keep > the loop as it is (Rafael) > > - Include slab.h to fix compilation warnings on some architectures > with kmalloc and kfree (kernel test robot) > > Daniel Lezcano (4): > thermal/netlink: Add the commands and the events for the thresholds > tools/lib/thermal: Make more generic the command encoding function > tools/lib/thermal: Add the threshold netlink ABI > tools/thermal/thermal-engine: Take into account the thresholds API > > drivers/thermal/thermal_netlink.c | 236 +++++++++++++++++- > drivers/thermal/thermal_netlink.h | 34 +++ > drivers/thermal/thermal_thresholds.c | 36 +-- > drivers/thermal/thermal_thresholds.h | 2 +- > include/uapi/linux/thermal.h | 27 +- > tools/lib/thermal/commands.c | 167 ++++++++++++- > tools/lib/thermal/events.c | 55 +++- > tools/lib/thermal/include/thermal.h | 40 +++ > tools/lib/thermal/libthermal.map | 5 + > tools/lib/thermal/thermal.c | 17 ++ > tools/thermal/lib/Makefile | 2 +- > tools/thermal/thermal-engine/thermal-engine.c | 105 +++++++- > 12 files changed, 662 insertions(+), 64 deletions(-) >
Hi Daniel, On 10/21/24 09:28, Daniel Lezcano wrote: > > Hi, > > as there are no more comments and I think I took into account all the > feedback, is it possible to merge this series so I can get rid of > monitoring its status ? > My apologies, give me 1 day. Today I will test it and review. I've seen some previous version and haven't seen any major issues, so this one should be good IMO. Regards, Lukasz
Hi Daniel, On Mon, Oct 21, 2024 at 10:28 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > > Hi, > > as there are no more comments and I think I took into account all the > feedback, is it possible to merge this series so I can get rid of > monitoring its status ? Please see my reply to the first patch. Apart from a locking issue in thermal_thresholds_for_each() that will go away when the thermal_zone guard is used in this function, this looks good to me. Of course, I'll give Lukasz the time to review and test it before it gets applied, though. > On 14/10/2024 11:43, Daniel Lezcano wrote: > > The trip points are a firmware description of the temperature limits > > of a specific thermal zone where we associate an action which is done > > by the kernel. The time resolution is low. > > > > The userspace has to deal with a more complex thermal management based > > on heuristics from different information coming from different > > places. The logic is much more complex but based on a bigger time > > resolution, usually one second based. > > > > The purpose of the userspace is to monitor the temperatures from > > different places and take actions. However, it can not be constantly > > reading the temperature to detect when a temperature threshold has > > been reached. This is especially bad for mobile or embedded system as > > that will lead to an unacceptable number of wakeup to check the > > temperature with nothing to do. > > > > On the other side, the sensors are now most of the time interrupt > > driven. That means the thermal framework will use the temperature trip > > points to program the sensor to trigger an interrupt when a > > temperature limit is crossed. > > > > Unfortunately, the userspace can not benefit this feature and current > > solutions found here and there, iow out-of-tree, are to add fake trip > > points in the firmware and enable the writable trip points. > > > > This is bad for different reasons, the trip points are for in-kernel > > actions, the semantic of their types is used by the thermal framework > > and by adding trip points in the device tree is a way to overcome the > > current limitation but tampering with how the thermal framework is > > supposed to work. The writable trip points is a way to adjust a > > temperature limit given a specific platform if the firmware is not > > accurate enough and TBH it is more a debug feature from my POV. > > > > The user thresholds mechanism is a way to have the userspace to tell > > thermal framework to send a notification when a temperature limit is > > crossed. There is no id, no hysteresis, just the temperature and the > > direction of the limit crossing. That means we can be notified when a > > temperature threshold is crossed the way up only, or the way down only > > or both ways. That allows to create hysteresis values if it is needed. > > > > Those thresholds are refered as user thresholds in order to do the > > difference with the trip points which are similar. > > > > An user threshold can be added, deleted or flushed. The latter means > > all user thresholds belonging to a thermal zone will be deleted. > > > > When one or several user thresholds are crossed, an event is sent to > > the userspace. > > > > All aforementioned actions and events lead to a notification to the > > userspace. > > > > Along with the kernel changes, the thermal library has been extended > > to provide the different API to deal with the new user threshold > > netlink events and commands. > > > > In addition, the thermal-engine skeleton uses these new API by > > flushing and adding user thresholds as well as getting the > > notification about these actions. > > > > Overall the series has been tested with the thermal-engine skeleton > > and some selftests which are not part of this series. > > > > Changelog: > > V5: > > - Added CAP_SYS_ADMIN needed capability when adding, deleting and > > flushing a threshold (Rafael) > > > > - Remove the pid information to prevent leaking pid inside > > containers. Also the information is not really needed (Rafael) > > > > - Renamed "THERMAL_GENL_ATTR_THRESHOLD_WAY" to > > "THERMAL_GENL_ATTR_THRESHOLD_DIRECTION". Did not used '*_DIR' as > > suggested initially because it can be ambiguous with 'directory' > > (Rafael) > > > > - Renamed 'last_temp' to 'prev_temp' (Rafael) > > > > - Used CLASS constructor/destructor to get / put the thermal > > zone's device refcount (Rafael) > > > > - Moved locking inside thermal_thresholds_for_each() (Rafael) > > > > - Reflected the changes above in the thermal library and the > > thermal engine skeleton > > > > > > V4: > > - Fix missing stubs when THERMAL_NETLINK=n (kernel test robot) > > > > V3: > > - the first patch of the v2 series has been merged > > > > - Modified the description to split the information between the > > cover letter and the patch 1 description (Rafael) > > > > - Made the thresholds code as part of the core (Rafael) > > > > - Converted the thresholds into a list and directly declared in > > the thermal zone device structure (Rafael) > > > > - Changed the name of the field in the thermal zone device > > structure to user_thresholds (Rafael) > > > > - Added #include "thermal_thresholds.h" (Rafael) > > > > - Combined the conditions in the function > > __thermal_threshold_is_crossed (Rafael) > > > > - Moved the function thermal_thresholds_flush() before > > thermal_thresholds_exit() (Rafael) > > > > - Change thermal_thresholds_handle() to return void (Rafael) > > > > - Move the list field on top the of the structure threshold and > > renamed it list_node (Rafael) > > > > - Changed THERMAL_THRESHOLD_* notifications to > > THERMAL_TZ_THRESHOLD_* (Rafael) > > > > V2: > > - Compute min and max in thermal_zone_device_update() but keep > > the loop as it is (Rafael) > > > > - Include slab.h to fix compilation warnings on some architectures > > with kmalloc and kfree (kernel test robot) > > > > Daniel Lezcano (4): > > thermal/netlink: Add the commands and the events for the thresholds > > tools/lib/thermal: Make more generic the command encoding function > > tools/lib/thermal: Add the threshold netlink ABI > > tools/thermal/thermal-engine: Take into account the thresholds API > > > > drivers/thermal/thermal_netlink.c | 236 +++++++++++++++++- > > drivers/thermal/thermal_netlink.h | 34 +++ > > drivers/thermal/thermal_thresholds.c | 36 +-- > > drivers/thermal/thermal_thresholds.h | 2 +- > > include/uapi/linux/thermal.h | 27 +- > > tools/lib/thermal/commands.c | 167 ++++++++++++- > > tools/lib/thermal/events.c | 55 +++- > > tools/lib/thermal/include/thermal.h | 40 +++ > > tools/lib/thermal/libthermal.map | 5 + > > tools/lib/thermal/thermal.c | 17 ++ > > tools/thermal/lib/Makefile | 2 +- > > tools/thermal/thermal-engine/thermal-engine.c | 105 +++++++- > > 12 files changed, 662 insertions(+), 64 deletions(-) > > > > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog >